[MySQL] [MHA] 03. MHA 테스트 및 Failover
■ MHA 체크 방법
먼저 MHA 설치 및 환경설정까지는 완료가 되었습니다. 이제 MHA 설정에 이상이 없는지 체크를 진행합니다.
▶ MHA 주요 명령어 확인
Script명 | 설명 |
masterha_check_ssh | SSH 접속 체크 |
masterha_manager | Manager 실행(모니터링 시작) - 장애 발생시 failover 수행됨 |
masterha_stop | Manager 중지 |
masterha_master_switch | TakeOver(relocate) 수행 |
masterha_check_repl | 복제 현황(Master/Slave 노드 정보등) |
masterha_check_status | Status 확인하기 |
위의 스크립트 수행은 mha 유저로 수행합니다. 또한 실행 역시 MHA Manager서버에서 수행합니다.
▶ 서버간 SSH 상태 확인
masterha_check_ssh 스크립트를 이용해서 서버 ssh상태를 확인합니다. 매니저 서버에서 수행합니다.
[root@mhamgr bin]# su - mha
[mha@mhamgr ~]$ masterha_check_ssh --conf=/mysqlha/conf/app1.cnf
[mha@mhamgr conf]$ masterha_check_ssh --conf=/mysqlha/conf/app1.cnf
Sun Sep 18 11:13:07 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 18 11:13:07 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 11:13:07 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 11:13:07 2022 - [info] Starting SSH connection tests..
Sun Sep 18 11:13:09 2022 - [debug]
Sun Sep 18 11:13:07 2022 - [debug] Connecting via SSH from mha@db1(192.168.0.101:22) to mha@db2(192.168.0.102:22)..
Sun Sep 18 11:13:08 2022 - [debug] ok.
Sun Sep 18 11:13:08 2022 - [debug] Connecting via SSH from mha@db1(192.168.0.101:22) to mha@db3(192.168.0.103:22)..
Sun Sep 18 11:13:09 2022 - [debug] ok.
Sun Sep 18 11:13:10 2022 - [debug]
Sun Sep 18 11:13:08 2022 - [debug] Connecting via SSH from mha@db3(192.168.0.103:22) to mha@db1(192.168.0.101:22)..
Sun Sep 18 11:13:09 2022 - [debug] ok.
Sun Sep 18 11:13:09 2022 - [debug] Connecting via SSH from mha@db3(192.168.0.103:22) to mha@db2(192.168.0.102:22)..
Sun Sep 18 11:13:10 2022 - [debug] ok.
Sun Sep 18 11:13:10 2022 - [debug]
Sun Sep 18 11:13:08 2022 - [debug] Connecting via SSH from mha@db2(192.168.0.102:22) to mha@db1(192.168.0.101:22)..
Sun Sep 18 11:13:08 2022 - [debug] ok.
Sun Sep 18 11:13:08 2022 - [debug] Connecting via SSH from mha@db2(192.168.0.102:22) to mha@db3(192.168.0.103:22)..
Sun Sep 18 11:13:09 2022 - [debug] ok.
Sun Sep 18 11:13:10 2022 - [info] All SSH connection tests passed successfully.
[mha@mhamgr conf]$
▶ 서버간 복제 상태 확인
masterha_check_repl 스크립트를 이용해서 복제 상태를 확인합니다. 매니저 서버에서 수행합니다.
[root@mhamgr bin]# su - mha
[mha@mhamgr ~]$ masterha_check_repl --conf=/mysqlha/conf/app1.cnf
Sun Sep 18 11:06:55 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 18 11:06:55 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 11:06:55 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 11:06:55 2022 - [info] MHA::MasterMonitor version 0.58.
Sun Sep 18 11:06:57 2022 - [info] GTID failover mode = 0
Sun Sep 18 11:06:57 2022 - [info] Dead Servers:
Sun Sep 18 11:06:57 2022 - [info] Alive Servers:
Sun Sep 18 11:06:57 2022 - [info] db1(192.168.0.101:3306)
Sun Sep 18 11:06:57 2022 - [info] db2(192.168.0.102:3306)
Sun Sep 18 11:06:57 2022 - [info] db3(192.168.0.103:3306)
Sun Sep 18 11:06:57 2022 - [info] Alive Slaves:
Sun Sep 18 11:06:57 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 11:06:57 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 11:06:57 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 11:06:57 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 11:06:57 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 11:06:57 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 11:06:57 2022 - [info] Current Alive Master: db1(192.168.0.101:3306)
Sun Sep 18 11:06:57 2022 - [info] Checking slave configurations..
Sun Sep 18 11:06:57 2022 - [warning] relay_log_purge=0 is not set on slave db2(192.168.0.102:3306).
Sun Sep 18 11:06:57 2022 - [warning] relay_log_purge=0 is not set on slave db3(192.168.0.103:3306).
Sun Sep 18 11:06:57 2022 - [info] Checking replication filtering settings..
Sun Sep 18 11:06:57 2022 - [info] binlog_do_db= , binlog_ignore_db=
Sun Sep 18 11:06:57 2022 - [info] Replication filtering check ok.
Sun Sep 18 11:06:57 2022 - [info] GTID (with auto-pos) is not supported
Sun Sep 18 11:06:57 2022 - [info] Starting SSH connection tests..
Sun Sep 18 11:06:59 2022 - [info] All SSH connection tests passed successfully.
Sun Sep 18 11:06:59 2022 - [info] Checking MHA Node version..
Sun Sep 18 11:07:00 2022 - [info] Version check ok.
Sun Sep 18 11:07:00 2022 - [info] Checking SSH publickey authentication settings on the current master..
Sun Sep 18 11:07:00 2022 - [info] HealthCheck: SSH to db1 is reachable.
Sun Sep 18 11:07:01 2022 - [info] Master MHA Node version is 0.58.
Sun Sep 18 11:07:01 2022 - [info] Checking recovery script configurations on db1(192.168.0.101:3306)..
Sun Sep 18 11:07:01 2022 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/logs --output_file=/mysqlha/app1/save_binary_logs_test --manager_version=0.58 --start_file=binary_log.000010
Sun Sep 18 11:07:01 2022 - [info] Connecting to mha@192.168.0.101(db1:22)..
Creating /mysqlha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /usr/local/mysql/logs, up to binary_log.000010
Sun Sep 18 11:07:01 2022 - [info] Binlog setting check done.
Sun Sep 18 11:07:01 2022 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Sep 18 11:07:01 2022 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dbmha' --slave_host=db2 --slave_ip=192.168.0.102 --slave_port=3306 --workdir=/mysqlha/app1 --target_version=5.7.36-log --manager_version=0.58 --relay_dir=/usr/local/mysql/logs --current_relay_log=relay_log.000014 --slave_pass=xxx
Sun Sep 18 11:07:01 2022 - [info] Connecting to mha@192.168.0.102(db2:22)..
Checking slave recovery environment settings..
Relay log found at /usr/local/mysql/logs, up to relay_log.000015
Temporary relay log file is /usr/local/mysql/logs/relay_log.000015
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 18 11:07:02 2022 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dbmha' --slave_host=db3 --slave_ip=192.168.0.103 --slave_port=3306 --workdir=/mysqlha/app1 --target_version=5.7.36-log --manager_version=0.58 --relay_dir=/usr/local/mysql/logs --current_relay_log=relay_log.000013 --slave_pass=xxx
Sun Sep 18 11:07:02 2022 - [info] Connecting to mha@192.168.0.103(db3:22)..
Checking slave recovery environment settings..
Relay log found at /usr/local/mysql/logs, up to relay_log.000014
Temporary relay log file is /usr/local/mysql/logs/relay_log.000014
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 18 11:07:03 2022 - [info] Slaves settings check done.
Sun Sep 18 11:07:03 2022 - [info]
db1(192.168.0.101:3306) (current master)
+--db2(192.168.0.102:3306)
+--db3(192.168.0.103:3306)
Sun Sep 18 11:07:03 2022 - [info] Checking replication health on db2..
Sun Sep 18 11:07:03 2022 - [info] ok.
Sun Sep 18 11:07:03 2022 - [info] Checking replication health on db3..
Sun Sep 18 11:07:03 2022 - [info] ok.
Sun Sep 18 11:07:03 2022 - [info] Checking master_ip_failover_script status:
Sun Sep 18 11:07:03 2022 - [info] /mysqlha/scripts/master_ip_failover --command=status --ssh_user=mha --orig_master_host=db1 --orig_master_ip=192.168.0.101 --orig_master_port=3306
Sun Sep 18 11:07:03 2022 - [info] OK.
Sun Sep 18 11:07:03 2022 - [warning] shutdown_script is not defined.
Sun Sep 18 11:07:03 2022 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
▶ 서버간 매니저 프로세스 체크
masterha_check_status 스크립트를 이용해서 프로세스 상태를 확인합니다. 매니저 서버에서 수행합니다.
[root@mhamgr ]# su - mha
• 구동되어 있을경우
[mha@mhamgr conf]$ masterha_check_status --conf=/mysqlha/conf/app1.cnf
app1 (pid:14926) is running(0:PING_OK), master:db1
[mha@mhamgr conf]$
• 구동되어 있지 않을경우
[mha@mhamgr conf]$ masterha_check_status --conf=/mysqlha/conf/app1.cnf
app1 is stopped(2:NOT_RUNNING).
[mha@mhamgr conf]$
■ MHA 구동 및 MHA 모니터링
▶ 스크립트 생성 및 구동
mha를 구동할 수 있는 스크립트를 만듭니다.
[mha@mhamgr scripts]$ cd /mysqlha/scripts/
[mha@mhamgr scripts]$ vi mha_start.sh
nohup masterha_manager \
--conf=/etc/masterha/app1.cnf \
--last_failover_minute=1 &
[mha@mhamgr scripts]$ chmod 755 mha_start.sh
[mha@mhamgr scripts]$ ./mha_start.sh
MHA가 마스터 DB 모니터링 중에 장애가 감지되어 failover 가 발생 된 이후 일정 시간내에 장애에 대해서는 failover 가 진행 되지 않습니다.
계속된 문제로 failover 가 Ping pong 형태로 반복되는 것을 고려하여 장애가 발생하여 failover 가 한번 처리 된 후 일정 시간안에는 failover가 되지 않으며 기본 값은 8시간 입니다.
이와 관련된 수행 옵션이 있으며 last_failover_minute 옵션 입니다.
포스팅에서는 반복적으로 테스트 를 위하여 1분으로 지정하고 수행하였습니다.
실제 사용시에도 장애 이후 다음 failover 시간의 gap이 8시간이 너무 크다고 생각된다면 last_failover_minute 를 통해 조절하시면 됩니다
▶ 수행후 확인
[mha@mhamgr scripts]$ ./mha_start.sh
[mha@mhamgr scripts]$ nohup: appending output to ‘nohup.out’
[mha@mhamgr scripts]$ ps -ef | grep -i mha
root 2133 1059 0 10:36 pts/0 00:00:00 su - mha
mha 2134 2133 0 10:36 pts/0 00:00:00 -bash
mha 2626 1 1 11:22 pts/0 00:00:00 perl /usr/local/bin/masterha_manager --conf=/mysqlha/conf/app1.cnf --last_failover_minute=1
mha 2671 2134 0 11:22 pts/0 00:00:00 ps -ef
mha 2672 2134 0 11:22 pts/0 00:00:00 grep --color=auto -i mha
▶ 로그 확인
[mha@mhamgr app1]$ cd /mysqlha/app1
[mha@mhamgr app1]$ ls -l
total 12
drwxr-xr-x. 2 mha mha 55 Sep 18 11:22 .
drwxr-xr-x. 5 mha mha 45 Sep 8 10:03 ..
-rw-rw-r--. 1 mha mha 5865 Sep 18 11:22 app1.log
-rw-rw-r--. 1 mha mha 25 Sep 18 11:23 app1.master_status.health
[mha@mhamgr app1]$ tail -100f app1.log
[mha@mhamgr app1]$ tail -100f app1.log
Sun Sep 18 11:22:13 2022 - [info] MHA::MasterMonitor version 0.58.
Sun Sep 18 11:22:14 2022 - [info] GTID failover mode = 0
Sun Sep 18 11:22:14 2022 - [info] Dead Servers:
Sun Sep 18 11:22:14 2022 - [info] Alive Servers:
Sun Sep 18 11:22:14 2022 - [info] db1(192.168.0.101:3306)
Sun Sep 18 11:22:14 2022 - [info] db2(192.168.0.102:3306)
Sun Sep 18 11:22:14 2022 - [info] db3(192.168.0.103:3306)
Sun Sep 18 11:22:14 2022 - [info] Alive Slaves:
Sun Sep 18 11:22:14 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 11:22:14 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 11:22:14 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 11:22:14 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 11:22:14 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 11:22:14 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 11:22:14 2022 - [info] Current Alive Master: db1(192.168.0.101:3306)
Sun Sep 18 11:22:14 2022 - [info] Checking slave configurations..
Sun Sep 18 11:22:14 2022 - [info] Checking replication filtering settings..
Sun Sep 18 11:22:14 2022 - [info] binlog_do_db= , binlog_ignore_db=
Sun Sep 18 11:22:14 2022 - [info] Replication filtering check ok.
Sun Sep 18 11:22:14 2022 - [info] GTID (with auto-pos) is not supported
Sun Sep 18 11:22:14 2022 - [info] Starting SSH connection tests..
Sun Sep 18 11:22:17 2022 - [info] All SSH connection tests passed successfully.
Sun Sep 18 11:22:17 2022 - [info] Checking MHA Node version..
Sun Sep 18 11:22:18 2022 - [info] Version check ok.
Sun Sep 18 11:22:18 2022 - [info] Checking SSH publickey authentication settings on the current master..
Sun Sep 18 11:22:18 2022 - [info] HealthCheck: SSH to db1 is reachable.
Sun Sep 18 11:22:18 2022 - [info] Master MHA Node version is 0.58.
Sun Sep 18 11:22:18 2022 - [info] Checking recovery script configurations on db1(192.168.0.101:3306)..
Sun Sep 18 11:22:18 2022 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/logs --output_file=/mysqlha/app1/save_binary_logs_test --manager_version=0.58 --start_file=binary_log.000010
Sun Sep 18 11:22:18 2022 - [info] Connecting to mha@192.168.0.101(db1:22)..
Creating /mysqlha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /usr/local/mysql/logs, up to binary_log.000010
Sun Sep 18 11:22:19 2022 - [info] Binlog setting check done.
Sun Sep 18 11:22:19 2022 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Sep 18 11:22:19 2022 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dbmha' --slave_host=db2 --slave_ip=192.168.0.102 --slave_port=3306 --workdir=/mysqlha/app1 --target_version=5.7.36-log --manager_version=0.58 --relay_dir=/usr/local/mysql/logs --current_relay_log=relay_log.000014 --slave_pass=xxx
Sun Sep 18 11:22:19 2022 - [info] Connecting to mha@192.168.0.102(db2:22)..
Checking slave recovery environment settings..
Relay log found at /usr/local/mysql/logs, up to relay_log.000015
Temporary relay log file is /usr/local/mysql/logs/relay_log.000015
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 18 11:22:19 2022 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='dbmha' --slave_host=db3 --slave_ip=192.168.0.103 --slave_port=3306 --workdir=/mysqlha/app1 --target_version=5.7.36-log --manager_version=0.58 --relay_dir=/usr/local/mysql/logs --current_relay_log=relay_log.000013 --slave_pass=xxx
Sun Sep 18 11:22:19 2022 - [info] Connecting to mha@192.168.0.103(db3:22)..
Checking slave recovery environment settings..
Relay log found at /usr/local/mysql/logs, up to relay_log.000014
Temporary relay log file is /usr/local/mysql/logs/relay_log.000014
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Sep 18 11:22:20 2022 - [info] Slaves settings check done.
Sun Sep 18 11:22:20 2022 - [info]
db1(192.168.0.101:3306) (current master)
+--db2(192.168.0.102:3306)
+--db3(192.168.0.103:3306)
Sun Sep 18 11:22:20 2022 - [info] Checking master_ip_failover_script status:
Sun Sep 18 11:22:20 2022 - [info] /mysqlha/scripts/master_ip_failover --command=status --ssh_user=mha --orig_master_host=db1 --orig_master_ip=192.168.0.101 --orig_master_port=3306
Sun Sep 18 11:22:20 2022 - [info] OK.
Sun Sep 18 11:22:20 2022 - [warning] shutdown_script is not defined.
Sun Sep 18 11:22:20 2022 - [info] Set master ping interval 3 seconds.
Sun Sep 18 11:22:20 2022 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s db1 -s db2 -s db3 --user=mha --master_host=db1 --master_ip=db1 --master_port=3306
Sun Sep 18 11:22:20 2022 - [info] Starting ping health check on db1(192.168.0.101:3306)..
Sun Sep 18 11:22:20 2022 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
▶ 가상 IP(VIP)기동
마스터 서버에서 VIP를 기동시킵니다.
[root@db1 ~]# su - mha
[mha@db1 ~]$ sudo ifconfig ens33:0 192.168.0.105 netmask 255.255.255.0 up
가상IP가 할당된것을 확인해봅니다.
[mha@db1 ~]$ ifconfig -a
......
ens33:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.105 netmask 255.255.255.0 broadcast 192.168.0.255
ether 00:50:56:22:e0:54 txqueuelen 1000 (Ethernet)
......
■ 장애 발생
장애를 발생시켜 Failover 순서를 알아봅니다.
▶ 마스터 MySQL DB 중지
• 로그 tail
실시간 로그를 확인합니다.
[mha@mhamgr ]$ cd /mysqlha/app1
[mha@mhamgr app1]$ tail -f app1.log
• MySQL DB 중지
마스터 서버에서 아래 2가지중 자신에게 맞는 방법으로 DB를 중지시킵니다.
1. [root@db1 ~]# /etc/init.d/mysqld stop
2. [root@db1 ~]# systemctl stop mysqld
3. Log 확인
Tue Sep 20 09:52:37 2022 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Tue Sep 20 09:52:37 2022 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s db1 -s db2 -s db3 --user=mha --master_host=db1 --master_ip=db1 --master_port=3306 --user=mha --master_host=db1 --master_ip=192.168.0.101 --master_port=3306 --master_user=dbmha --master_password=admin1234 --ping_type=SELECT
Tue Sep 20 09:52:37 2022 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/logs --output_file=/mysqlha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=binary_log
Monitoring server db1 is reachable, Master is not reachable from db1. OK.
Tue Sep 20 09:52:38 2022 - [info] HealthCheck: SSH to db1 is reachable.
Monitoring server db2 is reachable, Master is not reachable from db2. OK.
Monitoring server db3 is reachable, Master is not reachable from db3. OK.
Tue Sep 20 09:52:39 2022 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Sep 20 09:52:40 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.101' (111))
Tue Sep 20 09:52:40 2022 - [warning] Connection failed 2 time(s)..
Tue Sep 20 09:52:43 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.101' (111))
Tue Sep 20 09:52:43 2022 - [warning] Connection failed 3 time(s)..
Tue Sep 20 09:52:46 2022 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.0.101' (111))
Tue Sep 20 09:52:46 2022 - [warning] Connection failed 4 time(s)..
Tue Sep 20 09:52:46 2022 - [warning] Master is not reachable from health checker!
Tue Sep 20 09:52:46 2022 - [warning] Master db1(192.168.0.101:3306) is not reachable!
Tue Sep 20 09:52:46 2022 - [warning] SSH is reachable.
Tue Sep 20 09:52:46 2022 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /mysqlha/conf/app1.cnf again, and trying to connect to all servers to check server status..
Tue Sep 20 09:52:46 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Sep 20 09:52:46 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Tue Sep 20 09:52:46 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Tue Sep 20 09:52:47 2022 - [info] GTID failover mode = 0
Tue Sep 20 09:52:47 2022 - [info] Dead Servers:
Tue Sep 20 09:52:47 2022 - [info] db1(192.168.0.101:3306)
Tue Sep 20 09:52:47 2022 - [info] Alive Servers:
Tue Sep 20 09:52:47 2022 - [info] db2(192.168.0.102:3306)
Tue Sep 20 09:52:47 2022 - [info] db3(192.168.0.103:3306)
Tue Sep 20 09:52:47 2022 - [info] Alive Slaves:
Tue Sep 20 09:52:47 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:47 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:47 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:47 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:47 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:47 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:47 2022 - [info] Checking slave configurations..
Tue Sep 20 09:52:47 2022 - [info] Checking replication filtering settings..
Tue Sep 20 09:52:47 2022 - [info] Replication filtering check ok.
Tue Sep 20 09:52:47 2022 - [info] Master is down!
Tue Sep 20 09:52:47 2022 - [info] Terminating monitoring script.
Tue Sep 20 09:52:47 2022 - [info] Got exit code 20 (Master dead).
Tue Sep 20 09:52:47 2022 - [info] MHA::MasterFailover version 0.58.
Tue Sep 20 09:52:47 2022 - [info] Starting master failover.
Tue Sep 20 09:52:47 2022 - [info]
Tue Sep 20 09:52:47 2022 - [info] * Phase 1: Configuration Check Phase..
Tue Sep 20 09:52:47 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] GTID failover mode = 0
Tue Sep 20 09:52:49 2022 - [info] Dead Servers:
Tue Sep 20 09:52:49 2022 - [info] db1(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Checking master reachability via MySQL(double check)...
Tue Sep 20 09:52:49 2022 - [info] ok.
Tue Sep 20 09:52:49 2022 - [info] Alive Servers:
Tue Sep 20 09:52:49 2022 - [info] db2(192.168.0.102:3306)
Tue Sep 20 09:52:49 2022 - [info] db3(192.168.0.103:3306)
Tue Sep 20 09:52:49 2022 - [info] Alive Slaves:
Tue Sep 20 09:52:49 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info] Starting Non-GTID based failover.
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Sep 20 09:52:49 2022 - [info] Executing master IP deactivation script:
Tue Sep 20 09:52:49 2022 - [info] /mysqlha/scripts/master_ip_failover --orig_master_host=db1 --orig_master_ip=192.168.0.101 --orig_master_port=3306 --command=stopssh --ssh_user=mha
Tue Sep 20 09:52:49 2022 - [info] done.
Tue Sep 20 09:52:49 2022 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Sep 20 09:52:49 2022 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] * Phase 3: Master Recovery Phase..
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] The latest binary log file/position on all slaves is binary_log.000002:2376029
Tue Sep 20 09:52:49 2022 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Sep 20 09:52:49 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info] The oldest binary log file/position on all slaves is binary_log.000002:2376029
Tue Sep 20 09:52:49 2022 - [info] Oldest slaves:
Tue Sep 20 09:52:49 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:49 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:49 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Sep 20 09:52:49 2022 - [info]
Tue Sep 20 09:52:49 2022 - [info] Fetching dead master's binary logs..
Tue Sep 20 09:52:49 2022 - [info] Executing command on the dead master db1(192.168.0.101:3306): save_binary_logs --command=save --start_file=binary_log.000002 --start_pos=2376029 --binlog_dir=/usr/local/mysql/logs --output_file=/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /mysqlha/app1 if not exists.. ok.
Concat binary/relay logs from binary_log.000002 pos 2376029 to binary_log.000002 EOF into /mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
Dumping effective binlog data from /usr/local/mysql/logs/binary_log.000002 position 2376029 to tail(2376052).. ok.
Binlog Checksum enabled
Concat succeeded.
Tue Sep 20 09:52:50 2022 - [info] scp from mha@192.168.0.101:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog to local:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog succeeded.
Tue Sep 20 09:52:50 2022 - [info] HealthCheck: SSH to db2 is reachable.
Tue Sep 20 09:52:51 2022 - [info] HealthCheck: SSH to db3 is reachable.
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] * Phase 3.3: Determining New Master Phase..
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Sep 20 09:52:52 2022 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Sep 20 09:52:52 2022 - [info] Searching new master from slaves..
Tue Sep 20 09:52:52 2022 - [info] Candidate masters from the configuration file:
Tue Sep 20 09:52:52 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:52 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:52 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:52 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Tue Sep 20 09:52:52 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Tue Sep 20 09:52:52 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Sep 20 09:52:52 2022 - [info] Non-candidate masters:
Tue Sep 20 09:52:52 2022 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Sep 20 09:52:52 2022 - [info] New master is db2(192.168.0.102:3306)
Tue Sep 20 09:52:52 2022 - [info] Starting master failover..
Tue Sep 20 09:52:52 2022 - [info]
From:
db1(192.168.0.101:3306) (current master)
+--db2(192.168.0.102:3306)
+--db3(192.168.0.103:3306)
To:
db2(192.168.0.102:3306) (new master)
+--db3(192.168.0.103:3306)
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Sep 20 09:52:52 2022 - [info] Sending binlog..
Tue Sep 20 09:52:52 2022 - [info] scp from local:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog to mha@db2:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog succeeded.
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] * Phase 3.5: Master Log Apply Phase..
Tue Sep 20 09:52:52 2022 - [info]
Tue Sep 20 09:52:52 2022 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Sep 20 09:52:52 2022 - [info] Starting recovery on db2(192.168.0.102:3306)..
Tue Sep 20 09:52:52 2022 - [info] Generating diffs succeeded.
Tue Sep 20 09:52:52 2022 - [info] Waiting until all relay logs are applied.
Tue Sep 20 09:52:52 2022 - [info] done.
Tue Sep 20 09:52:52 2022 - [info] Getting slave status..
Tue Sep 20 09:52:52 2022 - [info] This slave(db2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(binary_log.000002:2376029). No need to recover from Exec_Master_Log_Pos.
Tue Sep 20 09:52:52 2022 - [info] Connecting to the target slave host db2, running recover script..
Tue Sep 20 09:52:52 2022 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dbmha' --slave_host=db2 --slave_ip=192.168.0.102 --slave_port=3306 --apply_files=/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog --workdir=/mysqlha/app1 --target_version=5.7.36-log --timestamp=20220920095247 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Tue Sep 20 09:52:53 2022 - [info]
MySQL client version is 5.7.36. Using --binary-mode.
Applying differential binary/relay log files /mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog on db2:3306. This may take long time...
Applying log files succeeded.
Tue Sep 20 09:52:53 2022 - [info] All relay logs were successfully applied.
Tue Sep 20 09:52:53 2022 - [info] Getting new master's binlog name and position..
Tue Sep 20 09:52:53 2022 - [info] binary_log.000003:2372711
Tue Sep 20 09:52:53 2022 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='db2 or 192.168.0.102', MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000003', MASTER_LOG_POS=2372711, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Sep 20 09:52:53 2022 - [info] Executing master IP activate script:
Tue Sep 20 09:52:53 2022 - [info] /mysqlha/scripts/master_ip_failover --command=start --ssh_user=mha --orig_master_host=db1 --orig_master_ip=192.168.0.101 --orig_master_port=3306 --new_master_host=db2 --new_master_ip=192.168.0.102 --new_master_port=3306 --new_master_user='dbmha' --new_master_password=xxx
Set read_only=0 on the new master.
VIP IS Alive, VIP Relocate db2
bind: Cannot assign requested address
Tue Sep 20 09:52:54 2022 - [info] OK.
Tue Sep 20 09:52:54 2022 - [info] ** Finished master recovery successfully.
Tue Sep 20 09:52:54 2022 - [info] * Phase 3: Master Recovery Phase completed.
Tue Sep 20 09:52:54 2022 - [info]
Tue Sep 20 09:52:54 2022 - [info] * Phase 4: Slaves Recovery Phase..
Tue Sep 20 09:52:54 2022 - [info]
Tue Sep 20 09:52:54 2022 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Sep 20 09:52:54 2022 - [info]
Tue Sep 20 09:52:54 2022 - [info] -- Slave diff file generation on host db3(192.168.0.103:3306) started, pid: 2730. Check tmp log /mysqlha/app1/db3_3306_20220920095247.log if it takes time..
Tue Sep 20 09:52:55 2022 - [info]
Tue Sep 20 09:52:55 2022 - [info] Log messages from db3 ...
Tue Sep 20 09:52:55 2022 - [info]
Tue Sep 20 09:52:54 2022 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Sep 20 09:52:55 2022 - [info] End of log messages from db3.
Tue Sep 20 09:52:55 2022 - [info] -- db3(192.168.0.103:3306) has the latest relay log events.
Tue Sep 20 09:52:55 2022 - [info] Generating relay diff files from the latest slave succeeded.
Tue Sep 20 09:52:55 2022 - [info]
Tue Sep 20 09:52:55 2022 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Sep 20 09:52:55 2022 - [info]
Tue Sep 20 09:52:55 2022 - [info] -- Slave recovery on host db3(192.168.0.103:3306) started, pid: 2732. Check tmp log /mysqlha/app1/db3_3306_20220920095247.log if it takes time..
Tue Sep 20 09:52:57 2022 - [info]
Tue Sep 20 09:52:57 2022 - [info] Log messages from db3 ...
Tue Sep 20 09:52:57 2022 - [info]
Tue Sep 20 09:52:55 2022 - [info] Sending binlog..
Tue Sep 20 09:52:55 2022 - [info] scp from local:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog to mha@db3:/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog succeeded.
Tue Sep 20 09:52:55 2022 - [info] Starting recovery on db3(192.168.0.103:3306)..
Tue Sep 20 09:52:55 2022 - [info] Generating diffs succeeded.
Tue Sep 20 09:52:55 2022 - [info] Waiting until all relay logs are applied.
Tue Sep 20 09:52:55 2022 - [info] done.
Tue Sep 20 09:52:55 2022 - [info] Getting slave status..
Tue Sep 20 09:52:55 2022 - [info] This slave(db3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(binary_log.000002:2376029). No need to recover from Exec_Master_Log_Pos.
Tue Sep 20 09:52:55 2022 - [info] Connecting to the target slave host db3, running recover script..
Tue Sep 20 09:52:55 2022 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dbmha' --slave_host=db3 --slave_ip=192.168.0.103 --slave_port=3306 --apply_files=/mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog --workdir=/mysqlha/app1 --target_version=5.7.36-log --timestamp=20220920095247 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58 --slave_pass=xxx
Tue Sep 20 09:52:56 2022 - [info]
MySQL client version is 5.7.36. Using --binary-mode.
Applying differential binary/relay log files /mysqlha/app1/saved_master_binlog_from_db1_3306_20220920095247.binlog on db3:3306. This may take long time...
Applying log files succeeded.
Tue Sep 20 09:52:56 2022 - [info] All relay logs were successfully applied.
Tue Sep 20 09:52:56 2022 - [info] Resetting slave db3(192.168.0.103:3306) and starting replication from the new master db2(192.168.0.102:3306)..
Tue Sep 20 09:52:56 2022 - [info] Executed CHANGE MASTER.
Tue Sep 20 09:52:56 2022 - [info] Slave started.
Tue Sep 20 09:52:57 2022 - [info] End of log messages from db3.
Tue Sep 20 09:52:57 2022 - [info] -- Slave recovery on host db3(192.168.0.103:3306) succeeded.
Tue Sep 20 09:52:57 2022 - [info] All new slave servers recovered successfully.
Tue Sep 20 09:52:57 2022 - [info]
Tue Sep 20 09:52:57 2022 - [info] * Phase 5: New master cleanup phase..
Tue Sep 20 09:52:57 2022 - [info]
Tue Sep 20 09:52:57 2022 - [info] Resetting slave info on the new master..
Tue Sep 20 09:52:57 2022 - [info] db2: Resetting slave info succeeded.
Tue Sep 20 09:52:57 2022 - [info] Master failover to db2(192.168.0.102:3306) completed successfully.
Tue Sep 20 09:52:57 2022 - [info]
----- Failover Report -----
app1: MySQL Master failover db1(192.168.0.101:3306) to db2(192.168.0.102:3306) succeeded
Master db1(192.168.0.101:3306) is down!
Check MHA Manager logs at mhamgr:/mysqlha/app1/app1.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on db1(192.168.0.101:3306)
The latest slave db2(192.168.0.102:3306) has all relay logs for recovery.
Selected db2(192.168.0.102:3306) as a new master.
db2(192.168.0.102:3306): OK: Applying all logs succeeded.
db2(192.168.0.102:3306): OK: Activated master IP address.
db3(192.168.0.103:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
db3(192.168.0.103:3306): OK: Applying all logs succeeded. Slave started, replicating from db2(192.168.0.102:3306)
db2(192.168.0.102:3306): Resetting slave info succeeded.
Master failover to db2(192.168.0.102:3306) completed successfully.
맨 아래 즈음에 있는 Failover Report 로그를 확인해보면...
Master db1(192.168.0.101:3306) is down!
: 마스터서버가 다운되었습니다.
Selected db2(192.168.0.102:3306) as a new master
: db2가 새로운 마스터가 되었습니다.
db3(192.168.0.103:3306): OK: Applying all logs succeeded. Slave started, replicating from db2(192.168.0.102:3306)
db2(192.168.0.102:3306): Resetting slave info succeeded
: db3가 신규 마스터인 db2로 복제가 걸렸으며 성공정으로 이동되었습니다.
그리고 mha서버를 확인해보면 모니터링 프로세스가 죽어있는것을 확인해볼 수 있습니다. 위의 로그중에 다음곽 같은 내용을 확인해볼 수 있습니다.
Tue Sep 20 09:52:47 2022 - [info] Terminating monitoring script.
[mha@mhamgr app1]$ ps -ef | grep -i mha
root 2410 2393 0 09:36 pts/0 00:00:00 su - mha
mha 2411 2410 0 09:36 pts/0 00:00:00 -bash
mha 2744 2411 0 09:58 pts/0 00:00:00 ps -ef
mha 2745 2411 0 09:58 pts/0 00:00:00 grep --color=auto -i mha
죽은 Master DB인 db1을 신규 마스터인 db2에 슬레이브 ROLE로 복제를 수행합니다.
▶ 신규 마스터로 복제 구성
종료된 db1 MySQL을 구동합니다.
아래 명령어중 알맞는 명령어로 구동을 시킵니다.
1. [root@db1 mysql]# /etc/init.d/mysqld start
2. [root@db1 mysql]# systemctl start mysqld
• 마스터 로그 포지션 로그를 확인합니다.
위에 Failover 로그 내용중 다음 내용으로 검색을 합니다.
All other slaves should start replication from here
그럼 아래와 같은 내용이 나옵니다.
Tue Sep 20 09:52:53 2022 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='db2 or 192.168.0.102', MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000003', MASTER_LOG_POS=2372711, MASTER_USER='repl', MASTER_PASSWORD='xxx';
이 정보를 이용해서 db1에서 change master를 실행합니다.
mysql> CHANGE MASTER TO MASTER_HOST='192.168.0.102', MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000003', MASTER_LOG_POS=2372711, MASTER_USER='repl', MASTER_PASSWORD='admin1234';
mysql> show slave statust\G;
......
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
위 두 항목이 YES인지 확인합니다.
복제 확인이 완료되었다면 기존 로그들을 삭제 후 mha 매니저를 다시 띄워 모니터링을 진행합니다.
[mha@mhamgr ~]$ cd /mysqlha/app1
[mha@mhamgr app1]$ rm -rf *
[mha@mhamgr app1]$ cd /mysqlha/app1
[mha@mhamgr app1]$ cd /mysqlha/scripts
[mha@mhamgr scripts]$ ./mha_start.sh
가상 IP를 올립니다.
[mha@db2 ~]$ sudo ifconfig ens33:0 192.168.0.105 netmask 255.255.255.0 up
▶ 사용자 정의 Takeover
서버에 작업이 필요하여 MySQL의 Role 역활을 변경해야 할 일이 있습니다. 이때 임의적으로 Takeover를 시킬 수 있습니다.
중요한점은 takeover시 매니저 서버의 모니터링을 중지해야 합니다.
• 모니터링 매니저 종료
[mha@mhamgr app1]$ masterha_stop --conf=/mysqlha/conf/app1.cnf
Stopped app1 successfully.
[mha@mhamgr scripts]$ ps -ef | grep -i mha
root 1489 1388 0 09:19 pts/0 00:00:00 su - mha
mha 1490 1489 0 09:19 pts/0 00:00:00 -bash
mha 1868 1490 0 10:03 pts/0 00:00:00 ps -ef
mha 1869 1490 0 10:03 pts/0 00:00:00 grep --color=auto -i mha
• 매니저 프로세스 확인
[mha@mhamgr scripts]$ masterha_check_status --conf=/mysqlha/conf/app1.cnf
app1 is stopped(2:NOT_RUNNING).
• Takeover 수행
현재 마스터가 db2인데 이것을 원래 마스터인 db1으로 Takeover
[mha@mhamgr scripts]$ masterha_master_switch --master_state=alive \
--conf=/mysqlha/conf/app1.cnf --new_master_host=db1 \
--interactive=0
Fri Sep 30 11:07:55 2022 - [info] MHA::MasterRotate version 0.58.
Fri Sep 30 11:07:55 2022 - [info] Starting online master switch..
Fri Sep 30 11:07:55 2022 - [info]
Fri Sep 30 11:07:55 2022 - [info] * Phase 1: Configuration Check Phase..
Fri Sep 30 11:07:55 2022 - [info]
Fri Sep 30 11:07:55 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Sep 30 11:07:55 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Fri Sep 30 11:07:55 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Fri Sep 30 11:07:56 2022 - [info] GTID failover mode = 0
Fri Sep 30 11:07:56 2022 - [info] Current Alive Master: db2(192.168.0.102:3306)
Fri Sep 30 11:07:56 2022 - [info] Alive Slaves:
Fri Sep 30 11:07:56 2022 - [info] db1(192.168.0.101:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Fri Sep 30 11:07:56 2022 - [info] Replicating from 192.168.0.102(192.168.0.102:3306)
Fri Sep 30 11:07:56 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 30 11:07:56 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Fri Sep 30 11:07:56 2022 - [info] Replicating from 192.168.0.102(192.168.0.102:3306)
Fri Sep 30 11:07:56 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Sep 30 11:07:56 2022 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Fri Sep 30 11:07:56 2022 - [info] ok.
Fri Sep 30 11:07:56 2022 - [info] Checking MHA is not monitoring or doing failover..
Fri Sep 30 11:07:56 2022 - [info] Checking replication health on db1..
Fri Sep 30 11:07:56 2022 - [info] ok.
Fri Sep 30 11:07:56 2022 - [info] Checking replication health on db3..
Fri Sep 30 11:07:56 2022 - [info] ok.
Fri Sep 30 11:07:56 2022 - [info] db1 can be new master.
Fri Sep 30 11:07:56 2022 - [info]
From:
db2(192.168.0.102:3306) (current master)
+--db1(192.168.0.101:3306)
+--db3(192.168.0.103:3306)
To:
db1(192.168.0.101:3306) (new master)
+--db3(192.168.0.103:3306)
Fri Sep 30 11:07:56 2022 - [info] Checking whether db1(192.168.0.101:3306) is ok for the new master..
Fri Sep 30 11:07:56 2022 - [info] ok.
Fri Sep 30 11:07:56 2022 - [info] ** Phase 1: Configuration Check Phase completed.
Fri Sep 30 11:07:56 2022 - [info]
Fri Sep 30 11:07:56 2022 - [info] * Phase 2: Rejecting updates Phase..
Fri Sep 30 11:07:56 2022 - [info]
Fri Sep 30 11:07:56 2022 - [info] Executing master ip online change script to disable write on the current master:
Fri Sep 30 11:07:56 2022 - [info] /mysqlha/scripts/master_ip_online_change --command=stop --orig_master_host=db2 --orig_master_ip=192.168.0.102 --orig_master_port=3306 --orig_master_user='dbmha' --new_master_host=db1 --new_master_ip=192.168.0.101 --new_master_port=3306 --new_master_user='dbmha' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_password=xxx --new_master_password=xxx
Fri Sep 30 11:07:56 2022 454436 Set read_only on the new master.. ok.
Fri Sep 30 11:07:56 2022 459273 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
{'Time' => '6631','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Fri Sep 30 11:07:56 2022 963267 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
{'Time' => '6631','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Fri Sep 30 11:07:57 2022 466718 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
{'Time' => '6632','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Fri Sep 30 11:07:57 2022 970189 Set read_only=1 on the orig master.. ok.
Fri Sep 30 11:07:57 2022 972187 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
{'Time' => '6632','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Fri Sep 30 11:07:58 2022 473616 Killing all application threads..
Fri Sep 30 11:07:58 2022 474206 done.
Fri Sep 30 11:07:58 2022 - [info] ok.
Fri Sep 30 11:07:58 2022 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Sep 30 11:07:58 2022 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Sep 30 11:07:58 2022 - [info] ok.
Fri Sep 30 11:07:58 2022 - [info] Orig master binlog:pos is binary_log.000001:2371352.
Fri Sep 30 11:07:58 2022 - [info] Waiting to execute all relay logs on db1(192.168.0.101:3306)..
Fri Sep 30 11:07:58 2022 - [info] master_pos_wait(binary_log.000001:2371352) completed on db1(192.168.0.101:3306). Executed 0 events.
Fri Sep 30 11:07:58 2022 - [info] done.
Fri Sep 30 11:07:58 2022 - [info] Getting new master's binlog name and position..
Fri Sep 30 11:07:58 2022 - [info] binary_log.000005:154
Fri Sep 30 11:07:58 2022 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='db1 or 192.168.0.101', MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000005', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Sep 30 11:07:58 2022 - [info] Executing master ip online change script to allow write on the new master:
Fri Sep 30 11:07:58 2022 - [info] /mysqlha/scripts/master_ip_online_change --command=start --orig_master_host=db2 --orig_master_ip=192.168.0.102 --orig_master_port=3306 --orig_master_user='dbmha' --new_master_host=db1 --new_master_ip=192.168.0.101 --new_master_port=3306 --new_master_user='dbmha' --orig_master_ssh_user=mha --new_master_ssh_user=mha --orig_master_password=xxx --new_master_password=xxx
Fri Sep 30 11:07:58 2022 569841 Set read_only=0 on the new master.
VIP IS Alive, VIP Relocate db1
packet_write_wait: Connection to 192.168.0.105 port 22: Broken pipe
ARPING 192.168.0.105 from 192.168.0.105 ens33
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
Fri Sep 30 11:08:04 2022 - [info] ok.
Fri Sep 30 11:08:04 2022 - [info]
Fri Sep 30 11:08:04 2022 - [info] * Switching slaves in parallel..
Fri Sep 30 11:08:04 2022 - [info]
Fri Sep 30 11:08:04 2022 - [info] -- Slave switch on host db3(192.168.0.103:3306) started, pid: 2396
Fri Sep 30 11:08:04 2022 - [info]
Fri Sep 30 11:08:05 2022 - [info] Log messages from db3 ...
Fri Sep 30 11:08:05 2022 - [info]
Fri Sep 30 11:08:04 2022 - [info] Waiting to execute all relay logs on db3(192.168.0.103:3306)..
Fri Sep 30 11:08:04 2022 - [info] master_pos_wait(binary_log.000001:2371352) completed on db3(192.168.0.103:3306). Executed 0 events.
Fri Sep 30 11:08:04 2022 - [info] done.
Fri Sep 30 11:08:04 2022 - [info] Resetting slave db3(192.168.0.103:3306) and starting replication from the new master db1(192.168.0.101:3306)..
Fri Sep 30 11:08:04 2022 - [info] Executed CHANGE MASTER.
Fri Sep 30 11:08:04 2022 - [info] Slave started.
Fri Sep 30 11:08:05 2022 - [info] End of log messages from db3 ...
Fri Sep 30 11:08:05 2022 - [info]
Fri Sep 30 11:08:05 2022 - [info] -- Slave switch on host db3(192.168.0.103:3306) succeeded.
Fri Sep 30 11:08:05 2022 - [info] Unlocking all tables on the orig master:
Fri Sep 30 11:08:05 2022 - [info] Executing UNLOCK TABLES..
Fri Sep 30 11:08:05 2022 - [info] ok.
Fri Sep 30 11:08:05 2022 - [info] All new slave servers switched successfully.
Fri Sep 30 11:08:05 2022 - [info]
Fri Sep 30 11:08:05 2022 - [info] * Phase 5: New master cleanup phase..
Fri Sep 30 11:08:05 2022 - [info]
Fri Sep 30 11:08:05 2022 - [info] db1: Resetting slave info succeeded.
Fri Sep 30 11:08:05 2022 - [info] Switching master to db1(192.168.0.101:3306) completed successfully.
이제 1번 서버가 마스터가 되며 3번 슬레이브는 db1으로 다시 바라보게 됩니다.
마찬가지로 이제 db2번도 db1으로 복제를 걸어야 합니다.
위에서 확인한 "All other slaves should start replication from here" 항목을 app1.log에서 검색하여 똑같이 실행합니다.
All other slaves should start replication from here.
Statement should be:
CHANGE MASTER TO MASTER_HOST='db1 or 192.168.0.101',
MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000005',
MASTER_LOG_POS=154, MASTER_USER='repl',
MASTER_PASSWORD='xxx';
위의 로그를 참조하여 CHANGE MASTER를 수행 합니다.
mysql> CHANGE MASTER TO MASTER_HOST='192.168.0.101',
-> MASTER_PORT=3306, MASTER_LOG_FILE='binary_log.000005',
-> MASTER_LOG_POS=154, MASTER_USER='repl',
-> MASTER_PASSWORD='admin1234';
Query OK, 0 rows affected, 2 warnings (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.101
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binary_log.000005
Read_Master_Log_Pos: 154
Relay_Log_File: relay_log.000002
Relay_Log_Pos: 321
Relay_Master_Log_File: binary_log.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
ERROR:
No query specified
mysql>
다시 매니저 프로세스를 기동합니다.
[mha@mhamgr app1]$ cd /mysqlha/scripts/
[mha@mhamgr scripts]$ ./mha_start.sh
[mha@mhamgr scripts]$ nohup: appending output to ‘nohup.out’
■ 에러 발생시 참고사항
-> 노드 매지저 패키지(mha4mysql-node-0.58.tar.gz) 설치 확인
[mha@mhamgr ~]$ masterha_check_repl --conf=/mysqlha/conf/app1.cnf
Can't locate MHA/NodeConst.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/share/perl5/MHA/ManagerConst.pm line 25.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/ManagerConst.pm line 25.
Compilation failed in require at /usr/local/share/perl5/MHA/Server.pm line 27.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/Server.pm line 27.
Compilation failed in require at /usr/local/share/perl5/MHA/Config.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/Config.pm line 29.
Compilation failed in require at /usr/local/share/perl5/MHA/MasterMonitor.pm line 30.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/MasterMonitor.pm line 30.
Compilation failed in require at /usr/local/bin/masterha_check_repl line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_repl line 25.
->MySQL 서버 수행여부 확인.
[mha@mhamgr ~]$ masterha_check_repl --conf=/mysqlha/conf/app1.cnf
Sun Sep 18 09:36:17 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 18 09:36:17 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 09:36:17 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 09:36:17 2022 - [info] MHA::MasterMonitor version 0.58.
Sun Sep 18 09:36:18 2022 - [error][/usr/local/share/perl5/MHA/ServerManager.pm, ln188] There is no alive server. We can't do failover
Sun Sep 18 09:36:18 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/MasterMonitor.pm line 329.
Sun Sep 18 09:36:18 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sun Sep 18 09:36:18 2022 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
-> MySQL 퍼미션 확인
[mha@mhamgr conf]$ masterha_check_repl --conf=/mysqlha/conf/app1.cnf
Sun Sep 18 10:59:32 2022 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Sep 18 10:59:32 2022 - [info] Reading application default configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 10:59:32 2022 - [info] Reading server configuration from /mysqlha/conf/app1.cnf..
Sun Sep 18 10:59:32 2022 - [info] MHA::MasterMonitor version 0.58.
Sun Sep 18 10:59:33 2022 - [info] GTID failover mode = 0
Sun Sep 18 10:59:33 2022 - [info] Dead Servers:
Sun Sep 18 10:59:33 2022 - [info] Alive Servers:
Sun Sep 18 10:59:33 2022 - [info] db1(192.168.0.101:3306)
Sun Sep 18 10:59:33 2022 - [info] db2(192.168.0.102:3306)
Sun Sep 18 10:59:33 2022 - [info] db3(192.168.0.103:3306)
Sun Sep 18 10:59:33 2022 - [info] Alive Slaves:
Sun Sep 18 10:59:33 2022 - [info] db2(192.168.0.102:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 10:59:33 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 10:59:33 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 10:59:33 2022 - [info] db3(192.168.0.103:3306) Version=5.7.36-log (oldest major version between slaves) log-bin:enabled
Sun Sep 18 10:59:33 2022 - [info] Replicating from 192.168.0.101(192.168.0.101:3306)
Sun Sep 18 10:59:33 2022 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Sep 18 10:59:33 2022 - [info] Current Alive Master: db1(192.168.0.101:3306)
Sun Sep 18 10:59:33 2022 - [info] Checking slave configurations..
Sun Sep 18 10:59:33 2022 - [warning] relay_log_purge=0 is not set on slave db2(192.168.0.102:3306).
Sun Sep 18 10:59:33 2022 - [warning] relay_log_purge=0 is not set on slave db3(192.168.0.103:3306).
Sun Sep 18 10:59:33 2022 - [info] Checking replication filtering settings..
Sun Sep 18 10:59:33 2022 - [info] binlog_do_db= , binlog_ignore_db=
Sun Sep 18 10:59:33 2022 - [info] Replication filtering check ok.
Sun Sep 18 10:59:33 2022 - [error][/usr/local/share/perl5/MHA/Server.pm, ln398] db2(192.168.0.102:3306): User repl_user does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
Sun Sep 18 10:59:33 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/ServerManager.pm line 1403.
Sun Sep 18 10:59:33 2022 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sun Sep 18 10:59:33 2022 - [info] Got exit code 1 (Not master dead).
MySQL Replication Health is NOT OK!
DB서버들의 Binary Log파일을 액세스 할 수 있는 퍼미션을 확인합니다.
위의 OS유저 계정 추가중 -g옵션을 주지 않고 mha유저를 생성했는지 확인합니다.
mha유저가 mysql유저의 그룹에 속해야 합니다.
■ 결론
▶ 단점
현재 MHA 는 더이상 업데이트가 되지 않고 있습니다. 따라서 잠재적인 버그를 가지고 있습니다.
에러 발생시 손수 원인을 찾아야 합니다. 펄 문법을 모르면 고치기가 어렵습니다.
언제까지 사용할 수 있을지 모릅니다.
▶ 장점
스크립트를 이용해 자신의 환경에 맞게 설정할 수 있습니다.
MySQL에 대한 HA구성시 간단히 구성할 수 있습니다.
■ 참고 사이트
https://hoing.io/archives/9812
http://sgdba.blogspot.com/2015/01/centos-mha-keepalived-mysql-installation.html
https://github.com/yoshinorim/mha4mysql-manager
■ 관련 링크
https://myinfrabox.tistory.com/267
https://myinfrabox.tistory.com/268
※ 도움이 되셨다면 광고클릭 한번 부탁드립니다. ※