This blog post discusses ProxySQL and MHA integration, and how they work together.
MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.
This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.
The following is an example of an MHA configuration file for use with ProxySQL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | server default] user=mha password=mha ssh_user=root repl_password=replica manager_log=/tmp/mha.log manager_workdir=/tmp remote_workdir=/tmp master_binlog_dir=/opt/mysql_instances/mha1/logs client_bindir=/opt/mysql_templates/mysql-57/bin client_libdir=/opt/mysql_templates/mysql-57/lib master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change log_level=debug [server1] hostname=mha1r ip=192.168.1.104 candidate_master=1 [server2] hostname=mha2r ip=192.168.1.107 candidate_master=1 [server3] hostname=mha3r ip=192.168.1.111 candidate_master=1 [server4] hostname=mha4r ip=192.168.1.109 no_master=1 |
NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.
After that, just install MHA as you normally would.
In ProxySQL, be sure to have all MHA users and the servers set.
When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute max_replication_lag, or the check will be ignored.
As a reminder:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10); INSERT INTO mysql_replication_hostgroups VALUES (600,601); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1); insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE'); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT'); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK |
OK, now that all is ready, let’s rock’n’roll!
Controlled fail-over
First of all, the masterha_manager should not be running or you will get an error.
Now let’s start some traffic:
1 2 3 4 | Write sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run Read only sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run |
Let it run for a bit, then check:
1 2 3 4 5 6 7 8 9 10 | mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.104 | 3306 | ONLINE | 10 | 0 | 20 | 0 | 551256 | 44307633 | 0 | 285 | <--- current Master | 601 | 192.168.1.111 | 3306 | ONLINE | 5 | 3 | 11 | 0 | 1053685 | 52798199 | 4245883580 | 1133 | | 601 | 192.168.1.109 | 3306 | ONLINE | 3 | 5 | 10 | 0 | 1006880 | 50473746 | 4052079567 | 369 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1040524 | 52102581 | 4178965796 | 604 | | 601 | 192.168.1.104 | 3306 | ONLINE | 7 | 1 | 16 | 0 | 987548 | 49458526 | 3954722258 | 285 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ |
Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:
1 | masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0 |
Check what happened:
1 2 3 4 5 6 | [ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects: 0.00 <--- moment of the switch [ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects: 0.00 [ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects: 1.00 [ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects: 0.00 |
Check ProxySQL:
1 2 3 4 5 6 7 8 9 10 | mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.107 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 123457 | 9922280 | 0 | 658 | <--- new master | 601 | 192.168.1.111 | 3306 | ONLINE | 2 | 6 | 14 | 0 | 1848302 | 91513537 | 7590137770 | 1044 | | 601 | 192.168.1.109 | 3306 | ONLINE | 5 | 3 | 12 | 0 | 1688789 | 83717258 | 6927354689 | 220 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1834415 | 90789405 | 7524861792 | 658 | | 601 | 192.168.1.104 | 3306 | ONLINE | 6 | 2 | 24 | 0 | 1667252 | 82509124 | 6789724589 | 265 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ |
In this case, the servers weren’t behind the master and switch happened quite fast.
We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.
Read operations were not affected, at all. Nice, eh?
Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.
This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.
Crash fail-over
What happened if instead of an easy switch, we have to cover a real failover?
First of all, let’s start masterha_manager:
1 | nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1 |
Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107
1 2 3 4 5 6 7 | ps aux|grep mysql mysql 18755 0.0 0.0 113248 1608 pts/0 S Aug28 0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf mysql 21975 3.2 30.4 4398248 941748 pts/0 Sl Aug28 93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306 And kill the MySQL process. kill -9 21975 18755 |
As before, check what happened on the application side:
1 2 3 4 5 6 7 8 9 10 11 | [ 80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects: 0.00 [ 90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects: 0.00 [ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects: 0.00 [ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects: 0.00 <-- issue starts [ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects: 0.00 [ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.40 <-- total stop in write [ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects: 0.00 <-- writes restart [ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects: 0.00 [ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects: 0.00 |
So it takes ~10 seconds to perform failover.
To understand better, let see what happened in MHA-land:
1 2 3 4 5 6 7 8 9 10 11 12 | Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf.. ... Read conf and start Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock.. Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. ... Wait for errors Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s).. <--- Finally MHA decide to do something Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker! Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable! Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable. Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover |
MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).
To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.
As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.
Is there a reason to use mysql_replication_hostgroups? This table implies that some host are read-only, in your case the 601 hostgroup. On failover will the MHA change the read_only on the old master to 1 and the new master to 0?
Hi Alessandro,
yes that is correct here we are talking of standard MySQL replication with MHA controller. As such we have a master with READ_ONLY=0 and Slaves with READ_ONLY=1. In case of failover MHA will change the READ_ONLY variables in the new just elect MASTER to 0.
ProxySQL will “see” that event and will copy the entry to the 600 group, re-enabling writes against the new elected Master.
Hope this clarify
Hi Alessandro,
yes that is correct here we are talking of standard MySQL replication with MHA controller. As such we have a master with READ_ONLY=0 and Slaves with READ_ONLY=1. In case of failover MHA will change the READ_ONLY variables in the new just elect MASTER to 0.
ProxySQL will “see” that event and will copy the entry to the 600 group, re-enabling writes against the new elected Master.
Hope this clarify
Marco, good article, as always!
Just calling the attention to a small thing; when you create a database to hold sysbench tables on ProxySQL, you got in “test_mha” and when you run sysbench, you’re running it against “mha_test”:
#: proxysql default_schema for mha_* users
insert into mysql_users (username,password,active,default_hostgroup,default_schema) values (‘mha_W’,’test’,1,10,’test_mha’);
#: sysbench
sysbench … –mysql-db=mha_test –db-driver=mysql … run
If it’s not a problem, pardon myself. Cheers!
Hey, Im using Orchestrator for failover. This is also will make the read_only=OFF on the new master. But I didn’t see any hostgroup changes. Did you use any parameters in proxysql?
Are you using the Replication Host group Table?
Best way is to share the mysql_server and replication host group table information to debug your problem
We have a configuration of one master server and two slaves and MHA is installed on top of it . MHA manager is installed and is operated through a dedicated server. Our objective is to build up MHA in such a way that when master host(the server where master database is running) goes down it should perform fail over. As per our present infrastructure database fail over happens whenever database running on master host is down but db fail over doesn’t happen when host is down or unreachable.
As we are using virtual machine and not a physical server power manger script is not deployed in MHA .
Will we be able to meet our requirement If we integrate proxy sql with MHA ?
We do understand that using proxy sql there is no need to move ip from one server to another server but db fail over takes place and proxy sql routes write connection but will there be db fail over when master host is down ?