When working with group replication, MySQL router would be the obvious choice for the connection layer. It is tightly coupled with the rest of the technologies since it is part of the InnoDB cluster stack.
The problem is that except for simple workloads, MySQL router’s performance is still not on par with other proxies like Haproxy or even ProxySQL. Fortunately, Haproxy and group replication can still work together as well. Let’s see how this can be accomplished.
Architecture
For our testing, let’s deploy an InnoDB ClusterSet in single-writer mode.
There are three nodes on the primary site:
- mysql1-t1
- mysql2-t1
- mysql3-t1
And three nodes on the DR site:
- mysql1-t2
- mysql2-t2
- mysql3-t2
In this scenario, group replication is used locally between the nodes on each site, and both sites are linked together via asynchronous replication. Local or regional failover can be controlled from the MySQL shell.
Similarly to using Haproxy as the connection layer for Percona XtraDB Cluster, health checks can be used to detect which member of the cluster to send reads or writes. This is usually accomplished through an xinetd service that runs the actual health check script. Here’s what it looks like; let’s discuss all the components in more detail.
Health check script
We need to create a database user with enough permissions to run any queries on the health check script. Let’s create the user on the current primary:
1 2 3 | CREATE USER clustercheck@localhost IDENTIFIED BY 'Clust3rCh3ck@' WITH MAX_USER_CONNECTIONS 10; GRANT PROCESS ON *.* to 'clustercheck'@'localhost'; GRANT SELECT ON sys.* to 'clustercheck'@'localhost'; |
The queries in our sample script require a few special objects to be created in the SYS schema:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | USE sys; DELIMITER $$ CREATE FUNCTION my_id() RETURNS TEXT(36) DETERMINISTIC NO SQL RETURN (SELECT @@global.server_uuid as my_id);$$ CREATE FUNCTION IFZERO(a INT, b INT) RETURNS INT DETERMINISTIC RETURN IF(a = 0, b, a)$$ CREATE FUNCTION LOCATE2(needle TEXT(10000), haystack TEXT(10000), offset INT) RETURNS INT DETERMINISTIC RETURN IFZERO(LOCATE(needle, haystack, offset), LENGTH(haystack) + 1)$$ CREATE FUNCTION GTID_NORMALIZE(g TEXT(10000)) RETURNS TEXT(10000) DETERMINISTIC RETURN GTID_SUBTRACT(g, '')$$ CREATE FUNCTION GTID_COUNT(gtid_set TEXT(10000)) RETURNS INT DETERMINISTIC BEGIN DECLARE result BIGINT DEFAULT 0; DECLARE colon_pos INT; DECLARE next_dash_pos INT; DECLARE next_colon_pos INT; DECLARE next_comma_pos INT; SET gtid_set = GTID_NORMALIZE(gtid_set); SET colon_pos = LOCATE2(':', gtid_set, 1); WHILE colon_pos != LENGTH(gtid_set) + 1 DO SET next_dash_pos = LOCATE2('-', gtid_set, colon_pos + 1); SET next_colon_pos = LOCATE2(':', gtid_set, colon_pos + 1); SET next_comma_pos = LOCATE2(',', gtid_set, colon_pos + 1); IF next_dash_pos < next_colon_pos AND next_dash_pos < next_comma_pos THEN SET result = result + SUBSTR(gtid_set, next_dash_pos + 1, LEAST(next_colon_pos, next_comma_pos) - (next_dash_pos + 1)) - SUBSTR(gtid_set, colon_pos + 1, next_dash_pos - (colon_pos + 1)) + 1; ELSE SET result = result + 1; END IF; SET colon_pos = next_colon_pos; END WHILE; RETURN result; END$$ CREATE FUNCTION gr_applier_queue_length() RETURNS INT DETERMINISTIC BEGIN RETURN (SELECT sys.gtid_count( GTID_SUBTRACT( (SELECT Received_transaction_set FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier' ), (SELECT @@global.GTID_EXECUTED) ))); END$$ CREATE FUNCTION gr_member_in_primary_partition() RETURNS VARCHAR(3) DETERMINISTIC BEGIN RETURN (SELECT IF( MEMBER_STATE='ONLINE' AND ((SELECT COUNT(*) FROM performance_schema.replication_group_members WHERE MEMBER_STATE NOT IN ('ONLINE', 'RECOVERING')) >= ((SELECT COUNT(*) FROM performance_schema.replication_group_members)/2) = 0), 'YES', 'NO' ) FROM performance_schema.replication_group_members JOIN performance_schema.replication_group_member_stats USING(member_id) where member_id=my_id()); END$$ CREATE VIEW gr_member_routing_candidate_status AS SELECT sys.gr_member_in_primary_partition() as viable_candidate, IF( (SELECT (SELECT GROUP_CONCAT(variable_value) FROM performance_schema.global_variables WHERE variable_name IN ('read_only', 'super_read_only')) != 'OFF,OFF'), 'YES', 'NO') as read_only, Count_Transactions_Remote_In_Applier_Queue as transactions_behind, Count_Transactions_in_queue as 'transactions_to_cert' from performance_schema.replication_group_member_stats where member_id=my_id();$$ DELIMITER ; |
Here is an example of a simple checker script that is good enough for testing purposes. For production use, you might want to develop something more robust.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | tee /usr/local/bin/mysql_gr_routing_check.sh < # Frederic -lefred- Descamps <[email protected]> # Based on the original script from Unai Rodriguez and later work by Olaf van Zandwijk and Raghavendra Prabhu # # version 0.1 - first release # version 0.2 - add read & write check + queue length check # mysql_gr_routing_check.sh <READ|WRITE> # This password method is insecure and should not be used in a production environment! MYSQL_USERNAME="clustercheck" MYSQL_PASSWORD="Clust3rCh3ck@" MYSQL_HOST=localhost MYSQL_PORT=3306 MAXQUEUE=${1-100} ROLE=${2-WRITE} if ! [ "$MAXQUEUE" -eq "$MAXQUEUE" ] 2>/dev/null; then # Member is not a viable routing candidate => return HTTP 503 # Shell return-code is 1 echo -en "HTTP/1.1 503 Service Unavailablern" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 44rn" echo -en "rn" echo -en "Group Replication member is not a viable routing candidate:rn" echo -en "maxqueue argument is not a valid value: ($MAXQUEUE).rn" exit 1 fi echo $(mysql --no-defaults -BN --connect-timeout=10 --host=$MYSQL_HOST --port=$MYSQL_PORT --user="$MYSQL_USERNAME" --password="$MYSQL_PASSWORD" -e 'SELECT * FROM sys.gr_member_routing_candidate_status' 2>/dev/null) | while read candidate readonly queue tx_to_cert do if [ "$candidate" == "YES" ] then if [ "${ROLE^^}" == "READ" ] then if [ $queue -lt $MAXQUEUE ] then # Member is a viable routing candidate => return HTTP 200 # Shell return-code is 0 echo -en "HTTP/1.1 200 OKrn" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 40rn" echo -en "rn" echo -en "Group Replication member is a viable routing candidate for $ROLE.rn" exit 0 else # Member is not a viable routing candidate => return HTTP 503 # Shell return-code is 1 echo -en "HTTP/1.1 503 Service Unavailablern" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 44rn" echo -en "rn" echo -en "Group Replication member is not a viable routing candidate:rn" echo -en "queue exceeds ($queue) threshold ($MAXQUEUE).rn" exit 1 fi elif [ "${ROLE^^}" == "WRITE" ] then if [ "$readonly" == "YES" ] then # Member is not a viable routing candidate => return HTTP 503 # Shell return-code is 1 echo -en "HTTP/1.1 503 Service Unavailablern" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 44rn" echo -en "rn" echo -en "Group Replication member is not a viable routing candidate:rn" echo -en "$ROLE cannot be routed to a readonly member.rn" exit 1 else if [ $queue -lt $MAXQUEUE ] then # Member is a viable routing candidate => return HTTP 200 # Shell return-code is 0 echo -en "HTTP/1.1 200 OKrn" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 40rn" echo -en "rn" echo -en "Group Replication member is a viable routing candidate for $ROLE.rn" exit 0 else # Member is not a viable routing candidate => return HTTP 503 # Shell return-code is 1 echo -en "HTTP/1.1 503 Service Unavailablern" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 44rn" echo -en "rn" echo -en "Group Replication member is not a viable routing candidate:rn" echo -en "queue exceeds ($queue) threshold ($MAXQUEUE).rn" exit 1 fi fi else # Member is not a viable routing candidate => return HTTP 503 # Shell return-code is 1 echo -en "HTTP/1.1 503 Service Unavailablern" echo -en "Content-Type: text/plainrn" echo -en "Connection: closern" echo -en "Content-Length: 44rn" echo -en "rn" echo -en "Group Replication member is not a viable routing candidate:rn" echo -en "$ROLE is not a valid argument.rn" exit 1 fi fi done EOF |
Don’t forget to give the script execute permissions:
1 | /usr/local/bin/mysql_gr_routing_check.sh |
Xinetd service
We need to deploy a custom xinetd service to expose the state of MySQL on each node. We deploy one service using port 6446 to check for the write availability of a node and a service on port 6447 to check if the node is available for reads.
First, install the xinetd package, e.g.,
1 | yum -y install xinetd |
Now we prepare the definitions of the xinetd services:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | tee /etc/xinetd.d/mysql_gr_routing_check_write <<EOF # default: on # description: check to see if the node is a viable routing candidate service mysql_gr_routing_check_write { disable = no flags = REUSE socket_type = stream port = 6446 wait = no user = mysql server = /usr/local/bin/mysql_gr_routing_check.sh server_args = 100 write log_on_failure += USERID per_source = UNLIMITED } EOF |
And:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | tee /etc/xinetd.d/mysql_gr_routing_check_read <<EOF # default: on # description: check to see if the node is a viable routing candidate service mysql_gr_routing_check_read { disable = no flags = REUSE socket_type = stream port = 6447 wait = no user = mysql server = /usr/local/bin/mysql_gr_routing_check.sh server_args = 100 read log_on_failure += USERID per_source = UNLIMITED } EOF |
Start the service:
1 | <span style="font-weight: 400;">service xinetd start</span> |
Testing the service
We can verify our service is working by using telnet (or curl) from a remote host. For example, to check if a node is available as a writer, we query the service on port 6446:
1 2 3 4 5 6 7 8 9 10 11 12 | # telnet mysql1-t2 6446 Trying 10.11.100.133... Connected to mysql1-t2. Escape character is '^]'. HTTP/1.1 503 Service Unavailable Content-Type: text/plain Connection: close Content-Length: 44 Group Replication member is not a viable routing candidate: write cannot be routed to a readonly member. Connection closed by foreign host. |
We can also verify if a node can be a reader using port 6447:
1 2 3 4 5 6 7 8 9 10 11 | # telnet mysql1-t2 6447 Trying 10.11.100.133... Connected to mysql1-t2. Escape character is '^]'. HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 40 Group Replication member is a viable routing candidate for read. Connection closed by foreign host. |
Haproxy configuration
Haproxy needs to be configured with two dedicated endpoints for reads and writes, respectively. The HTTP checks defined will query our custom xinetd services to check for a node’s read/write availability.
Here’s an example of the haproxy configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | frontend mysql-gr-front_write bind *:3307 mode tcp default_backend mysql-gr-back_write backend mysql-gr-back_write mode tcp balance leastconn option httpchk server mysql1-t1 mysql1-t1:3306 check port 6446 inter 1000 rise 1 fall 2 on-marked-up shutdown-backup-sessions server mysql2-t1 mysql2-t1:3306 check port 6446 inter 1000 rise 1 fall 2 backup server mysql3-t1 mysql3-t1:3306 check port 6446 inter 1000 rise 1 fall 2 backup server mysql1-t2 mysql1-t2:3306 check port 6446 inter 1000 rise 1 fall 2 backup server mysql2-t2 mysql2-t2:3306 check port 6446 inter 1000 rise 1 fall 2 backup server mysql3-t2 mysql3-t2:3306 check port 6446 inter 1000 rise 1 fall 2 backup frontend mysql-gr-front_read bind *:3308 mode tcp default_backend mysql-gr-back_read backend mysql-gr-back_read mode tcp balance leastconn option httpchk server mysql1-t1 mysql1-t1:3306 check port 6447 inter 1000 rise 1 fall 2 server mysql2-t1 mysql2-t1:3306 check port 6447 inter 1000 rise 1 fall 2 server mysql3-t1 mysql3-t1:3306 check port 6447 inter 1000 rise 1 fall 2 server mysql1-t2 mysql1-t2:3306 check port 6447 inter 1000 rise 1 fall 2 server mysql2-t2 mysql2-t2:3306 check port 6447 inter 1000 rise 1 fall 2 server mysql3-t2 mysql3-t2:3306 check port 6447 inter 1000 rise 1 fall 2 frontend stats mode http bind *:443 stats enable stats uri /stats stats refresh 10s stats admin if LOCALHOST |
Connecting our application
Here’s an example of how the application would connect to the database through Haproxy to write:
1 | mysql -h haproxy1 -P3307 -u testuser -p'testpwd' |
For read-only connections, it would use the alternative port as configured above:
1 | mysql -h haproxy1 -P3308 -u testuser -p'testpwd' |
Closing thoughts
Using this architecture, both local and regional failover can be triggered through MySQL Shell. Haproxy will automatically adjust the writer/reader nodes without human intervention.
We can also take individual nodes out of the pool by simply stopping the MySQL service or pausing group replication.
Hopefully, in the future, MySQL router will offer similar performance to other routing solutions so that we can take advantage of the integration with the rest of the components of InnoDB cluster.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
Try Percona Distribution for MySQL today!