During the last few weeks I’ve been testing and playing a bit with the new group-replication plugin available for MySQL 5.7. Before continuing I’d like to clarify some aspects: the plugin is only available in labs and is not yet ready for production. The current version is 0.6. I used 5.7.9 GA running in a Vagrant 3 nodes cluster with CentOS 7.
As an additional note, I’ve tested previous version of plugin 0.5 against 5.7.8.rc and there are some good changes, so I recommend starting with the GA version.
For the matter of my tests I’ve followed instructions from this post. It’s not as straightforward as it looks; there were some issues that needed to be handled, but I finally managed to get a consistent 3 nodes cluster running:
1 2 3 4 5 6 7 8 9 | mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) |
My impressions about installation
1- The previous version relies only on Corosync for group communication, meaning that you need to install and configure an extra piece of software. For those who aren’t very familiar with it (like me), it can be a bit tricky and complex. From version 0.6.0 for MySQL 5.7.9, a new communication engine called XCom was added. This is now the default communication framework which is included in the plugin, so no extra pieces of software are needed.
2- When you initialize the MySQL database (mysqld –initialize function now replaces mysql_install_db script) you need to disable binary logging in the configuration file, otherwise information such as ‘create database mysql’ will be pushed to binary logs and cause issues with nodes joining the cluster due errors like:
2015-10-21T20:18:52.059231Z 8 [Warning] Slave: Can't create database 'mysql'; database exists Error_code: 1007
3- In group replication there isn’t a concept like SST (State Snapshot Transfer) which basically drops and recreates the datadir if it finds data discrepancies. With group replication you may end up having different datasets and replication will continue working (more on this later in the post).
4- For Incremental State Transfer (a.k.a. IST in Galera), group replication trusts in binary logs present in any of the potential donors (at the moment the selection of a donor is done randomly). So, if a node is disconnected, when it comes back online, it requests binary logs from the donor using the same IO thread as regular replication. The problem here is that if the binary log was purged on the donor then the joiner can’t be recovered and a full rebuild is needed. This is a similar approach to the gcache in Galera, but when gcache is not able to provide transactions needed for IST, an SST is performed instead. Group replication can’t do this (yet?).
These are some of the installation issues I faced. Now that we have the cluster running, what works? Well let’s try some samples.
Simple write tests
I tried running simple write operations like a few inserts, create tables and so on using sysbench like this:
1 | sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run |
And checked status in the other nodes. It does what it is supposed to do; data and records are found in the rest of nodes just like this:
Node1:
1 2 3 4 5 6 7 8 9 10 | mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec) |
Node2
1 2 3 4 5 6 7 8 9 10 | mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec) |
Well we expected this so, yay, we are fine.
What about trying to write in 2 nodes at the same time? This should fire things like conflict resolution during certification; in a nutshell, if we expect to use group replication to write in multiple nodes at the same time, we need a way to resolve conflicts with the data. These are most common in PK/UK violations; in other words 2 transactions trying to insert the same record/id. This is not recommended because it is not an approach we can use to scale up writes (same as Galera) but it’s still possible to do.
An easier way to test is to run sysbench in more than one member of a cluster and wait for a failure. As expected, it does what it is supposed to do:
Node1
1 2 3 4 5 6 | [root@node1 data]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run |grep tps [ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 1416.75, response time: 25.65ms (95%), errors: 0.00, reconnects: 0.00 [ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 1893.78, response time: 20.94ms (95%), errors: 0.00, reconnects: 0.00 [ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 1421.19, response time: 28.44ms (95%), errors: 0.00, reconnects: 0.00 [ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 1397.83, response time: 34.92ms (95%), errors: 0.00, reconnects: 0.00 [ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 1734.31, response time: 22.75ms (95%), errors: 0.00, reconnects: 0.00 |
Node2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [root@node2 vagrant]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run WARNING: Both max-requests and max-time are 0, running endless test sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 8 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12608`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null) ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12468`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null) |
Node2 eventually failed, but what happened? Let’s check the error log to see what’s reporting:
1 2 | 2015-10-27T17:12:33.894759Z 3 [Note] InnoDB: Blocking transaction: ID: 223804 - Blocked transaction ID: 223820 - MySQL thread id 59, OS thread handle 139838350866176, query id 197913 localhost root query end UPDATE sbtest5 SET k=k+1 WHERE id=12510 |
At the commit stage there was a conflict with an already committed transaction in Node1, so it forced a failure and a rollback of the operation. So far, so good.
What about a node going down?
One of the tests I ran was to kill one of the nodes during the operations to see if it resumes replication properly when back to life. For this we need to set up some variables in the configuration file as follows:
1 2 3 4 | loose-group_replication_group_name="8a94f357-aab4-11df-86ab-c80aa9429562" loose-group_replication_start_on_boot=1 loose-group_replication_recovery_user='rpl_user' loose-group_replication_recovery_password='rpl_pass' |
Note: This is interesting, that the replication credentials are not saved into a table (as is done with slave_master_info in regular replication). I guess this is part of a to do section, but it’s something to keep in mind since this implies a security risk.
Back to our test. I ran the regular sysbench command in one of my nodes and then went to node2 and killed mysql daemon. After the regular crash recovery messages we can see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | [root@node2 data]# killall mysqld [root@node2 data]# tail -500 error.log 2015-10-27T17:15:26.460674Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active. 2015-10-27T17:15:26.460711Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active. 2015-10-27T17:15:26.461001Z 0 [Note] InnoDB: Waiting for purge to start 2015-10-27T17:15:26.514015Z 0 [Note] InnoDB: 5.7.9 started; log sequence number 77370253 2015-10-27T17:15:26.515417Z 0 [Note] Plugin 'FEDERATED' is disabled. 2015-10-27T17:15:26.525466Z 0 [Note] InnoDB: not started 2015-10-27T17:15:26.525914Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/ib_buffer_pool [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configuring Xcom group: XCom Group ID=1827295128 Name=8a94f357-aab4-11df-86ab-c80aa9429562 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.2:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Total number of peers: 2 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Local Node: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Bootstrap: false 2015-10-27T17:15:26.723392Z 0 [Note] InnoDB: Buffer pool(s) load completed at 151027 17:15:26 2015-10-27T17:15:27.135089Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key 2015-10-27T17:15:27.136449Z 0 [Note] Server hostname (bind-address): '*'; port: 3306 2015-10-27T17:15:27.145198Z 0 [Note] IPv6 is available. 2015-10-27T17:15:27.145247Z 0 [Note] - '::' resolves to '::'; 2015-10-27T17:15:27.145265Z 0 [Note] Server socket created on IP: '::'. 2015-10-27T17:15:27.171615Z 0 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.171711Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.172447Z 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.173089Z 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.192881Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.205764Z 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=node2-relay-bin' to avoid this problem. 2015-10-27T17:15:27.676222Z 1 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:27.685374Z 3 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_applier.000002' position: 51793711 2015-10-27T17:15:27.685985Z 0 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:15:27.686009Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:15:27.686017Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 2' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.3 10300 state 0 action xa_init connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 ... [XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.2:10300 connecting to 192.168.70.2 10300 connected to 192.168.70.2 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run get_nodeno(get_site_def()) = 2 task_now() = 1445966128.920615 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12385 1} [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. 2015-10-27T17:15:28.926806Z 0 [Note] Event Scheduler: Loaded 0 events 2015-10-27T17:15:28.928485Z 0 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.7.9-log' socket: '/data/mysql.sock' port: 3306 MySQL Community Server (GPL) [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. .... [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:15:30.084101Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:7' [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 1382963399 2015-10-27T17:15:30.091414Z 5 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/86400' 2015-10-27T17:15:30.107078Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.117379Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor e5263489-7cb7-11e5-a8ee-0800275ff74d at node3 port: 3306.' 2015-10-27T17:15:30.118109Z 6 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.130001Z 7 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_recovery.000001' position: 4 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 2015-10-27T17:15:30.169817Z 6 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'rpl_user@node3:3306' - retry-time: 60 retries: 1, Error_code: 1130 2015-10-27T17:15:30.169856Z 6 [Note] Slave I/O thread for channel 'group_replication_recovery' killed while connecting to master 2015-10-27T17:15:30.169862Z 6 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4 2015-10-27T17:15:30.174955Z 5 [ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server. Check group replication recovery's connection credentials.' 2015-10-27T17:15:30.175573Z 5 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/86400' 2015-10-27T17:15:30.178016Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.189233Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor a0ef74a1-7cb3-11e5-845e-0800275ff74d at node1 port: 3306.' 2015-10-27T17:15:30.190787Z 8 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.610531Z 8 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_user@node1:3306',replication started in log 'FIRST' at position 4 get_nodeno(get_site_def()) = 2 task_now() = 1445966131.000425 n = 74 (n - old_n) / (task_now() - old_t) = 11.136156 |
During this process we can check the status in any node as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | RECOVERING | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) |
Again, as expected, the node connected to the cluster, requested binary logs from latest GTID executed position and applied remaining changes to be back online.
The final test I’ve done so far is about data consistency. For example, what if I stop group replication in a node and make some data changes? When it gets back to replication will it send these changes?
Let’s see a very simple example:
Node2:
1 2 3 4 5 6 7 | mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.12 sec) |
Node1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> stop group_replication; Query OK, 0 rows affected (0.03 sec) mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.02 sec) mysql> start group_replication; Query OK, 0 rows affected (0.02 sec) mysql> select * from sbtest1 where id=15; Empty set (0.00 sec) |
And now Node2 again:
1 2 3 4 5 6 7 | mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec) |
Hmmmm, not cool, what if I try to remove a row from Node2?
1 2 3 4 5 6 7 8 9 10 11 12 | mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.01 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) |
Hmmmm, strange, everything seems to be working correctly. Is it? Let’s check node1 again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.01 sec) mysql> stop group_replication; Query OK, 0 rows affected (4.01 sec) mysql> start group_replication; Query OK, 0 rows affected (2.41 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | OFFLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 1 row in set (0.00 sec) mysql> exit Bye [root@node1 data]# tail -100 error.log 2015-10-27T17:52:50.075274Z 15 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 346, Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [Warning] Slave: Can't find record in 'sbtest1' Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0 2015-10-27T17:52:50.075294Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:52:50.075308Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:52:50.075312Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.2 10300 state 3489 action xa_init connecting to 192.168.70.2 10300 .... XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: Skipping own address. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 0 [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: node set: peer: 0 flag: 1 peer: 1 flag: 1 peer: 2 flag: 1 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: Processing new view on Handler [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: My node_id is 2 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: State Exchange started. get_nodeno(get_site_def()) = 2 task_now() = 1445968372.450627 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12585 1} [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 1 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 2 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12586 message_id.node= 0 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:52:52.455340Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:11' 2015-10-27T17:52:52.456474Z 16 [ERROR] Plugin group_replication reported: 'Can't evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.' 2015-10-27T17:52:52.456503Z 16 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.' [XCOM BINDING DEBUG] ::leave() [XCOM BINDING DEBUG] ::leave():: Skipping own address. [XCOM BINDING DEBUG] ::leave():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::leave():: Calling xcom_client_remove_node cli_err 0 handle_remove_node /export/home2/pb2/build/sb_0-16846472-1445524610.82/build/BUILD/mysql-server/plugin/group_replication/gcs/src/bindings/xcom/xcom/xcom_base.c:1987 nodes: 0x3d05fa8 nodes->node_list_len = 1 nodes->node_list_val: 0x3da7da0 node_address n.address: 0x3d238d0 192.168.70.2:10300 getstart group_id 84bca5ce state 3551 action xa_terminate new state x_start state 3489 action xa_exit Exiting xcom thread new state x_start [XCOM BINDING DEBUG] ::leave():: Installing Leave view [XCOM_BINDING_DEBUG] ::install_view():: No exchanged data [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 [XCOM BINDING DEBUG] ::leave():: Exiting with error=GCS_OK |
So it looks like a member that has data inconsistencies might be reported as ONLINE erroneously, but whenever group replication is restarted it will fail and won’t be able to join to the cluster. It seems there should be better error handling when a data inconsistency is found.
What about the operational perspective?
It looks very limited, just a few variables and status counters, plus some status tables in performance schema as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | mysql> show global variables like '%group_repli%'; +---------------------------------------------------+---------------------------------------+ | Variable_name | Value | +---------------------------------------------------+---------------------------------------+ | group_replication_allow_local_lower_version_join | OFF | | group_replication_auto_increment_increment | 7 | | group_replication_bootstrap_group | OFF | | group_replication_components_stop_timeout | 31536000 | | group_replication_gcs_engine | xcom | | group_replication_group_name | 8a94f357-aab4-11df-86ab-c80aa9429562 | | group_replication_local_address | 192.168.70.3:10300 | | group_replication_peer_addresses | 192.168.70.2:10300,192.168.70.3:10300 | | group_replication_pipeline_type_var | STANDARD | | group_replication_recovery_complete_at | TRANSACTIONS_CERTIFIED | | group_replication_recovery_password | | | group_replication_recovery_reconnect_interval | 60 | | group_replication_recovery_retry_count | 86400 | | group_replication_recovery_ssl_ca | | | group_replication_recovery_ssl_capath | | | group_replication_recovery_ssl_cert | | | group_replication_recovery_ssl_cipher | | | group_replication_recovery_ssl_crl | | | group_replication_recovery_ssl_crlpath | | | group_replication_recovery_ssl_key | | | group_replication_recovery_ssl_verify_server_cert | OFF | | group_replication_recovery_use_ssl | OFF | | group_replication_recovery_user | rpl_user | | group_replication_start_on_boot | ON | +---------------------------------------------------+---------------------------------------+ 24 rows in set (0.00 sec) mysql> show global status like '%group_repli%'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | Com_group_replication_start | 8 | | Com_group_replication_stop | 7 | +-----------------------------+-------+ 2 rows in set (0.01 sec) mysql> show tables from performance_schema like '%group%'; +----------------------------------------+ | Tables_in_performance_schema (%group%) | +----------------------------------------+ | replication_group_member_stats | | replication_group_members | +----------------------------------------+ 2 rows in set (0.00 sec) |
Most of values above are self-descriptive. I still need to dig into it a bit more to find the function for some of them.
Conclusions:
So far the work done with group replication is very impressive. Of course there is still a long road to travel, but it doesn’t look to be fair to compare group replication against Galera, unless it is not a side by side comparison.
Even if I like the idea of using a legacy component, I don’t like the need to install and configure Corosync because it’s another piece of software that could eventually fail. Fortunately this can be avoided with the newer version of the plugin, which can use the new XCom communication framework. I tested both versions and using XCom is far easier to setup and configure; however, the error log file can become very verbose, maybe too verbose in my opinion.
With regards to installation and configuration it’s pretty easy once you find the proper way to do it. There are few variables to configure to have a working cluster and most of the settings works just fine by default (like group_replication_auto_increment_increment).
I would still like to have some automatic control on data inconsistency handling (like SST in Galera), but in my opinion this new feature can be a good approach to consider in the future when looking for high availability solutions. A lot of tests need to be done and I’d also like to see some benchmarks. These are just my first impressions and we should wait some time before seeing this feature as GA. Paraphrasing that song “it’s a long way to the top if you wanna rock ‘n’ roll.”
How does it perform compared to Galera with
1) larger network latencies like 100ms between replicas
2) larger groups like 5 replicas?
@Mark, for both questions I have the same answer 🙂
I haven’t done performance tests yet because this was just a first approach running some VMs on my laptop, I’m planning to write another post on installation and maybe a comparison against Galera, so far I can’t do a fair comparison since group replication is not GA yet and I haven’t seen or tried Galera against 5.7
The tests you mentioned looks interesting since performance here can be mostly penalized by network roundtrips so it worth a try to see what’s going on with large groups or bad networks (I guess it should be very similar to Galera)
Cheers.
Mark,
One thing I think is good to check is handling of larger transactions, especially how certification happens. Galera essentially “stalls” the whole cluster when large transaction is being certified, at least last time I checked which can be improved.