Group Replication During the last few weeks I’ve been testing and playing a bit with the new group-replication plugin available for MySQL 5.7. Before continuing I’d like to clarify some aspects: the plugin is only available in labs and is not yet ready for production. The current version is 0.6. I used 5.7.9 GA running in a Vagrant 3 nodes cluster with CentOS 7.
As an additional note, I’ve tested previous version of plugin 0.5 against 5.7.8.rc and there are some good changes, so I recommend starting with the GA version.

For the matter of my tests I’ve followed instructions from this post. It’s not as straightforward as it looks; there were some issues that needed to be handled, but I finally managed to get a consistent 3 nodes cluster running:

My impressions about installation

1- The previous version relies only on Corosync for group communication, meaning that you need to install and configure an extra piece of software. For those who aren’t very familiar with it (like me), it can be a bit tricky and complex. From version 0.6.0 for MySQL 5.7.9, a new communication engine called XCom was added. This is now the default communication framework which is included in the plugin, so no extra pieces of software are needed.

2- When you initialize the MySQL database (mysqld –initialize function now replaces mysql_install_db script) you need to disable binary logging in the configuration file, otherwise information such as ‘create database mysql’ will be pushed to binary logs and cause issues with nodes joining the cluster due errors like:
2015-10-21T20:18:52.059231Z 8 [Warning] Slave: Can't create database 'mysql'; database exists Error_code: 1007

3- In group replication there isn’t a concept like SST (State Snapshot Transfer) which basically drops and recreates the datadir if it finds data discrepancies. With group replication you may end up having different datasets and replication will continue working (more on this later in the post).

4- For Incremental State Transfer (a.k.a. IST in Galera), group replication trusts in binary logs present in any of the potential donors (at the moment the selection of a donor is done randomly). So, if a node is disconnected, when it comes back online, it requests binary logs from the donor using the same IO thread as regular replication. The problem here is that if the binary log was purged on the donor then the joiner can’t be recovered and a full rebuild is needed. This is a similar approach to the gcache in Galera, but when gcache is not able to provide transactions needed for IST, an SST is performed instead. Group replication can’t do this (yet?).

These are some of the installation issues I faced. Now that we have the cluster running, what works? Well let’s try some samples.

Simple write tests
I tried running simple write operations like a few inserts, create tables and so on using sysbench like this:

And checked status in the other nodes. It does what it is supposed to do; data and records are found in the rest of nodes just like this:
Node1:

Node2

Well we expected this so, yay, we are fine.

What about trying to write in 2 nodes at the same time? This should fire things like conflict resolution during certification; in a nutshell, if we expect to use group replication to write in multiple nodes at the same time, we need a way to resolve conflicts with the data. These are most common in PK/UK violations; in other words 2 transactions trying to insert the same record/id. This is not recommended because it is not an approach we can use to scale up writes (same as Galera) but it’s still possible to do.

An easier way to test is to run sysbench in more than one member of a cluster and wait for a failure. As expected, it does what it is supposed to do:
Node1

Node2

Node2 eventually failed, but what happened? Let’s check the error log to see what’s reporting:

At the commit stage there was a conflict with an already committed transaction in Node1, so it forced a failure and a rollback of the operation. So far, so good.

What about a node going down?
One of the tests I ran was to kill one of the nodes during the operations to see if it resumes replication properly when back to life. For this we need to set up some variables in the configuration file as follows:

Note: This is interesting, that the replication credentials are not saved into a table (as is done with slave_master_info in regular replication). I guess this is part of a to do section, but it’s something to keep in mind since this implies a security risk.

Back to our test. I ran the regular sysbench command in one of my nodes and then went to node2 and killed mysql daemon. After the regular crash recovery messages we can see:

During this process we can check the status in any node as follows:

Again, as expected, the node connected to the cluster, requested binary logs from latest GTID executed position and applied remaining changes to be back online.

The final test I’ve done so far is about data consistency. For example, what if I stop group replication in a node and make some data changes? When it gets back to replication will it send these changes?

Let’s see a very simple example:
Node2:

Node1:

And now Node2 again:

Hmmmm, not cool, what if I try to remove a row from Node2?

Hmmmm, strange, everything seems to be working correctly. Is it? Let’s check node1 again:

So it looks like a member that has data inconsistencies might be reported as ONLINE erroneously, but whenever group replication is restarted it will fail and won’t be able to join to the cluster. It seems there should be better error handling when a data inconsistency is found.

What about the operational perspective?
It looks very limited, just a few variables and status counters, plus some status tables in performance schema as follows:

Most of values above are self-descriptive. I still need to dig into it a bit more to find the function for some of them.

Conclusions:

So far the work done with group replication is very impressive. Of course there is still a long road to travel, but it doesn’t look to be fair to compare group replication against Galera, unless it is not a side by side comparison.

Even if I like the idea of using a legacy component, I don’t like the need to install and configure Corosync because it’s another piece of software that could eventually fail. Fortunately this can be avoided with the newer version of the plugin, which can use the new XCom communication framework. I tested both versions and using XCom is far easier to setup and configure; however, the error log file can become very verbose, maybe too verbose in my opinion.
With regards to installation and configuration it’s pretty easy once you find the proper way to do it. There are few variables to configure to have a working cluster and most of the settings works just fine by default (like group_replication_auto_increment_increment).

I would still like to have some automatic control on data inconsistency handling (like SST in Galera), but in my opinion this new feature can be a good approach to consider in the future when looking for high availability solutions. A lot of tests need to be done and I’d also like to see some benchmarks. These are just my first impressions and we should wait some time before seeing this feature as GA. Paraphrasing that song “it’s a long way to the top if you wanna rock ‘n’ roll.”

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mark Callaghan

How does it perform compared to Galera with
1) larger network latencies like 100ms between replicas
2) larger groups like 5 replicas?

Peter Zaitsev

Mark,

One thing I think is good to check is handling of larger transactions, especially how certification happens. Galera essentially “stalls” the whole cluster when large transaction is being certified, at least last time I checked which can be improved.