In this blog, we’ll look at how to quickly add a node to an InnoDB Cluster or Group Replication using Percona XtraBackup.

Adding nodes to a Group Replication cluster can be easy (documented here), but it only works if the existing nodes have retained all the binary logs since the creation of the cluster. Obviously, this is possible if you create a new cluster from scratch. The nodes rotate old logs after some time, however. Technically, if the gtid_purged set is non-empty, it means you will need another method to add a new node to a cluster. You also need a different method if data becomes inconsistent across cluster nodes for any reason. For example, you might hit something similar to this bug, or fall prey to human error.

Hot Backup to the Rescue

The quick and simple method I’ll present here requires the Percona XtraBackup tool to be installed, as well as some additional small tools for convenience. I tested my example on Centos 7, but it works similarly on other Linux distributions. First of all, you will need the Percona repository installed:

Then, install Percona XtraBackup and the additional tools. You might need to enable the EPEL repo for the additional tools and the experimental Percona repo for XtraBackup 8.0 that works with MySQL 8.0. (Note: XtraBackup 8.0 is still not GA when writing this article, and we do NOT recommend or advise that you install XtraBackup 8.0 into a production environment until it is GA). For MySQL 5.7, Xtrabackup 2.4 from the regular repo is what you are looking for:

You need to do it on both the source and destination nodes. Now, my existing cluster node (I will call it a donor) – gr01 looks like this:

The new node candidate (I will call it a joiner) – gr02, has no data but the same MySQL version installed. It also has the required settings in place, like the existing node address in group_replication_group_seeds, etc. The next step is to stop the MySQL service on the joiner (if already running), and wipe out it’s datadir:

and start the “listener” process, that waits to receive the data snapshot (remember to open the TCP port if you have a firewall):

Then, start the backup job on the donor:

On the joiner side, we will see:

BTW, if you noticed the difference in transfer rate between the two, please note that on the donor side I put |pv| before the compressor while in the joiner before decompressor. This way, I can monitor the compression ratio at the same time!

The next step will be to prepare the backup on joiner:

and fix the files ownership:

Now we should verify the GTID position information and restart the joiner (I have the group_replication_start_on_boot=off in my.cnf):

Now, let’s check if the position reported by the node is consistent with the above:

No, it is not. We have to correct it:

Finally, start the replication:

Let’s check the cluster status again:

OK, our cluster is consistent! The new node joined successfully as secondary. We can proceed to add more nodes!

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mike

It seems that, to simplify this article, no writes are arriving on gr01.

Before you start Percona XtraBackup, we see that gr01 has:

gtid_executed = 1-302662

Next you take a backup and prepare it on gr02, which then has:

xtrabackup_binlog_info = 1-302662

My question is, why does gr02 have a smaller @@GLOBAL.gtid_executed, requiring set global gtid_purged?

I thought maybe that these transactions were FLUSH or similar, but remember that these are before PXB started on gr01.

mppfor manu

I followed the above for MySQL 5.7.25 to recover a corrupted node. If you have the same situation, then I would add one additional step.

When I had the recovered node started again, it had a new server UUID. That meant a mismatch occurred between it and the cluster’s current metadata. To overcome this, I copied the “old” auto.cnf file from the previous server, then copied it back into the MySQL data directory prior to restarting the recovered node.

In the case of a total loss of the server, you could follow the procedure as written and then remove the old instance.

Thank you for a truly excellent article. It worked very well.

Andres Ofner

Great article, thanks! Just to let you know there is one more caveat: if you execute these steps to “clone” a node (e.g. in a group replication setup, like I did), it will complain when restarting replication with ERROR 3094, logging “Slave SQL for channel ‘group_replication_applier’: Slave failed to initialize relay log info structure from the repository, Error_code: MY-013124” to the mysql log.
This is caused by incorrect filenames in the mysql.slave_relay_log_info table which still reference the filenames from the original node, while xtrabackup already took care of the actual files and renamed them correctly. For me the simple solution was to “RESET SLAVE” on the clone, truncating the above table might work as well.