Recently, I’ve been looking into issues with the interactions between MySQL asynchronous replication and Galera replication. In this blog post, I’d like to share what I’ve learned.

MySQL asynchronous replication and Galera replication

These interactions are complicated due to the number of factors involved (Galera replication vs. asynchronous replication, replication filters, and row-based vs. statement-based replication). So as a start, I’ll look at an issue that came up with setting up an asynchronous replication channel between two Percona XtraDB Cluster (PXC) clusters.

Here’s a view of the desired topology:

MySQL asynchronous replication

The Problem

We want to set up an asynchronous replication channel between two PXC clusters. We also set log-slave-updates on the async slave (PXC node 2a in the topology diagram).

This is an interesting configuration and results in unexpected behavior as the replication depends on the node where the operation was performed. Let’s use CREATE TABLE as an example.

  • Run CREATE TABLE on Node 1a.  The table replicates to Node 1b, but not to the nodes in cluster 2.
  • Run CREATE TABLE on Node 1b. The table replicates to all nodes (both cluster 1 and cluster 2).

Some background information

Understanding the problem requires some knowledge of MySQL threads. However, as a simplification, I’ll group the threads into three groups:

  • Main MySQL Client Thread: This thread handles the requests for the client connection (here the client is an external entity).
  • Async Replication Threads: There are multiple threads in use here, some handle I/O, and some apply the updates, but we will view them as a single entity for simplicity.
  • Galera Threads: There are also multiple threads here, similar to the Async Replication Threads. (The name Galera here refers to the underlying replication technology used by PXC.)

Why is the data not replicating?

In the first case (CREATE TABLE executed on Node1a)

  • The table is replicated from Node1a -> Node 1b via Galera replication.
  • The table is not replicated because the async replication threads are not picking up the changes.

In the second case (CREATE TABLE executed on Node 1b)

  • The table is replicated from Node1b -> Node 1a via Galera replication.
  • The table is replicated from Node1b -> Node 2a via async replication. This differs from the first case because the statement is executed on the Main MySQL client thread.  The async replication threads pick up the changes and send them to Node 2a.
  • The table is replicated from Node 2a -> Node 2b via Galera replication because log-slave-updates has been enabled on Node2a.

That last part is the important bit. We can view the Galera replication threads as another set of asynchronous replication threads. So if data is coming in via async replication, they have to be made visible to Galera by log-slave-updates.  This is true in the other direction also: log-slave-updates must be enabled for Galera to supply data to async replication.

The Solution

In this scenario, the answer is to set log-slave-updates on Node 1b (the async master) and on Node 2a (the async slave).

We set log-slave-updates on node 1b to allow the async threads to pickup the changes from the Galera threads.

We set log-slave-updates on node 2a to allow the Galera threads to pickup the changes from the async threads. Starting with PXC 5.7.17, calling START SLAVE on a PXC node will return an error unless log-slave-updates is enabled.

You must enable log-slave-updates on the node for data to be transferred between Galera and asynchronous replication.

Recommendations/Best Practices

If you plan to use MySQL asynchronous replication with Percona XtraDB Cluster (either as async master or slave), we recommend that you enable log-slave-updates on all nodes within the cluster. This to (1) to ensure that any async replication connections to/from the cluster work correctly and (2) to ensure that all the nodes within a cluster share the same configuration and behavior.

Recommended configuration diagram for the clusters:

MySQL asynchronous replicationYou May Also Like

Today’s developers are tasked with operating in an increasingly competitive business world. They must be ready to capitalize on new trends, as well as develop, test and deploy technologies faster than ever before. Enter MongoDB, which is as flexible as it is powerful. With it, developers can design extremely high-performance apps. Read our white paper, Why Developers Prefer MongoDB, for a discussion of each of MongoDB’s advantages and when they make the best business sense.

Embark on a seamless MySQL database setup with Percona Server for MySQL. Our solution brief guides you through establishing a robust, on-premises MySQL database, complete with failover and essential business continuity features.

Get the Solution Brief

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ramesh S

Based on your recommended configuration diagram:

What is the purpose of enabling the *log-slave-updates* on Node 2a? based on my understanding, as this is under the *sync-replication* and the it has been taken care by galera itself.

Kenn Takara

Enabling log-slave-updates on node 2a will cause node 2a’s async replication thread to write its events to the binlog, thus allowing galera to see the events (galera is acting like any other async replication thread, thus this is like chained replication).

Masood Alam

What will happen when we use semi-sync replication instead of async between two cluster. Given two clusters are far apart, and latency can cause the thread to wait for slave response. Does this effect Galera internal replication and slow down the cluster due to async setup ?

Kenn Takara

I haven’t done much with semi-sync replication, so I can’t answer definitely. I can see this causing a slowdown in the cluster because the speed of the cluster is determined by the slowest node (so in this case it would probably be the node sending the transaction to the other cluster).

(1) galera cluster 1 tries to commit
(2) Before it can commit it sends the transaction to cluster2 (via semi-sync), and has to wait until cluster2 has acknowledged receipt of the data
(3) cluster2 receives the data and returns back
(4) cluster1 can now commit

Timur Solodovnikov

I was playing with such schema and I do not understand the issue. To make it working you need to enable log_bin= ON on all cluster members

Let’s imagine:
— 1. Run CREATE TABLE on Node 1a. The table replicates to Node 1b, but not to the nodes in cluster 2.:

Create statement will be in binary logs of all cluster members will not be?

— Run CREATE TABLE on Node 1b. The table replicates to all nodes (both cluster 1 and cluster 2).
The only difference in binary logs will be “server id” value, but the statement will appear in binary logs.

slave_log_updates is not an option that you need worry about.

Please correct me if I am wrong.

Kenn Takara

Hi Timur,

The issue is that the replication threads (both async replication and galera replication) do not write to the binlog unless log-slave-updates is set. So the events will not jump between async replication and galera replication unless log-slave-updates is set. So within cluster 1, the table will replicate but the binlog event may not.

Here’s the relevant section from the MySQL 5.7 documenation (https://dev.mysql.com/doc/refman/5.7/en/binary-log.html)

A replication slave server by default does not write to its own binary log any data modifications that are received from the replication master. To log these modifications, start the slave with the –log-slave-updates option in addition to the –log-bin option (see Section 16.1.6.3, “Replication Slave Options and Variables”). This is done when a slave is also to act as a master to other slaves in chained replication.

Daniel Hinrichs

Does this also require GTID or can we use bin-log’s?