How To Recover Percona XtraDB Cluster 5.7 Node Without SST

The Problem

Recover Percona XtraDB Cluster 5.7 Node Without SST State Snapshot Transfer can be a very long and expensive process, depending on the size of your Percona XtraDB Cluster (PXC)/Galera cluster, as well as network and disk bandwidth. There are situations where it is needed though, like after long enough node separation, where the gcache on other members was too small to keep all the needed transactions.

Let’s see how we can avoid SST, yet recover fast and without even the need for doing a full backup from another node.

Below, I will present a simple scenario, where one of the cluster nodes was having a broken network for long enough that it will make Incremental State Transfer (IST) no longer possible.

For this solution to work, I am assuming that the cluster has binary logs with GTID mode enabled, and logs with missing transactions were not purged yet. Though it would be still possible without GTID, just slightly more complex.

My example PXC member, node3, gets separated from the cluster due to a network outage. Its last applied transaction status is:

node3 > show global variables like 'gtid_executed';
+---------------+----------------------------------------------+
| Variable_name | Value                                        |
+---------------+----------------------------------------------+
| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-28578 |
+---------------+----------------------------------------------+
1 row in set (0.01 sec)

node3 > show status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 28610 |
+----------------------+-------+
1 row in set (0.00 sec)

node3 > show global variables like 'gtid_executed';

+---------------+----------------------------------------------+

| Variable_name | Value |

+---------------+----------------------------------------------+

| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-28578 |

+---------------+----------------------------------------------+

1 row in set (0.01 sec)

node3 > show status like 'wsrep_last_committed';

+----------------------+-------+

| Variable_name | Value |

+----------------------+-------+

| wsrep_last_committed | 28610 |

+----------------------+-------+

1 row in set (0.00 sec)

However, other available active nodes in the cluster have already rotated the gcache further:

node1 > show status like 'wsrep_local_cached_downto';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| wsrep_local_cached_downto | 42629 |
+---------------------------+-------+
1 row in set (0.00 sec)

node1 > show status like 'wsrep_local_cached_downto';

+---------------------------+-------+

| Variable_name | Value |

+---------------------------+-------+

| wsrep_local_cached_downto | 42629 |

+---------------------------+-------+

1 row in set (0.00 sec)

Hence, after the network is restored, it fails to re-join the cluster due to IST failure:

DONOR error log:

2021-06-30T21:52:02.199697Z 2 [Note] WSREP: IST request: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:28610-83551|tcp://127.0.0.1:27172
2021-06-30T21:52:02.199743Z 2 [Note] WSREP: IST first seqno 28611 not found from cache, falling back to SST

1 2	2021-06-30T21:52:02.199697Z 2 [Note] WSREP: IST request: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:28610-83551\|tcp://127.0.0.1:27172 2021-06-30T21:52:02.199743Z 2 [Note] WSREP: IST first seqno 28611 not found from cache, falling back to SST

JOINER error log:

2021-06-30T21:52:02.139242Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 83551)
2021-06-30T21:52:02.139408Z 4 [Note] WSREP: State transfer required:
Group state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:83551
Local state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:28610
...
2021-06-30T21:52:02.200137Z 0 [Warning] WSREP: 1.0 (node1): State transfer to 0.0 (node3) failed: -61 (No data available)
2021-06-30T21:52:02.200171Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():805: State transfer request failed unrecoverably because the donor seqno had gone forward during IST, but SST request was not prepared from our side due to selected state transfer method (which do not supports SST during node operation). Restart required.
2021-06-30T21:52:02.200191Z 0 [Note] WSREP: gcomm: terminating thread

2021-06-30T21:52:02.139242Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 83551)

2021-06-30T21:52:02.139408Z 4 [Note] WSREP: State transfer required:

Group state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:83551

Local state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:28610

...

2021-06-30T21:52:02.200137Z 0 [Warning] WSREP: 1.0 (node1): State transfer to 0.0 (node3) failed: -61 (No data available)

2021-06-30T21:52:02.200171Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():805: State transfer request failed unrecoverably because the donor seqno had gone forward during IST, but SST request was not prepared from our side due to selected state transfer method (which do not supports SST during node operation). Restart required.

2021-06-30T21:52:02.200191Z 0 [Note] WSREP: gcomm: terminating thread

And node3 shuts down its service as a result.

The Solution

To avoid using full backup transfer from the donor, let’s try asynchronous replication here, to let the failed node catch up with the others so that IST should be possible later.

To achieve that, let’s modify the configuration file first on the separated node, and add these to avoid accidental writes during the operation:

super_read_only = 1
skip_networking

1 2	super_read_only = 1 skip_networking

and to disable PXC mode for the time, comment out the provider:

#wsrep-provider=/usr/lib64/libgalera_smm.so

1	#wsrep-provider=/usr/lib64/libgalera_smm.so

Now, after a restart, node3 becomes a standalone MySQL node, without Galera replication enabled. So, let’s configure async replication channel (repl user was created already on all nodes):

node3 > CHANGE MASTER TO MASTER_HOST='localhost', MASTER_USER='repl', MASTER_PASSWORD='replpassword', MASTER_AUTO_POSITION=1, MASTER_PORT=27037;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

node3 > start slave;
Query OK, 0 rows affected (0.00 sec)

node3 > CHANGE MASTER TO MASTER_HOST='localhost', MASTER_USER='repl', MASTER_PASSWORD='replpassword', MASTER_AUTO_POSITION=1, MASTER_PORT=27037;

Query OK, 0 rows affected, 2 warnings (0.03 sec)

node3 > start slave;

Query OK, 0 rows affected (0.00 sec)

And then wait for it to catch up with the source node. Once this replica is fully up to date, let’s stop it, remove async channel configuration, and note its new GTID position:

node3 > stop slave;
Query OK, 0 rows affected (0.00 sec)

node3 > reset slave all;
Query OK, 0 rows affected (0.01 sec)

node3 > show global variables like 'gtid_executed';
+---------------+----------------------------------------------+
| Variable_name | Value                                        |
+---------------+----------------------------------------------+
| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-83553 |
+---------------+----------------------------------------------+
1 row in set (0.00 sec)

node3 > stop slave;

Query OK, 0 rows affected (0.00 sec)

node3 > reset slave all;

Query OK, 0 rows affected (0.01 sec)

node3 > show global variables like 'gtid_executed';

+---------------+----------------------------------------------+

| Variable_name | Value |

+---------------+----------------------------------------------+

| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-83553 |

+---------------+----------------------------------------------+

1 row in set (0.00 sec)

Now, we have to find the corresponding cluster’s wsrep sequence, in the source binary log, like this:

$ mysqlbinlog mysql-bin.000005|grep -A1000 '2cd15721-261a-ee14-4166-00c9b4945b0b:83553'|grep Xid|head -1
#210701  0:19:06 server id 100  end_log_pos 1010 CRC32 0x212d2592  Xid = 83557

1 2	$ mysqlbinlog mysql-bin.000005\|grep -A1000 '2cd15721-261a-ee14-4166-00c9b4945b0b:83553'\|grep Xid\|head -1 #210701 0:19:06 server id 100 end_log_pos 1010 CRC32 0x212d2592 Xid = 83557

With this position, the grastate.dat file on the failed node has to be updated, as follows:

$ cat pxc_msb_pxc5_7_33/node3/data/grastate.dat
# GALERA saved state
version: 2.1
uuid:    d32ea8de-d9e5-11eb-be99-ff364b6ba4f4
seqno:   83557
safe_to_bootstrap: 0

$ cat pxc_msb_pxc5_7_33/node3/data/grastate.dat

# GALERA saved state

version: 2.1

uuid: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4

seqno: 83557

safe_to_bootstrap: 0

The previous configuration file modifications must be now reverted, and the service restarted again.

This time, IST was finally possible:

2021-06-30T22:26:10.563512Z 2 [Note] WSREP: State transfer required:
Group state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:85668
Local state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:83557
...
2021-06-30T22:26:28.860555Z 2 [Note] WSREP: Receiving IST: 2111 writesets, seqnos 83557-85668
2021-06-30T22:26:28.860812Z 0 [Note] WSREP: Receiving IST...  0.0% (   0/2111 events) complete.
2021-06-30T22:26:29.247313Z 0 [Note] WSREP: Receiving IST...100.0% (2111/2111 events) complete.
2021-06-30T22:26:29.247713Z 2 [Note] WSREP: IST received: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:85668
2021-06-30T22:26:29.247902Z 0 [Note] WSREP: 0.0 (node3): State transfer from 1.0 (node1) complete.
...
2021-06-30T22:26:29.248074Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 85668)

2021-06-30T22:26:10.563512Z 2 [Note] WSREP: State transfer required:

Group state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:85668

Local state: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:83557

...

2021-06-30T22:26:28.860555Z 2 [Note] WSREP: Receiving IST: 2111 writesets, seqnos 83557-85668

2021-06-30T22:26:28.860812Z 0 [Note] WSREP: Receiving IST... 0.0% ( 0/2111 events) complete.

2021-06-30T22:26:29.247313Z 0 [Note] WSREP: Receiving IST...100.0% (2111/2111 events) complete.

2021-06-30T22:26:29.247713Z 2 [Note] WSREP: IST received: d32ea8de-d9e5-11eb-be99-ff364b6ba4f4:85668

2021-06-30T22:26:29.247902Z 0 [Note] WSREP: 0.0 (node3): State transfer from 1.0 (node1) complete.

...

2021-06-30T22:26:29.248074Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 85668)

And node3 joins back the cluster properly:

node3 > show global variables like 'gtid_executed';
+---------------+----------------------------------------------+
| Variable_name | Value                                        |
+---------------+----------------------------------------------+
| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-85664 |
+---------------+----------------------------------------------+
1 row in set (0.00 sec)

node3 > show status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 85668 |
+----------------------+-------+
1 row in set (0.01 sec)

node3 > show global variables like 'gtid_executed';

+---------------+----------------------------------------------+

| Variable_name | Value |

+---------------+----------------------------------------------+

| gtid_executed | 2cd15721-261a-ee14-4166-00c9b4945b0b:1-85664 |

+---------------+----------------------------------------------+

1 row in set (0.00 sec)

node3 > show status like 'wsrep_last_committed';

+----------------------+-------+

| Variable_name | Value |

+----------------------+-------+

| wsrep_last_committed | 85668 |

+----------------------+-------+

1 row in set (0.01 sec)

Summary

With the help of traditional asynchronous replication, we were able to restore the failed node back to the cluster faster and without all the overhead related to a full backup made by SST.

The only requirement for such a method to work is an enabled binary log, with a long enough rotation policy.

I have tested this on version:

node3 > select @@version,@@version_commentG
*************************** 1. row ***************************
        @@version: 5.7.33-36-49-log
@@version_comment: Percona XtraDB Cluster binary (GPL) 5.7.33-rel36-49, Revision a1ed9c3, wsrep_31.49
1 row in set (0.00 sec)

node3 > select @@version,@@version_commentG

*************************** 1. row ***************************

@@version: 5.7.33-36-49-log

@@version_comment: Percona XtraDB Cluster binary (GPL) 5.7.33-rel36-49, Revision a1ed9c3, wsrep_31.49

1 row in set (0.00 sec)

Unfortunately, a similar solution does not work with Percona XtraDB Cluster 8.0.x, due to the modified way wsrep positions are kept in the storage engine, hence the trick with updating grastate.dat does not work as expected there.

I would like to also remind here, that in case some node is expected to stay separated from the cluster for too long, there is a way to preserve longer galera cache history for it. So by doing this, the solution I presented may not even be needed – check the relevant article: Want IST Not SST for Node Rejoins? We Have a Solution!

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

La Cancellera Yoann

2 years ago

Great post thanks ! If GTID is not enabled, how would you do it ? I guess parsing binary logs, grep Xid= to get the next correct position ?

Przemysław Malkowski

Author

Reply to La Cancellera Yoann

2 years ago

Thanks! Yes, without GTID is it very similar, just also the initial replication binlog position needs to be found out from the binary log , based on relevant Xid. Fortunately, Xid are consistent across all cluster nodes.

peterzaitsev

2 years ago

Does it only work for 5.7 or would similar procedure work for Percona XtraDB Cluster 8.0 too ?

Przemysław Malkowski

Author

Reply to peterzaitsev

2 years ago

This procedure fails with PXC 8.0, or at least I could not find any way to make it work.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How To Recover Percona XtraDB Cluster 5.7 Node Without SST

The Solution

Summary

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How To Recover Percona XtraDB Cluster 5.7 Node Without SST

The Solution

Summary

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation