Auto-bootstrapping an all-down cluster: Percona XtraDB Cluster

One new feature in Percona XtraDB Cluster (PXC) in recent releases was the inclusion of the ability for an existing cluster to auto-bootstrap after an all-node-down event. Suppose you lose power on all nodes simultaneously or something else similar happens to your cluster. Traditionally, this meant manually re-bootstrapping the cluster, but not any more.

How it works

Given the above all-down situation, if all nodes are able to restart and see each other such that they all agree what the state was and that all nodes have returned, then the nodes will make a decision that it is safe for them to recover PRIMARY state as a whole.

This requires:

All nodes went down hard — that is; a kill -9, kernel panic, server power failure, or similar event
All nodes from the last PRIMARY component are restarted and are able to see each other again.

Demonstration

Suppose I have a 3 node cluster in a stable state. I then kill all nodes simultaneously (simulating a power failure or similar event):

[root@node1 ~]# killall -9 mysqld
[root@node2 ~]# killall -9 mysqld
[root@node3 ~]# killall -9 mysqld

[root@node1 ~]# killall -9 mysqld

[root@node2 ~]# killall -9 mysqld

[root@node3 ~]# killall -9 mysqld

I can see that each node maintained a state file in its datadir called ‘gvwstate.dat’. This contains the last known view of the cluster:

[root@node1 ~]# cat /var/lib/mysql/gvwstate.dat
my_uuid: 78caedfe-75a5-11e4-ac69-fb694ee06530
#vwbeg
view_id: 3 78caedfe-75a5-11e4-ac69-fb694ee06530 9
bootstrap: 0
member: 78caedfe-75a5-11e4-ac69-fb694ee06530 0
member: 87da2387-75a5-11e4-900f-ba49ecdce584 0
member: 8a25acd8-75a5-11e4-9e22-97412a1263ac 0
#vwend

[root@node1 ~]# cat /var/lib/mysql/gvwstate.dat

my_uuid: 78caedfe-75a5-11e4-ac69-fb694ee06530

#vwbeg

view_id: 3 78caedfe-75a5-11e4-ac69-fb694ee06530 9

bootstrap: 0

member: 78caedfe-75a5-11e4-ac69-fb694ee06530 0

member: 87da2387-75a5-11e4-900f-ba49ecdce584 0

member: 8a25acd8-75a5-11e4-9e22-97412a1263ac 0

#vwend

This file will not exist on a node if it was shutdown cleanly, only if the mysqld was uncleanly terminated. This file should exist and be the same on all the nodes for the auto-recovery to work.

I can now restart all 3 nodes more or less at the same time. Note that none of these nodes are bootstrapping and all of the nodes have the wsrep_cluster_address set to a proper list of the nodes in the cluster:

[root@node1 ~]# service mysql start
[root@node2 ~]# service mysql start
[root@node3 ~]# service mysql start

[root@node1 ~]# service mysql start

[root@node2 ~]# service mysql start

[root@node3 ~]# service mysql start

I can indeed see that they all start successfully and enter the primary state:

[root@node1 ~]# mysql -e "show global status like 'wsrep_cluster%'"
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 0                                    |
| wsrep_cluster_size       | 3                                    |
| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+

[root@node2 ~]# mysql -e "show global status like 'wsrep_cluster%'"
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 0                                    |
| wsrep_cluster_size       | 3                                    |
| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+

[root@node3 ~]# mysql -e "show global status like 'wsrep_cluster%'"
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 0                                    |
| wsrep_cluster_size       | 3                                    |
| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |
| wsrep_cluster_status     | Primary                              |
+--------------------------+--------------------------------------+

[root@node1 ~]# mysql -e "show global status like 'wsrep_cluster%'"

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| wsrep_cluster_conf_id | 0 |

| wsrep_cluster_size | 3 |

| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |

| wsrep_cluster_status | Primary |

+--------------------------+--------------------------------------+

[root@node2 ~]# mysql -e "show global status like 'wsrep_cluster%'"

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| wsrep_cluster_conf_id | 0 |

| wsrep_cluster_size | 3 |

| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |

| wsrep_cluster_status | Primary |

+--------------------------+--------------------------------------+

[root@node3 ~]# mysql -e "show global status like 'wsrep_cluster%'"

+--------------------------+--------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------+

| wsrep_cluster_conf_id | 0 |

| wsrep_cluster_size | 3 |

| wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 |

| wsrep_cluster_status | Primary |

+--------------------------+--------------------------------------+

Checking the logs, I can see this indication that the feature is working:

2014-11-26 19:59:36 1809 [Note] WSREP: promote to primary component
2014-11-26 19:59:36 1809 [Note] WSREP: view(view_id(PRIM,78caedfe,13) memb {
78caedfe,0
87da2387,0
8a25acd8,0
} joined {
} left {
} partitioned {
})
2014-11-26 19:59:36 1809 [Note] WSREP: save pc into disk
2014-11-26 19:59:36 1809 [Note] WSREP: clear restored view

2014-11-26 19:59:36 1809 [Note] WSREP: promote to primary component

2014-11-26 19:59:36 1809 [Note] WSREP: view(view_id(PRIM,78caedfe,13) memb {

78caedfe,0

87da2387,0

8a25acd8,0

} joined {

} left {

} partitioned {

})

2014-11-26 19:59:36 1809 [Note] WSREP: save pc into disk

2014-11-26 19:59:36 1809 [Note] WSREP: clear restored view

Changing this behavior

This feature is enabled by default, but you can toggle it off with the pc.recovery setting in the wsrep_provider_options

This feature helps cover an edge case where manual bootstrapping was necessary in the past to recovery properly. This feature was added in Percona XtraDB Cluster version 5.6.19, but was broken due to this bug. It was fixed in PXC 5.6.21

9 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Antonio Kang

9 years ago

Hi Jay,

I was testing this feature out on my VMs and I having trouble with starting mysql on all 3 of the nodes consistently.

Some times, was able to start mysql on 3 nodes after using the killall command listed in the tutorial but other times, I was not able to start up mysql on the nodes.

Also, I was wondering what scenarios do you recommend using this feature?

Jay Janssen

Author

9 years ago

@Antonio — Are you using the latest release? It had some issues prior to that. You’d need to check the logs when it fails to recover, and also confirm that they all have a gvwstate.dat file in their datadirs before the restart.

As for usage cases: it’s enabled by default, but it should gracefully handle auto-recovery the off-chance that you have a full cluster outage. The standard use case is a power failure and then recovery — once all the previous nodes from the last PRIMARY state recover, it should auto-bootstrap itself.

Peter Zaitsev

Admin

9 years ago

Jay,

I wonder how do we find which node really was discovered to be latest and actually was PRIMARY and gave IST to others (hopefully)

I am testing this and I see:

NODE1:
2014-12-05 17:03:25 2241 [Note] WSREP: promote to primary component

NODE2:
2014-12-05 17:03:25 2194 [Note] WSREP: promote to primary component

What I’m doing is shutting boxes off with 5 seconds delay and I want to ensure the last box down is actually picked so we have indeed latest state

Jay Janssen

Author

9 years ago

@Peter — my understanding is that this works by auto-rejoining the last PRIMARY component (if any). The only reason nodes might have different GTIDs is because apply is asynchronous. My understanding is that state transfer will happen normally in that case (and I guess a full SST), but I believe this is after the decision to go primary is reached. In this case, only node(s) with the highest GTID should continue, while the others ST.

Morgan Jones

9 years ago

Jay,

You say that the cluster will recover if all nodes are started and they are able to recover the PRIMARY component. What will happen if they cannot recover the PRIMARY component for some reason? Will the nodes be left running, but not replicating? Will the user be able to access the database?

Thanks,

Jay Janssen

Author

9 years ago

@Morgan Tocker — If the nodes cannot recover, then they should eventually timeout and exit. If this happens, you can manually bootstrap one and restart the others normally. During the timeout I would not expect apps to have any access at all.

Brian Kruger

8 years ago

Been playing around with this. Adding some additional help for people if they come across this.

If you do lose your whole cluster, for this to work, all of the nodes need to come back on line that are listed in the gvwstate.dat .. If you manage to not have a machine comeback after a power outage for instance, you can edit gvwstate.dat and remove the dead host uuid and restart.

The big question is with going through this effort, is it easier to just re-bootstrap at that point ?

Jay Janssen

Author

8 years ago

@Brian — gvwstate.dat is really a best-effort auto-recovery from an all-down situation. If you’re already logged in, you may as well just restart the nodes yourself.

sanjay

8 years ago

Thanks Jay

I have followed steps but when I trying to restart it
service mysql start
ERROR! MySQL (Percona XtraDB Cluster) is not running, but PID file exists

So, I have to manually remove PID file in order to start it.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster

How it works

Demonstration

Changing this behavior

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster

How it works

Demonstration

Changing this behavior

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation