Global Transaction IDs (GTIDs) are one of my favorite features of MySQL 5.6. The main limitation is that you must stop all the servers at the same time to allow GTID-replication. Not everyone can afford to take a downtime so this requirement has been a showstopper for many people. Starting with Percona Server 5.6.22-72.0 enabling GTID replication can be done without almost no downtime. Let’s see how to do it.

Implementation of the Facebook patch

Finding a solution to migrate to GTIDs with no downtime is not a new idea, and several companies have already developed their own patch. The 2 best known implementations are the one from Facebook and the one from Booking.com.

Both options have pros and cons, and we finally chose to port the Facebook patch and add a new setting (gtid_deployment_step).

Performing the migration

Let’s assume we have a master-slaves setup with 4 servers A, B, C and D. A is the master:

gtid_1
The 1st step is to take each slave out of rotation, one at a time, and set gtid_mode = ON and gtid_deployment_step = ON (and also log_bin, log_slave_updates and enforce_gtid_consistency).

gtid_2
gtid_deployment_step = ON means that a server will not generate GTIDs when it executes writes, but it will record a GTID in its binary log if it gets an event from the replication stream tagged with a GTID.

The 2nd step is to promote one of the slaves to become the new master (for instance C) and to disable gtid_deployment_step. It is a regular slave promotion so you should do it the same way you deal with planned slave promotions (for instance using MHA or your own scripts). Our patch doesn’t help you do this promotion.

At this point replication will break on the old master as it has gtid_mode = OFF and gtid_deployment_step = OFF.

gtid_3
Don’t forget that you need to use CHANGE MASTER TO MASTER_AUTO_POSITION = 1 to enable GTID-based replication.

The 3rd step is to restart the old master to set gtid_mode = ON. Replication will resume automatically, but don’t forget to set MASTER_AUTO_POSITION = 1.

gtid_4
The final step is to disable gtid_deployment_step on all slaves. This can be done dynamically:

and you should remove the setting from the my.cnf file so that it is not set again when the server is restarted.

Optionally, you can promote the old master back to its original role.

That’s it, GTID replication is now available without having restarted all servers at the same time!

Limitations

At some point during the migration, a slave promotion is needed. And at this point, you are still using position-based replication. The patch will not help you with this promotion so use your regular failover scripts. If you have no scripts to deal with that kind of situation, make sure you know how to proceed.

Also be aware that this patch provides a way to migrate to GTIDs with no downtime, but not a way to migrate away from GTIDs with no downtime. So test carefully and make sure you understand all the new stuff that comes with GTIDs, like the new replication protocol, or how to skip transactions.

Other topologies

If you are using master-master replication or multiple tier replication, you can follow the same steps. With multiple tier replication, simply start by setting gtid_mode = ON and gtid_deployment_step = ON for the leaves first.

Conclusion

If you’re interested by the benefits of GTID replication but if taking a downtime has always scared you, you should definitely download the latest Percona Server 5.6 and give it a try!

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jiao

online upgrade GTID cause binlog damage in high write QPS situation
https://bugs.launchpad.net/percona-server/+bug/1516919