MariaDB no longer meeting your needs?
Migrate to Percona software for MySQL – an open source, production-ready, and enterprise-grade MySQL alternative.
In this blog post, I’ll look at point-in-time recovery (PITR) options for MySQL, MariaDB and Percona Server for MySQL.
It is a common good practice to extend data safety by having additional measures apart from regular data backups, such as delayed slaves and binary log backups. These two options provide the ability to restore the data to any given point in time, or just revert from some bad accidents. These methods have their limitations of course: delayed slaves only help if a deadly mistake is noticed fast enough, while full point-in-time recovery (PITR) requires the last full backup and binary logs (and therefore usually takes a lot of time).
How to reverse from disaster faster
Alibaba engineers and the MariaDB team implemented an interesting feature in their version of the mysqlbinlog tool: the --flashback
option. Based on ROW-based DML events, it can transform the binary log and reverse purposes. That means it can help undo given row changes extremely fast. For instance, it can change DELETE events to INSERTs and vice versa, and it will swap WHERE and SET parts of the UPDATE events. This simple idea can dramatically speed up recovery from certain types of mistakes or disasters.
The question is whether it works with non-MariaDB variants. To verify that, I tested this feature with the latest available Percona Server for MySQL 5.7 (which is fully compatible with upstream MySQL).
1 2 3 4 5 6 7 | master [localhost] {msandbox} ((none)) > select @@version,@@version_comment; +---------------+--------------------------------------------------------+ | @@version | @@version_comment | +---------------+--------------------------------------------------------+ | 5.7.21-20-log | Percona Server (GPL), Release 20, Revision ed217b06ca3 | +---------------+--------------------------------------------------------+ 1 row in set (0.00 sec) |
First, let’s simulate one possible deadly scenario: a forgotten WHERE in DELETE statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 200 | +----------+ 1 row in set (0.00 sec) master [localhost] {msandbox} ((none)) > delete from test.sbtest1; Query OK, 200 rows affected (0.04 sec) slave1 [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec |
So, our data is lost on both the master and slave!
Let’s start by downloading the latest MariaDB server 10.2.x package, which I’m hoping has a mysqlbinlog tool that works with MySQL 5.7, and unpack it to some custom location:
1 2 3 | $ dpkg -x mariadb-server-10.2_10.2.13+maria~wheezy_amd64.deb /opt/maria/ $ /opt/maria/usr/bin/mysqlbinlog --help|grep flash -B, --flashback Flashback feature can rollback you committed data to a |
It has the function we are looking for. Now, we have to find the culprit transaction or set of transactions we want to revert. A simplified example may look like this:
1 2 3 | $ mysqlbinlog -v --base64-output=DECODE-ROWS mysql-bin.000002 > mysql-bin.000002.sql $ less mysql-bin.000002.sql |
By searching through the decoded binary log, we are looking for transactions that have wiped out the table test
.sbtest1
. It looks like this (as the table had 200 rows, it is pretty long, so I’ve pasting only the beginning and the end):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | BEGIN /*!*/; # at 291 #180314 15:30:34 server id 1 end_log_pos 348 CRC32 0x06cd193e Table_map: `test`.`sbtest1` mapped to number 111 # at 348 #180314 15:30:34 server id 1 end_log_pos 8510 CRC32 0x064634c5 Delete_rows: table id 111 ... ### DELETE FROM `test`.`sbtest1` ### WHERE ### @1=200 ### @2=101 ### @3='26157116088-21551255803-13077038767-89418462090-07321921109-99464656338-95996554805-68102077806-88247356874-53904987561' ### @4='51157774706-69740598871-18633441857-39587481216-98251863874' # at 38323 #180314 15:30:34 server id 1 end_log_pos 38354 CRC32 0x6dbb7127 Xid = 97 COMMIT/*!*/; |
It is very important to take the proper start and stop positions. We need the ones exactly after BEGIN and before the final COMMIT. Then, let’s test if the tool produces the reverse statements as expected. First, decode the rows to the .sql file:
1 | $ /opt/maria/usr/bin/mysqlbinlog --flashback -v --base64-output=DECODE-ROWS --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.sql |
Inside, we find 200 of those. Looks good:
1 2 3 4 | ### INSERT INTO `test`.`sbtest1` ### SET ### @1=200 ... |
Since we verified the positions are correct, we can prepare a binary log file:
1 | $ /opt/maria/usr/bin/mysqlbinlog --flashback --start-position=291 --stop-position=38323 mysql-bin.000002 > mysql-bin.000002_flash.bin |
and load it back to our master:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | master [localhost] {msandbox} (test) > source mysql-bin.000002_flash.bin Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) master [localhost] {msandbox} (test) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 200 | +----------+ 1 row in set (0.00 sec) |
and double check they restored on slaves:
1 2 3 4 5 6 7 | slave1 [localhost] {msandbox} (test) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 200 | +----------+ 1 row in set (0.00 sec) |
GTID problem
MariaDB has a completely different GTID implementation from MySQL and Percona Server. You can expect problems when decoding incompatible GTID enabled binary logs with MariaDB. As MariaDB’s mysqlbinlog does not support –start/stop-gtid options (even for its own implementation), we have to take the usual positions anyway. From a GTID-enabled binary log, for example, delete can look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # at 2300 #180315 9:37:31 server id 1 end_log_pos 2365 CRC32 0x09e4d815 GTID last_committed=1 sequence_number=2 rbr_only=yes /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:2'/*!*/; # at 2365 #180315 9:37:31 server id 1 end_log_pos 2433 CRC32 0xac62a20d Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1521103051/*!*/; BEGIN /*!*/; # at 2433 #180315 9:37:31 server id 1 end_log_pos 2490 CRC32 0x275601d6 Table_map: `test`.`sbtest1` mapped to number 108 # at 2490 #180315 9:37:31 server id 1 end_log_pos 10652 CRC32 0xe369e169 Delete_rows: table id 108 ... # at 42355 #180315 9:37:31 server id 1 end_log_pos 42386 CRC32 0xe01ff558 Xid = 31 COMMIT/*!*/; SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; |
The tool seems to work, and transforms the delete transaction to a sequence of INSERTs. However, the server rejects it when we try to load it on a GTID-enabled master:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.00 sec) ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON. ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON. ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON. ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON. ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON. Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) |
Unfortunately, the solution here is either to disable GTID mode for the recovery time (which is surely tricky in replicated clusters), or try to add GTID-related information to the resulting binary log with the --flashback option. In my case, adding these lines worked (I used the next free available GTID sequence):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | $ diff -u mysql-bin.000003.flash mysql-bin.000003.flash.gtid --- mysql-bin.000003.flash 2018-03-15 10:20:20.080487998 +0100 +++ mysql-bin.000003.flash.gtid 2018-03-15 10:25:02.909953620 +0100 @@ -4,6 +4,10 @@ DELIMITER /*!*/; #180315 9:32:51 server id 1 end_log_pos 123 CRC32 0x941b189a Start: binlog v 4, server v 5.7.21-20-log created 180315 9:32:51 at startup ROLLBACK/*!*/; +# at 154 +#180315 9:37:05 server id 1 end_log_pos 219 CRC32 0x69e4ce26 GTID last_committed=0 sequence_number=1 rbr_only=yes +/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/; +SET @@SESSION.GTID_NEXT= '00020996-1111-1111-1111-111111111111:5'/*!*/; BINLOG ' sy+qWg8BAAAAdwAAAHsAAAAAAAQANS43LjIxLTIwLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAACzL6paEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA @@ -724,6 +728,7 @@ '/*!*/; COMMIT /*!*/; +SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; |
1 2 3 4 5 6 7 8 9 | master [localhost] {msandbox} ((none)) > source mysql-bin.000003.flash.gtid (...) master [localhost] {msandbox} ((none)) > select count(*) from test.sbtest1; +----------+ | count(*) | +----------+ | 200 | +----------+ 1 row in set (0.00 sec |
Limitations
Obviously, flashback cannot help after DROP/TRUNCATE or other DDL commands. These are not transactional, and affected rows are never recorded in the binary log. It doesn’t work with encrypted or compressed binary logs either. But most importantly, to produce complete events that can reverse bad transactions, the binary format must be ROW. The row image also must be FULL:
1 2 3 4 5 6 7 | master [localhost] {msandbox} ((none)) > select @@binlog_format,@@binlog_row_image; +-----------------+--------------------+ | @@binlog_format | @@binlog_row_image | +-----------------+--------------------+ | ROW | FULL | +-----------------+--------------------+ 1 row in set (0.00 sec) |
If these conditions are not met (or if you’re dealing with a too-complicated GTID issue), you will have to follow the standard point-in-time recovery procedure.
It also will not recover changes made by a cascading foreign key.
Thank you for this post! Here is an awk one liner to re-inject
@@SESSION.GTID_NEXT
into the binary log:| awk 'BEGIN {"uuidgen -t" |& getline u} /^BEGIN/ {c += 1 ; print "SET @@SESSION.GTID_NEXT= \x27" u ":" c "\x27/*!*/;"} {print}'
This will create a one time, time-based UUID, and inject a
SET @@SESSION.GTID_NEXT
entry just before aBEGIN
statement in the binary log. Range is sequentially running:1
…:n
Also, this:
| sed -e s/', @@session.check_constraint_checks=1//g'
and so, for example, you would:
cat mysql-bin.000002_flash.bin | awk 'BEGIN {"uuidgen -t" |& getline u} /^BEGIN/ {c += 1 ; print "SET @@SESSION.GTID_NEXT= \x27" u ":" c "\x27/*!*/;"} {print}' | sed -e s/', @@session.check_constraint_checks=1//g' | mysql -h localhost
Thanks a lot.