Chasing bugs can be a tedious task, and multi-threaded software doesn’t make it any easier. Threads will be scheduled at different times, instructions will not have deterministic results, and in order for one to reproduce a particular issue, it might require the exact same threads, doing the exact same work, at the exact same time. As you can imagine, this is not straightforward.
Let’s say your database is crashing or even having a transient stall. By the time you get to it, the crash has happened and you are stuck restoring service quickly and doing after-the-fact forensics. Wouldn’t it be nice to replay the work from right before or during the crash and see exactly what was happening?
Record and Replay is a technique where we record the execution of a program allowing it to be replayed over and over producing the same result. Engineers at Mozilla have created RR, and basically, this open source tool allows you to record the execution of the software and replay it under the well-known GDB.
A Backup Problem
To demonstrate how powerful the tool is, we will be walking through how we used it to narrow down the issue from PXB-2180 (Special thanks to Satya Bodapati, who helped with all the InnoDB internals research for this bug).
In summary, we were seeing Percona XtraBackup crashing at the prepare stage (remember, always test your backup!). The crash was happening randomly, sometimes after the second incremental, sometimes after the 10th incremental, with no visible pattern.
The stack trace was also not always the same. It was crashing on different parts of InnoDB, but here we had one commonality from all crashes – it always happened while trying to apply a redo log record to the same block page and space id:
1 2 3 4 5 6 7 8 | #12 0x00000000015ad05f in recv_parse_or_apply_log_rec_body (type=MLOG_COMP_REC_INSERT, ptr=0x7f2849150556 "\003K4G", '\377' <repeats 13 times>, end_ptr=0x7f2849150573 "", space_id=<optimized out>, page_no=<optimized out>, block=0x7f2847d7da00, mtr=0x7f286857b4f0, parsed_bytes=18446744073709551615) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2002 2002 ptr = page_cur_parse_insert_rec(FALSE, ptr, end_ptr, block, index, mtr); (gdb) p block->page->id +p block->page->id $3 = { m_space = 4294967294, m_page_no = 5 } |
Our suspicion was that the page layout on this block diverged between MySQL and XtraBackup. When working with these types of bugs, the crash is always the consequence of something that happened earlier, eg.: a crash on the sixth incremental backup could be the consequence of an issue that happened on the fourth incremental.
The main goal at this step is to prove and identify where the page layout has diverted.
With this information, we ran MySQL under RR and reran the backup until we saw the same issue at prepare. We can now replay the MySQL execution and check how it compares. Our idea is to:
- Read the LSNs for this same page before/after each backup prepare.
- Identify all changes to m_space = 4294967294 & m_page_no = 5 at mysqld.
Before we progress further, let’s explain a few things:
- m_space = 4294967294 correspond to the MySQL data dictionary (mysql.ibd) – dict0dict.h:1146
- On disk page, LSN is stored at the 16th byte of the page and has a size of 8 bytes – fil0types.h:66
- Pages are written sequentially to disk, as an example, for the default 16k page size, from bytes 1 to 16384 will have the data for page 0, from byte 16385 to 32768 data from page 1, and so on.
- Frame is raw data of a page – buf0buf.h:1358
Replaying the Execution
To start, let’s read what LSN we have on mysql.ibd for page five before the backup. We will be using od (check man od
for more information) and the information explained above:
1 2 3 | $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 10 21 85 0240030 |
And check if it matches an LSN stamp from mysqld. For that we will add a conditional breakpoint on the replay execution of MySQL at function buf_flush_note_modification:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | $ rr replay . . . . (rr) b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5 +b buf_flush_note_modification if block->page->id->m_space == 4294967294 && block->page->id->m_page_no == 5 Breakpoint 1 at 0x495beb1: file /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic, line 69. (rr) c [Switching to Thread 18839.18868] Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=17892965, end_lsn=17893015, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $1 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x10, [0x6] = 0x21, [0x7] = 0x85} (rr) |
We can see the LSN stamp from before the preparation of full backup and the first stamp from the replay session match. Time to prepare the backup, advance the replay execution, and recheck:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | xtrabackup --prepare --apply-log-only --target-dir=full/ . . . Shutdown completed; log sequence number 17897577 Number of pools: 1 210402 17:46:29 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 11 07 06 0240030 (rr) c +c Continuing. [Switching to Thread 18839.18868] Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=19077332, end_lsn=19077382, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $16 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x11, [0x6] = 0x7, [0x7] = 0x6} (rr) |
Same LSN stamp on both, server and backup. Time to move on and start to apply the incrementals:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc1/ . . . Shutdown completed; log sequence number 19082430 . . . 210402 18:12:20 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 23 19 06 0240030 (rr) c +c Continuing. Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=20262758, end_lsn=20262808, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $17 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x23, [0x6] = 0x19, [0x7] = 0x6} (rr) |
Once again, we have a matching LSN stamp on both sides. Moving to the next incremental:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc2/ . . . Shutdown completed; log sequence number 20269669 . . . 210402 18:15:04 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 35 2f 98 0240030 (rr) c +c Continuing. Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=21449997, end_lsn=21450047, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $18 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x35, [0x6] = 0x2f, [0x7] = 0x98} (rr) |
Incremental two applied and matching LSN stamp from mysqld. Let’s keep doing this until we find a mismatch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc3/ . . . Shutdown completed; log sequence number 21455916 . . . 210402 18:18:25 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 47 4d 3f 0240030 (rr) c +c Continuing. Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=25529471, end_lsn=25529521, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $19 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x47, [0x6] = 0x4d, [0x7] = 0x3f} (rr) xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/ . . . Shutdown completed; log sequence number 23044902 . . . 210402 18:24:00 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 5f a3 26 0240030 (rr) c +c Continuing. Breakpoint 1, buf_flush_note_modification (block=0x7fd2df4ad750, start_lsn=27218464, end_lsn=27218532, observer=0x0) at /home/marcelo.altmann/percona-server/storage/innobase/include/buf0flu.ic:69 69 ut_ad(!srv_read_only_mode || ++rr-set-suppress-run-hook 1 (rr) p/x block->frame[16]@8 +p/x block->frame[16]@8 $242 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x9f, [0x6] = 0x3f, [0x7] = 0xc9} (rr) |
Ok, here we have something. Backup files jumped from 0x01474d3f
to 0x015fa326
when applying incremental four while the server moved from 0x01474d3f
to 0x019f3fc9
. Perhaps we missed some other place where we can update the LSN stamp of a page? But now, we are at a point in the future with our replay execution of the MySQL server.
Replaying the Execution Backward
Here is (yet) another very cool feature from RR, it allows you to replay the execution backward. To eliminate the possibility of missing a place that is also updating the LSN of this block, let’s add a hardware watchpoint on the block->frame memory address and reverse the execution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | (rr) p block->frame +p block->frame $243 = (unsigned char *) 0x7fd2e0758000 "\327\064X[" (rr) watch *(unsigned char *) 0x7fd2e0758000 +watch *(unsigned char *) 0x7fd2e0758000 Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000 (rr) disa 1 +disa 1 (rr) reverse-cont +reverse-cont +continue Continuing. Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000 Old value = 215 '\327' New value = 80 'P' 0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "P\257\"\347", n=3610531931) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135 135 b[0] = static_cast<byte>(n >> 24); ++rr-set-suppress-run-hook 1 ++rr-set-suppress-run-hook 1 (rr) p/x buf_flush_init_for_writing::block->frame[16]@8 +p/x buf_flush_init_for_writing::block->frame[16]@8 $11 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x9f, [0x6] = 0x3f, [0x7] = 0xc9} (rr) reverse-cont +reverse-cont +continue Continuing. Hardware watchpoint 2: *(unsigned char *) 0x7fd2e0758000 Old value = 80 'P' New value = 43 '+' 0x0000000004c13903 in mach_write_to_4 (b=0x7fd2e0758000 "+k*\304", n=1353655015) at /home/marcelo.altmann/percona-server/storage/innobase/include/mach0data.ic:135 135 b[0] = static_cast<byte>(n >> 24); ++rr-set-suppress-run-hook 1 ++rr-set-suppress-run-hook 1 (rr) p/x buf_flush_init_for_writing::block->frame[16]@8 +p/x buf_flush_init_for_writing::block->frame[16]@8 $12 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x1, [0x5] = 0x47, [0x6] = 0x4d, [0x7] = 0x3f} (rr) |
By replaying the execution backward we can see that indeed the server changed the LSN from 0x01474d3f
to 0x019f3fc9
. This confirms the issue is at incremental backup four as the LSN 0x015fa326
we see at end of incremental four was never a valid LSN at the server execution.
Root Cause
Now that we have limited the scope from six backups to a single one, things will become easier.
If we look closely at the log messages from the –prepare of the backup we can see that LSN of mysql.ibd matches the LSN stamp at the end of the backup:
1 2 3 4 5 6 7 8 9 10 11 12 13 | xtrabackup --prepare --apply-log-only --target-dir=full/ --incremental-dir=inc4/ . . . Shutdown completed; log sequence number 23044902 . . . 210402 18:24:00 completed OK! $ od -j $((16384 * 5 + 16)) -N 8 -t x1 full/mysql.ibd 0240020 00 00 00 00 01 5f a3 26 0240030 $ echo $(( 16#015fa326 )) 23044902 |
By checking the stack trace of the issue and examining further the block we have parsed we can see that this is innodb_dynamic_metadata index:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | (gdb) f 13 +f 13 #13 0x00000000015af3dd in recv_recover_page_func (just_read_in=just_read_in@entry=true, block=block@entry=0x7f59efd7da00) at /home/marcelo.altmann/percona-xtrabackup/storage/innobase/log/log0recv.cc:2624 2624 recv_parse_or_apply_log_rec_body(recv->type, buf, buf + recv->len, (gdb) p/x block->frame[66]@8 +p/x block->frame[66]@8 $4 = {[0x0] = 0x0, [0x1] = 0x0, [0x2] = 0x0, [0x3] = 0x0, [0x4] = 0x0, [0x5] = 0x0, [0x6] = 0x0, [0x7] = 0x2} |
You might be wondering where 66 came from; this is from examining position FIL_PAGE_DATA + PAGE_INDEX_ID. That gave us index ID 2. This is below 1024, which is reserved for Data Dictionary tables. By checking what is the second table on that list, we can see that it’s innodb_dynamic_metadata. With all this information summed up we can look at what the server does at shutdown, and it becomes clear what the issue is:
srv0start.cc:3965
1 2 3 4 5 6 7 8 | /** Shut down the InnoDB database. */ void srv_shutdown() { . . . /* Write dynamic metadata to DD buffer table. */ dict_persist_to_dd_table_buffer(); . . . } |
As part of the shutdown process, we are persisting dirty metadata back to the DD Buffer table (innodb_dynamic_metadata), which is wrong. Those changes will likely be persisted by the server and redo logged once the server performs a checkpoint. Also, more data can be merged together by the point of when the backup was taken and when the server itself persists this data to DD Tables. This is a result of the implementation of WL#7816 and WL#6204 which required Percona XtraBackup to change how it handles these types of redo records.
Summary
In this blog, we walked through the process of analyzing a real Percona XtraBackup bug. This bug exposes a challenge we face in various types of bugs, where the crash/malfunction is a consequence of something that happened way before, and by the time we have a stack trace/coredump, it is too late to perform a proper analysis. Record and Replay enabled us to consistently replay the execution of the source server, making it possible to narrow down the issue to where the root cause was.