WiredTiger File Forensics Part 3: Viewing all the MongoDB Data

WiredTiger MongoDB This article continues on from Part 1: Building “wt” and “Part 2: wt dump” to show how to extract any of your MongoDB documents directly from WiredTiger’s raw data files. It’ll also show how to take a peek into the index files. Lastly, it’ll show how to also look in the WT transaction log to see updates made since the latest checkpoint.

[porto_content_box align=”left”]⚠ Warning: the wt dump tool usually opens files in read-write mode, even for commands you’d think would be read-only. It will automatically step through its normal recovery process most of the time, so it may change files.
Until you know its effects on data files do not use it on your only copy of precious data – make a copy of the data directory and learn with the copy first.[/porto_content_box]

List up the Collections and Indexes

WiredTiger doesn’t name any of its data files according to the MongoDB object names, so as a first step you’ll have to extract a table of WT idents (=identifiers) vs. collection and index names.

The _mdb_catalog.wt file is not the top table in the WiredTiger storage engine’s own hierarchy of data sources. To the MongoDB layer of code though it is the complete definition of collection and index objects in the database. This includes both MongoDB system and user-made collections and indexes.

Dump WT ident vs Collections

As explained in the previous Part 2: wt dump blog post, you should reuse this command to save a tab-delimited file of WiredTiger ident(ifier)s vs collection names. I’ll call this file wt_ident_vs_collection_ns.tsv in this article.

$ #cd to a copy of a MongoDB data directory
$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("md")) | [.ident, .ns] | @tsv' | sort > wt_ident_vs_collection_ns.tsv
$ 
$ head -n 5 wt_ident_vs_collection_ns.tsv
collection-0--4131298130356306083	config.cache.chunks.test.bar
collection-0-5834121039240263510	local.replset.initialSyncId
collection-0-5841128870485063129	local.startup_log
collection-0--6422702119521843596	config.system.sessions
collection-10--4131298130356306083	config.cache.chunks.test.foo

$ #cd to a copy of a MongoDB data directory

$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("md")) | [.ident, .ns] | @tsv' | sort > wt_ident_vs_collection_ns.tsv

$ head -n 5 wt_ident_vs_collection_ns.tsv

collection-0--4131298130356306083 config.cache.chunks.test.bar

collection-0-5834121039240263510 local.replset.initialSyncId

collection-0-5841128870485063129 local.startup_log

collection-0--6422702119521843596 config.system.sessions

collection-10--4131298130356306083 config.cache.chunks.test.foo

Eg. if an ident is “collection-4-5841128870485063129” then there will be a file collection-4-5841128870485063129.wt that has the data of a MongoDB collection.

In case you have a vague memory of seeing strings like these somewhere whilst using the mongo shell, you probably have. These idents are the same as the ones shown in the “wiredTiger.uri” field of the db.collection.stats() command output.

Optional: Dump WT ident vs Indexes

The index WT idents can be dumped as well. The example below will save them to wt_vs_index_ns.tsv in three columns [WT ident, collection name, index name].

$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("idxIdent")) | .ns as $nsT | .idxIdent | to_entries[] | [.value, $nsT, .key] | @tsv' | sort > wt_ident_vs_index_ns.tsv
$
$ head -n 5 wt_vs_index_ns.tsv
index-11--4131298130356306083	config.cache.chunks.test.foo	_id_
index-12--4131298130356306083	config.cache.chunks.test.foo	lastmod_1
index-12-5841128870485063129	local.system.replset	_id_
index-1--4131298130356306083	config.cache.chunks.test.bar	_id_
index-14-5841128870485063129	admin.system.version	_id_

$ wt dump -x table:_mdb_catalog | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p | bsondump --quiet | jq -r 'select(. | has("idxIdent")) | .ns as $nsT | .idxIdent | to_entries[] | [.value, $nsT, .key] | @tsv' | sort > wt_ident_vs_index_ns.tsv

$ head -n 5 wt_vs_index_ns.tsv

index-11--4131298130356306083 config.cache.chunks.test.foo _id_

index-12--4131298130356306083 config.cache.chunks.test.foo lastmod_1

index-12-5841128870485063129 local.system.replset _id_

index-1--4131298130356306083 config.cache.chunks.test.bar _id_

index-14-5841128870485063129 admin.system.version _id_

Looking at the Application (= MongoDB) Table Data

collection-*.wt and index-*.wt files contain the data of the MongoDB collections and indexes that you observe as a client connected to a MongoDB instance. Once you’ve identified the WT ident value for the collection or index you want to inspect you can use that in the URI argument to the wt dump command.

wt dump collections to *.bson files

The wt dump output of collection-* tables has a basic numeric type key and BSON data as the values.

Use wt dump -x in your terminal on the WT file for a collection and you will see, after the header section, records of the WT table in alternating lines of key and value.

Eg. in the example below the first key is numeric/binary value 0x81 and the first value is the one starting with binary bytes 0xce000000025f6964001e. The second key is 0x82, and its value starts with 0xce000000025f69640021. The values are BSON which were much longer in reality, I only trimmed it for readability at the moment.

$ wt dump -x collection-X-XXXXXXXXXX
WiredTiger Dump (WiredTiger Version 10.0.0)
Format=hex
Header
file:WiredTiger.wt
access_pattern_hint=none,allocation_size=4KB,app.....
Data
81
68000000025f6964001e000000424d50524f442d425a41342d31...
82
6b000000025f69640021000000424d50524f442d574f524b4552...
...

$ wt dump -x collection-X-XXXXXXXXXX

WiredTiger Dump (WiredTiger Version 10.0.0)

Format=hex

Header

file:WiredTiger.wt

access_pattern_hint=none,allocation_size=4KB,app.....

Data

68000000025f6964001e000000424d50524f442d425a41342d31...

6b000000025f69640021000000424d50524f442d574f524b4552...

...

FYI Using the alternative URI argument syntax “wt dump -x table:collection-X-XXXXXXXXXX” or “wt dump -x file:collection-X-XXXXXXXXXX.wt” will produce the same.

The key value in the collection-* tables isn’t needed to see the document content so print only every second line for the values using the following command:

$ wt dump -x collection-X-XXXXXXXXXX | tail -n +7 | awk 'NR%2 == 0 { print }'
68000000025f6964001e000000424d50524f442d425a41342d31...
6b000000025f69640021000000424d50524f442d574f524b4552....
66000000025f6964001c000000424d50524f442d574f524b4552...
6b000000025f69640021000000424d50524f442d574f524b4552...
64000000025f6964001a000000424d50524f442d44415441312d...

$ wt dump -x collection-X-XXXXXXXXXX | tail -n +7 | awk 'NR%2 == 0 { print }'

68000000025f6964001e000000424d50524f442d425a41342d31...

6b000000025f69640021000000424d50524f442d574f524b4552....

66000000025f6964001c000000424d50524f442d574f524b4552...

6b000000025f69640021000000424d50524f442d574f524b4552...

64000000025f6964001a000000424d50524f442d44415441312d...

The command above prints binary BSON in a hex string format with a newline separating each record. We can translate that hexadecimal back to the original binary using the xxd command utility with the “-r” and “-p” flags. (Note: Don’t combine as one “-rp” flag. It doesn’t work like most unix command’s short options.)

$ #Look for the WT ident of my test.customerOrder collection
$ grep customerOrder wt_ident_vs_collection_ns.tsv
collection-14--3398103177079662761	test.customerOrder
$ 
$ ls -lh collection-14--3398103177079662761.wt
-rw------- 1 akira akira 40K Jun 10 15:12 collection-14--3398103177079662761.wt
$
$ #dump and convert its values to a plain BSON file:
$ wt dump -x collection-14--3398103177079662761 | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p > test.customerOrder.bson
$ 
$ #Confirm the content using bsondump
$ bsondump --quiet test.customerOrder.bson
{"_id":"123456","_class":"model.customer_order.CustomerOrder","orderReference":{"businessInteractionId":"str ... "orderAttributeValue":"string"}]}]}]}]}}
{"_id":"ORN-billingissue01","_class":"com.ctl.bm.servi ... startDate":"2018-02-02T00:00:00.000Z"}],"existingTN":[]}
2021-06-10T15:12:16.627+0900	2 objects found

$ #Look for the WT ident of my test.customerOrder collection

$ grep customerOrder wt_ident_vs_collection_ns.tsv

collection-14--3398103177079662761 test.customerOrder

$ ls -lh collection-14--3398103177079662761.wt

-rw------- 1 akira akira 40K Jun 10 15:12 collection-14--3398103177079662761.wt

$ #dump and convert its values to a plain BSON file:

$ wt dump -x collection-14--3398103177079662761 | tail -n +7 | awk 'NR%2 == 0 { print }' | xxd -r -p > test.customerOrder.bson

$ #Confirm the content using bsondump

$ bsondump --quiet test.customerOrder.bson

{"_id":"123456","_class":"model.customer_order.CustomerOrder","orderReference":{"businessInteractionId":"str ... "orderAttributeValue":"string"}]}]}]}]}}

{"_id":"ORN-billingissue01","_class":"com.ctl.bm.servi ... startDate":"2018-02-02T00:00:00.000Z"}],"existingTN":[]}

2021-06-10T15:12:16.627+0900 2 objects found

“wt read” a single record?

Sorry, you can’t read your MongoDB collection data with the “wt read” shell command. The blocker is trivial – as of WT v10.0.0 the “wt read” command only accepts plain text or its own “r” recordid numeric value as the lookup key value. The keys for the mongodb collections and indexes however are ‘q’ and ‘u’ types respectively. (Documentation: WT Schema Format types.)

In MongoDB, you might know you can use a showRecordId cursor option so it’s tempting to think that this is the same “r” type that wt read can currently accept, but unfortunately it is not. See the key_format=X value in the header of wt dump output samples to confirm.

If wt read (code = utilities/util_read.c) was modified to accept an -x argument so we could pass the hex strings we already see in wt dump output this issue would be solved. But as it isn’t, for now, you have to dump all records even if you just want one.

This is only a limitation in the shell. If you use the API from within a programming language, including the Python SWIG binding available, you should be able to read just a single record.

Looking at MongoDB’s Index Files

wt dump index-*.wt files

index-* WiredTiger tables have two different formats that are easy to see when using wt dump to look inside them – one with just keys, and one with both keys and values. The WT keys are binary values generated from MongoDB’s KeyString structure.

Below is an example of an index on {state: 1, process: 1} on a collection called config.locks. This is a case where there are no values in the WT table for the index.

$ grep index-15--749209957533832251 wt_ident_vs_index_ns.tsv 
index-15--7492099575338322516	config.locks	state_1_process_1
$ 
$ wt dump -x index-15--7492099575338322516 | tail -n +7
293c436f6e66696753657276657200040008    <keys on odd-numbered line>
   <these even-numbered lines are where the WT value would be>
293c436f6e66696753657276657200040040

293c7072642d6d6f6e2d7065722d73686172642d6130333a32373031383a313538363938343138383a3139393331363932353337323937373139383300040028

....

293c70726f642d6d6f6e676f2d73686172642d6d6f6e676f732d302e6d6f6e676f2e706572636f6e612e636f6d3a32373031373a313535343633323838333a38363534383239393034393534383037333200040078

2b043c436f6e66696753657276657200040060

$

$ grep index-15--749209957533832251 wt_ident_vs_index_ns.tsv

index-15--7492099575338322516 config.locks state_1_process_1

$ wt dump -x index-15--7492099575338322516 | tail -n +7

293c436f6e66696753657276657200040008 <keys on odd-numbered line>

293c436f6e66696753657276657200040040

293c7072642d6d6f6e2d7065722d73686172642d6130333a32373031383a313538363938343138383a3139393331363932353337323937373139383300040028

....

293c70726f642d6d6f6e676f2d73686172642d6d6f6e676f732d302e6d6f6e676f2e706572636f6e612e636f6d3a32373031373a313535343633323838333a38363534383239393034393534383037333200040078

2b043c436f6e66696753657276657200040060

Below is an example of an {_id: 1} index on the collection test.rlru. This is a case when there are both WT keys and values.

$ grep index-1-2871579003788456567 wt_ident_vs_index_ns.tsv 
index-1-2871579003788456567	test.rlru	_id_
$
$ wt dump -x index-1-287157900378845656 | tail -n +7
2904
0008
2b0204
0010
2b0404
0018
...
...
2c0faa04
203eb1
2c0fac04
203eb9
...

$ grep index-1-2871579003788456567 wt_ident_vs_index_ns.tsv

index-1-2871579003788456567 test.rlru _id_

$ wt dump -x index-1-287157900378845656 | tail -n +7

2904

0008

2b0204

0010

2b0404

0018

...

2c0faa04

203eb1

2c0fac04

203eb9

...

Given the point of an index record lookup is to have a value that points to a record in the collection-X-XXXX WT table, you should be asking “How can that first index above be useful without values?”

The answer is the recordid is packed on the end of the key. You’ll notice in the first example they all have 0x04 as the third-last byte. This is how MongoDB packs a recordId when the value is between 0 and 2^10 – 1 I believe. See KeyString appendRecordId() if you want to get into it further.

By the way, an index’s field names are constant and thus irrelevant to sort order, so they’re not part of the keystrings.

Writing about the KeyString format even in just the shortest summary would take a whole blog post. So I’ll just punch out some translations of the binary above as a teaser and stop there.

Keystring 293c436f6e66696753657276657200040008 =>

0x29 = type marker for numeric value 0.
0x3c type marker for (UTF-8?) string
36f6e66696753657276657200 hex of string “ConfigServer” plus tailing null
0x04 type marker for a recordId
0x0008 => binary 000 + binary 0000000001 + binary 000 = (recordId) value 1

Keystring 2b043c436f6e66696753657276657200040060 =>

0x2b = type marker, positive integer in small range (< 2^7?)
04 = binary 0000010 + bit 0 = value 2
0x3c type marker for (UTF-8?) string
36f6e66696753657276657200 hex of string “ConfigServer” plus tailing null
0x04 type marker for a recordId
0x0060 => binary 000 + binary 00.0000.1100 + binary 000 = (recordId) value 12

Keystring 2b0404 =>

0x2b = type marker, positive integer in small range (< 2^7?)
04 = binary 000010 + 00 = value 2, not sure what the tailing two bytes are for.
0x04 type marker to indicate the following value is recordId?

Value 001001 =>

0018 = binary 000 + binary value 00 0000 0011 = 3, plus 3 bits 000 on the end. T.b.h. I don’t always know how to translate this one, but commonly 3 bits at the beginning and end of the recordId format are used as a byte size indicator.

If looking for matching records in the collection-X-XXX table look for key (0x80 + the recordid) value from the index file. Eg. 2 -> 0x82; 12 -> 0x9c. For some reason 0x0 – 0x80 seems to be reserved and so key values in the collection-X-XXX.wt files are all incremented by 0x80 higher than the recordId value in the index-X-XXX.wt records.

In the end though, as we don’t have an ability to use wt read, all this poking around in the indexes from a shell command can only be for satisfying curiosity. Not for making single-record access fast.

Looking at the WT Transaction Log

A WiredTiger checkpoint saves a copy of documents in all collections and indexes as they were at one exact point in time (the time the checkpoint is started). MongoDB will call for a checkpoint to be made once per minute by default.

Without something else being saved to disk, a sudden crash would mean that restores/restarts could only revert to the last checkpoint. The classic database concept of write-ahead log is the solution to this of course. In WiredTiger this is provided by the transaction log, often just called “log” in its own documentation. Or as the documentation also says it adds “commit-level” durability to checkpoint durability.

At restart, whether it is after a perfectly normal shutdown or a crash, WiredTiger will read and replay the writes it finds in its transaction log onto the tables, in memory. In time, when the MongoDB layer requests a new checkpoint be created, the in-memory restored data will be saved to disk.

When you use wt dump it’s not easy to say if you’re looking at the collection (or index) restored as of the last checkpoint or with the transaction log “recovered” (read and applied) as well. If the “-R” global option of the wt command is used then yes log is recovered; if the opposite “-r” option is used then no. But which is in effect if you specify neither is unclear. Also, I’ve seen comments or diagnostic messages that suggest the -r option isn’t perfect.

Two Layers of Recovery

WiredTiger would, by default, keep no transaction log for the tables it manages for the application embedding it. It’s only the request of the application that engages it.

If you are using a standalone mongod MongoDB code will enable WT log for every collection and index created
- Unless the “recoverFromOplogAsStandalone” server parameter is used. This is a trick that is part of point-in-time recovery of hot backups.
When replication is enabled (the typical case) only the WT tables for “local” db collections and indexes get “log=(enabled=true)” in their WT config string made by WiredTigerRecordStore::generateCreateString() in wiredtiger_record_store.cpp, and something similar in wiredtiger_index.cpp.

When a mongod node restarts with an existing data directory the WiredTiger library will run recovery. It can’t be stopped (at least not by the MongoDB user). This is not the end of the story though. When replication was used it means only the “local” db collections, in particular the oplog, is restored. The replication system code has a ReplicationRecovery class that is used next, and this will apply updates made since the last durable checkpoint time from the oplog to the user and system collections they’re supposed to be in. Only after that occurs will recovery be complete and the db server will make the data available to clients.

ReplicationRecovery allows MongoDB to do rollbacks (at least to the checkpoint time) and also to trim off ops in the last replication batch application. It’s an ugly point but MongoDB applies replication ops in small batches in parallel to improve performance; this only happens on secondaries usually of course, but also after a restart. If all the writes in the batch finish that is fine, but if they don’t some writes may be missing which is a clear consistency violation. So the unfinished batch’s writes should all be cleared, back to the last saved oplogTruncateAfterPoint value. repl.oplogTruncateAfterPoint is a single-document collection in the local db. It exists only for this purpose as far as I know.

Diagram for Typical Restart Recovery

“Typical” = restarting a node that is a replica. Doesn’t matter if it was a primary or secondary, or even the only node in the replica set.

WiredTiger’s recovery

⇓

WT table for local.oplog.rs updated

⇓

ReplicationRecovery

⇓

Trim oplog docs where “ts” > oplogTruncateAfterPoint

⇓

oplog application

Pseudo code:
db.oplog.rs.find({ts > durable checkpoint time}).forEach(function(d) {
//re-apply write to intended collection in “test”, “mydb”, “admin”, etc. db
applyOps(opdoc);
}

⇓

Ready!

wt printlog

You can use the wt printlog command to see what is in the log currently in the WiredTigerLog.<nnnnnn> files in the journal/ subdirectory. If you decode the write ops in there you can understand what document versions will be restored by MongoDB when it restarts.

Preparation to Using wt printlog – Map WT idents to fileid

Until now you’ve learned that every MongoDB collection and index is in its own WT table file, which means needing to learn what the WT ident(ifier) is to find the right raw WT file to look into.

There’s another small integer id value, a.k.a. fileid, in WT metadata / WT table config strings for each file too. I suppose it saves space; at any rate, the WT transaction log references files only by this number. This means you’ll have to build a handy mapping or list of which WT ident is which fileid. Use the following command to create a file I call wt_ident_vs_fileid.tsv.

$ wt dump file:WiredTiger.wt | grep -B 1 ',id=' | sed 's/^file:(.*).wt

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

WiredTiger File Forensics Part 3: Viewing all the MongoDB Data

List up the Collections and Indexes

Dump WT ident vs Collections

Optional: Dump WT ident vs Indexes

Looking at the Application (= MongoDB) Table Data

wt dump collections to *.bson files

“wt read” a single record?

Looking at MongoDB’s Index Files

wt dump index-*.wt files

Looking at the WT Transaction Log

Two Layers of Recovery

Diagram for Typical Restart Recovery

wt printlog

Preparation to Using wt printlog – Map WT idents to fileid

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

WiredTiger File Forensics Part 3: Viewing all the MongoDB Data

List up the Collections and Indexes

Dump WT ident vs Collections

Optional: Dump WT ident vs Indexes

Looking at the Application (= MongoDB) Table Data

wt dump collections to *.bson files

“wt read” a single record?

Looking at MongoDB’s Index Files

wt dump index-*.wt files

Looking at the WT Transaction Log

Two Layers of Recovery

Diagram for Typical Restart Recovery

wt printlog

Preparation to Using wt printlog – Map WT idents to fileid

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation