Restore a Specific MongoDB Collection From a Physical BackupWe all know how critical it is to get our data back as soon as possible. To achieve this, as you all might be aware, we have two methods of restore available for MongoDB: logical and physical.

Of course, this depends on the type of backup we have configured. For large data sets, it is advisable to use physical backups which offer faster backup and restore times. I’ve used Percona Backup for MongoDB (PBM) to take a physical backup in this demo.

But here’s a catch. It’s very complicated to restore specific collection(s) from a physical backup, be it any kind of backup method like volume snapshot, cold rsync data file copies, or a Hotbackup/Percona Backup for MongoDB.

The simplest, or what almost everyone knows, is to restore the physical dump, a.k.a. data files, onto a temporary “mongod”  and get the logical dump of that collection(s) manually first, followed by a logical restore to the original cluster. This is usually done with the conjunction of mongodump and mongorestore. 

But there’s another way where we can avoid provisioning up a temporary mongod and then a generally slow logical dump, a.k.a. mongodump.

To achieve this, in this blog post we’ll be taking a look at the “wt” utility which will help us to perform a restore. 

As a pre-requisite, we need to first build “wt” from either their GitHub repo or download a tar archive file and build it to a specific version. Now let’s jump straight into the steps now.

In this blog, we’ll be building using the GitHub repo.

1. Clone the WiredTiger repo

2. Check your mongod version and checkout the same branch version.

3. Depending on the type of compression method we’re using, we will install its library and configure its WiredTiger library extension. Mostly snappy is being used currently so I’ve used the same method. But depending upon what compression method you have, you can provide the same library path after installing/configuring it.

Once we have the necessary dependencies installed and verified, we can execute the commands below to dive straight into the action of restoring the dropped collection.

Restoration time

1. Find out the relevant URI of the collection you want to restore. This can be checked on the original cluster:

2. It’s necessary to check the compression method of the collection. Accordingly, we need to use the respective WiredTiger library extension. Look for ‘block_compressor’:

3. There are different ways to take a physical dump like Percona Backup for MongoDB, Hotbackup, Snapshot based, or even a rsync to a separate volume/path. Let’s now take a physical backup using Percona Backup for MongoDB and then drop our collection.

As you can clearly see, using PBM we can list all our underlying data files which we’ll use for this demo purpose.

4. Take the “wt dump” of the URI file from Step 1 into a mongodb known bson format. A regular output of “wt dump” would be in binary hex string format, otherwise.

If you have got a different compression method for the collection or on a global level, you can use its respective WiredTiger library extension.

Note: Don’t forget to append ‘.wt’ at the end of URI and add the prefix “table:” or “file:”, as within WiredTiger, all collection files are in the table or file format.

Let’s look at and understand the different flags used in the above command.

  • Extensions: based on our compression method, we have used respective compression WiredTiger library extension
  • [ -v, -x , dump, -h, -C]: These are WiredTiger binary flags to take the dump from a URI file in hex raw string in verbose style with default or command line provided configuration using “-C”
  • Tail is only used to trim the top seven header lines from the output of “wt dump
  • awk is used to filter out just the keys (line number NR%2 == 1) or values (line number NR%2 == 0) in the pretty-print or hex mode of wt dump.
  • xxd with conjunction of ‘-r’ and ‘-p’  is to convert raw hex strings into mongodb known bson format 
  • Finally, we’re redirecting the output to filename “percona.demo.bson”. This is very important to keep the output filename as per WiredTiger catalog ident which is nothing but a proper full namespace of collection URI. In our case, it was “percona.demo”. 
  • If needed, it can be validated using the below command: 

5.  Finally, restore bson using the native mongorestore. 

A few things to consider

1. You all must be thinking, what about indexes, right? 😉 The same process can be done by dumping the index*.wt files. But it’s rather complex as dumped keys and values have slightly different formats. I’ll soon cover it in a separate blog. Also interesting to mention is that WiredTiger maintains multiple index* URI files for every index, thus it’s better to build manually with the “createIndex” command which is a far easier approach. 

2.  In order to perform point-in-time recovery, incremental oplogs still need to be replayed on top of the restored backup, which we have covered. 

3. This method is applicable for sharded clusters as well (both unsharded and sharded collections) but there are a few additional steps that need to be taken. We’ll cover similarly detailed demos in upcoming blogs.

Before I wrap up, let’s talk about some drawbacks since this is a bit risky and complicated approach. Hence test it out first in your lab or test environment first to get familiarized with WiredTiger internals before jumping straight into production. 

Cons

  1. Overall it’s a bit complicated approach as one has to have a clear understanding of “wt” utility and its internals.
  2. Doesn’t restore indexes and needs to be built separately, as mentioned.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Scott Kurowski

The specific wt tool version to use is very closely tied to its MongoDB version to avoid writing incompatible WiredTiger metadata and corrupting the db.