Percona Backup for MongoDB: Restore a Single Collection From Backup

When you design a backup strategy, you need to think about the business requirements, as you will need to shape your backups to meet them. Let’s review the basics briefly; you need to define the RPO and RTO. The RPO stands for “Recovery Point Objective”, which means how far back you will be able to recover. The RTO stands for “Recovery Time Objective”, this is the time the business expects the data to be recovered. This article will focus on one scenario that can help to meet the RTO.

The scenario

Imagine the company has a replica set running Percona Server for MongoDB 6.0 (PSMDB). This replica set has a footprint of one terabyte. The operations team has also configured Percona Backup for MongoDB (PBM) to generate physical and logical backups. One terrible day, a chain of unfortunate events occurs; one developer gets a call from his manager about a critical bug found on the PRODUCTION system, he quickly goes through the code that was released yesterday and finds that the issue can be easily fixed by removing a set of documents inserted on a collection. As he has read-write access to the PRODUCTION database, he decided to be fast and run the delete command directly on PRODUCTION to try to mitigate the issue as fast as possible. As you can imagine, when someone does things fast, the tendency to make a mistake is high. This was the case, and 90% of the documents in this collection were removed, and now the problem is even bigger than a critical bug — the system is completely down.

The solution

Since the database is rather large, it can take a long time to restore the whole thing, and given that a single collection is the culprit, it will be faster to execute a restore for that single collection.

The first thing to do is to list the backups available:

$ pbm list
Backup snapshots:
  2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]
  2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]
  ...
PITR <off>:
  2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

$ pbm list

Backup snapshots:

2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]

2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]

...

PITR <off>:

2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

Next, we need to find the most recent logical backup, as the restore of selected collections requires a logical backup. In this case, the backup that we need is “2024-03-22T20:42:50Z”.

Now, we have two options:

Restore the collection on the live database: This will overwrite the existing data in the collection. If you are sure that no additional data was added to the collection, then this definitely is the fastest and simplest path.
Restore the collection on a temporary instance: This will allow you to export and import the data into the live database without overwriting the new data generated. This alternative adds more steps to the process, but we can preserve the existing data.

Option one

Restore the single collection into the live database:

$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips"
Starting restore 2024-03-22T22:23:56.715785074Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

1 2	$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips" Starting restore 2024-03-22T22:23:56.715785074Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

You can view what is PBM doing by running the following command:

$ pbm status -s running
Currently running:
==================
(none)

$ pbm status -s running

Currently running:

==================

(none)

In this case, the restore process is complete; you can list the restore operations with the following command:

$ pbm list --restore
Restores history:
  2024-03-22T22:23:56.715785074Z [backup: snapshot, selective]	done

$ pbm list --restore

Restores history:

2024-03-22T22:23:56.715785074Z [backup: snapshot, selective] done

You can see the restore details with this command:

$ pbm describe-restore 2024-03-22T22:23:56.715785074Z
name: "2024-03-22T22:23:56.715785074Z"
opid: 65fe04fccc46cf421780bab5
backup: "2024-03-22T20:42:50Z"
type: logical
status: done
namespaces:
- sample_training.zips
last_transition_time: "2024-03-22T22:24:05Z"
replsets:
- name: rs0
  status: done
  last_transition_time: "2024-03-22T22:24:04Z"

$ pbm describe-restore 2024-03-22T22:23:56.715785074Z

name: "2024-03-22T22:23:56.715785074Z"

opid: 65fe04fccc46cf421780bab5

backup: "2024-03-22T20:42:50Z"

type: logical

status: done

namespaces:

- sample_training.zips

last_transition_time: "2024-03-22T22:24:05Z"

replsets:

- name: rs0

status: done

last_transition_time: "2024-03-22T22:24:04Z"

Finally, confirm the data was restored as expected

rs0 [direct: primary] sample_training> db.zips.find().count()
29470

1 2	rs0 [direct: primary] sample_training> db.zips.find().count() 29470

Option two

Create a mongod config file for the temporary instance:

$ cat /etc/mongod_tmp.conf|grep -v "^$"|grep -v "^#"
storage:
  dbPath: /var/lib/mongodb_tmp ### Different dbPath
  journal:
    enabled: true
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod_tmp.log ### Different log file
processManagement:
  fork: true
  pidFilePath: /var/run/mongod_tmp.pid ### Different pidFilePath
net:
  port: 27018 ### Different port
  bindIp: 127.0.0.1
replication:
  replSetName: rs0

$ cat /etc/mongod_tmp.conf|grep -v "^$"|grep -v "^#"

storage:

dbPath: /var/lib/mongodb_tmp ### Different dbPath

journal:

enabled: true

systemLog:

destination: file

logAppend: true

path: /var/log/mongodb/mongod_tmp.log ### Different log file

processManagement:

fork: true

pidFilePath: /var/run/mongod_tmp.pid ### Different pidFilePath

net:

port: 27018 ### Different port

bindIp: 127.0.0.1

replication:

replSetName: rs0

Create the dbPath:

$ sudo mkdir /var/lib/mongodb_tmp
$ sudo chown mongod.mongod /var/lib/mongodb_tmp/

1 2	$ sudo mkdir /var/lib/mongodb_tmp $ sudo chown mongod.mongod /var/lib/mongodb_tmp/

Start the temporary instance:

$ sudo -u mongod /usr/bin/mongod -f /etc/mongod_tmp.conf
about to fork child process, waiting until server is ready for connections.
forked process: 16270
child process started successfully, parent exiting

$ sudo -u mongod /usr/bin/mongod -f /etc/mongod_tmp.conf

about to fork child process, waiting until server is ready for connections.

forked process: 16270

child process started successfully, parent exiting

Configure PBM to run on the new instance and make sure it has point-in-time recovery (PITR) disabled. In this case, the new instance is running on port 27018.

$ pbm status
Cluster:
========
rs0:
  - rs0/192.168.56.3:27018 [P]: pbm-agent v2.4.0 OK


PITR incremental backup:
========================
Status [ON]

Currently running:
==================
(none)

Backups:
========
S3 us-east-1 s3://bucket-s3/mongodb_backup/test1
  (none)

$ pbm status

Cluster:

========

rs0:

- rs0/192.168.56.3:27018 [P]: pbm-agent v2.4.0 OK

PITR incremental backup:

========================

Status [ON]

Currently running:

==================

(none)

Backups:

========

S3 us-east-1 s3://bucket-s3/mongodb_backup/test1

(none)

Force a sync to pull the list of backups stored on the S3 bucket:

$ pbm config --force-resync
Storage resync started

1 2	$ pbm config --force-resync Storage resync started

List the backups and make sure the logical backup you require is present:

$ pbm list
Backup snapshots:
  2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]
  2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]
  ...
PITR <off>:
  2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

$ pbm list

Backup snapshots:

2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]

2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]

...

PITR <off>:

2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

Restore the collection you need to recover:

$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips"
Starting restore 2024-03-22T22:47:52.180513787Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

1 2	$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips" Starting restore 2024-03-22T22:47:52.180513787Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

Export all the documents from the recovered collection:

$ mongodump --uri=$MONGODB_URI --archive=/tmp/sample_training.zips.archive.gzip --gzip --db=sample_training --collection=zips
2024-03-22T23:05:46.047+0000	WARNING: ignoring unsupported URI parameter 'replsetname'
2024-03-22T23:05:46.099+0000	writing sample_training.zips to archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:05:46.368+0000	done dumping sample_training.zips (29470 documents)

$ mongodump --uri=$MONGODB_URI --archive=/tmp/sample_training.zips.archive.gzip --gzip --db=sample_training --collection=zips

2024-03-22T23:05:46.047+0000 WARNING: ignoring unsupported URI parameter 'replsetname'

2024-03-22T23:05:46.099+0000 writing sample_training.zips to archive '/tmp/sample_training.zips.archive.gzip'

2024-03-22T23:05:46.368+0000 done dumping sample_training.zips (29470 documents)

Import the archive file into the live database, this will append the data.

$ mongorestore --uri=$MONGODB_URI --nsInclude="sample_training.zips" --gzip --archive=/tmp/sample_training.zips.archive.gzip
2024-03-22T23:13:28.180+0000	WARNING: ignoring unsupported URI parameter 'replsetname'
2024-03-22T23:13:28.221+0000	preparing collections to restore from
2024-03-22T23:13:28.246+0000	reading metadata for sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:13:28.250+0000	restoring to existing collection sample_training.zips without dropping
2024-03-22T23:13:28.251+0000	restoring sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:13:29.934+0000	finished restoring sample_training.zips (29470 documents, 0 failures)
2024-03-22T23:13:29.935+0000	no indexes to restore for collection sample_training.zips
2024-03-22T23:13:29.935+0000	29470 document(s) restored successfully. 0 document(s) failed to restore.

$ mongorestore --uri=$MONGODB_URI --nsInclude="sample_training.zips" --gzip --archive=/tmp/sample_training.zips.archive.gzip

2024-03-22T23:13:28.180+0000 WARNING: ignoring unsupported URI parameter 'replsetname'

2024-03-22T23:13:28.221+0000 preparing collections to restore from

2024-03-22T23:13:28.246+0000 reading metadata for sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'

2024-03-22T23:13:28.250+0000 restoring to existing collection sample_training.zips without dropping

2024-03-22T23:13:28.251+0000 restoring sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'

2024-03-22T23:13:29.934+0000 finished restoring sample_training.zips (29470 documents, 0 failures)

2024-03-22T23:13:29.935+0000 no indexes to restore for collection sample_training.zips

2024-03-22T23:13:29.935+0000 29470 document(s) restored successfully. 0 document(s) failed to restore.

Alternate solution

If the requirement is to recover the data up until the second when it was deleted, then we should do a PITR. For this option to be viable, we need to have this feature enabled in PBM. Due to the size of the database, we will need a separate server to execute the restore process. The steps to perform this are detailed on this documentation page, Make a point-in-time restore. Once you have the database restored up to the time you need it, you can export the collection documents and import them as we did on the second option.

Percona Backup for MongoDB flexibility

The flexibility that PBM offers to manage the backup and restore operations is unique, and this is just a simple scenario that PBM can help you with. It is important to understand how PBM works to be able to build strategies to meet the business needs. If you need help managing your databases, don’t hesitate to contact us, we have an excellent team of experts ready to help.

Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.

Download Percona Distribution for MongoDB Today!

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Percona Backup for MongoDB: Restore a Single Collection From Backup

The scenario

The solution

Option one

Option two

Alternate solution

Percona Backup for MongoDB flexibility

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Percona Backup for MongoDB: Restore a Single Collection From Backup

The scenario

The solution

Option one

Option two

Alternate solution

Percona Backup for MongoDB flexibility

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation