Percona Backup for MongoDB in ActionWe recently released Percona Backup for MongoDB(PBM) as GA. It’s our open source tool for taking a consistent backup of a running standalone mongod instance, a Replica Set, or a Sharded Cluster as well. The following articles can give you an overview of the tool:

But now I would like to test it for real, so let’s see how it works for taking backups of a Replica Set.

Warning: Percona Backup for MongoDB supports Percona Server for MongoDB and MongoDB Community v3.6 and higher with MongoDB Replication enabled.

Architecture

At first, let’s briefly discuss the internals. Percona Backup for MongoDB consists of two actors: the pbm-agent and the pbm utility.

The pbm-agent is a process that has to be installed on each mongod node. The agent has a little footprint and it is responsible for various tasks. For example, to detect if a secondary node is the best candidate to do the backup or restore operations and coordinates with the other nodes.

The pbm CLI rules all the agents around and can be installed on any node with access to the MongoDB cluster. The following commands are available in the current 1.0.0 version:

CommandDescription
store set (*)Set up a backup store
store show (*)Show the backup store associated with the active replica set.
backupMake a backup
restoreRestore a backup
listList the created backups

(*) these will become config –set / config — list in version 1.1

Installation

My test environment is the following:

  • 3 nodes Replica Set with MongoDB 4.2 on AWS EC2 instances
  • OS: Ubuntu 18.04

The easiest and recommended way to install it is to use the official Percona repositories by using the percona-release utility. More details about Percona repositories and percona-release usage.

Enable the tools

Install the package

The sample configuration files are placed into:

You have to install the package on all the nodes of the replica set: pbm-agent must be installed on each node.

Storage

Backup data can be stored on Amazon S3 or compatible storage, such as MinIO. Storing backups on a local filesystem directory also works but isn’t a top recommendation as it requires all servers involved to be given mounts to the same remote backup server.

For running the backup and restore operations, we need to set up a place where the files will be stored and retrieved.  In the current 1.0 pbm version, the only two available types of store are:

  • Amazon Simple Storage (S3)
  • MinIO: it is an Amazon Simple Storage Service compatible object storage.
  • local file system

We’ll use AWS S3 in our test. It’s the easier way at the moment, even for the recovery.

The storage details for pbm can be put into a configuration YAML file. The file contains all required options that pertain to one store. In 1.0, only Amazon Simple Storage Service-compatible remote stores are supported.

We can create the following file storage.yaml, specifying our own access keys:

To configure pbm to use this storage, execute the following command:

You don’t need to supply the store information for any subsequent operations.

Run pbm-agent on the Nodes

Now it’s time to turn on the pbm-agent process on the nodes of the cluster. Let’s run the following command:

In —mongodb-uri we have to provide the connection string to the local mongod server.

For the sake of simplicity, I didn’t enable authentication, but in case you have it (as suggested for production environments), you also need to provide the username and password. As an example, you can specify the –mongodb-uri like as the following: mongodb://myuser:[email protected]:2017

After executing the same query on all the nodes we are ready to take the backup.

Take the Backup

Now it’s time to run our first backup. I have installed pbm on all the machines of the replica set, and we can use one of them. In a production environment with a lot of nodes, it is worth considering running it from another machine, but be sure that the machine can access mongod on the database nodes.

Let’s run the following command on the nodes and let’s see what’s happening. Please note that the following box shows log messages coming from the different nodes.

We have launched the backup on mdb2, which is currently a SECONDARY. We can notice that at the same time all the pbm-agents were addressed and decided the best node to use for running the dump. It was mbd2. The dump of all the collections is taken using mongodump.

We can also notice from the log messages that mdb3 was not selected because it’s currently the PRIMARY.

The picture below shows what you should see on AWS S3 if the streaming of the dump files worked as expected. The json file contains metadata about the backup, the other two compressed ones contain the documents of the collections and the oplog collection.

After taking further backups you should see the following:

List the Backups

It’s not mandatory to use the S3 dashboard to see and manage the files, you can use pbm as well. To list all available backups you run the following:

Restore a Backup

At last, let’s test the recovery. As usual, you just need to use the pbm client.

For example, we would like to restore the first dump we have taken. We need simply to specify the right date and time as returned by the previous list command. That’s all.

Let’s take a look at the log messages on all the nodes to analyze what’s happening.

The recovery runs on the PRIMARY node, mdb3. The SECONDARY reported they were not suitable for restore, as expected.

Warning: the instance that you will restore your backup to may already have data. After running pbm restore, the instance will have both its existing data and the data from the backup. To make sure that your data is consistent, either clean up the target instance or use an instance without data.

Conclusion

Percona Backup for MongoDB is a good and reliable tool for doing backups of any MongoDB deployment, even for a large Sharded Cluster. Anyway, there’s still a lot of work to do in order to add more features. As shown in the article, pbm is also quite easy to use.

In a future article, we’ll test pbm backup and recovery also on a larger sharded cluster. So, stay tuned for the next chapter and for the next versions of the tool.

Note: TLS does not work in v1.0 due to awaiting-release bug

Consider the links below to follow the development and new releases.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
SelvaKumar K

Trying to take backup in Ubuntu 18.04 , getting below error frequently

2019/12/19 14:05:33 Node in not suitable for backup

Also tried with docker with mongo replica , getting same error for backup . Please suggest any issues on this

Xzccc

I try to backup my Mongo Cluster in debian9, but its stuck at:

2019/12/25 16:57:08 mongodump finished, waiting to finish oplog

I wait about half an hour, but still no result, and I don’t know why its blocked.

frank

how can i use pbm restore to restore data to another cluster? it display Error: backup ‘2020-04-10T08:11:10Z’ not found,how can i register backup ‘2020-04-10T08:11:10Z’ for the another cluster

Regina Cho

I got a same error when I tried to restore on a new cluster set. I do find the solution of this. Error : backup ” not found.

Regina Cho

Hi, I found the answer.
First, you should backup admin database and restore it to new cluster config database. it has all backup history of pbm backup so pbm agent need it to restore backup files.
After restore admin db via mongorestore I was able to restore pbm bakcup files to the new cluster db set.
And don’t forget you mount the remote storage within the config you already set as well.
If you have any question reply after this comment.
Good luck.

Ivan

Hi Regina, you can also use the –force-resync option for this