This is a very straightforward article written with the intention to show you how easy it is to refresh your Test/Dev environments with PROD data, using Percona Backup for MongoDB (PBM). This article will cover all the steps from the PBM configuration until the restore, assuming that the PBM agents are all up and running on all the replica set members of either PROD and Dev/Test servers.
Taking the Backup on PROD
This step is quite simple and it demands no more than two commands:
1. Configuring the Backup
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | $ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin' $ pbm config --file /etc/pbm/pbm-s3.yaml [Config set] ------ pitr: enabled: false storage: type: s3 s3: provider: aws region: us-west-1 bucket: rafapbmtest prefix: bpPROD credentials: access-key-id: '***' secret-access-key: '***' Backup list resync from the store has started |
Important note on two things: I will address my backups to an S3 bucket and I am defining a prefix. When defining a prefix in the PBM storage configuration, a subdirectory will be automatically created and the backup files will be stored on that subdirectory instead of the root of the S3 bucket.
2. Taking the Backup
Having the PBM properly configured, it is time to take the backup. (You can skip this step if you already have PBM backups to use, of course.)
1 2 3 4 5 | $ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin' $ pbm backup Starting backup '2021-05-08T08:34:47Z'................... Backup '2021-05-08T08:34:47Z' to remote store 's3://rafapbmtest/bpPROD' has started |
And if we hit the PBM status command, we will see the snapshot running and when it is complete, the PBM status will show it as completed like below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | $ pbm status Cluster: ======== bprepPROD: - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK PITR incremental backup: ======================== Status [OFF] Currently running: ================== (none) Backups: ======== S3 us-west-1 rafapbmtest/bpPROD Snapshots: 2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08] |
Configuring the PBM Space on a DEV/TEST Environment
All right, now my PROD has a proper backup routine configured. I will move one step forward and configure my PBM space but this time in a Dev/Test environment – named here as DEV.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | $ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:50001/?replSetName=rbprepDEV?authSource=admin' $ pbm config --file /etc/pbm/pbm-s3.yaml [Config set] ------ pitr: enabled: false storage: type: s3 s3: provider: aws region: us-west-1 bucket: rafapbmtest prefix: bpDEV credentials: access-key-id: '***' secret-access-key: '***' |
The backup list resync from the store has started.
Note that the S3 bucket is exactly the same where PROD is storing the backups but with a different prefix. If I hit a status command, I will see it is configured but no snapshots available yet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | $ pbm status Cluster: ======== bprepPROD: - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK PITR incremental backup: ======================== Status [OFF] Currently running: ================== (none) Backups: ======== S3 us-west-1 rafapbmtest/bpDEV (none) |
Lastly, note that the replica set name is exactly the same as PROD. If this was a sharded cluster, rather than a non-sharded replicaset, all the replica set names have to match in the target cluster. PBM is guided by the replica set name and if my DEV env had a different one, it would not be possible to load backup metadata from PROD to DEV
Transfering the Desired Backup Files
The next step will be transferring the backup files from the PROD prefix to the target prefix. I will use the AWS CLI to achieve that, but there is one important thing to keep in mind in advance: determining which files are referent to a certain backup set (snapshot). Let’s go back to the PBM status output taken in PROD previously:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | $ export PBM_MONGODB_URI='mongodb://pbmuser:[email protected]:40001/?replSetName=rbprepPROD?authSource=admin' $ pbm status Cluster: ======== bprepPROD: - bprepPROD/127.0.0.1:40001: pbm-agent v1.4.1 OK PITR incremental backup: ======================== Status [OFF] Currently running: ================== (none) Backups: ======== S3 us-west-1 rafapbmtest/bpPROD Snapshots: 2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08] |
The PBM snapshots are named with the timestamp from when the backup started. If we check at the S3 prefix where it is stored, we will see that the file’s names contain that timestamp in its name composition.
1 2 3 4 5 | $ aws s3 ls s3://rafapbmtest/bpPROD/ 2021-05-08 10:26:11 5 .pbm.init 2021-05-08 10:35:14 1428 2021-05-08T08:34:47Z.pbm.json 2021-05-08 10:35:10 11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2 2021-05-08 10:35:13 949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2 |
So, it will be easy now to know which file I have to copy.
1 2 3 4 5 6 7 8 | $ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json' 's3://rafapbmtest/bpDEV/' copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z.pbm.json to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z.pbm.json $ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2' 's3://rafapbmtest/bpDEV/' copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.dump.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.dump.s2 $ aws s3 cp 's3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2' 's3://rafapbmtest/bpDEV/' copy: s3://rafapbmtest/bpPROD/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 to s3://rafapbmtest/bpDEV/2021-05-08T08:34:47Z_bprepPROD.oplog.s2 |
Checking the DEV prefix:
1 2 3 4 5 | $ aws s3 ls s3://rafapbmtest/bpDEV/ 2021-05-08 10:43:59 5 .pbm.init 2021-05-08 10:52:02 1428 2021-05-08T08:34:47Z.pbm.json 2021-05-08 10:52:13 11606 2021-05-08T08:34:47Z_bprepPROD.dump.s2 2021-05-08 10:52:24 949 2021-05-08T08:34:47Z_bprepPROD.oplog.s2 |
The files are already there and PBM has already automatically loaded their metadata into the DEV PBM collections:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | $ pbm status Cluster: ======== bprepPROD: - bprepPROD/127.0.0.1:50001: pbm-agent v1.4.1 OK PITR incremental backup: ======================== Status [OFF] Currently running: ================== (none) Backups: ======== S3 us-west-1 rafapbmtest/bpDEV Snapshots: 2021-05-08T08:34:47Z 11.33KB [complete: 2021-05-08T08:35:08] |
Finally – Restoring It
Believing it or not, now comes the easiest part: the restore. It is only one command and nothing else:
1 2 | $ pbm restore '2021-05-08T08:34:47Z' ....Restore of the snapshot from '2021-05-08T08:34:47Z' has started |
Refreshing Dev/Test environments with PROD data is a very common and required task in corporations worldwide. I hope this article helps to clarify the practical questions regarding using PBM for it!
Just a quick note. If “pbm status” is not showing any backups on the destination server after copying the files, running “pbm config –force-resync” to re-read the backup list from storage should help.