If you have been working in the database field for some time, you have likely come across the need to create a new database, based on an existing one. The most common example I can think of is to create a copy of the production database for testing purposes.
In the case of MongoDB sharded clusters, the official documentation covers the procedure to restore a sharded cluster from a backup. But what if we want to restore the dataset to different hosts, and also rename the shards and/or replicasets? There are some mentions of metadata renaming in the documentation, but the steps are not complete.
I am providing the detailed steps for MongoDB 4.2, although I should warn you to use them at your own risk since this is not a supported procedure. I won’t cover the backup part in detail here, but normally we can stop the balancer, and then shut down one secondary member from each shard, as well as one config server, and create snapshots to seed members of the new cluster. You can also check Corrado’s blog Percona Backup for MongoDB in Action for another way to get a consistent backup of a sharded cluster with no downtime.
Overview of Restoring a MongoDB Sharded Cluster
In this case, I am assuming we want to clone a cluster where components are named as follows:
1 2 3 4 5 6 7 | # config servers prod-cfg/prod-cfg1.domain,prod-cfg2.domain,prod-cfg3.domain # shards prod-shard-01/prod-s01a.domain,prod-s01b.domain,prod-s01c.domain prod-shard-02/prod-s02a.domain,prod-s02b.domain,prod-s02c.domain ... |
The target cluster will have the following names:
1 2 3 4 5 6 7 | # config servers stage-cfg/stage-cfg1.domain,stage-cfg2.domain,stage-cfg3.domain # shards stage-shard-01/stage-s01a.domain,stage-s01b.domain,stage-s01c.domain stage-shard-02/stage-s02a.domain,stage-s02b.domain,stage-s02c.domain ... |
One caveat here is the target cluster needs to keep the same number of shards as the source.
Depending on the number of shards, the metadata editing can be time-consuming, so you should use some kind of automation to help. At least use a terminal multiplexer so you can broadcast the commands you type in a single tab to other tabs (one per shard).
Restoring the Config Servers
The first thing to do is to restore the backup of the config server to the new hardware. Once we have that, follow these steps:
1. Start the config server in standalone mode. I am taking the opportunity and renaming the config server Replicaset in the file:
1 2 3 4 | sed -i 's/sharding:/#sharding:/' /etc/mongod.conf sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf sed -i 's/replication:/#replication:/' /etc/mongod.conf sed -i 's/ replSetName: prod-cfg/# replSetName: stage-cfg/' /etc/mongod.conf |
1 | service mongod start |
2. Drop the local database
1 2 3 4 | mongo -u$user -p$password --port=27019 <<EOF use local db.dropDatabase() EOF |
3. Update the shard metadata in config.shards collection
1 2 3 4 5 6 7 8 | use config db.shards.find({}).forEach(function(oldshard) { newshard=oldshard newshard["_id"]=oldshard["_id"].replace(/prod/gi,'stage'); newshard["host"]=oldshard["host"].replace(/prod/gi,'stage'); db.shards.save(newshard); db.shards.remove({_id: oldshard["_id"]}); }) |
4. Modify the primary shard name for non-shared collections
1 2 3 4 5 | use config db.databases.find().forEach(function(shard) { shard["primary"]=shard["primary"].replace(/prod/gi,'stage'); db.databases.save(shard); }) |
5. Modify each chunk metadata (this can take a while if you have lots of chunks)
1 2 3 4 | db.chunks.find().forEach(function(shard) { shard["shard"]=shard["shard"].replace(/prod/gi,'stage'); db.chunks.save(shard); }) |
6. Modify chunk history metadata
1 2 3 4 | db.chunks.find().forEach(function(shard) { shard["history"][0].shard=shard["history"][0].shard.replace(/prod/gi,'stage'); db.chunks.save(shard); }) |
7. Start the config server with the new name and initiate its replica set
1 2 3 4 | sed -i 's/#sharding:/sharding:/' /etc/mongod.conf sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf sed -i 's/#replication:/replication:/' /etc/mongod.conf sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf |
1 | service mongod restart |
1 2 3 4 5 6 7 8 9 10 | mongo -u$user -p$password --port=27019 rs.initiate( { _id: "stage-cfg", configsvr: true, members: [ { _id : 0, host : "stage-cfg1.domain:27019" }, ] } ) |
At this point, it is safe to add the other members of the config server replicaset.
1 2 | rs.add("stage-cfg2.domain:27019") rs.add("stage-cfg3.domain:27019") |
Next, we have to work on the shards themselves.
Restoring the Shards
Once you have restored the backup of each shard, do the following:
1. Start the member of each shard you restored in standalone mode:
1 2 3 4 | sed -i 's/sharding:/#sharding:/' /etc/mongod.conf sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf sed -i 's/replication:/#replication:/' /etc/mongod.conf sed -i 's/ replSetName: prod-shard-/# replSetName: stage-shard-/' /etc/mongod.conf |
1 | service mongod start |
2. Create a temporary user with the __system role. This is needed to edit the system collections.
1 2 3 4 5 6 7 8 9 | mongo -u$user -p$password --port=27018 <<EOF db.createUser( { user: "mySystemUser", pwd: "percona", roles: [ "__system" ] } ) EOF |
3. Drop the local database
1 2 3 | db.auth("mySystemUser","percona") use local db.dropDatabase() |
4. Drop the document that stores Oplog recovery information (we want to avoid any kind of recovery on start, just keep the data as is).
1 2 | use admin db.system.version.deleteOne( { _id: "minOpTimeRecovery" } ) |
5. Update shard identity metadata
1 2 3 4 5 | db.system.version.find({"_id" : "shardIdentity"}).forEach(function(shard) { shard["configsvrConnectionString"]=shard["configsvrConnectionString"].replace(/prod/gi,'stage'); shard["shardName"]=shard["shardName"].replace(/prod/gi,'stage'); db.system.version.save(shard); }) |
6. Remove all documents in the cached metadata collections. Basically anything starting with db.cache has to go.
1 2 3 4 5 6 7 | use config db.cache.chunks.config.system.sessions.remove({}) db.cache.collections.remove({}) db.cache.databases.remove({}) db.cache.chunks["mydb.mycol1"].remove({}) db.cache.chunks["mydb.mycol2"].remove({}) ... |
7. Restart each shard as a single node replica set and add the other members. In this case, I am adding them as non-voting as we don’t want them to become primary by accident (at least until they finish syncing from the node that has actual data).
1 2 3 4 | sed -i 's/#sharding:/sharding:/' /etc/mongod.conf sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf sed -i 's/#replication:/replication:/' /etc/mongod.conf sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf |
1 | service mongod restart |
1 2 3 | rs.initiate( { _id: "stage-s01", members: [ { _id : 0, host :"stage-s01a.domain:27018" }, ] }) rs.add({host:"stage-s01b.domain:27018", priority: 0, votes: 0 }) rs.add({host:"stage-s01c.domain:27018", priority: 0, votes: 0 }) |
8. Finally, don’t forget to remove the privileged user we had created.
1 2 | use admin db.removeUser("mySystemUser") |
Final Words
While the procedure above works, it would be nice if MongoDB had some built-in script to help with this tedious task. Hopefully, future versions include something in this regard. If you get a chance to use this procedure, please leave a comment in the section below and let me know how it goes.
I had to do the same procedure in a different context and found it quite tedious due to the lack of official mongodb documentation. So thank you for the great write-up!
Hi Kay, glad to hear you found it useful!
Thanks for the procedure. Found it very helpful. Since the cluster was authentication enabled we had to go through few extra checks also.