Restoring a MongoDB Sharded Cluster to a Different Environment

MongoDB Sharded Cluster If you have been working in the database field for some time, you have likely come across the need to create a new database, based on an existing one. The most common example I can think of is to create a copy of the production database for testing purposes.

In the case of MongoDB sharded clusters, the official documentation covers the procedure to restore a sharded cluster from a backup. But what if we want to restore the dataset to different hosts, and also rename the shards and/or replicasets? There are some mentions of metadata renaming in the documentation, but the steps are not complete.

I am providing the detailed steps for MongoDB 4.2, although I should warn you to use them at your own risk since this is not a supported procedure. I won’t cover the backup part in detail here, but normally we can stop the balancer, and then shut down one secondary member from each shard, as well as one config server, and create snapshots to seed members of the new cluster. You can also check Corrado’s blog Percona Backup for MongoDB in Action for another way to get a consistent backup of a sharded cluster with no downtime.

Overview of Restoring a MongoDB Sharded Cluster

In this case, I am assuming we want to clone a cluster where components are named as follows:

# config servers
prod-cfg/prod-cfg1.domain,prod-cfg2.domain,prod-cfg3.domain

# shards
prod-shard-01/prod-s01a.domain,prod-s01b.domain,prod-s01c.domain
prod-shard-02/prod-s02a.domain,prod-s02b.domain,prod-s02c.domain
...

# config servers

prod-cfg/prod-cfg1.domain,prod-cfg2.domain,prod-cfg3.domain

# shards

prod-shard-01/prod-s01a.domain,prod-s01b.domain,prod-s01c.domain

prod-shard-02/prod-s02a.domain,prod-s02b.domain,prod-s02c.domain

...

The target cluster will have the following names:

# config servers
stage-cfg/stage-cfg1.domain,stage-cfg2.domain,stage-cfg3.domain

# shards
stage-shard-01/stage-s01a.domain,stage-s01b.domain,stage-s01c.domain
stage-shard-02/stage-s02a.domain,stage-s02b.domain,stage-s02c.domain
...

# config servers

stage-cfg/stage-cfg1.domain,stage-cfg2.domain,stage-cfg3.domain

# shards

stage-shard-01/stage-s01a.domain,stage-s01b.domain,stage-s01c.domain

stage-shard-02/stage-s02a.domain,stage-s02b.domain,stage-s02c.domain

...

One caveat here is the target cluster needs to keep the same number of shards as the source.

Depending on the number of shards, the metadata editing can be time-consuming, so you should use some kind of automation to help. At least use a terminal multiplexer so you can broadcast the commands you type in a single tab to other tabs (one per shard).

Restoring the Config Servers

The first thing to do is to restore the backup of the config server to the new hardware. Once we have that, follow these steps:

1. Start the config server in standalone mode. I am taking the opportunity and renaming the config server Replicaset in the file:

sed -i 's/sharding:/#sharding:/' /etc/mongod.conf
sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf
sed -i 's/replication:/#replication:/' /etc/mongod.conf
sed -i 's/ replSetName: prod-cfg/# replSetName: stage-cfg/' /etc/mongod.conf

sed -i 's/sharding:/#sharding:/' /etc/mongod.conf

sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf

sed -i 's/replication:/#replication:/' /etc/mongod.conf

sed -i 's/ replSetName: prod-cfg/# replSetName: stage-cfg/' /etc/mongod.conf

service mongod start

1	service mongod start

2. Drop the local database

mongo -u$user -p$password --port=27019 <<EOF
    use local
    db.dropDatabase()
EOF

mongo -u$user -p$password --port=27019 <<EOF

use local

db.dropDatabase()

EOF

3. Update the shard metadata in config.shards collection

use config
db.shards.find({}).forEach(function(oldshard) {
    newshard=oldshard
    newshard["_id"]=oldshard["_id"].replace(/prod/gi,'stage');
    newshard["host"]=oldshard["host"].replace(/prod/gi,'stage');
    db.shards.save(newshard);
    db.shards.remove({_id: oldshard["_id"]});
})

use config

db.shards.find({}).forEach(function(oldshard) {

newshard=oldshard

newshard["_id"]=oldshard["_id"].replace(/prod/gi,'stage');

newshard["host"]=oldshard["host"].replace(/prod/gi,'stage');

db.shards.save(newshard);

db.shards.remove({_id: oldshard["_id"]});

})

4. Modify the primary shard name for non-shared collections

use config
db.databases.find().forEach(function(shard) {
    shard["primary"]=shard["primary"].replace(/prod/gi,'stage');
    db.databases.save(shard);
})

use config

db.databases.find().forEach(function(shard) {

shard["primary"]=shard["primary"].replace(/prod/gi,'stage');

db.databases.save(shard);

})

5. Modify each chunk metadata (this can take a while if you have lots of chunks)

db.chunks.find().forEach(function(shard) {
    shard["shard"]=shard["shard"].replace(/prod/gi,'stage');
    db.chunks.save(shard);
})

db.chunks.find().forEach(function(shard) {

shard["shard"]=shard["shard"].replace(/prod/gi,'stage');

db.chunks.save(shard);

})

6. Modify chunk history metadata

db.chunks.find().forEach(function(shard) {
    shard["history"][0].shard=shard["history"][0].shard.replace(/prod/gi,'stage');
    db.chunks.save(shard);
})

db.chunks.find().forEach(function(shard) {

shard["history"][0].shard=shard["history"][0].shard.replace(/prod/gi,'stage');

db.chunks.save(shard);

})

7. Start the config server with the new name and initiate its replica set

sed -i 's/#sharding:/sharding:/' /etc/mongod.conf
sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf
sed -i 's/#replication:/replication:/' /etc/mongod.conf
sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf

sed -i 's/#sharding:/sharding:/' /etc/mongod.conf

sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf

sed -i 's/#replication:/replication:/' /etc/mongod.conf

sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf

service mongod restart

1	service mongod restart

mongo -u$user -p$password --port=27019
rs.initiate(
{
    _id: "stage-cfg",
    configsvr: true,
    members: [
        { _id : 0, host : "stage-cfg1.domain:27019" },
    ]
}
)

mongo -u$user -p$password --port=27019

rs.initiate(

{

_id: "stage-cfg",

configsvr: true,

members: [

{ _id : 0, host : "stage-cfg1.domain:27019" },

]

}

)

At this point, it is safe to add the other members of the config server replicaset.

rs.add("stage-cfg2.domain:27019")
rs.add("stage-cfg3.domain:27019")

1 2	rs.add("stage-cfg2.domain:27019") rs.add("stage-cfg3.domain:27019")

Next, we have to work on the shards themselves.

Restoring the Shards

Once you have restored the backup of each shard, do the following:

1. Start the member of each shard you restored in standalone mode:

sed -i 's/sharding:/#sharding:/' /etc/mongod.conf
sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf
sed -i 's/replication:/#replication:/' /etc/mongod.conf
sed -i 's/ replSetName: prod-shard-/# replSetName: stage-shard-/' /etc/mongod.conf

sed -i 's/sharding:/#sharding:/' /etc/mongod.conf

sed -i 's/ clusterRole: /# clusterRole: /' /etc/mongod.conf

sed -i 's/replication:/#replication:/' /etc/mongod.conf

sed -i 's/ replSetName: prod-shard-/# replSetName: stage-shard-/' /etc/mongod.conf

service mongod start

1	service mongod start

2. Create a temporary user with the __system role. This is needed to edit the system collections.

mongo -u$user -p$password --port=27018 <<EOF
db.createUser(
{
    user: "mySystemUser",
    pwd: "percona",
    roles: [ "__system" ]
}
)
EOF

mongo -u$user -p$password --port=27018 <<EOF

db.createUser(

{

user: "mySystemUser",

pwd: "percona",

roles: [ "__system" ]

}

)

EOF

3. Drop the local database

db.auth("mySystemUser","percona")
use local
db.dropDatabase()

db.auth("mySystemUser","percona")

use local

db.dropDatabase()

4. Drop the document that stores Oplog recovery information (we want to avoid any kind of recovery on start, just keep the data as is).

use admin
db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )

1 2	use admin db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )

5. Update shard identity metadata

db.system.version.find({"_id" : "shardIdentity"}).forEach(function(shard) {
    shard["configsvrConnectionString"]=shard["configsvrConnectionString"].replace(/prod/gi,'stage');
    shard["shardName"]=shard["shardName"].replace(/prod/gi,'stage');
    db.system.version.save(shard);
})

db.system.version.find({"_id" : "shardIdentity"}).forEach(function(shard) {

shard["configsvrConnectionString"]=shard["configsvrConnectionString"].replace(/prod/gi,'stage');

shard["shardName"]=shard["shardName"].replace(/prod/gi,'stage');

db.system.version.save(shard);

})

6. Remove all documents in the cached metadata collections. Basically anything starting with db.cache has to go.

use config
db.cache.chunks.config.system.sessions.remove({})
db.cache.collections.remove({})
db.cache.databases.remove({})
db.cache.chunks["mydb.mycol1"].remove({})
db.cache.chunks["mydb.mycol2"].remove({})
...

use config

db.cache.chunks.config.system.sessions.remove({})

db.cache.collections.remove({})

db.cache.databases.remove({})

db.cache.chunks["mydb.mycol1"].remove({})

db.cache.chunks["mydb.mycol2"].remove({})

...

7. Restart each shard as a single node replica set and add the other members. In this case, I am adding them as non-voting as we don’t want them to become primary by accident (at least until they finish syncing from the node that has actual data).

sed -i 's/#sharding:/sharding:/' /etc/mongod.conf
sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf
sed -i 's/#replication:/replication:/' /etc/mongod.conf
sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf

sed -i 's/#sharding:/sharding:/' /etc/mongod.conf

sed -i 's/# clusterRole: / clusterRole: /' /etc/mongod.conf

sed -i 's/#replication:/replication:/' /etc/mongod.conf

sed -i 's/# replSetName:/ replSetName:/' /etc/mongod.conf

service mongod restart

1	service mongod restart

rs.initiate( { _id: "stage-s01", members: [ { _id : 0, host :"stage-s01a.domain:27018" }, ] })
rs.add({host:"stage-s01b.domain:27018", priority: 0, votes: 0 })
rs.add({host:"stage-s01c.domain:27018", priority: 0, votes: 0 })

rs.initiate( { _id: "stage-s01", members: [ { _id : 0, host :"stage-s01a.domain:27018" }, ] })

rs.add({host:"stage-s01b.domain:27018", priority: 0, votes: 0 })

rs.add({host:"stage-s01c.domain:27018", priority: 0, votes: 0 })

8. Finally, don’t forget to remove the privileged user we had created.

use admin
db.removeUser("mySystemUser")

1 2	use admin db.removeUser("mySystemUser")

Final Words

While the procedure above works, it would be nice if MongoDB had some built-in script to help with this tedious task. Hopefully, future versions include something in this regard. If you get a chance to use this procedure, please leave a comment in the section below and let me know how it goes.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Kay Agahd

4 years ago

I had to do the same procedure in a different context and found it quite tedious due to the lack of official mongodb documentation. So thank you for the great write-up!

Ivan Groenewold

Author

Reply to Kay Agahd

4 years ago

Hi Kay, glad to hear you found it useful!

Anoop V

2 years ago

Thanks for the procedure. Found it very helpful. Since the cluster was authentication enabled we had to go through few extra checks also.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Restoring a MongoDB Sharded Cluster to a Different Environment

Overview of Restoring a MongoDB Sharded Cluster

Restoring the Config Servers

Restoring the Shards

Final Words

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Restoring a MongoDB Sharded Cluster to a Different Environment

Overview of Restoring a MongoDB Sharded Cluster

Restoring the Config Servers

Restoring the Shards

Final Words

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation