In this blog post, we will discuss how we can migrate data from MongoDB Atlas to self-hosted MongoDB. There are a couple of third-party tools in the market to migrate data from Atlas to Pecona Server for MongoDB (PSMDB), like MongoPush, Hummingbird, and MongoShake. Today, we are going to discuss how to use MongoShake and migrate and sync the data from Atlas to PSMDB.
NOTE: These tools are not officially supported by Percona.
MongoShake is a powerful tool that facilitates the migration of data from one MongoDB cluster to another. These are step-by-step instructions on how to install and utilize MongoShake for data migration from Atlas to PSMDB. So, let’s get started!
Prerequisites:
A MongoDB Atlas account. I created a test account (replica set) and loaded sample data with one click in Atlas:
- Create an account in Atlas.
- Create a cluster.
- Once a cluster is created, go to browse collections.
- It will ask for load sample data. Once you click on it, you will see the sample data like below.12345678910111213Atlas atlas-mhnnqy-shard-0 [primary] test> show dbssample_airbnb 52.69 MiBsample_analytics 9.44 MiBsample_geospatial 1.23 MiBsample_guides 40.00 KiBsample_mflix 109.43 MiBsample_restaurants 6.42 MiBsample_supplies 1.05 MiBsample_training 46.77 MiBsample_weatherdata 2.59 MiBadmin 336.00 KiBlocal 20.35 GiBAtlas atlas-mhnnqy-shard-0 [primary] test>
An EC2 instance with PSMDB installed. I installed PSMDB on the EC2 machine:
1 2 3 4 5 6 7 | rs0 [direct: primary] test> rs0 [direct: primary] test> show dbs admin 40.00 KiB config 12.00 KiB local 40.00 KiB rs0 [direct: primary] test> |
Make sure Atlas and PSMDB both have the same DB version (I have also used this tool on MongoDB 4.2, which is already EOL).
PSMDB version:
1 2 3 | rs0 [direct: primary] test> db.version() 6.0.9-7 rs0 [direct: primary] test> |
MongoDB Atlas version:
1 2 3 | Atlas atlas-mhnnqy-shard-0 [primary] test> db.version() 6.0.10 Atlas atlas-mhnnqy-shard-0 [primary] test> |
To install MongoShake, follow these steps:
Step 1: Install Go
Ensure that Go is installed on your system. If not, download it from the official website and follow the installation instructions. I used Amazon Linux 2, so used the below command to install go:
1 | sudo yum install golang -y |
Step 2: Install MongoShake
Open the terminal and run the following command to install MongoShake:
1 | git clone https://github.com/alibaba/MongoShake.git |
- Untar the file; it will create a folder with the name Mongoshake.
- cd MongoShake.
- Run ./build.sh file.
Once you have installed MongoShake, you need to configure it for the migration process. Here’s how:
- Configuration file (collector.conf) will be under conf dir under Mongoshake dir.
- In the config file, you can edit the URI for both RS or sharded clusters. Also, the tunnel (how you are migrating the data) method. If you are doing it directly, then the value will be direct. You can edit the log file path and log file name. Below are some important parameters:1234mongo_urls = mongodb+srv://gautam:****@cluster0.teeeayh.mongodb.net/ // Atlas conn stringTunnel.address = mongodb:127.0.0.1:27017 // PSMDB conn stringSync_mode = all // default incrlog.dir = /home/percona/MongoShake/log/ // default /root/mongoshake/
Sync_mode other options: all/full/incr.
- All means full synchronization + incremental synchronization. (copy the data and apply the oplogs after sync completes).
- Full means full synchronization only. (only copy the data).
- Incr means incremental synchronization only. (only apply the oplog).
There are other parameters as well in the configuration file, which you can tune as per your needs. For example, if you want to read data from the Secondary node and do not want to overwhelm the Primary with the reads, you can set below parameter:
1 | mongo_connect_mode = secondaryPreferred |
Step 3: Once you are done with the configuration, run MongoShake in a screen session like the one below:
1 | ./bin/collector.linux -conf=conf/collector.conf -verbose 0 |
Step 4: Monitor the log file in the log directory to check the progress of migration.
Below is the sample log when you start MongoShake:
1 2 3 4 5 6 7 8 9 10 11 | [2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully [2023/09/25 21:09:13 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ [2023/09/25 21:09:13 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully [2023/09/25 21:09:19 UTC] [INFO] Close client with mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ [2023/09/25 21:09:19 UTC] [INFO] GetAllTimestamp biggestNew:{1695675385 26}, smallestNew:{1695675385 26}, biggestOld:{1695668185 9}, smallestOld:{1695668185 9}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}] [2023/09/25 21:09:19 UTC] [INFO] all node timestamp map: map[atlas-mhnnqy-shard-0:{7282839399442677769 7282870323207208986}] CheckpointStartPosition:{1 0} [2023/09/25 21:09:19 UTC] [INFO] New session to mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/ successfully [2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 Regenerate checkpoint but won't persist. content: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1} [2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 checkpoint using mongod/replica_set: {"name":"atlas-mhnnqy-shard-0","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}, ckptRemote set? [false] [2023/09/25 21:09:19 UTC] [INFO] atlas-mhnnqy-shard-0 syncModeAll[true] ts.Oldest[7282839399442677769], confTsMongoTs[4294967296] [2023/09/25 21:09:19 UTC] [INFO] start running with mode[all], fullBeginTs[7282870323207208986[1695675385, 26]] |
You will see the below log once full sync is completed, and incr will start (incr means it will start syncing live data via oplog):
1 2 3 4 5 | [2023/09/25 22:12:04 UTC] [INFO] GetAllTimestamp biggestNew:{1695679924 3}, smallestNew:{1695679924 3}, biggestOld:{1695677613 1}, smallestOld:{1695677613 1}, MongoSource:[url[mongodb+srv://gautam:***@cluster0.teeeayh.mongodb.net/], name[atlas-mhnnqy-shard-0]], tsMap:map[atlas-mhnnqy-shard-0::{7282879892394344449 7282889818063765507}] [2023/09/25 22:12:04 UTC] [INFO] ------------------------full sync done!------------------------ [2023/09/25 22:12:04 UTC] [INFO] oldestTs[7282879892394344449[1695677613, 1]] fullBeginTs[7282889689214746625[1695679894, 1]] fullFinishTs[7282889818063765507[1695679924, 3]] [2023/09/25 22:12:04 UTC] [INFO] finish full sync, start incr sync with timestamp: fullBeginTs[7282889689214746625[1695679894, 1]], fullFinishTs[7282889818063765507[1695679924, 3]] [2023/09/25 22:12:04 UTC] [INFO] start incr replication |
You will see the logs like this when both nodes are in sync (when lag is 0, i.e., tps=0):
1 2 3 4 | [2023/09/25 22:14:41 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]] [2023/09/25 22:14:46 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=24, filter=24, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]] [2023/09/25 22:14:51 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]] [2023/09/25 22:14:56 UTC] [INFO] [name=atlas-mhnnqy-shard-0, stage=incr, get=25, filter=25, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 00:00:00}, lsn_ack={0[0, 0], 1970-01-01 00:00:00}]] |
Once the full data replication process is complete and both clusters are in sync, you can stop pointing the application to Atlas. Check the logs of MongoShake, and when the lag is 0, as we can see in the above logs, stop the replication/sync from Atlas or stop MongoShake. Verify that the data has been successfully migrated to PSMDB. You can use MongoDB shell or any other client to connect to the PSMDB instance to verify this.
MongoDB Atlas databases and their collection count:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | Database: sample_airbnb ----- Collection 'listingsAndReviews' documents: 5555 Database: sample_analytics ----- Collection 'transactions' documents: 1746 Collection 'accounts' documents: 1746 Collection 'customers' documents: 500 Database: sample_geospatial ----- Collection 'shipwrecks' documents: 11095 Database: sample_guides ----- Collection 'planets' documents: 8 Database: sample_mflix ----- Collection 'embedded_movies' documents: 3483 Collection 'users' documents: 185 Collection 'theaters' documents: 1564 Collection 'movies' documents: 21349 Collection 'comments' documents: 41079 Collection 'sessions' documents: 1 Database: sample_restaurants ----- Collection 'neighborhoods' documents: 195 Collection 'restaurants' documents: 25359 Database: sample_supplies ----- Collection 'sales' documents: 5000 Database: sample_training ----- Collection 'posts' documents: 500 Collection 'trips' documents: 10000 Collection 'grades' documents: 100000 Collection 'routes' documents: 66985 Collection 'inspections' documents: 80047 Collection 'companies' documents: 9500 Collection 'zips' documents: 29470 Database: sample_weatherdata ----- Collection 'data' documents: 10000 Atlas atlas-mhnnqy-shard-0 [primary] sample_weatherdata> |
PSDMB databases and their collection count:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | rs0 [direct: primary] test> show dbs admin 80.00 KiB config 240.00 KiB local 468.00 KiB mongoshake 56.00 KiB sample_airbnb 52.20 MiB sample_analytics 9.21 MiB sample_geospatial 984.00 KiB sample_guides 40.00 KiB sample_mflix 108.17 MiB sample_restaurants 5.57 MiB sample_supplies 980.00 KiB sample_training 40.50 MiB sample_weatherdata 2.39 MiB rs0 [direct: primary] test> |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | Database: sample_airbnb ----- Collection 'listingsAndReviews' documents: 5555 Database: sample_analytics ----- Collection 'transactions' documents: 1746 Collection 'accounts' documents: 1746 Collection 'customers' documents: 500 Database: sample_geospatial ----- Collection 'shipwrecks' documents: 11095 Database: sample_guides ----- Collection 'planets' documents: 8 Database: sample_mflix ----- Collection 'embedded_movies' documents: 3483 Collection 'users' documents: 185 Collection 'theaters' documents: 1564 Collection 'movies' documents: 21349 Collection 'comments' documents: 41079 Collection 'sessions' documents: 1 Database: sample_restaurants ----- Collection 'neighborhoods' documents: 195 Collection 'restaurants' documents: 25359 Database: sample_supplies ----- Collection 'sales' documents: 5000 Database: sample_training ----- Collection 'posts' documents: 500 Collection 'trips' documents: 10000 Collection 'grades' documents: 100000 Collection 'routes' documents: 66985 Collection 'inspections' documents: 80047 Collection 'companies' documents: 9500 Collection 'zips' documents: 29470 Database: sample_weatherdata ----- Collection 'data' documents: 10000 rs0 [direct: primary] sample_weatherdata> |
Above, you can see we have verified data in PSMDB. Now, update the connection string of the application to point to PSMDB.
NOTE: Sometimes, during the migration process, it is possible for some indexes to replicate. So, during the data verification process, please verify the indexes, and if an index is missing, create that index before the cutover time.
Conclusion
MongoShake simplifies the process of migrating MongoDB data from Atlas to self-hosted MongoDB. Percona experts can assist you with migration as well. By following the steps outlined in this blog, you can seamlessly install, configure, and utilize MongoShake for migrating your data from MongoDB Atlas.
To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?
Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.
Download Percona Distribution for MongoDB Today!