Mistakes can happen. If only we could go back in time to the very second before that mistake was made.
Act 1: The Disaster
Plain text version for those who cannot run the asciicast above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | akira@perc01:/data$ #OK, let's get this party started! akira@perc01:/data$ # The frontend has been shut down for 20 mins so they can akira@perc01:/data$ # update that part, and I can update the schema in he akira@perc01:/data$ # backend simultaneously. akira@perc01:/data$ #Easy-peasy ... akira@perc01:/data$ date Tue Jul 2 13:34:09 JST 2019 akira@perc01:/data$ #Just set my auth details.(NO PEEKING!) akira@perc01:/data$ conn_args="--host localhost:27017 --username akira --password secret --authenticationDatabase admin" akira@perc01:/data$ mongo ${conn_args} --quiet testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection testrs:PRIMARY> //Ah, there it is. Time to work! testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> db.TheImportantCollection.findOne() { "_id" : 0, "customer" : { "fn" : "Smith", "gn" : "Ken", "city" : "Georgevill", "street1" : "1 Wishful St.", "postcode" : "45031" }, "order_ids" : [ ] } testrs:PRIMARY> //Ah, there it is. The "customer" object that has the testrs:PRIMARY> //address fields in it. We're going to move those out. testrs:PRIMARY> //Copy the whole collection, adding the new "addresses" array testrs:PRIMARY> var counter = 0; testrs:PRIMARY> db.TheImportantCollection.find().forEach(function(d) { ... d["adresses"] = [ ]; ... db.TheImportantCollectionV2.insert(d); ... counter += 1; ... if (counter % 25000 == 0) { print(counter + " updates done"); } ... }); 25000 updates done 50000 updates done 75000 updates done 100000 updates done 125000 updates done 150000 updates done testrs:PRIMARY> //Cool. Let's look at the temp table testrs:PRIMARY> db.TheImportantCollectionV2.findOne() { "_id" : 0, "customer" : { "fn" : "Smith", "gn" : "Ken", "city" : "Georgevill", "street1" : "1 Wishful St.", "postcode" : "45031" }, "order_ids" : [ ], "adresses" : [ ] } testrs:PRIMARY> //?AH!! testrs:PRIMARY> //typo. I misspelled "addresses". testrs:PRIMARY> //I'll just drop this and go again testrs:PRIMARY> db.TheImportantCollectionV2.remove({}) WriteResult({ "nRemoved" : 174662 }) testrs:PRIMARY> //ooops. Why did I bother deleting the docs? testrs:PRIMARY> //I need to *drop* the collection testrs:PRIMARY> db.TheImportantCollection.drop() true testrs:PRIMARY> //!!!! testrs:PRIMARY> //Wait! testrs:PRIMARY> show collections TheImportantCollectionV2 testrs:PRIMARY> //... testrs:PRIMARY> //I've done a bad thing .... testrs:PRIMARY> //Let me see testrs:PRIMARY> //in the oplog testrs:PRIMARY> use local switched to db local testrs:PRIMARY> db.oplog.rs.findOne({"o.drop": "TheImportantCollection"}) { "ts" : Timestamp(1562042272, 1), "t" : NumberLong(6), "h" : NumberLong("6726633412398410781"), "v" : 2, "op" : "c", "ns" : "payments.$cmd", "ui" : UUID("abc9c1f9-71c0-45ea-aeba-ea239b975a95"), "wall" : ISODate("2019-07-02T04:37:52.171Z"), "o" : { "drop" : "TheImportantCollection" } } testrs:PRIMARY> //AH. 1562042272, you are the worst unix epoch second of my testrs:PRIMARY> // life. testrs:PRIMARY> |
Act 2: Time travel with a Snapshot restore + Oplog replay
Plain text version for those who cannot run the asciicast above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | akira@perc01:/data$ #OK, OK, this is bad. I dropped TheImportantCollection akira@perc01:/data$ #Breathe. Breathe Akira. akira@perc01:/data$ #Right! Backups! akira@perc01:/data$ #I have backups! akira@perc01:/data$ ls /backups/ 20190624_2300 20190626_2300 20190628_2300 20190625_2300 20190627_2300 20190629_2300 akira@perc01:/data$ #OK, I have one from 23:00 JST ... which is a while ago. akira@perc01:/data$ #I can use the latest backup, then roll forward from akira@perc01:/data$ # there using this neat thing you can do with akira@perc01:/data$ # mongorestore (the standard mongo utils command) akira@perc01:/data$ #You can replay a dumped oplog bson file akira@perc01:/data$ # on a primary like it was receiving as a secondary akira@perc01:/data$ #Just as a secondary can catch up from a primary so akira@perc01:/data$ # far the oplog window of time goes, a primary can akira@perc01:/data$ # be given an oplog history to replay, using this 'trick' akira@perc01:/data$ #(Not really a trick, but let's call it that) akira@perc01:/data$ akira@perc01:/data$ # akira@perc01:/data$ #But, before doing ANYTHING with the backups, akira@perc01:/data$ # get a full dump of the oplog of the *live* replicaset akira@perc01:/data$ # first akira@perc01:/data$ conn_args="--host localhost:27017 --username akira --password secret --authenticationDatabase admin" akira@perc01:/data$ mongodump ${conn_args} -d local -c oplog.rs --out /data/oplog_dump_full 2019-07-02T13:50:02.713+0900 writing local.oplog.rs to 2019-07-02T13:50:03.635+0900 done dumping local.oplog.rs (825815 documents) akira@perc01:/data$ #Oh wait. akira@perc01:/data$ #We *do* need a trick akira@perc01:/data$ #v3.6 and v4.0 added some system collections that cause akira@perc01:/data$ # mongorestore to fail, no matter what we do. akira@perc01:/data$ # This is just a 3.6 and 4.0 issue hopefully, but 4.2's akira@perc01:/data$ # behaviour is not known at this date. akira@perc01:/data$ #I'll do the dump again, removing these two collections akira@perc01:/data$ mongodump ${conn_args} -d local -c oplog.rs \ > --query '{"ns": {"$nin": ["config.system.sessions", "config.cache.collections"]}}' --out /data/oplog_dump_full 2019-07-02T13:52:08.841+0900 writing local.oplog.rs to 2019-07-02T13:52:10.010+0900 done dumping local.oplog.rs (825781 documents) akira@perc01:/data$ #So that was Trick #1. Removing those 2 specific akira@perc01:/data$ # config.* collections. akira@perc01:/data$ #Now for #Trick 2 akira@perc01:/data$ #mongodump puts the dumped oplog.rs.bson file in subdirectory "local" like that is a whole DB to restore. But you don't do a restore of local like any other DB, it doesn't work like that. akira@perc01:/data$ #So we MUST get rid of subdirectory structure and just keep the single *.bson file akira@perc01:/data$ ls -lR /data/oplog_dump_full/ /data/oplog_dump_full/: total 146032 drwxr-xr-x 2 akira akira 57 Jul 2 13:50 local -rw-r--r-- 1 akira akira 149534510 Jul 2 10:26 oplog.rs.bson /data/oplog_dump_full/local: total 233008 -rw-r--r-- 1 akira akira 238596091 Jul 2 13:52 oplog.rs.bson -rw-r--r-- 1 akira akira 120 Jul 2 13:52 oplog.rs.metadata.json akira@perc01:/data$ mv /data/oplog_dump_full/local/oplog.rs.bson /data/oplog_dump_full/ akira@perc01:/data$ rm -rf /data/oplog_dump_full/local akira@perc01:/data$ ls -lR /data/oplog_dump_full/ /data/oplog_dump_full/: total 233004 -rw-r--r-- 1 akira akira 238596091 Jul 2 13:52 oplog.rs.bson akira@perc01:/data$ #OK. akira@perc01:/data$ #Now let's look at this oplog. Does it go back as far as akira@perc01:/data$ # the latest backup snapshot or more? akira@perc01:/data$ ls /backups/ | tail -n 1 20190629_2300 akira@perc01:/data$ #By the way that is my JST timezone, not UTC akira@perc01:/data$ #let's see ... check the bson file's first timestamp akira@perc01:/data$ bsondump /data/oplog_dump_full/oplog.rs.bson 2>/dev/null | head -n 1 {"ts":{"$timestamp":{"t":1561727517,"i":1}},"h":{"$numberLong":"212971303912007811"},"v":2,"op":"n","ns":"","wall":{"$date":"2019-06-28T13:11:57.633Z"},"o":{"msg":"initiating set"}} akira@perc01:/data$ #I see the epoch timestamp there: 1561727517 akira@perc01:/data$ date -d @1561727517 Fri Jun 28 22:11:57 JST 2019 akira@perc01:/data$ #Ah, good, that's before 20190629_2300 akira@perc01:/data$ #We can do a oplog replay akira@perc01:/data$ #Just for sanity's sake let's look for that "drop" akira@perc01:/data$ # command that is the disaster we want to avoid replaying akira@perc01:/data$ bsondump /data/oplog_dump_full/oplog.rs.bson 2>/dev/null | grep drop | grep '\bTheImportantCollection\b' | tail -n 1 {"ts":{"$timestamp":{"t":1562042272,"i":1}},"t":{"$numberLong":"6"},"h":{"$numberLong":"6726633412398410781"},"v":2,"op":"c","ns":"payments.$cmd","ui":{"$binary":"q8nB+XHARequuuojm5dalQ==","$type":"04"},"wall":{"$date":"2019-07-02T04:37:52.171Z"},"o":{"drop":"TheImportantCollection"}} akira@perc01:/data$ #Let's see it was 1562042272, the worst epoch second of my akira@perc01:/data$ # my life. Let's not go there again! akira@perc01:/data$ #Time to shut the live replicaset down, restore a snapshot akira@perc01:/data$ # backup from 20190629_2300 akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 18119 mongod -f /data/n1/mongod.conf 18195 mongod -f /data/n2/mongod.conf 18225 mongod -f /data/n3/mongod.conf akira@perc01:/data$ kill 18119 18195 18225 akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND akira@perc01:/data$ #OK, shutdown akira@perc01:/data$ /data/dba_scripts/our_restore_script.sh usage: /data/dba_scripts/our_restore_script.sh XXXXXX Choose one of these subdirectory names from /backups/: 20190624_2300 20190625_2300 20190626_2300 20190627_2300 20190628_2300 20190629_2300 akira@perc01:/data$ /data/dba_scripts/our_restore_script.sh 20190629_2300 Stopping mongod nodes Restoring backup 20190629_2300 to one node dbpath Restarting about to fork child process, waiting until server is ready for connections. forked process: 21776 child process started successfully, parent exiting akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 21776 mongod -f /data/n1/mongod.conf akira@perc01:/data$ #I'll start the secondaries too akira@perc01:/data$ rm -rf /data/n2/data/* akira@perc01:/data$ mongod -f /data/n2/mongod.conf about to fork child process, waiting until server is ready for connections. forked process: 21859 child process started successfully, parent exiting akira@perc01:/data$ rm -rf /data/n3/data/* akira@perc01:/data$ mongod -f /data/n3/mongod.conf about to fork child process, waiting until server is ready for connections. forked process: 21896 child process started successfully, parent exiting akira@perc01:/data$ ps -C mongod -o pid,args PID COMMAND 21776 mongod -f /data/n1/mongod.conf 21859 mongod -f /data/n2/mongod.conf 21896 mongod -f /data/n3/mongod.conf akira@perc01:/data$ #I'm going to check my important collection is there again akira@perc01:/data$ mongo ${conn_args} MongoDB shell version v4.0.10 connecting to: mongodb://localhost:27017/?authSource=admin&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("e5aa9b27-f26b-4c73-bdc1-bdaf494cf7ab") } MongoDB server version: 4.0.10 testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection testrs:PRIMARY> //YES testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> db.TheImportantCollection.findOne() { "_id" : 0, "customer" : { "fn" : "Smith", "gn" : "Ken", "city" : "Georgevill", "street1" : "1 Wishful St.", "postcode" : "45031" }, "order_ids" : [ ] } testrs:PRIMARY> //Yes yes yes ... I live testrs:PRIMARY> bye akira@perc01:/data$ #So the data is back ... but only some time way in the akira@perc01:/data$ # past. I want to replay up until ... akira@perc01:/data$ bad_drop_epoch_sec=1562042272 akira@perc01:/data$ #Trick 3: mongorestore always expects a directory name akira@perc01:/data$ #We don't need any directories, but it's just hard-coded akira@perc01:/data$ # to expect one. So let's make one. Can be anywhere akira@perc01:/data$ # Just not a subdirectory under the oplog dump location please, that will confuse it maybe akira@perc01:/data$ mkdir /tmp/fake_empty_dir mkdir: cannot create directory ‘/tmp/fake_empty_dir’: File exists akira@perc01:/data$ #Ah, I got it already. akira@perc01:/data$ ls /tmp/fake_empty_dir akira@perc01:/data$ mongorestore ${conn_args} \ > --oplogReplay \ > --oplogFile /data/oplog_dump_full/oplog.rs.bson \ > --oplogLimit ${bad_drop_epoch_sec}:0 \ > --stopOnError /tmp/fake_empty_dir 2019-07-02T14:04:35.742+0900 preparing collections to restore from 2019-07-02T14:04:35.742+0900 replaying oplog 2019-07-02T14:04:38.715+0900 oplog 5.47MB 2019-07-02T14:04:41.715+0900 oplog 11.0MB 2019-07-02T14:04:44.715+0900 oplog 16.6MB 2019-07-02T14:04:47.715+0900 oplog 22.2MB 2019-07-02T14:04:50.715+0900 oplog 27.6MB 2019-07-02T14:04:53.715+0900 oplog 32.8MB 2019-07-02T14:04:56.715+0900 oplog 37.9MB 2019-07-02T14:04:59.715+0900 oplog 43.0MB 2019-07-02T14:05:02.715+0900 oplog 48.3MB 2019-07-02T14:05:05.715+0900 oplog 53.9MB 2019-07-02T14:05:08.715+0900 oplog 59.5MB 2019-07-02T14:05:11.715+0900 oplog 65.1MB 2019-07-02T14:05:14.715+0900 oplog 70.2MB 2019-07-02T14:05:17.715+0900 oplog 75.0MB 2019-07-02T14:05:20.715+0900 oplog 79.6MB 2019-07-02T14:05:23.715+0900 oplog 84.1MB 2019-07-02T14:05:26.715+0900 oplog 88.5MB 2019-07-02T14:05:29.715+0900 oplog 93.0MB 2019-07-02T14:05:32.715+0900 oplog 97.6MB 2019-07-02T14:05:35.715+0900 oplog 101MB 2019-07-02T14:05:38.715+0900 oplog 104MB 2019-07-02T14:05:41.715+0900 oplog 107MB 2019-07-02T14:05:44.715+0900 oplog 110MB 2019-07-02T14:05:47.715+0900 oplog 113MB 2019-07-02T14:05:50.715+0900 oplog 115MB 2019-07-02T14:05:53.715+0900 oplog 118MB 2019-07-02T14:05:56.715+0900 oplog 123MB 2019-07-02T14:05:59.715+0900 oplog 128MB 2019-07-02T14:06:02.715+0900 oplog 133MB 2019-07-02T14:06:05.715+0900 oplog 138MB 2019-07-02T14:06:08.715+0900 oplog 142MB 2019-07-02T14:06:11.715+0900 oplog 146MB 2019-07-02T14:06:14.715+0900 oplog 151MB 2019-07-02T14:06:17.715+0900 oplog 156MB 2019-07-02T14:06:20.715+0900 oplog 161MB 2019-07-02T14:06:23.715+0900 oplog 166MB 2019-07-02T14:06:26.715+0900 oplog 171MB 2019-07-02T14:06:29.715+0900 oplog 176MB 2019-07-02T14:06:32.715+0900 oplog 181MB 2019-07-02T14:06:35.715+0900 oplog 186MB 2019-07-02T14:06:38.715+0900 oplog 192MB 2019-07-02T14:06:41.715+0900 oplog 197MB 2019-07-02T14:06:44.715+0900 oplog 201MB 2019-07-02T14:06:47.715+0900 oplog 204MB 2019-07-02T14:06:50.715+0900 oplog 206MB 2019-07-02T14:06:53.715+0900 oplog 209MB 2019-07-02T14:06:56.715+0900 oplog 211MB 2019-07-02T14:06:59.715+0900 oplog 213MB 2019-07-02T14:07:02.715+0900 oplog 216MB 2019-07-02T14:07:05.715+0900 oplog 218MB 2019-07-02T14:07:08.715+0900 oplog 220MB 2019-07-02T14:07:11.715+0900 oplog 223MB 2019-07-02T14:07:14.715+0900 oplog 225MB 2019-07-02T14:07:17.715+0900 oplog 227MB 2019-07-02T14:07:17.753+0900 oplog 227MB 2019-07-02T14:07:17.753+0900 done akira@perc01:/data$ #Yay! I hope! Let's check akira@perc01:/data$ mongo ${conn_args} MongoDB shell version v4.0.10 connecting to: mongodb://localhost:27017/?authSource=admin&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("302f2c26-7416-4e18-bd02-1bd67626d062") } MongoDB server version: 4.0.10 testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection TheImportantCollectionV2 testrs:PRIMARY> //Yes! both there! testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> //plus the 'V2' table I was working on when I made my testrs:PRIMARY> // 'fat thumb' mistake testrs:PRIMARY> //There we go, a point-in-time restore from a snapshot testrs:PRIMARY> // backup + a mongorestore --oplogReplay --oplogFile testrs:PRIMARY> // operation. testrs:PRIMARY> //Hold on for one last trick (which I didn't have to use today) testrs:PRIMARY> // Trick #4: ultimate permissions are sometimes needed. testrs:PRIMARY> // The config.system.sessions and config.transactions(?) testrs:PRIMARY> // system collections are currently unreplayable (3.6, 4.0, testrs:PRIMARY> // 4.2. TBD). testrs:PRIMARY> // They are not the only system collections that you can stuck on, because systems collections are mostly not covered by the "backup" and "restore" built-in roles. testrs:PRIMARY> // E.g. if you are replaying updates to the admin.system.users testrs:PRIMARY> // collection that will fail. testrs:PRIMARY> // But you can allow if you make a *custom* role that grants testrs:PRIMARY> // "anyAction" on "anyResource" (see the docs), and grant that testrs:PRIMARY> // to your backup and restore user, that will make it possible for those to succeed too. testrs:PRIMARY> //good night testrs:PRIMARY> |
The ‘TLDR’
The oplog of the damaged replicaset is your valuable, idempotent history if you have a backup from a recent enough time to apply it on.
- Identify your disaster operation’s timestamp value in the oplog.
- Before shutting the damaged replicaset down:
mongodump connection-args --db local --collection oplog.rs
- (Necessary workaround #1) use a
--query '{"ns": {"$nin": ["config.system.sessions", "config.transactions", "config.transaction_coordinators"]}}'
argument to avoid transaction-related system collections from v3.6 and v4.0 (and maybe 4.2+ too) that can’t be restored.
- (Necessary workaround #1) use a
- (Necessary workaround #2) Get rid of the subdirectory structure mongodump makes and keep just the oplog.rs.bson file.
- (Necessary workaround #3) Make a fake, empty directory somewhere else too, to trick mongorestore later.
- Use
bsondump oplog.rs.bson | head -n 1
to check that this oplog starts before the time of your last backup - Shut the damaged DB down.
- Restore to the latest backup before the disaster.
- (Possibly-required workaround #4) If the oplog updates other system collections, create a user-defined role that grants anyAction on anyResource and grants it to your user as well. (See special section on system collections below.)
- Replay up to but not including the disaster second: mongorestore connection-args –oplogReplay –oplogFile oplog.rs.bson –oplogLimit disaster_epoch_sec:0 /tmp/fake_empty_directory
See the ‘Act 2’ video for the details.
So how did that work?
If you’re having the kind of disaster presented in this article I assume you are already familiar with the mongodump and mongorestore tools and MongoDB Oplog idempotency. Taking that for granted let’s go to the next level of detail.
The applyOps
command – Kinda secret; Actually public
In theory you could iterate oplog documents and write an application that runs an insert command for an “i” op, an update for the “u” ops, various different commands for the “c” op, etc, but the simpler way is to submit them as they are (well almost exactly as they are) using the applyOps command, and this is what the mongorestore tool does.
The permission to run applyOps is granted to the “restore” role for all non-system collections, and there is no ‘reject if a primary’ rule. So you can make a primary apply oplog docs like a secondary does.
N.b. for some system collections, the “restore” role is not enough. See the bottom section for more details.
It might seem a bit strange users can have this privilege but without it, there would be no convenient way for dump-restore tools to guarantee consistency. The “consistency” here means all that the restored data will be exactly as it was at some point in time – the end of the dump – and not contain earlier versions of documents from some midpoint time during the dumping process.
Achieving that data consistency is why the --oplog
option for mongodump was created, and why mongorestore has the matching --oplogReplay
option. (Those two options should be on by default i.m.o. but they are not). The short oplog span made during a normal dump will be at <dump_directory>/oplog.rs.bson, but the --oplogFile
argument lets you choose any arbitrary path.
--oplogLimit
We could have limited the oplog docs during mongodump to only include those before the disaster time with –query parameter such as the following:
mongodump ... --query '{"ts": {"$lt": new Timestamp(1560915610, 0)}}' ...
But --oplogLimit
makes it easier. You can dump everything, but then use --oplogLimit <epoch_sec_value>[:<counter>]
when you run mongorestore with the –oplogReplay argument.
If you’re getting confused about whether it’s UTC or your server timezone – it’s UTC. All timestamps inside MongoDB are UTC if they represent ‘wall clock’ times, and for ‘logical clocks’ timezone is a non-applicable concept.
When the oplog includes system collection updates
In the built-in roles documentation, inserted after the usual and mostly fair warnings on why you should not grant users the most powerful internal role, comes this extra note that tells you what you actually need to do to allow oplog-replay updates on all system collections too:
If you need access to all actions on all resources, for example to run applyOps commands … create a user-defined role that grants anyAction on anyResource and ensure that only the users who need access to these operations have this access.
Translation: if your oplog replay fails because it hit a system collection update the “restore” role doesn’t cover, upgrade your user to be able to run with all the privileges that a secondary runs oplog replication with.
1 2 3 4 5 6 7 8 9 10 11 12 13 | use admin db.createRole({ "role": "CustomAllPowersRole", "privileges": [ { "resource": { "anyResource": true }, "actions": [ "anyAction" ] }, ], "roles": [ ] }); db.grantRolesToUser("<bk_and_restore_username>", [ "CustomAllPowersRole" ]) //For afterwards: //use admin //db.revokeRolesFromUser("<bk_and_restore_username>", [ "CustomAllPowersRole" ]) //db.dropRole("CustomAllPowersRole") |
Alternatively, to granting the role shown above, you could restart the mongod with security disabled; in this mode, all operations work without access control restrictions.
It’s not quite as simple as that though because transaction stuff is currently (v3.6, v4.0) throwing a spanner in the works. So I’ve found explicitly excluding config.system.sessions and config.transactions during mongodump is the best way to avoid those updates. They are logically unnecessary in a restore because the sessions/transactions finished when the replica set was completely shut down.
Akira, well written, very helpful, thank you very much!