When stepDown() Goes WrongWe get to see and troubleshoot a lot of different problems at Percona. Here’s the latest one that got me scratching my head for a while recently.

The scenario

We have a sharded cluster environment running MongoDB 4.0 that needs to be upgraded to MongoDB 4.2. Easy right? The only thing particular about this environment is that MongoDB was running on a custom Docker environment in AWS.

We started with the usual approach of disabling the balancer and upgrading the config server’s replica set first. In this case, the config server’s replica set had three members running MongoDB 4.0. Instead of upgrading them in place, we chose to add three new members running MongoDB 4.2. So now we have a total of six nodes. The next step was to stepDown the primary to one of the new 4.2 nodes and finally decommission the old servers.

Testing the migration plan

We started with the plan in a non-prod environment. At first, everything was ok; we got the config server’s replica set to six members and set the priorities so that one of the new 4.2 servers was the only candidate to become a primary after stepping down from the current one. So we went ahead and ran the rs.stepDown() command as usual and here’s when things started to go wrong. Clients suddenly started reporting the following message:

My first thought was something must be wrong at the network layer, but checking connectivity between all hosts revealed no problems. Next, we looked at Docker, but everything seemed ok there as well.

Digging deeper

We connected locally to the server that was configured to become the primary, and there we saw a strange situation. The server was not able to complete the step-up process to become the primary, and was halfway through it:

Looking at the last four lines we can see that host6 is supposed to be the primary but it is not fully promoted. We also checked db.currentOp() and it revealed all sessions seemed to be waiting for some lock related to the replica set state transition, and the following operation seemed stuck creating an index for the config.chunks collection:

What was strange is that this collection contained only a few documents, so this operation should be really fast (also the index already existed).

At this point, we suspected we might be hitting some bug and started looking at the many problems reported about deadlocks in the stepDown/stepUp process. Unfortunately, we came up empty-handed again.

The solution

Next, we checked the configuration of the config server replica set itself and noticed something unusual about it on the “settings” section of rs.conf():

The getLastErrorDefaults setting is omitted most of the time when running rs.initialize(), as the write concern is usually controlled on a per-session basis. In this case, the config server replica set had been initialized with a getLastErrorDefaults of { w: majority, j: true } instead of the default value of { w: 1 }.

We tried resetting the replica set configuration to the default value as follows:

After doing this, the rs.stepDown() worked flawlessly and we were able to have our 4.2 primary in place.

Conclusion

The moral of the story is that we should check the default values for write concern at the replica set level when promoting a MongoDB 4.2 primary for the first time.

The non-default write concern hadn’t been a problem for elections before, so something changed in MongoDB 4.2 that triggered this behavior. This issue was specific to the config servers, the same didn’t happen with regular shards replica sets.

Interestingly, starting in MongoDB 5.0 we can no longer specify a default write concern using the settings.getLastErrorDefaults for a replica set.

This also proves the value of having a test setup that mimics the real production environment, to be able to test and catch these issues before they bite us in production.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments