MongoDB 3.6 Change StreamsIn this post, I’ll look at what MongoDB 3.6 change streams are, in a creative way. Just in time for the holidays!

What is a change stream?

Change streams in MongoDB provide a cross-platform unified API that can be supported with sharding. It has an option for talking to secondaries, and even allows for security controls like restrictions and action controls.

How is this important? To demonstrate, I’ll walk through an example of using a smart oven with a Nest Thermostat to keep your kitchen from becoming a sauna while you bake a cake — without the need for you to moderate the room temperature yourself.

What does a change stream look like?

What can we watch?

We can use change streams to watch these actions:

  • Insert
  • Delete
  • Replace
  • Update
  • Invalidate

Why not just use the Oplog?

Any change presented in the oplog could be rolled back as it’s only single node durable. Change streams need at least one other node to receive the change. In general, this represents a majority for a typical three node replica-set.

In addition, change streams are resumable. Having a collector job that survives an election is easy as pie, as by default it will automatically retry once. However, you can also record the last seen token to know how to resume where it left off.

Finally, since this is sharding supported with the new cluster clock (wc in Oplog), you can trust the operations order you get, even across shards. This was problematic both with the old oplog format and when managing connections manually.

In short, this is the logical evolution of oplog scrapping, and helps fit a long help request to be able to tail the oplog via mongos, not per replica set.

So what’s the downside?

It’s estimated that after 1000 streams you will start to see very measurable performance drops. Why there is not a global change stream option to avoid having so many cursors floating around is not clear. I think it’s something that should be looked at for future versions of this feature. Up to now, many use cases of mongo, specifically in the multi-tenant world, might have > 1000 namespaces on a system. This would make the performance drop problematic.

What’s in a change stream anyhow?

The first thing to understand is that while some drivers will have db.collection.watch(XXXX) as a function, you could use, this is just an alias for an actual aggregation pipeline $changeStream. This means you could mix this with much more powerful pipelines, though you should be careful. Things like projection could break the ability to resume if the token is not passed on accurately.

So a change stream:

  1. Is a view of an oplog entry for a change This sometimes means you know the change contents, and sometimes you don’t, for example in a delete
  2. Is an explicit API for drivers and code, but also ensures you can get data via Mongos rather than having to connect directly to each node.
  3. Is scalable, resumable, and well ordered – even when sharded!
  4. Harnesses the power of aggregations.
  5. Provides superior ACL support via roles and privileges

Back to a real-world example. Let’s assume you have a Nest unit in your house (because none of us techies have those right?) Let’s also assume you’re fancy and have the Jenn-Air oven which can talk to the Nest. If you’re familiar with the Nest, you might know that its API lets you enable the Jenn-Air fan or set its oven temperature remotely. Sure the oven has a fan schedule to prevent it running at night, but its ability to work with other appliances is a bit more limited.

So for our example, assume you want the temperature in the kitchen to drop by 15 degrees F whenever the oven is on, and that the fan should run even if it’s outside its standard time window.

Hopefully, you can see how such an app, powered by MongoDB, could be useful? However, there are a few more assumptions, which we have already set up: a collection of “device_states” to record the original state of the temperature setting in the Nest; and to record the oven’s status so that we know how to reset the oven using the Nest once cooking is done.

As we know we have the state changes for the devices coming in on a regular basis, we could simply say:

This will watch for any changes to either of these devices whether it be inserting new states or updating old ones.

Now let’s assume anytime something comes in for the Nest, we are updating  db.nest_settings with that document. However, in this case, when the oven turns on we update a secondary document with an _id of “override” to indicate this is the last known nest_setting before the oven enabling. This means that we can revert to it later.

This would be accomplished via something like…

Change Event document

So you could easily run the follow from your code:

db.nest_settings.update({_id:"current"},{_id:"current",data: event.fullDocument})

Now the current document is set to the last checking from the Nest API.

That was simple enough, but now we can do even more cool things…

Change Event document

This next segment is mode pseudocode:

This code is doing a good deal, but it’s pretty basic at the same time:

  1. If the oven is on, but there is no override document, create one from the most recent thermostat settings.
  2. Decrease the current temp setting by 15, and then insert it with the “override” _id value
  3. If the power is set to on
    (a) read in the current override document
    (b) set the thermostat to that setting
    (c) enable the fan for 15 minutes
  4. If the power is now off
    (a) read in the current override document
    (b) set the thermostat to 15 degrees higher
    (c) set the fan to disabled

Assuming you are constantly tailing the watch cursor, this means you will disable the oven and fan as soon as the oven is off.

Hopefully, this blog has helped explain how change streams work by using a real-world logical application to keep your kitchen from becoming a sweat sauna while making some cake… and then eating it!