Recently I noticed the site www.isdocumentdbreallymongodb.com. The page headline is:

In short, MongoDB is saying that AWS DocumentDB, one of its DB-as-a-service competitors, is not even half-compatible for existing users of its DBAAS MongoDB Atlas.

I think the claim above is terrible, for two reasons:

  1. Innumeracy is a sin. Feature compatibility is not a percentage, it is a vector operation. For different perspectives of compatibility (as it applies to you, to the abstract product, to the entire user population) there are different results but they all start with a feature vector/set.
  2. The pass-or-fail records were not counting features, the implicit unit of compatibility. The random error output shown below the header does look compelling, but the otherwise respectable buildscripts/resmoke.py continuous integration tests are being used as a smokescreen. (See “The ‘smoke’ tests” section below.)

Compatibility Algebra

Given the vectors/sets: Available features FA; Necessary features FN; Feature Popularity Weights W

Your Viewpoint: Boolean

(FN ⨯ FA) == FN    Compatible; Migration is possible.

Set algebra expression FN ⊆ FA is an easier way to say it of course, but for the sake of weighting features by popularity later I suggest a vector of 0, 1 values and this cross-product statement.

Product Comparison Checklists: Boolean Vector

FAA ⨯ FAB    Feature compatibility vector between products A and B

There is set algebra way to express this too: FAA ∩ FAB, but again for the sake of weighting features by popularity, I suggest vectors of 0, 1 values cross-product’ed to a new vector of 0, 1 values.

Flattening to a Scalar, Indicative ‘Score’:

count(FAA⨯FAB) / count(FAA)    Simple score of B’s compatibility in A’s feature set.

sum(W ⨯ (FAA⨯FAB)) / sum(W ⨯ FAA)    Weighted feature compatibility of B in A’s feature set (to recognize some features are more popular than others)

Database Provider’s Viewpoint: Matrix

Count(Apps where (FN⨯FA) == FN) / Count(Apps) App population compatibility

Example with Hypothetical, Mini-MongoDB Feature Set

Columns in red are less commonly used features that the new DBAAS ‘B’ doesn’t yet have compared to the older DBAAS ‘A’.

Company, App

Core
CRUD
Normal
Indexes
Geo
Index
Aggregation
$group
Aggregation
$graphLookup
Acme Inc.App P
App Q
BetterAppsRUsAlphaWow
BetaWow
GammaWow

 

FAA ⨯ FAB= [ Core CRUD: true, normal index: true, geo index: false, etc. ]
= [ 1, 1, 0, 1, 0 ]
count(FAA⨯FAB) / count(FAA)= sum([ 1, 1, 0, 1, 0 ]) / sum([ 1, 1, 1, 1, 1 ])
= 0.60
W (popularity, across apps)= [ 5/5, 5/5, 1/5, 4/5, 1/5 ]
= [ 1.0, 1.0, 0.2, 0.8, 0.2 ]
sum(W ⨯ (FAA⨯FAB)) / sum(W ⨯ FAA)= sum([ 1.0, 1.0, 0.2, 0.8, 0.2 ] ⨯ [ 1, 1, 0, 1, 0 ]) / sum( [ 1.0, 1.0, 0.2, 0.8, 0.2 ] )
= 2.8 / 3.2
= 0.875
Compatible for App P= true
Compatible for App Q= false
Compatible for Acme Inc= false
Compatible customers= 0 / 2
= 0

To put it into English: DBAAS B is only 60% compatible with DBAAS A by a count of features. But weighted by the popularity of the features people actually used, it ticks 87.5% of the boxes potential users would look for. In the end, though, probably neither company would migrate. Both have apps that could migrate but also ones that couldn’t.

The ‘Smoke’ Tests

The “1182 [tests]” mentioned in www.isdocumentdbreallymongodb.com sound as though they should be the FAA boolean vector of features that I want. But they aren’t.

The tests (Jan 2019 results download link) are a subset of the mongo javascript shell resmoke tests included in the core MongoDB server code. The five dbaas_*.yml test suites used are sensible-for-DBAAS subsets of the same core, aggregation, change_streams, decimal, and json_schema test suites in the normal v3.6.9 repo source.

Every single script is a valid part of continuous integration testing – none of it is cruft, or junk, for that purpose. Running them myself I ended up with exactly the same numbers shown in the table in this competition-comparison page at mongodb.com.

But the scripts are not 1-to-1 tests of features. There are more than twice as many test scripts as features. Most features have a single script, but a minority have up to 5 or even 10 scripts.

Secondly, many tests end up with a “fail” result because of entanglement with another feature unrelated to the one nominally being tested. Examples:

  • AWS DocumentDB does not support collation, so that is one family of features it lacks (collection option; index option; case level, normalization, etc.). But cross-checks on collation for 37 other test scripts caused those to be reported as failures too.
  • A missing high-verbosity option for explain plans (“executionStats”) contributes another 30+ failures.

Over 100 tests that neither failed nor passed cleanly (status was ‘further investigation needed’) were also marked “fail”.

Based on what I sampled, I estimate that in truth 65% +/- 15% of MongoDB v3.6 features are covered by AWS DocumentDB. And, as is natural in software evolution, the as-of-yet unsupported features tend to be the least-used ones.

Can I Migrate From MongoDB Atlas to AWS DocumentDB or Not?

This is the compatibility equation that evaluates to a boolean.

If you can’t live without anything in the list below then, as of Dec 2019, AWS DocumentDB compatibility is false for you.

Basically All of the New Features in v4.0 or v4.2

  • Transactions
  • Wildcard indexes
  • Aggregation $merge stage
  • etc.

Typical DBMS Role-Based Access Control

AWS DocumentDB grants a fixed, high level of privilege (dbAdminAnyDatabase + readWriteAnyDatabase + clusterAdmin) to any user-created for the cluster (doc link). There is no grantRolesToUser command that would afford you the ability to reduce the privileges, e.g. to make a read-only role.

MongoDB v3.6 vs DocumentDB as of Dec 2019 Differences

Commonly-Used Features (Estimated Popularity Weights 0.7 ~ 0.3)

  • Capped Collections
  • Change streams: (AWS comments vs. MongoDB comments)
  • Indexing a nested array field within a compound index.
  • Full-text indexes
  • Accessing the oplog for your own fun and games.
  • $out

Features with Medium Popularity (Estimated Popularity Weights 0.3 ~ 0.1)

  • collStats (or $collStats)
  • Tailable cursors
  • Aggregation operators
    • $arrayToObject, $filter, $indexOfArray, $objectToArray, $range, $reverseArray, $reduce, $slice, $zip
    • $ceil, $exp, $floor, $ln, $log, $log10, $mod, $pow, $sqrt, $trunc
    • $dateFromParts,$dateToParts, $dateFromString
  • Geo indexes
  • Case-insensitive indexes
  • Partial indexes
  • GridFS
  • executionStats option of explain()

Uncommonly Used Features (Estimated Popularity Weights < 0.1)

  • MapReduce
  • Aggregation stages:
    • $bucket, $facet
    • $graphLookup
  • Aggregation operators:
    • $setEquals, $setIntersection, $setUnion, $setDifference, $setIsSubset, $anyElementTrue, $allElementsTrue
    • $ifNull, $switch, $map, $let
    • $literal, $type
    • $stdDevPop, $stdDevSamp, $mergeObjects
    • $replaceRoot, $$REMOVE, $$CURRENT, $$DESCEND, etc.
  • Collation
  • JSON schema validation
  • Null characters permitted in BSON string.
  • Javascript function as a datatype
  • regex as a datatype
  • Decimal128 data type

N.b. The popularity weights above are only my own estimates, based on what I’ve seen of several hundred MongoDB deployments.

Unsupported in AWS DocDB, but Also Deprecated in v4.0 or v4.2 in MongoDB.

  • $where, $eval (they run server-side javascript)
  • The pre-pipeline legacy aggregation commands group(), distinct()
  • copydb()

How to Check What Features You Currently Use

For a DBA being asked ‘Do we currently use any of the features on the incompatible list?‘ is probably a mini-nightmare; more often than not you feel no more than 95% confident that you know everything the application developers decided to use, and that 5% risk isn’t really small enough to look over.

  • For index types (e.g. “2d” or “text”), collection options (e.g. “capped”, (json) “validator”), etc. you can iterate the catalog.
  • For data type (e.g. Decimal128) you must scan every document to see if they are present. E.g. with the $type operator if the field names are already known, or use the Javascript instanceOf operator in the mongo shell if they are not. This will be slow business, unless you have the dataset pretty much all in cache already, or choose to sample just a small set per collection.
  • metrics.comands in serverStatus() can be used to see if commands (eg. collStats) have been used anytime since the last restart.
  • Operators, modifiers and aggregation stages – no inbuilt way to check all of these except to use logLevel: 1 logging, or profiling or currentOp sampling.

Summary

Feature sets are huge and it’s tempting to reduce them to a single score value, but that is without professionally-applicable value to any given reader. A server software product is compatible or it isn’t, 0 or 1, for any given app that would interact with it.

If you haven’t got a list of feature compatibility you can cut and paste neatly into a 2- or 3-column table you’re not being given what you need. MongoDB Inc.’s marketing did provide a spreadsheet but misleadingly the rows do not represent features 1-to-1.

My estimation is the current release of DocumentDB covers 2/3rd of all features in MongoDB Atlas v3.6. The unsupported features are weighted towards those least-widely used in MongoDB-verse.

If you are considering a move to AWS DocumentDB:

  • Check the known incompatibilities feature list above in “Can I migrate from MongoDB Atlas to AWS DocumentDB or not?”
  • Your MongoDB database has diagnostic methods you can use to quickly assess whether certain classes of a feature are used (e.g. index types, commands), but lacks easy ways to check other classes of feature (eg. aggregation stages, operators don’t have counters in serverStatus). So if you have to audit that will take some time. If you are considering a move, it may be more practical to decide based just on your own judgment whether or not it covers all the features you require and then commit to testing it.

If you would like a feature audit in your existing MongoDB deployment, Percona can help with MongoDB or Percona Cloud Cover consulting services.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Akira Kurogane

Addendum: as of Feb 6th or 7th 2020
$objectToArray, $arrayToObject, $slice, $mod, and $range
have been added in AWS DocumentDB
https://aws.amazon.com/blogs/database/new-amazon-documentdb-with-mongodb-compatibility-aggregation-pipeline-operators-objecttoarray-arraytoobject-slice-mod-and-range/