Being schemaless is one of the key features of MongoDB. On the bright side this allows developers to easily modify the schema of their collections without waiting for the database to be ready to accept a new schema. However schemaless is not free and one of the drawbacks is write amplification. Let’s focus on that topic.
Write amplification?
The link between schema and write amplification is not obvious at first sight. So let’s first look at a table in the relational world:
1 2 3 4 5 6 7 | mysql> SELECT * FROM user LIMIT 2; +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+ | id | login | first_name | last_name | city | country | zipcode | address | password | birth_year | +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+ | 1 | arcu | Vernon | Chloe | Paulista | Cook Islands | 28529 | P.O. Box 369, 1464 Ac Rd. | SSC44GZL5R | 1970 | | 2 | quis | Rogan | Lewis | Nashville | Saint Vincent and The Grenadines | H3T 3S6 | P.O. Box 636, 5236 Elementum, Av. | TSY29YRN6R | 1983 | +----+-------+------------+-----------+-----------+----------------------------------+---------+-----------------------------------+------------+------------+ |
As all records have exactly the same fields, the field names are stored once in a separate file (.frm file). So the field names is metadata while the value of each field for each record is of course data.
Now let’s look at an equivalent collection in MongoDB:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | { { "login": "arcu", "first_name": "Vernon", "last_name": "Chloe", "city": "Paulista", "country": "Cook Islands", "zipcode": "28529", "address": "P.O. Box 369, 1464 Ac Rd.", "password": "SSC44GZL5R", "birth_year": 1970 }, { "login": "quis", "first_name": "Rogan", "last_name": "Lewis", "city": "Nashville", "country": "Saint Vincent and The Grenadines", "zipcode": "H3T 3S6", "address": "P.O. Box 636, 5236 Elementum, Av.", "password": "TSY29YRN6R", "birth_year": 1983 } } |
One difference with a table in the relational world is that MongoDB doesn’t know which fields each document will have. Therefore field names are data, not metadata and they must be stored with each document.
Then the question is: how large is the overhead in terms of disk space? To have an idea, I inserted 10M such records in an InnoDB table (adding an index on password and on birth_year to make the table look like a real table): the size on disk is around 1.4GB.
I also inserted the exact same 10M records in a MongoDB collection using the regular MMAPv1 storage engine, again adding an index on password and on birth_year, and this time the size on disk is … 2.97GB!
Of course it is not an apples-to-apples comparison as the InnoDB storage format and the MongoDB storage format are not identical. However a 100% difference is still significant.
Compression
One way to deal with write amplification is to use compression. With MongoDB 3.0, the WiredTiger storage engine is available and one of its benefits is compression (default algorithm: snappy). Percona TokuMX also has built-in compression using zlib by default.
Rebuilding the collection with 10M documents and the 2 indexes now gives the following results:
WiredTiger: 1.14GB
TokuMX: 736MB
This is a 2.5x to 4x data size reduction, pretty good!
WiredTiger also provides zlib compression and in this case the collection is only 691MB. However CPU usage is much higher compared to snappy so zlib will not be usable in all situations.
Conclusion
MongoDB schemaless design is attractive but it comes with several tradeoffs. Write amplification is one of them and using either WiredTiger with MongoDB 3.0 or Percona TokuMX is a very simple way to fix the issue.
I think the benefit of using a no-sql DB depends on the work approach. On my part, when I develop an app or a web application, the first thing I do is to create the DB schema and discuss it as well as possible with the client (or with myself, if that is an independent production). Only after the schema captures reality in a sort of object oriented way, I start coding. I would not know hence to start working with a no schema DB.
The most common way to fix this issue without using compression is by choosing short keys:
Instead of customer_id use c_id, instead of order_id use o_id, etc. This keeps documents small without the overhead of compression.
Yeah, MongoDB has a bit more overhead per record as each record stores the full “column” name inside it. So, in MySQL you would have the equivalent of first_name stored once and designated as column 3, in MongoDB you would have first_name written 10M times, once per document.
I don’t know why internally, MongoDB doesn’t have a map of all attribute names stored in a b-tree with a number associated with them.
For each collection
create map(column_name, column_id)
key1: 1
key2: 2
..
key 1000: 1000
Internally in each document in the collection:
1: horse
1000: goose
Then when you retrieve a document, the keys get populated:
key1: horse
key1000: goose
Given that the documents in MongoDB don’t have to be the same in a given collection, nor in the same order, this would be difficult. Also, you can have sub-documents with their own names, arrays of sub-documents, and arrays of objects in an array of sub-documents.
Example (syntax is a little off, but close enough):
{
_id : 1234
attributes : {
name : {
[ first : Jim,
Last : Smith ]
},
alias : {
[ Middle : Tim
Last : Jones ]
},
name : {
[ first : Jim
middle: Sam ]
}
}
}
A single document can belong to only one collection. It does not matter in what order the documents are in. When a document is added to the collection (under the collection level lock), the document is scanned, the keys are extracted, and keys are looked up for their associated 32 bit id. If a key is not found, it is added. It does not matter at what level in a document the key is at. The same key anywhere in a collection would have the same id value.
I guess that would be more efficient for space, but it would add extra performance overhead (small, typically, but non-zero) to do the scan to store and scan to display. Performance is their claim, not storage efficiency.
The overhead would be way lower than using compression. I don’t think compression is the answer to everything. This is a basic architecture and efficiency issue. If you don’t want a collection to use compact keys (which is what I could call this architecture) it would be trivial to allow the old format with redundant keys to be used. The IO saved would far outweigh the cost of key population IMHO.
Part of the issues here are around the bson/json spec that says fields should be names not numbers. There for using a set/enum type constant look up is complicated.
I do think you could have some like f202 but your storage is 3 bytes * 4 characters, and not really much better. That said “fn” vs 202 ( or some other int) means the int is actually more expense than a 2 digit string. You much choose the balance carefully here as an id look up might be more storage in the end. While compression simple stores the same block n-times removing such complicated considerations from the storage layer.
It isn’t the number 1000000 in the document, it is the four byte unsigned integer in the document. Any len key uses only 4 bytes.
Enumeration is done by querying the btree. The biggest downside is a document that uses a key that is unique in each document. Such high cardinality keys would bloat the btree. It would be the developer responsibilty to use redundant storage for such collections.
Justin, you are forgetting something. Mongo is typically used in a distributed environments. Whenever you insert/update any record with new fields, each cluster/shard needs to update their key map for the collection.
This will significantly reduce Mongo’s performance, further increase write amplification across the cluster, and could cause lots of problems in case of shard unavailability/errors.
It wouldn’t affect sharding. Each collection on each node would have an independent (shared-nothing) set of key mappings. key30 might be 123 on one shard and 4051 on another shard. When the documents are retrieved from storage the keys are populated. Even with aggregation, if it could be pushed down the shard, could internally aggregate on integer keys (a key is a key as long as there is a 1:1 mapping between keys) so I don’t see any problem with sharding.