pros and cons of using MongoDBQuite often we see that the main operational storage is used in conjunction with some additional services, for example, for caching or full-text search.

Another architecture approach using multiple databases is microservices, where every microservice has its own database better optimized for the tasks of this particular service. For example, you can use MySQL for primary storage, Redis and Memcache – for caching, Elastic Search, or native Sphinx – for searching. You can apply something like Kafka to transfer data to the analytics system, which was often done on something like Hadoop.

If we are talking about the main operational storage, there can be two options there. We can choose relational databases with SQL language. Alternatively, we may opt for a non-relational database and then for one of its types available in this case.

If we talk about NoSQL data models, then there are a lot of choices there too. The most typical ones are key-value, document, or wide column databases.

Examples are Memcache, MongoDB, and Cassandra, respectively.

Looking at DB-Engines Ranking, we will see that the popularity of open-source databases has been growing over the years, while commercial databases have been gradually declining.

What’s even more interesting, the same trend has been observed for different types of databases: open source databases are the most popular for many types such as columnar databases, time series, and document stories. Commercial licenses prevail only for classical technologies such as relational database data or even older ones like multivalue databases.

At Percona, we work closely with the most popular relational and non-relational open-source databases (MySQL, PostgreSQL, and MongoDB), dealing with many clients; we help them make choices and provide them with the best advice for each case.

With that in mind, this article has the purpose of showing scenarios that are worth considering before deploying MongoDB, leading you to the thought of when you should and should not use it. Additionally, if you already have your setup, this article might be also interesting, as during the evaluation of a product some of the following topics can have passed unnoticed.

Table of Contents

Here is a list of topics that I will discuss further in this article:

  1. Team experience and preferences
  2. Development approach and app lifecycle 
  3. Data Model
  4. Transactions and Consistency (ACID)
  5. Scalability
  6. Administration

1. Team Experience and Preferences

Before diving into MongoDB, the most important thing is to take into account the team’s experience and preferences.

From the MongoDB point of view, the advantage is that we have flexible JSON format documents, and for some tasks and some developers, this is convenient. For some teams, it is difficult, especially if they have worked with SQL databases for a long time and understand relational algebra and SQL language very well.

In MongoDB, you can easily familiarize yourself with CRUD operations like:

Simple queries are less likely to cause problems. Still, as soon as the daily task arises requiring more deep data processing, you certainly need a powerful tool to handle that, like MongoDB aggregation pipeline and map-reduce, which we will further discuss in this article.

There are great and free courses available at MongoDB University that can undoubtedly help grow the team’s knowledge. Still, it’s important to have in mind that the apex of the learning curve might take some time to reach if the team is not entirely familiar with it.

2. Development Approach and Application Lifecycle

If we talk about applications where MongoDB is used, they mainly focus on fast development because you can change everything at any time. You don’t have to worry about the strict format of the document.

The second point is a data schema. Here you need to understand that data always has the schema; the only question is where it is implemented. You can implement the data schema in your application because, somehow, this is the data you use. Or this schema is implemented at the database level.

It is quite often when you have an application, and only this application deals with the data in the database. For example, if we save data from the application to a database, the application-level schema works well. But, If we have the same data used by many applications, it becomes very inconvenient and difficult to control.

A point of view of the application development cycle can be represented as follows:

  • Speed of development
  • No need to synchronize the schema in the database and the application
  • It is clear how to scale further
  • Simple predetermined solutions

3. Data Model

As mentioned in the first topic, the data model is highly dependent on the application and the team’s experience.

The data of many web applications usually is easy to display. Because if we store the structure, something like an associated array of the application, it is straightforward and clear for the developer to serialize it into a JSON document.

Let’s take an example. We want to save a contact list from the phone. There is data that fits well into one relational table: first name, last name, etc. But if you look at phone numbers or email addresses, one person may have several of them. If we want to store this in a good relational form, it would be nice to have it in separate tables, then collect it using JOIN, which is less convenient than storing it in one collection with hierarchical documents.

Data Model – Contact List Example

Relational Database
  • First name, last name, date of birth
  • One person can have several phone numbers and emails
  • You should create separate tables for them 
  • JSON arrays are non-traditional extensions
Document-Oriented Database
  • Everything is stored in one “collection.”
  • Arrays and embedded documents

However, it’s crucial to consider that a more flexible solution results in a list of documents that may have completely different structures. As someone said before, “With great power comes great responsibility.”

Unfortunately, it’s pretty common to see operations failing to manage documents larger than 16MB or a single collection holding terabytes of data; Or, in a worse scenario, shard-keys were wrongly designed.

These anomalies could be a good indication that you are turning your database into a Data Swamp. It is a term commonly used in Big Data deployment for data that is badly designed, inadequately documented, or poorly maintained.

You don’t need to normalize your data strictly, but it’s essential to take time and analyze how you will structure your data to have the best of worlds once using MongoDB and avoid those pitfalls.

You can check the blog post “Schema Design in MongoDB vs Schema Design in MySQL” to get better clarification on data modeling and how it differs. It is worth mentioning the schema validation feature that you can use during updates and insertions. You can set validation rules on a per-collection basis, restricting the content type that is being stored.

Terms

Interestingly, while modeling and querying, there is much in common between relational and non-relational DBMSs. We are talking about databases in both cases, but what we call a table in a relational database is often called a collection in a non-relational database. What is a column in SQL, is a field in MongoDB, and the list goes on.

In terms of using JOIN, which we have been mentioning, MongoDB does not have such a concept. However, you can use $lookup over your aggregation pipeline. It performs only a left outer join on your search; Extensive use of $lookup might indicate an error in your data modeling.

As for access: we apply SQL for relational data. For MongoDB and many other NoSQL databases, we use a standard such as CRUD. This standard says that there are operations to create, read, delete and update documents.

Below are some examples of the most typical tasks to deal with documents and their equivalent in the SQL world:

  • CREATE:

  • READ:

  • UPDATE:

  • DELETE:

If you are a developer familiar with the JavaScript language, this syntax provided by CRUD (MongoDB) will be more natural for you than the SQL syntax.

In my opinion, when we have the simplest operations, such as search or insert, they all work well enough. When it comes to more tricky operations of sampling, the SQL language is much more readable.

  • COUNT:

With the interface, it’s easy enough to do things like counting the number of rows in a table or a collection.

  • Aggregation

But if we do more complex things like GROUP BY in MongoDB, the Aggregation Framework will be required. This is a more complex interface that shows how we want to filter, how we want to group, etc.

4. Transactions and Consistency (ACID)

The reason for bringing this topic to the table is because depending on business requirements, the database solution might need to be ACID compliant. In this game, relational databases are far ahead. An excellent example of ACID requirements is operations that involve money.

Imagine you were building a function to transfer money from one account to another. If you successfully take money from the source account but never credit it to the destination; Or if you instead credited the destination but never took money out of the source to cover it. These two writes have to either happen or both not happen to keep our system sane, also know “all or nothing.”

Prior to the release of 4.0, MongoDB did not support transactions, but it supported atomic operations within a single document. 

That means, from the point of view of one document, the operation will be atomic. If the process changes several documents, and some kind of failure occurs during the change, some of these documents will be changed, and some will not.

This restriction for MongoDB has been lifted with the 4.0 release and onwards. For situations that require atomicity of reads and writes to multiple documents (in single or multiple collections), MongoDB supports multi-document transactions. It can be used across multiple operations, collections, databases, documents, and shards with distributed transactions.

  • In version 4.0, MongoDB supports multi-document transactions on replica sets.
  • In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-document transactions on sharded clusters and incorporates the existing support for multi-document transactions on a Replica Se

5.  Scalability

What is scalability in this context? It is how easily you can take a small application and scale it to millions or even billions of users.

If we talk about the scalability of a cluster where our applications are already large enough, it is clear that one machine won’t cope, even if it is the most powerful one.

It also makes sense to talk about whether we scale reads, writes, or data volume. Priorities may differ in different applications, but in general, if the application is very large, they usually have to deal with all of these things.

In MongoDB, the focus was initially on scalability across multiple nodes. Even in cases of a small application. We can notice it on the Sharding feature released in the early days, which has been developed and getting more mature since then.

If you are looking for vertical scalability, It can be achieved in MongoDB via Replica Set configuration. You can scale up and scale down your database in very few steps, but the point here is that only your availability and reads are scaled. Your writes are still tied to a single point, the primary.

However, we know that the application will demand more write capacity at some point, or the dataset will become too big for the Replica Set; So, it is recommended to horizontally scaling by the use of Sharding, splitting the dataset, and writes across multiples shards. 

MongoDB sharding has some limitations: not all operations work with it, and a bad design on shard-keys can decrease query performance, create unevenly distributed data, and impact cluster internal operation as automatic data splitting, and on a worse scenario demanding manual re-sharding, which is an extensive and error-prone operation.

With the release of MongoDB 5.0, a resharding feature has been recently introduced. As with any new feature, my recommendation is to test extensively before any production usage. If at some point you are looking at approaches to refine your shard-key and then resharding with the new feature, the article Refining Shard Keys in MongoDB 4.4 and Above may guide you for a better choice.

6.  Administration

The administration is all those things that developers don’t think about. At least, it is not their first priority. The administration is all about the need to backup, update, monitor, restore an application in case of failures.

MongoDB is more focused on the standard way – administration is minimized. But it is clear that this happens at the expense of flexibility. A community of open source solutions for MongoDB is significantly smaller. You can notice it in the DB-Engines Ranking highlighted at the beginning of this article and the annual survey by StackOverflow; undoubtedly, MongoDB is the most popular NoSQL database, but unfortunately, it lacks a strong community.

Additionally, many recommended things in MongoDB are quite rigidly tied to Ops Manager and Atlas services – which are commercial platforms of MongoDB.

Until recently, running backup/restore routines were not trivial operations for Sharded Cluster or ReplicaSet. DBAs had to rely on methods around the mongodump/mongorestore tool or the use of File System Snapshot.

This scenario started to get better with features like Percona Hot-Backup and the Percona Backup for MongoDB tool. 

If we check the most popular relational database like MySQL, it is flexible enough and has many different approaches. There are good open-source implementations for everything, which are weaknesses that are still existing in MongoDB.

Conclusion

I have discussed a few topics that would help you in your daily routine, providing a wide vision of where MongoDB would benefit. It’s important to consider that this article is written on top of the latest available release MongoDB 5.0; If you already have a deployment, but it’s using older releases or deprecated ones, some of the observations and features might not be valid.   

If you are facing a problem or have a question on a granular level, please have a look at our blog; we may have written an article about it; We also invite you to check our white paper here, in which we detail more scenarios and cases where MongoDB is a good fit, and where it’s not.

I hope this helps you!

If you have any questions, feel free to reach out over the comment section below.

Useful Resources

Finally, you can reach us through the social networks, our forum, or access our material using the links presented below:

 

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments