MongoDB Index Building on ReplicaSet and Shard ClusterWe all know how important it is to have a proper index in the database in order to do its job effectively. We have been using indexing in our daily life to import daily tasks, without index all tasks would be completed but in a relatively long time.

The basic working of index

Imagine that we have tons of information and we want to look at something very particular and we don’t know where it is. We are going to spend a lot of time finding that particular piece of data.

If only we would have some kind of information about all the pieces of data, the job would finish very quickly because now we know where to look without spending too much time searching each and every record for one particular data.

Indexes are special data structures that store some information of records to traverse to that particular data. Indexes can be created in ascending or descending order to support efficient equality matches and range-based query operations.

Index building strategy and consideration

When we think of building an index many aspects have to be considered like key data set which is frequently being used, cardinality, write ratio in that collection, free memory, and storage.

If there are no indexes in the collection, MongoDB will do a full collection scan every time any type of query is performed which could contain millions of records. This will not only slow down the operation but will also increase the wait time for other operations too.

We can also create multiple indexes at the same time on the same collection, saving lots of time that is spent scanning the collection with the createIndexes command.

Limitations

It is very important to have enough memory to accommodate the working set. It is not necessary that all indexes need to fit in RAM.

Index key limit should be less than 1024 bytes till v4.0. Starting v4.2 with fcv 4.2, this limit is removed.

Same with index name, it can be up 127 bytes in db with fcv 4.0 and below. This limit is reduced with db v4.2 and fcv 4.2.

Only 64 indexes can be created in any given single collection.

Index types in MongoDB

Before seeing various index types, let’s see what the index name looks like.

The default name for an index is the concatenation of the indexed keys and each key’s direction in the index ( i.e. 1 or -1) using underscores as a separator. For example, an index created on { mobile : 1, points: -1 } has the name mobile_1_points_-1.

We can also create a custom, more human-readable name 

Index type

MongoDB provides various types of indexes to support various data and queries.

Single field index: In a single-field index, an index is created on a single field in a document. It can traverse in both directions regardless of sort order while creating the index.

Syntax:

Here 1 represents the field specified in ascending order and -1 for descending order.

Example:

 

Compound index: In a compound index, we can create indexes on multiple fields. The order of fields listed in a compound index has significance. For instance, if a compound index consists of { userid: 1, score: -1 }, the index sorts first by userid and then, within each userid value, sorts by score.

Syntax:

Example:

 

Multikey index: MongoDB uses multikey indexes to index the content stored in arrays. When we create an index on a field that contains an array value, MongoDB will automatically create a separate index for every element of the array. We do not need to specify multikey type explicitly, as MongoDB automatically takes care of whether to create a multikey index if the indexed field contains an array value.

Syntax:

Example:

 

Geospatial index: MongoDB provides two special indexes: 2d indexes that use planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results.

Syntax:

*where the <location field> is a field whose value is either a GeoJSON object or a legacy coordinate pair.

Example:

 

Text index: With the text index type, MongoDB supports searching for string content in a collection. A collection can only have one text search index, but that index can cover multiple fields.

Syntax:

Example:

 

Hash index: MongoDB creates the hash value of the indexed field in case of a hash base index. This type of index is mainly required where we want to have an even data distribution e.g in the case of a shard cluster environment. 

Syntax:

From Version 4.4 onwards, the compound Hashed Index is applicable

Properties

Unique indexes: When specified, MongoDB will reject duplicate values for the indexed field. It will not allow inserting another document containing the same key-value pair which is indexed.

 

Partial indexes: Partial indexes only index the documents that match the filter criteria.

 

TTL indexes: TTL indexes are special single-field indexes that can be used to auto delete documents from the collection over a certain period of time.

 

Sparse indexes: Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value.

 

Hidden indexes: Hidden indexes are not visible to the query planner and cannot be used to support a query. Apart from being hidden from the planner, hidden indexes behave like unhidden indexes.

To create a new hidden index:

To change an existing index into a hidden one (works only with db having fcv 4.4 or greater):

To unhide any hidden index:

Index name or key can be used to hide the index.

Rolling index builds on replica sets

Starting from MongoDB 4.4 and later, index build happens simultaneously on all data-bearing nodes. For workloads that cannot tolerate performance issues due to index build, we can follow the approach of rolling index build strategy.

**NOTE**

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

If you cannot stop all writes to the collection during this procedure, do not use the procedure on this page. Instead, build your unique index on the collection by issuing db.collection.createIndex() on the primary for a replica set.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop one secondary and restart as a standalone on a different port number.

In this process, we are going to stop any one secondary node at a time and disable the replication parameter from the configuration file, and disableLogicalSessionCacheRefresh to true in the configuration file under the setParameter section.

Example

We only need to make changes in the above settings, the rest will remain the same.

Once the above changes are done, save it and restart the process.

OR

Now, the mongod process will start on port 27217 in standalone mode.

2. Build the index

Connect to the mongod instance on port 27217. Switch to the desired database and collection to create an index.

Example:

 

3. Restart the process mongod as a replica set member

After the desired index build completes, we can add the node back to replicaset member. 

Undo the configuration file change made in step one above. Restart the mongod process with the original configuration file.

After saving the configuration file, restart the process and let it become secondary.

OR

4. Repeat the above procedure for the remaining secondaries

Once the ongoing node becomes secondary and there is no lag, repeat the procedure again one node at a time.

  1. Stop one secondary and restart as a standalone.
  2. Build the index.
  3. Restart the mongod process as a replica set member.

5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

  1. Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, it becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
  2. Stop secondary node and restart as a standalone.
  3. Build the iondex.
  4. Restart the mongod process as a replica set member.

Rolling index builds on sharded clusters

Starting from MongoDB 4.4 and later, index build happens simultaneously on all data-bearing nodes. For workloads that cannot tolerate performance issues due to index build, we can follow the approach of rolling index build strategy.

**NOTE**

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

If you cannot stop all writes to the collection during this procedure, do not use the procedure on this page. Instead, build your unique index on the collection by issuing db.collection.createIndex() on the primary for a replica set.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop the balancer

In order to create an index in a rolling fashion in a shard cluster, it is necessary to stop the balancer so that we do not end up with an inconsistent index.

Connect to mongos instance and run sh.stopBalancer() to disable the balancer.

If there is any active migration going on, the balancer will stop only after the completion of the ongoing migration.

We can check if the balancer is stopped or not with the below command,

If the balancer is stopped, the output will be false.

2. Determine the distribution of the collection

In order to build indexes in a rolling fashion, it is necessary to know on which shards the collections are residing. 

Connect to one of the mongos and refresh the cache so that we get fresh distribution information of collections in the shard for which we want to build the index.

Example:

We want to create an index in the studentData collection in the student database.

We will run the below command to get a fresh distribution of that collection.

We will get the output of shards containing the collection :

From the above output, we can see that the students.studentData exist on shardA and shardC and we need to build indexes on shardA and shardC, respectively.

3. Build indexes on the shards that contain collection chunks

Follow the procedure below on each shard that contains the chunk of collection.

3.1. Stop one secondary and restart as a standalone

For the identified shard, stop one of the secondary nodes and make the following changes.

  • Change the port number to a different port
  • Comment out replication parameters
  • Comment out sharding parameters
  • Under section “setParameter” add skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true 

Example

After saving the configuration restart the process 

OR

 

3.2. Build the index

Connect to the mongod instance running on standalone mode and start the index build process.

Here, we are building the index in students collection on field StudentID in ascending order

 

3.3. Restart the MongoDB process as replicaset node

Once the index build activity is finished, shutdown the instance and restart with the original configuration, remove the parameters skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true 

 

After saving the configuration restart the process 

OR

 

3.4. Repeat the procedure for the remaining secondaries for the shard

Once the node on which index build has been completed, added back to the replicaset set, and is in sync with other nodes, repeat the above process from 3.1 to 3.3 on the remaining nodes.

3.1. Stop one secondary and restart as a standalone

3.2. Build the index

3.3. Restart the MongoDB process as replicaset node

3.5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

  1. Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
  2. Stop the secondary node and restart it as a standalone
  3. Build the index
  4. Restart the process mongod as a replica set member

4. Repeat for the other affected shards

Once the index build is finished for one of the identified shard, start the process outlined in step three on the next identified shard.

5. Restart the balancer

Once we are done building the index on all identified shards we can start the balancer again.

Connect to a mongos instance in the sharded cluster, and run sh.startBalancer()

Conclusion

Picking the right key based on an access pattern and having a good index is better than having multiple bad indexes. So, choose your index wisely.

There are also other interesting blogs on https://www.percona.com/blog/ which might be helpful to you.

I also recommend going and using Percona Server for MongoDB, which provides MongoDB enterprise-grade features without any license (as it is free). You can learn more about it in the blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona also offers some more great products for MongoDB like Percona Backup for MongoDBPercona Operator for MongoDB, and for other technologies and tools too like MySQL Software, PostgreSQL Distribution, Percona Operators, and Monitoring & Management

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments