MongoDB on ARM Processors

ARM processors have been around for a while. In mid-2015/2016 there were a couple of attempts by the community to port MongoDB to work with this architecture. At the time, the main storage engine was MMAP and most of the available ARM boards were 32-bits. Overall, the port worked, but the fact is having MongoDB running on a Raspberry Pi was more a hack than a setup. The public cloud providers didn’t yet offer machines running with these processors.

The ARM processors are power-efficient and, for this reason, they are used in smartphones, smart devices and, now, even laptops. It was just a matter of time to have them available in the cloud as well. Now that AWS is offering ARM-based instances you might be thinking: “Hmmm, these instances include the same amount of cores and memory compared to the traditional x86-based offers, but cost a fraction of the price!”.

But do they perform alike?

In this blog, we selected three different AWS instances to compare: one powered by an ARM processor, the second one backed by a traditional x86_64 Intel processor with the same number of cores and memory as the ARM instance, and finally another Intel-backed instance that costs roughly the same as the ARM instance but carries half as many cores. We acknowledge these processors are not supposed to be “equivalent”, and we do not intend to go deeper in CPU architecture in this blog. Our goal is purely to check how the ARM-backed instance fares in comparison to the Intel-based ones.

These are the instances we will consider in this blog post.

Methodology

We will use the Yahoo Cloud Serving Benchmark (YCSB, https://github.com/brianfrankcooper/YCSB) running on a dedicated instance (c5d.4xlarge) to simulate load in three distinct tests:

a load of 1 billion documents in one collection having only the primary key (which we’ll call Inserts).
a workload comprised of exclusively reads (Reads)
a workload comprised of a mix of 75% reads with 5% scans plus 25% updates (Reads/Updates)

We will run each test with a varying number of concurrent threads (32, 64, and 128), repeating each set three times and keeping only the second-best result.

All instances will run the same MongoDB version (4.0.3, installed from a tarball and running with default settings) and operating system, Ubuntu 16.04. We chose this setup because MongoDB offer includes an ARM version for Ubuntu-based machines.

All the instances will be configured with:

100 GB EBS with 5000 PIOPS and 20 GB EBS boot device
Data volume formatted with XFS, 4k blocks
Default swappiness and disk scheduler
Default kernel parameters
Enhanced cloud watch configured
Free monitoring tier enabled

Preparing the environment

We start with the setup of the benchmark software we will use for the test, YCSB. The first task was to spin up a powerful machine (c5d.4xlarge) to run the software and then prepare the environment:

The YCSB program requires Java, Maven, Python, and pymongo which doesn’t come by default in our Linux version – Ubuntu server x86. Here are the steps we used to configure our environment:

Installing Java

sudo apt-get install java-devel

1	sudo apt-get install java-devel

Installing Maven

wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
sudo tar xzf apache-maven-*-bin.tar.gz -C /usr/local
cd /usr/local
sudo ln -s apache-maven-* maven
sudo vi /etc/profile.d/maven.sh

wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz

sudo tar xzf apache-maven-*-bin.tar.gz -C /usr/local

cd /usr/local

sudo ln -s apache-maven-* maven

sudo vi /etc/profile.d/maven.sh

Add the following to maven.sh

export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:${PATH}

1 2	export M2_HOME=/usr/local/maven export PATH=${M2_HOME}/bin:${PATH}

Installing Python 2.7

sudo apt-get install python2.7

1	sudo apt-get install python2.7

Installing pip to resolve the pymongo dependency

sudo apt-get install python-pip

1	sudo apt-get install python-pip

Installing pymongo (driver)

sudo pip install pymongo

1	sudo pip install pymongo

Installing YCSB

curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.5.0/ycsb-0.5.0.tar.gz
tar xfvz ycsb-0.5.0.tar.gz
cd ycsb-0.5.0

curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.5.0/ycsb-0.5.0.tar.gz

tar xfvz ycsb-0.5.0.tar.gz

cd ycsb-0.5.0

YCSB comes with different workloads, and also allows for the customization of a workload to match our own requirements. If you want to learn more about the workloads have a look at https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workload_template

First, we will edit the workloads/workloada file to perform 1 billion inserts (for our first test) while also preparing it to later perform only reads (for our second test):

recordcount=1000000
operationcount=1000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0.0

recordcount=1000000

operationcount=1000000

workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=1

updateproportion=0.0

We will then change the workloads/workloadb file so as to provide a mixed workload for our third test. We also set it to perform 1 billion reads, but we break it down into 70% of read queries and 30% of updates with a scan ratio of 5%, while also placing a cap on the maximum number of scanned documents (2000) in an effort to emulate real traffic – workloads are not perfect, right?

recordcount=10000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.7
updateproportion=0.25
scanproportion=0.05
insertproportion=0
maxscanlength=2000

recordcount=10000000

operationcount=10000000

workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0.7

updateproportion=0.25

scanproportion=0.05

insertproportion=0

maxscanlength=2000

With that, we have the environment configured for testing.

Running the tests

With all instances configured and ready, we run the stress test against our MongoDB servers using the following command :

./bin/ycsb [load/run] mongodb -s -P workloads/workload[ab] -threads [32/64/128] \ 
 -p mongodb.url=mongodb://xxx.xxx.xxx.xxx.:27017/ycsb0000[0-9] \
 -jvm-args="-Dlogback.configurationFile=disablelogs.xml"

./bin/ycsb [load/run] mongodb -s -P workloads/workload[ab] -threads [32/64/128] \

-p mongodb.url=mongodb://xxx.xxx.xxx.xxx.:27017/ycsb0000[0-9] \

-jvm-args="-Dlogback.configurationFile=disablelogs.xml"

The parameters between brackets varied according to the instance and operation being executed:

[load/run] load means insert data while run means perform action (update/read)
workload[a/b] reference the different workloads we’ve used
[32/64/128] indicate the number of concurrent threads being used for the test
ycsb0000[0-9] is the database name we’ve used for the tests (for reference only)

Results

Without further ado, the table below summarizes the results for our tests:

Performance cost

Considering throughput alone – and in the context of those tests, particularly the last one – you may get more performance for the same cost. That’s certainly not always the case, which our results above also demonstrate. And, as usual, it depends on “how much performance do you need” – a matter that is even more pertinent in the cloud. With that in mind, we had another look at our data under the “performance cost” lens.

As we saw above, the c5.4xlarge instance performed better than the other two instances for a little over 50% more (in terms of cost). Did it deliver 50% more (performance) as well? Well, sometimes it did even more than that, but not always. We used the following formula to extrapolate the OPS (Operations Per Second) data we’ve got from our tests into OPH (Operations Per Hour), so we could them calculate how much bang (operations) for the buck (US$1) each instance was able to provide:

transactions/hour/US$1 = (OPS * 3600) / instance cost per hour

This is, of course, an artificial metric that aims to correlate performance and cost. For this reason, instead of plotting the raw values, we have normalized the results using the best performer instance as baseline(100%):

The intent behind these was only to demonstrate another way to evaluate how much we’re getting for what we’re paying. Of course, you need to have a clear understanding of your own requirements in order to make a balanced decision.

Parting thoughts

We hope this post awakens your curiosity not only about how MongoDB may perform on ARM-based servers, but also by demonstrating another way you can perform your own tests with the YCSB benchmark. Feel free to reach out to us through the comments section below if you have any suggestions, questions, or other observations to make about the work we presented here.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

mdcallagk Callaghan

5 years ago

Thank you for sharing details that allow us to reproduce this. The open question is how large the price/perf difference needs to be to motivate a migration to ARM.

Kay Agahd

5 years ago

Very interesting and well written. Thank you for the insights!

HarpPDX

5 years ago

Thanks for the well-written article…a couple of comments/questions:
– the result charts refer to different instance types than the table in the beginning of the article, can you confirm which instance types were used? i.e., c5.2xlarge or c5.4xlarge and, m4.2xlarge or m4.xlarge
– was there a reason you stopped at 128 ycsb threads when the performance was still increasing (compared to previous run at 64 threads)?
– I like your method of presenting normalized performance per $

Adamo Tonete

5 years ago

Hi HarpPDX,
We used m4.2xlarge and c5.4xlarge, I will fix the graphs. Thanks for letting us know.
Regarding the threads with about 150 threads we hit the arm limitation and the performance didn’t increased. CPU usage went to 100% and context switch killed the performance.
We will do a follow up article with more data soon.

David Murphy

5 years ago

Did you happen to connect this to PMM? I am curious as to the CPU scaling perspective, after say 12 cores was x86 or ARM more effictive?

3 years ago

Can you rerun these tests comparing the new Graviton2 instances? It would be interesting to see how this compares

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MongoDB on ARM Processors

Methodology

Preparing the environment

Running the tests

Results

Performance cost

Parting thoughts

Related

Related Blog Articles

RECOMMENDED ARTICLES

Securing Your MongoDB Database: Essential Best Practices

Benchmarking MongoDB Performance on Kubernetes

How to Improve Database Performance: The Ultimate Guide

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MongoDB on ARM Processors

Methodology

Preparing the environment

Running the tests

Results

Performance cost

Parting thoughts

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Securing Your MongoDB Database: Essential Best Practices

Benchmarking MongoDB Performance on Kubernetes

How to Improve Database Performance: The Ultimate Guide

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation