EBS Storage Types in AWSEBS storage type choices in AWS can be impacted by a lot of factors. As a consultant, I get a lot of questions about choosing the best storage type for a workload. Let me share a few examples. Is io2 better than gp2/3 if the configured iops are the same? What can I expect when upgrading gp2 to gp3?

In order to be able to answer questions like this, in this blog post, we will take a deeper look. We will compare storage devices that are “supposed to be the same”, in order to reveal the differences between these storage types. We will examine the following storage devices:

    1 TB gp2 volume (has 3000 iops by definition)
    1 TB gp3 volume, with the iops set to 3000
    1 TB io1 volume, with the iops set to 3000
    1 TB io2 volume, with the iops set to 3000

So, all the volumes are 1TB with 3000 iops, so in theory, they are the same. Also, in theory, theory and practice are the same, but in practice, they are different. Storage performance is more complex than just capacity and the number of iops, as we will see soon. Note that this test is very limited to draw conclusions like io1 is better than gp2 or anything like that in general. These devices have very different scalability characteristics (the io devices are scaling to 64k iops, while the maximum for the gp devices is 16k). Measuring the scalability of these devices and testing them in the long run and in different availability zones are out of scope for these tests. The reason I chose devices that have the same “specs” is to gain an understanding of the difference in their behavior. The tests were only run in a single availability zone (eu-west-1a).

For the tests, I used sysbench fileio, with the following prepare command.

The instances I used were r5.xlarge instances, which have up to 4750 Mbps bandwidth to EBS.

I used the following command to run the tests:

In this command, the test mode can be rndwr (random writes only), rndrd (random reads only), and rndwr (random reads and writes mixed). The number of threads used were 1, 2, 4, 8, 16, 32, 64, and 128. All tests are using 16k io operations with direct io enabled (bypassing the filesystem cache), based on this, the peak theoretical throughput of the tests is 16k*3000 = 48 MB/s.

Random Writes

sysbench random writes

The gp2 and io1 devices reached the peak throughput for this benchmark with 4 threads and the gp3 reached it with 2 threads (but with a larger variance). The io2 device has more consistent performance overall. The peak throughput in these tests is the expected peak throughput (16k*3000 iops = 46.8MB/sec).

sysbench random mixed read/write latency

At a low thread count, gp3 has the highest variation in latency, gp2’s performance is more consistent. The latencies of io1 and io2 are more consistent, especially io2 at a higher thread count.

This means if the workload is mostly writes:

– Prefer gp3 over gp2 (better performance, less price).
– Prefer io2 if the price is worth the consistency in performance at lower thread counts.
– If the workload is multithreaded, and there are always more than 4 threads, prefer gp3 (in this case, the performance is the same, gp3 is the cheapest option).

Random Reads

sysbench random reads

The random read throughput shows a much bigger difference than writes. First of all, the performance is more inconsistent in the case of gp2 and gp3, but gp2 seems to be slightly more consistent. The io2 device has the same consistent performance even with a single thread.

sysbench random read latency

Similarly, there is a much bigger variance in latency in the case of low thread counts between the gp2 and the gp3. Even at 64 threads, the io2 device has very consistent latency characteristics.

This means if the workload is mostly reads:

– The gp2 volumes can give slightly better performance, but they are also slightly more expensive.
– Above 16 parallel threads, the devices are fairly similar, prefer gp3 because of the price.
– Prefer io2 if performance and latency are important with a low thread count (even over io1).

Random Mixed Reads/Writes

random mixed reads/writes

The mixed workload behavior is similar to the random read one, so the variance in the read performance will also show as a variance in the write performance. The more reads are added to the mix, the inconsistent the performance will become with the gp2/gp3 volumes. The io1 volume reaches peak throughput even with two threads, but with a high variance.

 

In the case of the mixed workload, the gp3 has the least consistent performance. This can come as an unpleasant surprise when the volumes are upgraded to gp3, and the workload has a low concurrency. This can be an issue for not loaded, but latency-sensitive applications. Otherwise, for choosing storage, the same advice applies to random reads.

Conclusion

The difference between these seemingly similar devices is greatest when a low number of threads are used against the device. If the io workload is parallel enough, the devices behave very similarly.

The raw data for these measurements are available on GitHub: https://github.com/pboros/aws_storage_blog.

Subscribe
Notify of
guest

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jesper Krogh

Hi Peter,

For gp3 did you leave the throughput at the default 125 MiB/s or did you increase it to 250 MiB/s to match that of gp2?

Thanks,
Jesper

Hugo

Hello,

Why do you use file-fsync-freq=0 and not file-fsync-freq=1 ? Isn’t it better to have a fsync after each write (for a DB Workload) ?

Vadim Tkachenko

on O_DIRECT mode with
–file-extra-flags=direct
each write is guaranteed to be completed, pretty much like we use in InnoDB

Tadhg Pearson

This is super interesting. I understand these latency graphs as 95th percentile ms of read… what percentage of reads are represented by those outlying dots above the main bar? For example, for gp2 on a single thread, what percentage of all read latencies are above 25ms?