Previously I tested Tokutek’s Fractal Trees (TokuMX & TokuMXse) as MongoDB storage engines – today let’s look into the MySQL area.
I am going to use modified LinkBench in a heavy IO-load.
I compared InnoDB without compression, InnoDB with 8k compression, TokuDB with quicklz compression.
Uncompressed datasize is 115GiB, and cachesize is 12GiB for InnoDB and 8GiB + 4GiB OS cache for TokuDB.
Important to note is that I used tokudb_fanout=128, which is only available in our latest Percona Server release.
I will write more on Fractal Tree internals and what does tokudb_fanout mean later. For now let’s just say it changes the shape of the fractal tree (comparing to default tokudb_fanout=16).
I am using two storage options:
- Intel P3600 PCIe SSD 1.6TB (marked as “i3600” on charts) – as a high end performance option
- Crucial M500 SATA SSD 900GB (marked as “M500” on charts) – as a low end SATA SSD
The full results and engine options are available here
Results on Crucial M500 (throughput, more is better)
- Engine Throughput [ADD_LINK/10sec]
- InnoDB: 6029
- InnoDB 8K: 6911
- TokuDB: 14633
There TokuDB outperforms InnoDB almost two times, but also shows a great variance in results, which I correspond to a checkpoint activity.
Results on Intel P3600 (throughput, more is better)
- Engine Throughput [ADD_LINK/10sec]
- InnoDB: 27739
- InnoDB 8K: 9853
- TokuDB: 20594
To understand the reasoning why InnoDB shines on a fast storage let’s review IO usage by all engines.
Following chart shows Reads in KiB, that engines, in average, performs for a request from client.
Following chart shows Writes in KiB, that engines, in average, performs for a request from client.
There we can make interesting observations that TokuDB on average performs two times less writes than InnoDB, and this is what allows TokuDB to be better on slow storages. On a fast storage, where there is no performance penalty on many writes, InnoDB is able to get ahead, as InnoDB is still better in using CPUs.
Though, it worth remembering, that:
- On a fast expensive storage, TokuDB provides a better compression, which allows to store more data in limited capacity
- TokuDB still writes two time less than InnoDB, that mean twice longer lifetime for SSD (still expensive).
Also looking at the results, I can make the conclusion that InnoDB compression is inefficient in its implementation, as it is not able to get benefits: first, from doing less reads (well, it helps to get better than uncompressed InnoDB, but not much); and, second, from a fast storage.
Because of the title of the post, which suggests that it’s a general benchmark, wouldn’t it be pertinent to conclude that uncompressed InnoDB performs much better on fast storage as showed by the later part of your benchmark.
There are quite a few popular flash storage implementations (Pure Storage, SolidFire, etc) that provide compression in which case it doesn’t really matter if InnoDB does not efficiently compress as compression is pushed to the storage layer. In such a case what would really matter is whether InnoDB is able to take advantage of fast storage.
In the end it all depends on the storage layer. Obviously InnoDB compression implementation needs a great deal of improvement.
Would be great to see results for Pure Storage compression. Can someone give Percona access to one? Otherwise I am skeptical that a clever storage device will do the right thing for compression with the IO done by an update-in-place b-tree.k
Hi Mark,
Pure Storage uses both deduplication and compression. The process is roughly as follows, the blocks are first written to NVRAM. Then deduplication is done on 512-byte granularity. Deduplication is done by calculating hashes of 512-byte sized blocks and storing the hashes in a hash-table, if the hash already exists in the hash-table, then the block is skipped. Every new block that cannot already be found in the hash table is then compressed using LZO algorithm. The compressed block then finally gets written to the SSD. This is what Pure calls inline deduplication and compression. They also have a background deduplication and compression process. This is as much as I know about the internals. To me deduplication is essentially another level of compression but at a higher level.
On the practical side, I can share the numbers from one of our replica-sets here at Lithium. The size of the uncompressed dataset as seen by the OS is 2TB. This get’s compressed to about 470GB (22.9% of the uncompressed size).
And when we were using compressed InnoDB tables, we were only able to reduce to 75% of the uncompressed size. Most of our data is textual, which is a good candidate for compression. Granted we were not too smart in the past with compression and were not using padding and such (well I wasn’t at Lithium then). But even then I wouldn’t expect the current implementation of InnoDB compression to get me any near to the numbers I get with Pure Storage.
That is why I care more about performance, because I know I can push compression down to the storage layer. And to me the fact that InnoDB is able to utilize fast storage makes it the winner for me.
I see more and more storage companies getting smart about storage and employing different techniques to compress data at the storage layer and that is making flash more economical and would increase its adoption. There are both small players in the market doing that, such as Pure Storage, SolidFire, Kaminario as well as big players like EMC, IBM, etc.
Ovais,
I wonder whenever you’re getting the “compression” from really compression or rather deduplication. 512 byte blocks are way to small for meaningful compression. Even 16K in Innodb are rather small for good compression.
I think this is one of the nicer features of TokuDB to be able to change the block size in the wide range to trade compression level for speed of random uncached access.
Vadim,
I wonder whenever 8G+4G setup for cache for TokuDB was found optimal in this case. It would be interesting to see the graph for results for different cache allocation for OS cache vs Internal TokuDB Cache. Internal cache should be a lot more efficient but OS cache caches compressed data and so can fit more. I guess this will be important decision to make for TokuDB at this point.
One thing I miss is a clear comment on the amount of RAM in the server(s) concerned. There are comments about the fact that you want to allow the OS to cache stuff but it’s not clear how much RAM is there to be used for caching and this may make quite a difference in performance.
So please if you can, apart from showing the storage devices, and the data sets also add a comment to show exactly how much RAM was on the server you were using. Thanks.
Simon.
The total amount of memory available on the server is 128G.
I limit usage of memory in following ways:
For InnoDB I use InnoDB buffer poll size = 12G and O_DIRECT mode.
This effectively guarantees that InnoDB does not use more than 12G for page cache, and InnoDB does not use OS cache.
For TokuDB I set tokudb_cache_size=8GB, and I do not use DIRECT mode, so to limit memory usage, I set a limit through cgroups, and total limit for mysqld in this case is 12G
I think these settings are more or less fair, however Peter sent me a private message that these settings are in favor for InnoDB, as InnoDB still may use OS Cache for innodb_log_files, and without cgroup limit, whole log files may get cached.
Does it answer your question?
@Peter, in the end what matters is how much data can fit on the flash, that’s exactly the reason why we look at using compression. So that we can fit more data on less flash to make flash more economical.
I agree with your point though, on the compression by TokuDB as compared to InnoDB. Out of the box that is one of the benefits of using TokuDB.
I would love to have Percona do some tests on Pure Storage to compare its storage efficiency vs compressed-InnoDB vs compressed-TokuDB.
Vadim, could you run the same tests against a RAM disk, and let us know the results?