ZSTD Support in Percona XtraBackupHaving a backup of your database is like insurance, you have to pay a monthly price to ensure you have a service available when you need to. When talking about backups, the storage required to keep your backups is what comes into factor when talking about price, the bigger your backup, or the bigger the retention period, the more it will cost.

Compressing your backups is a common practice to reduce this cost. Currently, Percona XtraBackup (PXB) has support for two compression algorithms: quicklz (which is an abandoned project and will soon be deprecated in PXB) and LZ4.

Today we are glad to introduce support for a new compression algorithm in Percona XtraBackup 8.0.30Zstandard (ZSTD).

Zstandard is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.

Usage

A new value has been added to the xtrabackup –compress option. Passing –compress=zstd to PXB will make it use the new Zstandard algorithm to run compression.

Compression can work in parallel by adjusting –parallel=X for parallel file copy and –compress-threads=x for parallel compression of those files.

For decompression either on PXB or xbstream the usage remains the same. Users only need to pass –decompress and the tool will use the same algorithm used for compression.

Please note that as in qpress and LZ4, you will require the ZSTD client to run the –decompress operation.

Compress

Decompress

 

Testing

To test how ZSTD compares with LZ4 and uncompressed backups, we set up a test environment using an AWS EC2 instance c5.4xlarge (16 CPUs and 32G of RAM ) with data and backup going to the same disk partition, an EBS IO2 with 10K provisioned IOPS SSD.

For each round of tests, we created a set of 12 tables with 40M rows resulting in a 109G dataset.

We ran five rounds of tests creating a new database with the same amount of data on each round.

Percona XtraBackup was invoked using –parallel=16 –compress-threads=8 .

Size of resulting backup (full, LZ4, and ZSTD), time to run the backup (full, LZ4, and ZSTD), and time to decompress the resulting backup (LZ4 and ZSTD).

The second round of tests explored how the two algorithms and uncompressed backups perform when uploading the data to S3. For this round, we ran two tests.

The first one was to take the backup again and upload it to S3, time to complete was measured.

The second part of the test was to download the backup from S3 and in the case of uncompressed backups, just store it on disk, and for compressed backup, we were first decompressing it and then storing it on disk.

On all tests, ZSTD compression level of one was used.

Results

 

ZSTD Support in Percona XtraBackup

time to decompress

 


Summary

As we can see from the test results above, ZSTD not only overcame LZ4 results on all tests, but it also brought backup size to half of its original size.

When streaming is added to the mix is when we see the biggest difference between both algorithms, with ZSTD overcoming LZ4 with an even bigger margin.

This can bring users and organizations a huge amount of savings in backup storage, either on-premises or especially in the cloud – where we are charged for each GB of storage we use.

Learn more about Percona XtraBackup

Subscribe
Notify of
guest

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
john xxx

Is or will this be backported to xtrabackup 5.x?

Rob Wultsch

The last few jobs I have been at we ditched compression in xtrabackup and used zstd pipelined. From reading the above it seems like xtrabackup is still writing the compressed file and then decompressing which is not good for ssd life or restore speed. Has thought been given to uncompressing directly (skipping writing the compressed files to the fs)?

Arnaud

it would be nice to see the difference between quicklz and ZSTD ?!

Franco Corbelli

Have you ever thought about integrating a deduplication technology into mysql backups?
I have been using it for years and I get at least 10 times better results (when archiving daily copies, as is normally done)
If you are curious you can look at my fork of zpaq (it’s called zpaqfranz) to which I added many things, including the support of the stdin stream
https://github.com/fcorbelli/zpaqfranz

Basically you get version archives, containing “snapshots” of the various backups, first deduplicated (via a rolling SHA-1 hash), then compressed (via various algos)

As mentioned, the developer (and the main credits goes to) is Matt Mahoney (who left the project), I specify that I just made a fork
https://en.wikipedia.org/wiki/ZPAQ
http://mattmahoney.net/dc/zpaq.html

Indeed it is slow (the interface with stdin is very rudimentary, I use it for scheduled backups and I’m not much interested in the time, it should be buffered)

DISCLAIMER: It’s open source software, so I have nothing to gain in the form of advertising or anything else

Something like (of course works with “real” files too, much faster indeed)

mysqldump -uroot -p1 zarc |zpaqfranz a /tmp/thebackup.zpaq my-very-dump.sql -stdin

With a small dump (880MB) you get, after TWO runs (one after the other, just a test) 44KB for the second one

root@prod113:/tmp # zpaqfranz i thebackup.zpaq
zpaqfranz v55.17e-experimental-JIT-L archiver, (13 Oct 2022)
thebackup.zpaq:
2 versions, 2 files, 10.867 fragments, 58 blocks, 160.185.999 bytes (152.76 MB)
————————————————————————-
< Ver > < date > < time > < added > <removed>   <   bytes added  >
————————————————————————-
00000001 2022-11-19 09:52:49 +00000001 -00000000 ->         160.141.028
00000002 2022-11-19 09:55:49 +00000001 -00000000 ->              44.971

0.010 seconds (00:00:00) (all OK)

Here you can “time-machine” back to the various versions

zpaqfranz v55.17e-experimental-JIT-L archiver, (13 Oct 2022)
thebackup.zpaq:
2 versions, 2 files, 10.867 fragments, 58 blocks, 160.185.999 bytes (152.76 MB)

– 2022-11-19 09:52:49                  0      0001| +1 -0 -> 160.141.028
– 2022-11-19 10:52:49        883.960.314 0644 0001|my-very-dump.sql
– 2022-11-19 09:55:49                  0      0002| +1 -0 -> 44.971
– 2022-11-19 10:55:49        883.960.314 0644 0002|my-very-dump.sql

       1.767.920.628 (1.65 GB) of 1.767.920.628 (1.65 GB) in 4 files shown
         160.185.999 compressed

Just a suggestion!

Matthew (Percona)

Hello Franco.
mysqldump is a logical backup tool where Xtrabackup is a physical backup tool. Comparing the two is like apples/oranges; they do the same thing (backup data) but in completely different ways. Physical backups will always be faster to perform and restore. Xtrabackup also supports incremental backups where the size of the backup is only the delta differences from the previous backup. This also allows you to “time-machine” back to a previous version.

Noelle

quick test on 8.0.30-24
qpress 17G
lz4 18.5G
zstd 8.2G

but if i use –compress-zstd-level=9, then i got 75.5G

Last edited 1 year ago by Noelle