How To Get Your Backup to Half of Its Size - Introducing ZSTD Support in Percona XtraBackup

ZSTD Support in Percona XtraBackup Having a backup of your database is like insurance, you have to pay a monthly price to ensure you have a service available when you need to. When talking about backups, the storage required to keep your backups is what comes into factor when talking about price, the bigger your backup, or the bigger the retention period, the more it will cost.

Compressing your backups is a common practice to reduce this cost. Currently, Percona XtraBackup (PXB) has support for two compression algorithms: quicklz (which is an abandoned project and will soon be deprecated in PXB) and LZ4.

Today we are glad to introduce support for a new compression algorithm in Percona XtraBackup 8.0.30 – Zstandard (ZSTD).

Zstandard is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.

Usage

A new value has been added to the xtrabackup –compress option. Passing –compress=zstd to PXB will make it use the new Zstandard algorithm to run compression.

Compression can work in parallel by adjusting –parallel=X for parallel file copy and –compress-threads=x for parallel compression of those files.

For decompression either on PXB or xbstream the usage remains the same. Users only need to pass –decompress and the tool will use the same algorithm used for compression.

Please note that as in qpress and LZ4, you will require the ZSTD client to run the –decompress operation.

Compress

xtrabackup --backup --compress=zstd --parallel=16 --compress-threads=8 --target-dir=/backup

1	xtrabackup --backup --compress=zstd --parallel=16 --compress-threads=8 --target-dir=/backup

Decompress

xtrabackup --decompress --parallel=16 --target-dir=/backup

1	xtrabackup --decompress --parallel=16 --target-dir=/backup

Testing

To test how ZSTD compares with LZ4 and uncompressed backups, we set up a test environment using an AWS EC2 instance c5.4xlarge (16 CPUs and 32G of RAM ) with data and backup going to the same disk partition, an EBS IO2 with 10K provisioned IOPS SSD.

For each round of tests, we created a set of 12 tables with 40M rows resulting in a 109G dataset.

We ran five rounds of tests creating a new database with the same amount of data on each round.

Percona XtraBackup was invoked using –parallel=16 –compress-threads=8 .

Size of resulting backup (full, LZ4, and ZSTD), time to run the backup (full, LZ4, and ZSTD), and time to decompress the resulting backup (LZ4 and ZSTD).

The second round of tests explored how the two algorithms and uncompressed backups perform when uploading the data to S3. For this round, we ran two tests.

The first one was to take the backup again and upload it to S3, time to complete was measured.

The second part of the test was to download the backup from S3 and in the case of uncompressed backups, just store it on disk, and for compressed backup, we were first decompressing it and then storing it on disk.

On all tests, ZSTD compression level of one was used.

Results

ZSTD Support in Percona XtraBackup

time to decompress

Summary

As we can see from the test results above, ZSTD not only overcame LZ4 results on all tests, but it also brought backup size to half of its original size.

When streaming is added to the mix is when we see the biggest difference between both algorithms, with ZSTD overcoming LZ4 with an even bigger margin.

This can bring users and organizations a huge amount of savings in backup storage, either on-premises or especially in the cloud – where we are charged for each GB of storage we use.

Learn more about Percona XtraBackup

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

john xxx

1 year ago

Is or will this be backported to xtrabackup 5.x?

Marcelo Altmann

Author

Reply to john xxx

1 year ago

There is no plans on implementing ZSTD compression on Percona Xtrabackup 2.4 (version that support PS/MySQL server from the 5 series).

Rob Wultsch

1 year ago

The last few jobs I have been at we ditched compression in xtrabackup and used zstd pipelined. From reading the above it seems like xtrabackup is still writing the compressed file and then decompressing which is not good for ssd life or restore speed. Has thought been given to uncompressing directly (skipping writing the compressed files to the fs)?

Marcelo Altmann

Author

Reply to Rob Wultsch

1 year ago

If I understand correctly, you are talking about streaming the backup to provision a new server rather than keeping your backup stored to restore in case a disaster happens on your database, right ?

In that case, yes you can use streaming, something like:

receiver side:
nc -l 9999 | xbstream -x –decompress

source side:

xtrabackup –backup –compress=zstd … | nc receiver_ip:9999

this will compress the backup over the WAN and will write it only once decompressed on receiver side.

Arnaud

1 year ago

it would be nice to see the difference between quicklz and ZSTD ?!

Marcelo Altmann

Author

Reply to Arnaud

1 year ago

ZSTD outperform quicklz too:

For 109G backup
qpress – took 6m3.870s – result backup 69G
ZSTD – took 5m17.896s – result backup 46G

Please note that we have deprecated qpress on 8.0.30 and the ability to take backups using this algorithm will be removed in the next few releases (decompress will continue to be supported a bit longer)

Franco Corbelli

1 year ago

Have you ever thought about integrating a deduplication technology into mysql backups?
I have been using it for years and I get at least 10 times better results (when archiving daily copies, as is normally done)
If you are curious you can look at my fork of zpaq (it’s called zpaqfranz) to which I added many things, including the support of the stdin stream
https://github.com/fcorbelli/zpaqfranz

Basically you get version archives, containing “snapshots” of the various backups, first deduplicated (via a rolling SHA-1 hash), then compressed (via various algos)

As mentioned, the developer (and the main credits goes to) is Matt Mahoney (who left the project), I specify that I just made a fork
https://en.wikipedia.org/wiki/ZPAQ
http://mattmahoney.net/dc/zpaq.html

Indeed it is slow (the interface with stdin is very rudimentary, I use it for scheduled backups and I’m not much interested in the time, it should be buffered)

DISCLAIMER: It’s open source software, so I have nothing to gain in the form of advertising or anything else

Something like (of course works with “real” files too, much faster indeed)

mysqldump -uroot -p1 zarc |zpaqfranz a /tmp/thebackup.zpaq my-very-dump.sql -stdin

With a small dump (880MB) you get, after TWO runs (one after the other, just a test) 44KB for the second one

root@prod113:/tmp # zpaqfranz i thebackup.zpaq
zpaqfranz v55.17e-experimental-JIT-L archiver, (13 Oct 2022)
thebackup.zpaq:
2 versions, 2 files, 10.867 fragments, 58 blocks, 160.185.999 bytes (152.76 MB)
————————————————————————-
< Ver > < date > < time > < added > <removed>   <   bytes added  >
————————————————————————-
00000001 2022-11-19 09:52:49 +00000001 -00000000 ->         160.141.028
00000002 2022-11-19 09:55:49 +00000001 -00000000 ->              44.971

0.010 seconds (00:00:00) (all OK)

Here you can “time-machine” back to the various versions

zpaqfranz v55.17e-experimental-JIT-L archiver, (13 Oct 2022)
thebackup.zpaq:
2 versions, 2 files, 10.867 fragments, 58 blocks, 160.185.999 bytes (152.76 MB)

– 2022-11-19 09:52:49                  0      0001| +1 -0 -> 160.141.028
– 2022-11-19 10:52:49        883.960.314 0644 0001|my-very-dump.sql
– 2022-11-19 09:55:49                  0      0002| +1 -0 -> 44.971
– 2022-11-19 10:55:49        883.960.314 0644 0002|my-very-dump.sql

1.767.920.628 (1.65 GB) of 1.767.920.628 (1.65 GB) in 4 files shown
160.185.999 compressed

Just a suggestion!

Matthew (Percona)

Reply to Franco Corbelli

1 year ago

Hello Franco.
mysqldump is a logical backup tool where Xtrabackup is a physical backup tool. Comparing the two is like apples/oranges; they do the same thing (backup data) but in completely different ways. Physical backups will always be faster to perform and restore. Xtrabackup also supports incremental backups where the size of the backup is only the delta differences from the previous backup. This also allows you to “time-machine” back to a previous version.

Noelle

1 year ago

quick test on 8.0.30-24
qpress 17G
lz4 18.5G
zstd 8.2G

but if i use –compress-zstd-level=9, then i got 75.5G

Last edited 1 year ago by Noelle

Marcelo Altmann

Author

Reply to Noelle

1 year ago

Hi Noelle,

Interesting. We will investigate – https://jira.percona.com/browse/PXB-2991

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How To Get Your Backup to Half of Its Size – Introducing ZSTD Support in Percona XtraBackup