Upload Ongoing MyDumper Backups to S3

Upload Ongoing MyDumper Backups to S3 If you are using MyDumper as your Logical Backup solution and you store your backups on S3, you need to take a local backup and then upload it to S3. But what if there is not enough space to hold the backup on the server where we are taking the backup? Even if we have enough disk space, we will need to wait until the end to start to upload the files, making the whole process longer.

MyDumper implemented stream backup in v0.11.3 and we have been polishing the code since then. We also implemented two ways of executing external commands:

--exec-per-thread: The worker that is getting the data from the database will write and redirect to the STDIN of the external command. It will be similar to execute cat FILE | command per every written and closed file.

--exec: In this case, the worker writes in the local storage and when the file is closed, the filename is enqueued. The exec threads are going to pop from the queue and execute the command based on the filename. FILENAME is a reserved word that is going to be replaced in the command, for instance, --exec=’/usr/bin/ls -l FILENAME’ will execute ls -l of every single file. The command must be an absolute path.

Both implementations have different use cases, pros, and cons. We are going to be using --exec, as current --exec-per-thread implementation doesn’t allow us to dynamically change the command with the filename which is going to be changing on each iteration.

Execution

For this example I created a table name test.mydumper2S3 with millions of rows, you need to configure a valid AWS account, install AWS CLI, and have a bucket.

As I stated before, there are two ways of uploading the files, the main difference is the amount of execution of the AWS command or threads that you want to use. A stream will be only one process but --exec can control the amount of thread or execution with --exec-threads.

With stream

This might be the simplest way if you are familiar with piping your commands. In the example you will find, the table name, the split by rows value, the path where the temporary files will reside, and finally the --stream option:

mydumper -T myd_test.mydumper2S3 -r 20000 \
  -o data --stream | aws s3 cp - s3://davidducos/mydumper_backup.sql --region us-east-1

1 2	mydumper -T myd_test.mydumper2S3 -r 20000 \ -o data --stream \| aws s3 cp - s3://davidducos/mydumper_backup.sql --region us-east-1

On the AWS CLI command, we specify the S3 service and the cp command, the – means that it will read from STDIN and then the location of the single file (s3://davidducos/mydumper_backup.sql) that is going to be uploaded.

In the log, you will entries like this:

…
2022-11-13 21:18:09 [INFO] - Releasing FTWR lock
2022-11-13 21:18:09 [INFO] - Releasing binlog lock
2022-11-13 21:18:09 [INFO] - File data/myd_test-schema-create.sql transferred | Global: 0 MB/s
2022-11-13 21:18:09 [INFO] - File data/myd_test.mydumper2S3-schema.sql transferred | Global: 0 MB/s
2022-11-13 21:18:09 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3`  WHERE `id` IS NULL OR `id` = 1 OR( 1 < `id` AND `id` <= 2001)       into data/myd_test.mydumper2S3.00000.sql| Remaining jobs: -3
………
2022-11-13 21:18:10 [INFO] - Thread 4 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 1740198 < `id` AND `id` <= 1760198)       into data/myd_test.mydumper2S3.00009.sql| Remaining jobs: 0
2022-11-13 21:18:10 [INFO] - File data/myd_test.mydumper2S3.00002.sql transferred | Global: 27 MB/s
2022-11-13 21:18:10 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 2283598 < `id` AND `id` <= 2303598)       into data/myd_test.mydumper2S3.00003.sql| Remaining jobs: 0
………
2022-11-13 21:18:10 [INFO] - Thread 3 dumping data for `myd_test`.`mydumper2S3`  WHERE ( 2424197 < `id` AND `id` <= 2424797)       into data/myd_test.mydumper2S3.00007.sql| Remaining jobs: 1
2022-11-13 21:18:10 [INFO] - Thread 3: Table mydumper2S3 completed
2022-11-13 21:18:10 [INFO] - Thread 3 shutting down
2022-11-13 21:18:10 [INFO] - Releasing DDL lock
2022-11-13 21:18:10 [INFO] - Queue count: 0 0 0 0 0
2022-11-13 21:18:10 [INFO] - Main connection closed
2022-11-13 21:18:10 [INFO] - Finished dump at: 2022-11-13 21:18:10
2022-11-13 21:18:32 [INFO] - File data/myd_test.mydumper2S3.00009.sql transferred in 22 seconds at 0 MB/s | Global: 2 MB/s
2022-11-13 21:18:36 [INFO] - File data/myd_test.mydumper2S3.00003.sql transferred in 4 seconds at 4 MB/s | Global: 2 MB/s
2022-11-13 21:18:39 [INFO] - File data/myd_test.mydumper2S3.00001.sql transferred in 2 seconds at 9 MB/s | Global: 2 MB/s
2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3.00007.sql transferred in 1 seconds at 4 MB/s | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3-metadata transferred | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - File data/metadata transferred | Global: 3 MB/s
2022-11-13 21:18:41 [INFO] - All data transferred was 104055843 at a rate of 3 MB/s

…

2022-11-13 21:18:09 [INFO] - Releasing FTWR lock

2022-11-13 21:18:09 [INFO] - Releasing binlog lock

2022-11-13 21:18:09 [INFO] - File data/myd_test-schema-create.sql transferred | Global: 0 MB/s

2022-11-13 21:18:09 [INFO] - File data/myd_test.mydumper2S3-schema.sql transferred | Global: 0 MB/s

2022-11-13 21:18:09 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3` WHERE `id` IS NULL OR `id` = 1 OR( 1 < `id` AND `id` <= 2001) into data/myd_test.mydumper2S3.00000.sql| Remaining jobs: -3

………

2022-11-13 21:18:10 [INFO] - Thread 4 dumping data for `myd_test`.`mydumper2S3` WHERE ( 1740198 < `id` AND `id` <= 1760198) into data/myd_test.mydumper2S3.00009.sql| Remaining jobs: 0

2022-11-13 21:18:10 [INFO] - File data/myd_test.mydumper2S3.00002.sql transferred | Global: 27 MB/s

2022-11-13 21:18:10 [INFO] - Thread 1 dumping data for `myd_test`.`mydumper2S3` WHERE ( 2283598 < `id` AND `id` <= 2303598) into data/myd_test.mydumper2S3.00003.sql| Remaining jobs: 0

………

2022-11-13 21:18:10 [INFO] - Thread 3 dumping data for `myd_test`.`mydumper2S3` WHERE ( 2424197 < `id` AND `id` <= 2424797) into data/myd_test.mydumper2S3.00007.sql| Remaining jobs: 1

2022-11-13 21:18:10 [INFO] - Thread 3: Table mydumper2S3 completed

2022-11-13 21:18:10 [INFO] - Thread 3 shutting down

2022-11-13 21:18:10 [INFO] - Releasing DDL lock

2022-11-13 21:18:10 [INFO] - Queue count: 0 0 0 0 0

2022-11-13 21:18:10 [INFO] - Main connection closed

2022-11-13 21:18:10 [INFO] - Finished dump at: 2022-11-13 21:18:10

2022-11-13 21:18:32 [INFO] - File data/myd_test.mydumper2S3.00009.sql transferred in 22 seconds at 0 MB/s | Global: 2 MB/s

2022-11-13 21:18:36 [INFO] - File data/myd_test.mydumper2S3.00003.sql transferred in 4 seconds at 4 MB/s | Global: 2 MB/s

2022-11-13 21:18:39 [INFO] - File data/myd_test.mydumper2S3.00001.sql transferred in 2 seconds at 9 MB/s | Global: 2 MB/s

2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3.00007.sql transferred in 1 seconds at 4 MB/s | Global: 3 MB/s

2022-11-13 21:18:41 [INFO] - File data/myd_test.mydumper2S3-metadata transferred | Global: 3 MB/s

2022-11-13 21:18:41 [INFO] - File data/metadata transferred | Global: 3 MB/s

2022-11-13 21:18:41 [INFO] - All data transferred was 104055843 at a rate of 3 MB/s

As you see from the log, the files are being streamed as soon as they are closed. However, it took more than 30 seconds after the dump finished for all the files to be streamed. Finally, the command returned a couple of seconds after the “All data transferred…” entry, as the buffer needs to flush the data and upload it to S3.

With`--exec`

If you need to upload every single file individually, this is the option that you should use. For instance, you can use –load-data or directly the –csv option to allow another process to consume the files.

Let’s see the example:

mydumper -T myd_test.mydumper2S3 -o data -v 3 \
  --exec="/usr/bin/aws s3 cp FILENAME s3://davidducos/mydumper_backup/ --region us-east-1" --exec-threads=8

1 2	mydumper -T myd_test.mydumper2S3 -o data -v 3 \ --exec="/usr/bin/aws s3 cp FILENAME s3://davidducos/mydumper_backup/ --region us-east-1" --exec-threads=8

In this case, AWS CLI will send to STDERR the status of the files that are being uploaded:

upload: data/myd_test-schema-create.sql to s3://davidducos/mydumper_backup/myd_test-schema-create.sql
upload: data/myd_test.mydumper2S3-schema.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-schema.sql
upload: data/myd_test.mydumper2S3.00042.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00042.sql
upload: data/myd_test.mydumper2S3.00010.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00010.sql
upload: data/myd_test.mydumper2S3.00026.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00026.sql
upload: data/myd_test.mydumper2S3-metadata to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-metadata
upload: data/metadata to s3://davidducos/mydumper_backup/metadata
upload: data/myd_test.mydumper2S3.00006.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00006.sql
upload: data/myd_test.mydumper2S3.00000.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00000.sql
upload: data/myd_test.mydumper2S3.00004.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00004.sql
upload: data/myd_test.mydumper2S3.00005.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00005.sql
upload: data/myd_test.mydumper2S3.00001.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00001.sql
upload: data/myd_test.mydumper2S3.00002.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00002.sql
upload: data/myd_test.mydumper2S3.00003.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00003.sql

upload: data/myd_test-schema-create.sql to s3://davidducos/mydumper_backup/myd_test-schema-create.sql

upload: data/myd_test.mydumper2S3-schema.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-schema.sql

upload: data/myd_test.mydumper2S3.00042.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00042.sql

upload: data/myd_test.mydumper2S3.00010.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00010.sql

upload: data/myd_test.mydumper2S3.00026.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00026.sql

upload: data/myd_test.mydumper2S3-metadata to s3://davidducos/mydumper_backup/myd_test.mydumper2S3-metadata

upload: data/metadata to s3://davidducos/mydumper_backup/metadata

upload: data/myd_test.mydumper2S3.00006.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00006.sql

upload: data/myd_test.mydumper2S3.00000.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00000.sql

upload: data/myd_test.mydumper2S3.00004.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00004.sql

upload: data/myd_test.mydumper2S3.00005.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00005.sql

upload: data/myd_test.mydumper2S3.00001.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00001.sql

upload: data/myd_test.mydumper2S3.00002.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00002.sql

upload: data/myd_test.mydumper2S3.00003.sql to s3://davidducos/mydumper_backup/myd_test.mydumper2S3.00003.sql

And the log will be the traditional mydumper log.

Conclusion

This is an example with S3 but it is also possible to use it with different vendors or if you need encryption, just pipe to your encryption command and pipe again to AWS or any other command. I didn’t use ZSTD compression which is another option that you should explore.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Walter

1 year ago

great job David, I was waiting this feature for long time

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Upload Ongoing MyDumper Backups to S3

Execution

With stream

With`--exec`

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Upload Ongoing MyDumper Backups to S3

Execution

With stream

With--exec

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation

With`--exec`