Upload Ongoing MyDumper Backups to S3If you are using MyDumper as your Logical Backup solution and you store your backups on S3, you need to take a local backup and then upload it to S3. But what if there is not enough space to hold the backup on the server where we are taking the backup? Even if we have enough disk space, we will need to wait until the end to start to upload the files, making the whole process longer.

MyDumper implemented stream backup in v0.11.3 and we have been polishing the code since then. We also implemented two ways of executing external commands:

--exec-per-thread: The worker that is getting the data from the database will write and redirect to the STDIN of the external command. It will be similar to execute cat FILE | command per every written and closed file.

--exec: In this case, the worker writes in the local storage and when the file is closed, the filename is enqueued. The exec threads are going to pop from the queue and execute the command based on the filename. FILENAME is a reserved word that is going to be replaced in the command, for instance, --exec=’/usr/bin/ls -l FILENAME’ will execute ls -l of every single file. The command must be an absolute path.

Both implementations have different use cases, pros, and cons. We are going to be using --exec, as current --exec-per-thread implementation doesn’t allow us to dynamically change the command with the filename which is going to be changing on each iteration.

Execution

For this example I created a table name test.mydumper2S3 with millions of rows, you need to configure a valid AWS account, install AWS CLI, and have a bucket. 

As I stated before, there are two ways of uploading the files, the main difference is the amount of execution of the AWS command or threads that you want to use. A stream will be only one process but --exec can control the amount of thread or execution with --exec-threads.

With stream

This might be the simplest way if you are familiar with piping your commands. In the example you will find, the table name, the split by rows value, the path where the temporary files will reside, and finally the --stream option:

On the AWS CLI command, we specify the S3 service and the cp command, the – means that it will read from STDIN and then the location of the single file (s3://davidducos/mydumper_backup.sql) that is going to be uploaded.

In the log, you will entries like this:

As you see from the log, the files are being streamed as soon as they are closed. However, it took more than 30 seconds after the dump finished for all the files to be streamed. Finally, the command returned a couple of seconds after the “All data transferred…” entry, as the buffer needs to flush the data and upload it to S3.

With--exec

If you need to upload every single file individually, this is the option that you should use. For instance, you can use –load-data or directly the –csv option to allow another process to consume the files.

Let’s see the example:

In this case, AWS CLI will send to STDERR the status of the files that are being uploaded:

And the log will be the traditional mydumper log.

Conclusion

This is an example with S3 but it is also possible to use it with different vendors or if you need encryption, just pipe to your encryption command and pipe again to AWS or any other command. I didn’t use ZSTD compression which is another option that you should explore. 

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Walter

great job David, I was waiting this feature for long time