Restore Strategies with MyDumperIn my previous post Back From a Long Sleep, MyDumper Lives!, I told you that Fast Index Creation was in the queue and (after fixing several bugs) it will package in release 0.10.7 next month. But why am I so excited about it? Well, this feature opens new opportunities, what I’m going to call Restore Strategies.

In the Past…

In the past, the only option was to first restore the table definition and then insert the data rows in two simple steps. On tables with millions of rows, we already know why it takes more time as it inserts in the clustered index and in the secondary index, instead of building the secondary indexes after the data has been inserted, as myloader is able to do now. 

Nowadays, we have the option to do it, in three steps:

  • Create the table with the primary/unique key.
  • Execute the inserts statements in parallel.
  • Create the indexes and constraints.

But, we don’t have just one table, we have multiple tables with different amounts of indexes, different amounts of columns per index, different amounts of columns, different amounts of rows… and depending on the table order selected will be the restoration time. So, the question is: what is the best table order?

Currently…

Currently, myloader has no rule or strategy that defines the order, as we read the files using g_dir_read_name and enqueue them. We know that larger tables should be imported first, if not, the whole process could take longer, as might be possible that importing the data of the largest tables and then creating the indexes, take the same amount of time that the rest of the database to import. After working on several restore operations, I realized that not always the largest table is the one that is going to take longer. So, what we need to determine is the table that is going to take the longest to import. During a mydumper execution, we know the number of rows that each table has and we can get that info and pass it to myloader. We can also estimate the index length based on the amount of index and the type of each one. 

Another thing to consider is that the three steps are for all the tables, which means that we create all the tables, execute the inserts for all the tables, and finally, we create the indexes for all the tables. But, I think that doing inserts in a table and running in parallel a process that creates the indexes, will speed up the whole process. 

Do We Know Enough?

From my point of view, there is a lot of information that we are missing during a table restore, we don’t know inserts or MB per second per table or database. I think that this will be valuable information if we want to develop the tools to use the best Restore Strategy on our database.

Conclusion

I shared some ideas and opened some discussions about how to import using myloader. I hope we can develop some of them in the future to have a better understanding and reduce the restore timings. Feel free to open an issue on MyDumper repository or add a comment below!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments