… or what I should keep in mind in case of disaster

MySQL 8 frm drop recover ddl

To retrieve and maintain in SQL format the definition of all tables in a database, is a best practice that we all should adopt. To have that under version control is also another best practice to keep in mind.

While doing that may seem redundant, it can become a life saver in several situations. From the need to review what has historically changed in a table, to knowing who changed what and why… to when you need to recover your data and have your beloved MySQL instance not start…

But let’s be honest, only a few do the right thing, and even fewer keep that information up to date. Given that’s the case, what can we do when we have the need to discover/recover the table structure?

From the beginning, MySQL has used some external files to describe its internal structure.

For instance, if I have a schema named windmills and a table named wmillAUTOINC1, on the file system I will see this:

The ibd file contains the data, while the frm file contains the structure information.

Putting aside ANY discussion about if this is safe, if it’s transactional and more… when we’ve experienced some major crash and data corruption this approach has been helpful. Being able to read from the frm file was the easiest way to get the information we need.
Simple tools like DBSake made the task quite trivial, and allowed us to script table definition when needed to run long, complex tedious data recovery:

Of course, if the frm file was also corrupt, then we could try to get the information from the ibdata dictionary. If that is corrupted too (trust me I’ve seen all of these situations) … well a last resource was hoping the customer has a recent table definition stored somewhere, but as mentioned before, we are not so diligent, are we?

Now, though, in MySQL8 we do not have FRM files, they were dropped. Even more interesting is that we do not have the same dictionary, most of the things that we knew have changed, including the dictionary location. So what can be done?

Well Oracle have moved the FRM information—and more—to what is called Serialized Dictionary Information (SDI), the SDI is written INSIDE the ibd file, and represents the redundant copy of the information contained in the data dictionary.

The SDI is updated/modified by DDL operations on tables that reside in that tablespace. This is it: if you have one file per table normally, then you will have in that file ONLY the SDI for that table, but if you have multiple tables in a tablespace, the SDI information will refer to ALL of the tables.

To extract this information from the IBD files, Oracle provides a utility called ibd2sdi. This application parses the SDI information and reports a JSON file that can be easily manipulated to extract and build the table definition.

One exception is represented by Partitioned tables. The SDI information is contained ONLY in the first partition, and if you drop it, it is moved to the next one. I will show that later.

But let’s see how it works. In the next examples I will look for the table’s name, attributes, and datatype starting from the dictionary tables.

To obtain the info I will do this:

The result will be something like:

I cut the output for brevity, but if you run the above command yourself you’ll be able to see that this retrieves the information for ALL the tables residing in the IBD.

The other thing I hope you noticed is that I am NOT parsing ibdata, but mysql.ibd. Why? Because the dictionary was moved out from ibdata and is now in mysql.ibd.

Look what happens if I try to parse ibdata:

Be very careful here to not mess up your mysql.ibd file.

Now what can I do to get information about my wmillAUTOINC1 table in MySQL8?

That is quite simple:

The JSON will contains:

  • A section describing the DB object at high level
  • Array of columns and related information
  • Array of indexes
  • Partition information (not here but in the next example)
  • Table space information

That is a lot more detail compared to what we had in the FRM, and it is quite relevant and interesting information as well.

Once you have extracted the SDI, any JSON parser tool script can generate the information for the SQL DDL.

I mention partitions, so let’s look at this a bit more, given they can be tricky.

As mentioned, the SDI information is present ONLY in the first partition. All other partitions hold ONLY the tablespace information. Given that, then the first thing to do is to identify which partition is the first… OR simply try to access all partitions, and when you are able to get the details, extract them.

The process is the same:

The difference, as you can see, is that the section related to partitions and sub partitions will be filled with all the details you might need to recreate the partitions.

We will have:

  • Partition type
  • Partition expression
  • Partition values
  • …more

Same for sub partitions.

Now again see what happens if I parse the second partition:

I will get only the information about the tablespace, not the table.

As promised let me show you now what happens if I delete the first partition, and the second partition becomes the first:

As I mentioned before, each DDL updates the SDI, and here we go: I will have all the information on what’s NOW the FIRST partition. Please note the value of the attribute “created” between the first time I queried the other partition, and the one that I have now:

To be clear the second created is NOW (PT20170401) from when I dropped the other partition (PT20170301).

Conclusions

In the end, this solution is definitely more powerful than the FRM files. It will allow us to parse the file and identify the table definition more easily, providing us with much more detail and information.

The problems will arise if and when the IBD file becomes corrupt.

As for the manual:  For InnoDB, an SDI record requires a single index page, which is 16KB in size by default. However, SDI data is compressed to reduce the storage footprint.

By which it means that for each table I have a page, if I associate record=table. Which means that in case of IBD corruption I should (likely) be able to read those pages. Unless I have bad (very bad) luck.

I still wonder how the dimension of an IBD affects the SDI retrieval, but given I have not tried it yet I will have to let you know.

As an aside, I am working on a script to facilitate the generation of the SQL, it’s not yet ready but you can find it here

Last note but keep this in mind! It is stated in the manual but in a hidden place and in small letters:
DDL operations take longer due to writing to storage, undo logs, and redo logs instead of .frm files.

References

https://stedolan.github.io/jq/

https://dev.mysql.com/doc/refman/8.0/en/ibd2sdi.html

https://dev.mysql.com/doc/refman/8.0/en/serialized-dictionary-information.html

https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-limitations.html


Photo by chuttersnap on Unsplash