First I want to thank everyone who attended my December 19, 2019 webinar “Top 3 Features of MySQL“. Recording and slides are available on the webinar page.

Here are answers to the questions from participants which I was not able to provide during the webinar.

Q: When do undo logs and redo logs come into play? Can you please explain the background operations of these logs?

A: These two are completely different structures.

Undo Logs

Belong to a single active transaction. It contains information on how to undo the latest change to the clustered index, performed by this transaction.

Let’s demonstrate how they work with an example.

Consider this table:

Now let’s open a transaction and retrieve rows from this table:

In another session let’s open new transaction and modify row where f1='barr'  to f1='bar':

We are using default transaction isolation level REPEATABLE-READ. Therefore the first transaction should receive the same result as it had seen before the change and it does:

The transaction retrieves the data from the undo logs because the table was already updated at the COMMIT.

Undo logs are stored by default in the shared InnoDB tablespace but could be placed into separate tablespaces for better space management. This is the recommended option.

You can find more information about undo logs in the User Reference Manual.

The Redo Log

The redo log is a disk-based data structure used during crash recovery to correct data written by incomplete transactions.

source: 15.6.5 Redo Log

During normal operations, InnoDB never reads it. Only if MySQL server was not stopped properly InnoDB uses records in the redo log to correct data.

Why is the Redo Log is Needed?

InnoDB stores data for the rows in the clustered index. The internal structure of the clustered index is B-Tree. This storage allows fast read access, but it is slow on writes because it consistently needs to re-balance the tree. Additionally, even if only non-indexed data is changed, it takes time to find the proper page on the disk where the update should have placed.

Therefore InnoDB initially stores the change in the in-memory buffer pool. This allows returning success to the application faster. Then, in the background, InnoDB flushes data from the buffer pool to the disk. Therefore at some time, newly committed data could exist only in memory. However, in case of a crash, data in the memory is gone.

Therefore, at the same time, an encoded change request is also written into the redo log. Such a write is sequential and no time needs to be spent on finding the appropriate position. Thus this operation is much faster than updating the table file. Still, it is crash-safe, because data now exists in both the memory (active data set) and on the disk (a record in the redo log).

Size of the redo log is limited by two options:

  1. innodb_log_file_size
    1. Size of the individual log file
  2. innodb_log_files_in_group
    1. Number of the log files

The total size of the redo log can be calculated by the formula:

What Will Happen if the Total Size of the Redo Log is Reached?

Every record in the redo log has its own number: LSN. Normally InnoDB consistently flushes changes to the table files and rewrites records those LSNs are already flushed in a circular fashion.

However, if the combined size of the redo log is too small for the number of writes which InnoDB receives in the short amount of time, it is possible that the redo log gets full before InnoDB writes the data into table files. In this case, InnoDB stops all write activity and performs so-called aggressive flushing. You can read more about flushing in the User Reference Manual.

Therefore it is essential to set the size of the redo log big enough, so InnoDB could keep up with the application writes. Check out how to calculate good InnoDB log file size.

Q: In the cases where InnoDB crashes (long semaphore wait occurs) what are the reasons for the crash?

A: InnoDB uses semaphores to protect internal structures for concurrent usage. Normally such semaphores are held only for a short period of time. Therefore if semaphore waited more than 600 seconds, InnoDB decides that something went terribly wrong and crashes the server. In this case, you need to do what the error message suggests: report a bug at https://bugs.mysql.com and/or https://jira.percona.com.

However, sometimes semaphore waits when the expectedly long operation finishes. For example, ANALYZE TABLE  on a very large table. In this case, the server may crash for no valid reason. We saw cases when even a diagnostic query caused such a failure. It was reported at PS-6113 and fixed in Percona Server for MySQL. Still, this limitation exists in the upstream MySQL Server as of versions 5.7.28 and 8.0.18