no primary key replication lag mysqlThe most common issue when using row-based replication (RBR) is replication lag due to the lack of Primary keys.

The problem is that any replicated DML will do a full table scan for each modified row on the replica. This bug report explains it more in-depth: https://bugs.mysql.com/bug.php?id=53375

For example, if a delete is executed on the following table definition:

 

With this amount of rows:

 

The delete being:

It can be seen that the delete on the Primary requires a full table scan (Handler_read_rnd_next matches row amount + 1) to delete 88 rows.

The additional problem is that each of the rows being deleted will be recorded in the binary log individually like this:

Which will result in 88 full table scans on the replica, and hence the performance degradation.

For these cases, the recommendation is to add a primary key to the table, but sometimes adding a PK might not be easy because:

  • There are no existing columns that could be considered a PK.
  • Or adding a new column (as the PK) is not possible as it might impact queries from a 3rd party tool that we have no control over (or too complex to fix with query rewrite plugin).

The solution is to use MySQL/Percona Server for MySQL 8 and add an invisible column

Adding a new column (named “newc”) invisible as a primary key can be done with the following line:

Note, adding a PK is an expensive operation that requires a table rebuild as shown here.

After adding an invisible PK, the table will look like this:

 

Deleting a row now will be recorded in the binary log like this:

Where @1 is the first column ( the PK in this case) which the replica can use to find the matching row without having to do a full table scan.

The operation executed on the replica would be similar to the following which requires only one scan to find the matching row:

 

Also as the name suggests, an invisible column won’t show nor it needs to be referenced when doing operations over the table, i.e:

 

But if needed, the new column (newc) can be fetched if explicitly queried:

 

What If…?

What if MySQL automatically detects that the PK is missing for InnoDB tables and adds the invisible PK?

Taking into account that an internal six bytes PK is already added when the PK is missing, it might be a good idea to allow the possibility of making the PK visible if you need to. 

This means that when you execute this CREATE TABLE statement:

 

Will be automatically translated to:

 

And then we can execute this command:

To make it visible.

Conclusion

Missing primary keys is a problem for scaling databases, as replication will require a full table scan for each updated/delete row, and the more data the more lag.

Adding a PK might not be always possible because of 3rd party tools or restrictions, but adding an invisible primary key will do the trick and have the benefits of adding a PK without impacting syntax and operations from 3rd party clients/tools. What will be awesome is to make MySQL able to detect the missing PK, add it automatically, and change it to visible if you need to.

Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
La Cancellera Yoann

Hello
I cannot agree more with “What will be awesome is to make MySQL able to detect the missing PK, add it automatically” , so I made a feature request : https://jira.percona.com/browse/PS-8051

Hareesh H

Why cant the auto PK ( the invisible 6byte PK ) added by innodb if no PK or candidate Unique is defined be leveraged in binlogs. Yes I understand that the logging exposes this hidden PK

David Ducos

Hi Hareesh, That is an interesting question. I think that the simplest answer is that you can’t trust on the 6byte invisible pk. You could delete or insert rows on a replica that could break the replication, and then you will have no tool to fix it. And on the other hand, MySQL will need to be aware about that hidden pk, not just the storage engine, which in this case is InnoDB. With this workaround, we are exposing the pk to MySQL and causing no performance impact to InnoDB.