Recently, I was working with my colleagues Edwin Wang and Taras Onishchuk and found an interesting edge case involving a situation where a replica running Percona Server for MySQL 5.7, external to AWS Aurora instance version 2.10.2 (5.7-compatible), broke. I recreated the issue in my lab with a simple create database statement, as you will see below.
1 | Error 'Character set '#255' is not a compiled character set and is not specified in the '/usr/share/percona-server/charsets/Index.xml' file' on query. Default database: 'lab'. Query: 'create database test' |
The interesting thing to note here is the character set ‘#255’. You won’t see this available if you check the list of available collations in Percona Server for MySQL 5.7 for the UTF8MB4 character set.
1 2 | mysql> SHOW COLLATION WHERE Charset = 'utf8mb4' and id = 255; Empty set (0.01 sec) |
But you will see this is available in Aurora.
1 2 3 4 5 6 7 | mysql> SHOW COLLATION WHERE Charset = 'utf8mb4' and id = 255; +--------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------+---------+-----+---------+----------+---------+ | utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | +--------------------+---------+-----+---------+----------+---------+ 1 row in set (0.04 sec) |
So we know that the problem here is that the command to create the lab database was processed using the utf8mb4_0900_ai_ci character set/collation, which was noted in the binary log for the Aurora instance. When this was executed on the external replica running, it encountered an error because this collation was not part of the binary.
The first thing we need to ask ourselves is this: Why was utf8mb4_0900_ai_ci used when you can see above, for both instances, that utf8mb4_general_ci is the default collation? Meaning that on the Aurora instance, there had to be something that specified that the collation should be changed to utf8mb4_0900_ai_ci before running the CREATE DATABASE command. We were informed that this change was not performed manually by the user, so we checked the system and discovered that the default collation behavior was overridden.
1 2 3 4 5 6 7 8 9 | mysql> show variables like 'col%'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+--------------------+ 3 rows in set (0.05 sec) |
This was pretty strange, considering that if you check the parameter group options for the Aurora 5.7 family, you cannot set the collation_connection variable to utf8mb4_0900_ai_ci. The only other way to set this would be via the init_connect variable, but we discovered that this was blank.
1 2 3 4 5 6 7 | mysql> show variables like 'init_con%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | init_connect | | +---------------+-------+ 1 row in set (0.04 sec) |
So it wasn’t set by the server, but the setting change had to come from somewhere. After extensive searching, we finally discovered that the problem was the MySQL client used to connect to Aurora. Specifically, if you connect to Aurora using the MySQL 8.0 client, it will set the collation_connection variable to utf8mb4_0900_ai_ci.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | [root@ip-172-31-92-215 ~]# mysql --version mysql Ver 8.0.22-13 for Linux on x86_64 (Percona Server (GPL), Release 13, Revision 6f7822f) [root@ip-172-31-92-215 ~]# mysql -h sylvester-repl-break.cluster-c5wumyjzkfnn.us-east-1.rds.amazonaws.com ... mysql> show variables like 'col%'; +----------------------+--------------------+ | Variable_name | Value | +----------------------+--------------------+ | collation_connection | utf8mb4_0900_ai_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+--------------------+ 3 rows in set (0.04 sec) |
But if you connect using the MySQL 5.7 client, it sets the collation_connection variable to utf8_general_ci.
1 2 3 4 5 6 7 8 9 10 11 12 13 | [root@ip-172-31-92-215 ~]# mysql --version mysql Ver 14.14 Distrib 5.7.40-43, for Linux (x86_64) using 6.2 [root@ip-172-31-92-215 ~]# mysql -h sylvester-repl-break.cluster-c5wumyjzkfnn.us-east-1.rds.amazonaws.com ... mysql> show variables like 'col%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ 3 rows in set (0.00 sec) |
This likely has to do with the fact that in MySQL 8.0, the default collation was changed to utf8mb4_0900_ai_ci.
In conclusion, if you are running a 5.7-compatible version of Aurora with a 5.7 external replica, ensure you avoid using the MySQL 8.0 client. Or make sure you change the collation_connection variable to utf8_general_ci, or whatever is appropriate for your session, before executing commands on the instance.
I want to thank my colleagues Taras Onishchuk and Edwin Wang for their contributions to helping solve this problem!
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
So how can you fix this issue once it has occurred?
If you create the database (or other structure) in Aurora and get this error when it gets to the replica, you need to move past the event in the relay log using something like sql_slave_skip_counter because there is really no way to modify the binary log to create the new structures using a compatible collation.
Always exercise a lot of care when using skip_slave_counter. Never skip more than one event at a time. You may have to do this several times depending on the number of structures you created. Depending on if you are using GTIDs you may have to insert an empty transaction on the replica to bypass.
Once you have skipped past the failed replication events, go back to the source Aurora instance and remove the structures that you added. Before removing you should set your local session variable sql_log_bin to 0 so that the event of dropping the structure in question isn’t added to the binary log or else you’ll just run into more replication issues.
Once the structures have been dropped from the source Aurora instance, reconnect to the aurora instance using the MySQL 5.7 client, check that it’s using the proper collation, make sure sql_log_bin is set to 1, then recreate the structures. This should result in the structures being added to the replica with no replication errors.