Hey folks,
This blog is straightforward but, at the same time, handy. At Percona, we know that the audience of our blogs is not limited to DBAs but includes a broader range of IT professionals, such as sysadmins, help desk employees, NOC operators, and developers as well.
Sometimes people (and also some DBAs, why not?) may accidentally and unintentionally execute supposedly innocuous commands which do not involve data manipulation, but can and do break the replication.
At Percona Managed Services, we see this type of situation more often than one would expect, believe me.
Suppose we have the following topology
1 2 | 1. PS8_Primary=192.168.0.70 [RW] [ver: 8.0.29-21] |___ 2. PS8_Replica=192.168.0.71 [R] [ver: 8.0.29-21] (Slave_delay: 0) |
PS8_Primary is a primary server, with Percona Server for MySQL 8.0.29 installed, in Read-Write mode
PS8_Primary is a replica server, with Percona Server for MySQL 8.0.29 installed, in Read-Only mode (but NOT Super Read Only!) and zero lag.
Below is a typical example of what I just mentioned. The user (whatever their role is) runs this command on the primary, and it works: the user is going to create the ‘foo‘ user with read-only privileges:
1 2 3 4 5 | PS8_Primary (none)> CREATE USER foo IDENTIFIED BY 'kungfoo'; Query OK, 0 rows affected (0.02 sec) PS8_Primary (none)> GRANT SELECT ON *.* TO foo; Query OK, 0 rows affected (0.01 sec) |
Then you realize that the following error occurs in the replica:
1 2 3 4 5 6 7 8 9 10 11 | PS8_Replica (none)> PAGER egrep -i "error|Errno"; SHOW SLAVE STATUSG PAGER set to 'egrep -i "error|Errno"' Last_Errno: 1396 Last_Error: Error 'Operation CREATE USER failed for 'foo2'@'%'' on query. Default database: ''. Query: 'CREATE USER 'foo2'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$r.Qk*gkq}{ZA)#&Q30SJpguBsa2NgGS08qtsA/X9fD7GUH9eoIh6cAG7M2'' Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1396 Last_SQL_Error: Error 'Operation CREATE USER failed for 'foo2'@'%'' on query. Default database: ''. Query: 'CREATE USER 'foo2'@'%' IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$r.Qk*gkq}{ZA)#&Q30SJpguBsa2NgGS08qtsA/X9fD7GUH9eoIh6cAG7M2'' Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 221027 08:22:30 1 row in set, 1 warning (0.00 sec) |
Replication is failing due to the user’s existence.
Now, you may be wondering: how could this have happened? Let’s explore a few possible options:
- The replica was once the primary instance, and the ‘foo’ user was created with SET sql_log_bin=0, which is absurd but possible.
- Person X explicitly created the users in the replica (which was not in Read-Only mode or Super Read-Only mode) so that they do not directly access the primary. Then person Y, without checking anything, tried to create (and managed to do so) the user.
And that’s what happened:
1 2 3 4 5 6 7 | PS8_Replica (none)> SHOW GRANTS FOR fooG *************************** 1. row *************************** Grants for foo@%: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `foo`@`%` *************************** 2. row *************************** Grants for foo@%: GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ABORT_EXEMPT,AUDIT_ADMIN,AUTHENTICATION_POLICY_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,GROUP_REPLICATION_STREAM,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PASSWORDLESS_USER_ADMIN,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SENSITIVE_VARIABLES_OBSERVER,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `foo`@`%` 2 rows in set (0.01 sec) |
The user existed in the replica! It explains the error.
What’s done is done. How do we solve it?
- We could skip the command if the replication is not based on GTID.
- If the replica is based on GTID, skipping the command would be more complex but possible.
- If the replication is based on GTID and at the same time uses multi-threaded replication, skipping the command would be more complex but possible.
The below links may help you:
How To Fix MySQL Replication After an Incompatible DDL Command
Repair MySQL 5.6 GTID replication by injecting empty transactions
We can also use the pt-slave-restart utility to sort out this error. This is an example
1 | PS8_Replica $ pt-slave-restart localhost --error-numbers=1396 |
Personally, I do not recommend any of the three methods or scenarios mentioned above. I would prefer to do the following instead:
- Access the replica, verify that the user exists and see what permissions are assigned.
- Compare the permissions assigned in the replica with those that are trying to be assigned in the primary instance. We can use the pt-show-grants utility for this purpose:
1 2 3 4 5 6 7 8 | $ pt-show-grants 192.168.0.70 | grep foo > /tmp/pri.txt && pt-show-grants 192.168.0.71 | grep foo > /tmp/rpl.txt && diff /tmp/pri.txt /tmp/rpl.txt 3,4c3,5 < ALTER USER `foo`@`%` IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$CGEt?8}EZ<_xx M:IjTJ/wDGbOgNaod0qqu65OoZoWcKmKVL/RgOF7Pm9e2' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK PASSWORD HISTORY DEFAULT PASSWORD REUSE INTERVAL DEFAULT PASSWORD REQUIRE CURRENT DEFAULT; < GRANT SELECT ON *.* TO `foo`@`%`; --- > ALTER USER `foo`@`%` IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$1X1&7zZTLNns.p?KeA6k4NAorc1D4UUcq.osjKyDCFp4rP2p3v6/W8Nmy5' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK PASSWORD HISTORY DEFAULT PASSWORD REUSE INTERVAL DEFAULT PASSWORD REQUIRE CURRENT DEFAULT; > GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROLE, CREATE ROUTINE, CREATE TABLESPACE, CREATE TEMPORARY TABLES, CREATE USER, CREATE VIEW, DELETE, DROP, DROP ROLE, EVENT, EXECUTE, FILE, INDEX, INSERT, LOCK TABLES, PROCESS, REFERENCES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SELECT, SHOW DATABASES, SHOW VIEW, SHUTDOWN, SUPER, TRIGGER, UPDATE ON *.* TO `foo`@`%`; > GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ABORT_EXEMPT,AUDIT_ADMIN,AUTHENTICATION_POLICY_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,GROUP_REPLICATION_STREAM,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PASSWORDLESS_USER_ADMIN,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SENSITIVE_VARIABLES_OBSERVER,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `foo`@`%`; |
Based on that, we can:
- If the user and permissions match, we can stop the replication, DROP the user in the replica, and resume the replication.
- If the user and the permissions DO NOT match, we need to discuss the situation with the user who executed the statement and verify what permissions they need (whether those of the replica or those of the primary or something in-between) and act accordingly.
Funnily enough, the opposite could happen: some people delete a user on the primary instance, and it doesn’t exist on the replica. In this case, it would be very similar.
1 2 3 4 5 6 7 8 9 10 11 | PS8_Primary (none)> DROP USER bar; Query OK, 0 rows affected (0.01 sec) PS8_Replica (none)> PAGER grep -i error; SHOW SLAVE STATUSG PAGER set to 'grep -i error' Last_Error: Error 'Operation DROP USER failed for 'bar'@'%'' on query. Default database: ''. Query: 'DROP USER bar' Last_IO_Error: Last_SQL_Error: Error 'Operation DROP USER failed for 'bar'@'%'' on query. Default database: ''. Query: 'DROP USER bar' Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 221026 07:21:40 1 row in set, 1 warning (0.02 sec) |
Let’s prevent this. How? Using the EXISTS clause.
For the DROP USER case:
1 2 3 4 5 6 7 8 9 10 11 | PS8_Primary (none)> DROP USER IF EXISTS foo; Query OK, 0 rows affected (0.01 sec) PS8_Primary (none)> SHOW MASTER STATUSG *************************** 1. row *************************** File: mysql-bin.000008 Position: 824 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 1 row in set (0.00 sec) |
And now, as you can see, we don’t have any errors on the replica:
1 2 3 4 5 6 7 8 9 10 11 12 13 | PS8_Replica (none)> PAGER egrep -i "log|behind"; SHOW SLAVE STATUSG PAGER set to 'egrep -i "log|behind"' Master_Log_File: mysql-bin.000008 Read_Master_Log_Pos: 824 Relay_Log_File: relay.000023 Relay_Log_Pos: 509 Relay_Master_Log_File: mysql-bin.000008 Exec_Master_Log_Pos: 824 Relay_Log_Space: 1362 Until_Log_File: Until_Log_Pos: 0 Seconds_Behind_Master: 0 Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates |
All resolved.
Below I will list most of the commands that are usually executed (in the first column), and how to make them failsafe (in the second column).
For MySQL
Instead of this | Use this |
CREATE USER | CREATE USER [IF NOT EXISTS] |
DROP USER | DROP USER [IF EXISTS] |
CREATE ROLE | CREATE ROLE [IF NOT EXISTS] |
DROP ROLE | DROP ROLE [IF EXISTS] |
CREATE TABLE | CREATE [TEMPORARY] TABLE [IF NOT EXISTS] |
DROP TABLE | DROP [TEMPORARY] TABLE [IF EXISTS] |
CREATE VIEW | CREATE OR REPLACE |
DROP VIEW | DROP VIEW [IF EXISTS] |
CREATE PROCEDURE | CREATE PROCEDURE [IF NOT EXISTS] |
DROP PROCEDURE | DROP {PROCEDURE | FUNCTION} [IF EXISTS] |
CREATE FUNCTION | CREATE FUNCTION [IF NOT EXISTS] |
DROP FUNCTION | DROP {PROCEDURE | FUNCTION} [IF EXISTS] |
CREATE TRIGGER | CREATE TRIGGER [IF NOT EXISTS] |
DROP TRIGGER | CREATE TRIGGER [IF NOT EXISTS] |
CREATE EVENT | CREATE EVENT [IF NOT EXISTS] |
DROP EVENT | DROP EVENT [IF EXISTS] |
CREATE SCHEMA | CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] |
DROP SCHEMA | DROP {DATABASE | SCHEMA} [IF EXISTS] |
For MariaDB
(Surprisingly, it has not only the EXISTS clause, but REPLACE as well.)
Instead of this | Use this |
CREATE USER | CREATE [OR REPLACE] USER [IF NOT EXISTS] |
DROP USER | DROP USER [IF EXISTS] |
CREATE ROLE | CREATE [OR REPLACE] ROLE [IF NOT EXISTS] |
DROP ROLE | DROP ROLE [IF EXISTS] |
CREATE TABLE | CREATE [OR REPLACE] [TEMPORARY] TABLE [IF NOT EXISTS] |
DROP TABLE | DROP [TEMPORARY] TABLE [IF EXISTS] |
CREATE VIEW | CREATE [OR REPLACE][IF NOT EXISTS] |
DROP VIEW | DROP VIEW [IF EXISTS] |
CREATE PROCEDURE | CREATE [OR REPLACE] [IF NOT EXISTS] |
DROP PROCEDURE | DROP {PROCEDURE | FUNCTION} [IF EXISTS] |
CREATE FUNCTION | CREATE [OR REPLACE] [IF NOT EXISTS] |
DROP FUNCTION | DROP FUNCTION [IF EXISTS] |
CREATE TRIGGER | CREATE [OR REPLACE] [IF NOT EXISTS] |
DROP TRIGGER | DROP TRIGGER [IF EXISTS] |
CREATE EVENT | CREATE [OR REPLACE] [IF NOT EXISTS] |
DROP EVENT | DROP EVENT [IF EXISTS] |
CREATE SCHEMA | CREATE [OR REPLACE] {DATABASE | SCHEMA} [IF NOT EXISTS] |
DROP SCHEMA | DROP {DATABASE | SCHEMA} [IF EXISTS] |
CREATE INDEX | CREATE [OR REPLACE] INDEX [IF NOT EXISTS] |
DROP INDEX | [IF EXISTS] |
Conclusion
Next time, consider using these options in the DROP or CREATE commands. It will help ensure you’re doing the right thing for the primary database instance and don’t break the replication.
Be clever, be simple, but above all, be careful.