PXC schema changes optionsIf you are using Galera replication, you know that schema changes may be a serious problem. With its current implementation, there is no way even a simple ALTER will be unobtrusive for live production traffic. It is a fact that with the default TOI alter method, Percona XtraDB Cluster (PXC) cluster suspends writes in order to execute the ALTER in the same order on all nodes.

For factual data structure changes, we have to adapt to the limitations, and either plan for a maintenance window, or use pt-online-schema-change, where interruptions should be very short. I suggest you be extra careful here, as normally you cannot kill an ongoing ALTER query in Galera cluster.

For schema compatible changes, that is, ones that cannot break ROW replication when the writer node and applier nodes have different metadata, we can consider using the Rolling Schema Update (RSU) method. An example of 100% replication-safe DDL is OPTIMIZE TABLE (aka noop-ALTER). However, the following are safe to consider too:

  • adding and removing secondary index,
  • renaming an index,
  • changing the ROW_FORMAT (for example enabling/disabling table compression),
  • changing the KEY_BLOCK_SIZE(compression property).

However, a lesser known fact is that even using the RSU method or pt-online-schema-change for the above may not save us from some unwanted disruptions.

RSU and Concurrent Queries

Let’s take a closer look at a very simple scenario with noop ALTER. We will set wsrep_OSU_method to RSU to avoid a cluster-wide stall. In fact, this mode turns off replication for the following DDL (and only for DDL), so you have to remember to repeat the same ALTER on every cluster member later.

For simplicity, let’s assume there is only one node used for writes. In the first client session, we change the method accordingly to prepare for DDL:

(By the way, as seen above, the desync mode is not enabled yet, as it will be automatically enabled around the DDL query only, and disabled right after it finishes).

In a second client session, we start a long enough SELECT query:

And while it’s ongoing, let’s rebuild the table:

Surprisingly, immediately the client in the second session receives its SELECT failure:

So, even a simple SELECT is aborted if it conflicts with the local, concurrent ALTER (RSU)… We can see more details in the error log:

Another example – a simple sysbench test, during which I did noop ALTER in RSU mode:

So, SELECT queries are aborted to resolve MDL lock request that a DDL in RSU needs immediately. This of course applies to INSERT, UPDATE and DELETE as well. That’s quite an intrusive way to accomplish the goal…

“Manual RSU”

Let’s try a “manual RSU” workaround instead. In fact, we can achieve the same isolated DDL execution as in RSU, by putting a node in desync mode (to avoid flow control) and disabling replication for our session. That way, the ALTER will only be executed in that particular node.

Session 1:

Session 2:

Session 1:

Session 3:

In this case, there was no interruption, the ALTER waited for it’s MDL lock request to succeed gracefully, and did it’s job when it became possible.

Remember, you have to execute the same commands on the rest of the nodes to make them consistent – even for noop-alter, it’s important to make the nodes consistent in terms of table size on disk.

Kill Problem

Another fact is that you cannot cancel or kill a DDL query executed in RSU or in TOI method:

This may be an annoying problem when you need to unblock a node urgently. Fortunately, the workaround with wsrep_on=0 also allows to kill an ALTER without that restriction:

Session 1:

Session 2:

Summary

The RSU method may be more intrusive then you’d expect. For schema compatible changes, it is worth considering “manual RSU” with set global wsrep_desync=1; set wsrep_on=0;
When using it though, please remember that wsrep_on applies to all types of writes, both DDL and DML, so be extra careful to set it back to 1 after the ALTER is done. So the procedure will look like this:

Incidentally, as in my opinion the current RSU behavior is unnecessarily intrusive, I have filed this change suggestion: https://jira.percona.com/browse/PXC-2293


Photo by Pierre Bamin on Unsplash

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
zarere

Nice article Przemysław,

I think we can achieve the same in PXC with:

select count(*) from sysbench.sbtest1 a join sysbench.sbtest1 b where a.id<10000 on 1-session
optimize local table sbtest1 on the 2-nd session

optimize local will be run only on the local node and will not cause issues on the other nodes in PXC.

Stofa

Hi Malkowski,

Great help for DDL issues on Galeda cluster
Concerning the “Manual RSU” solution, what happens if in Session 2, we have a transaction running ( INSERT, UPDATE , DELETE ) on the table modified ( by alter ) in Session 1.
Should we stop all write on db1.sbtest1 table in the Node where “Manual RSU” is performed as required by RSU method ( SET SESSION wsrep OSU method = ’RSU’ ) ?
From what i tested, NO DML done on db1.sbtest1 in other sessions different from which the alter command is running will be aborted but all are fully replicated to other nodes. Do you confirm ?

Thanks.