Chaos Testing Leads to More Stable Percona XtraDB ClusterIn my talk at Percona Live 2021, “Creating Chaos in Databases”, I discussed how creating a controlled interruption in available resources (I used primary pod and network interruptions) allows us to test the stability of a database, and in our case, Percona XtraDB Cluster.

I also mentioned in the talk that my testing led to diagnosing a few unpleasant bugs, namely:

  • PXC-3437: Node fails to join in the endless loop
  • PXC-3580: Aggressive network outages on one node makes the whole cluster unusable
  • PXC-3596: Node stuck in aborting SST

Currently, I am happy to report these bugs are fixed in Percona XtraDB Cluster 8.0.23 and this version will provide you with a much better and stable experience, especially when used in a combination with our Percona Distribution for MySQL Operator.

I am not able to break Percona XtraDB Cluster 8.0.23 as I was able to in previous releases. It seems I need to be more creative to find more network-related bugs, so we will see how it goes.

As a side note, I would like to mention that our fixes are available to everybody who would like to improve the stability of their products based on the Galera library. We do not hide our source code behind “Enterprise” paywalls or hide them in combined .tar.gz source code dumps.

For example, a bug fix for bug https://jira.percona.com/browse/PXC-3580 is available in the pull request https://github.com/percona/galera/pull/214/files. Percona is committed to providing you with a real Open Source experience.

Happy Clustering!

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Bruno C

I agree with you. Mariadb and Galera hiding NBO – Non-Blocking Operations schema change under a closed license model was very frustrating and have a very bad taste in mouth.

Because of Galera schema change limitations my company settled with Group Replication. What made me sad, because the Operator of Percona for Kubernetes is incredible.