In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.
Introduction
Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.
IST
IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.
Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.
IST uses the same path for applying write-sets, except that it is more like a batch operation.
IST Performance
Let’s look at IST performance before and now.
Setup
- Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
- Start workload against node-1 for 30 seconds
- Shutdown node-2
- Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
- Start node-2 with N-applier threads
- Wait until IST is done
- ….. repeat steps 3-6 with different values of N.
Observations:
- IST is 4x faster with PXC-5.7.17 (compared to previous releases)
- Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly
Conclusion
Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!
Can you show galera.conf variables used by this test? SST section in particular. Thanks
Is this fix yet available through Codership or MariaDB?
Currently it is available only as part of PXC.