Percona XtraDB Cluster testing with Consul and Vagrant

Introducing Consul
I’m always interested in what Mitchell Hashimoto and Hashicorp are up to, I typically find their projects valuable. If you’ve heard of Vagrant, you know their work.
I recently became interested in a newer project they have called ‘Consul‘. Consul is a bit hard to describe. It is (in part):

Highly consistent metadata store (a bit like Zookeeeper)

A monitoring system (lightweight Nagios)

A service discovery system, both DNS and HTTP-based. (think of something like haproxy, but instead of tcp load balancing, it provides dns lookups with healthy services)

What this has to do with Percona XtraDB Cluster

I’ve had some more complex testing for Percona XtraDB Cluster (PXC) to do on my plate for quite a while, and I started to explore Consul as a tool to help with this. I already have Vagrant setups for PXC, but ensuring all the nodes are healthy, kicking off tests, gathering results, etc. were still difficult.

So, my loose goals for Consul are:

A single dashboard to ensure my testing environment is healthy
Ability to adapt to any size environment — 3 node clusters up to 20+
Coordinate starting and stopping load tests running on any number of test clients
Have the ability to collect distributed test results

I’ve succeeded on some of these fronts with a Vagrant environment I’ve been working on. This spins up:

A Consul cluster (default is a single node)
Test server(s)
A PXC cluster

Additionally, it integrates the Test servers and PXC nodes with Consul such that:

The servers setup a Consul agent in client mode to the Consul cluster
Additionally, they setup a local DNS forwarder that sends all DNS requests to the ‘.consul’ domain to the local agent to be serviced by the Consul cluster.
The servers register services with Consul that run local health checks
The test server(s) setup a ‘watch’ in consul to wait for starting sysbench on a consul ‘event’.

Seeing it in action

Once I run my ‘vagrant up’, I get a consul UI I can connect to on my localhost at port 8501:

Consul’s Node Overview

I can see all 5 of my nodes. I can check the services and see that test1 is failing one health check because sysbench isn’t running yet:

Consul reporting sysbench is not running.

This is expected, because I haven’t started testing yet. I can see that my PXC cluster is healthy:

Health checks are using clustercheck from the PXC package

Involving Percona Cloud Tools in the system

So far, so good. This Vagrant configuration (if I provide a PERCONA_AGENT_API_KEY in my environment) also registers my test servers with Percona Cloud Tools, so I can see data being reported there for my nodes:

Percona Cloud Tool’s Dashboard for a single node

So now I am ready to begin my test. To do so, I simply need to issue a consul event from any of the nodes:

jayj@~/Src/pxc_consul [507]$ vagrant ssh consul1
Last login: Wed Nov 26 14:32:38 2014 from 10.0.2.2
[root@consul1 ~]# consul event -name='sysbench_update_index'
Event ID: 7c8aab42-fd2e-de6c-cb0c-1de31c02ce95

jayj@~/Src/pxc_consul [507]$ vagrant ssh consul1

Last login: Wed Nov 26 14:32:38 2014 from 10.0.2.2

[root@consul1 ~]# consul event -name='sysbench_update_index'

Event ID: 7c8aab42-fd2e-de6c-cb0c-1de31c02ce95

My pre-configured watchers on my test node knows what to do with that event and launches sysbench. Consul shows that sysbench is indeed running:

And I can indeed see traffic start to come in on Percona Cloud Tools:

I have testing traffic limited for my example, but that’s easily tunable via the Vagrantfile. To show something a little more impressive, here’s a 5 node cluster running hitting around 2500 tps total throughput:

So to summarize thus far:

I can spin up any size cluster I want and verify it is healthy with Consul’s UI
I can spin up any number of test servers and kick off sysbench on all of them simultaneously

Another big trick of Consul’s

That so far so good, but let me point out a few things that may not be obvious. If you check the Vagrantfile, I use a consul hostname in a few places. First, on the test servers:

            # sysbench setup
            'tables' => 1,
            'rows' => 1000000,
            'threads' => 4 * pxc_nodes,
            'tx_rate' => 10,
            'mysql_host' => 'pxc.service.consul'

# sysbench setup

'tables' => 1,

'rows' => 1000000,

'threads' => 4 * pxc_nodes,

'tx_rate' => 10,

'mysql_host' => 'pxc.service.consul'

then again on the PXC server configuration:

          # PXC setup
          "percona_server_version"  => pxc_version,
          'innodb_buffer_pool_size' => '1G',
          'innodb_log_file_size' => '1G',
          'innodb_flush_log_at_trx_commit' => '0',
          'pxc_bootstrap_node' => (i == 1 ? true : false ),
          'wsrep_cluster_address' => 'gcomm://pxc.service.consul',
          'wsrep_provider_options' => 'gcache.size=2G; gcs.fc_limit=1024',

# PXC setup

"percona_server_version" => pxc_version,

'innodb_buffer_pool_size' => '1G',

'innodb_log_file_size' => '1G',

'innodb_flush_log_at_trx_commit' => '0',

'pxc_bootstrap_node' => (i == 1 ? true : false ),

'wsrep_cluster_address' => 'gcomm://pxc.service.consul',

'wsrep_provider_options' => 'gcache.size=2G; gcs.fc_limit=1024',

Notice ‘pxc.service.consul’. This hostname is provided by Consul and resolves to all the IPs of the current servers both having and passing the ‘pxc’ service health check:

[root@test1 ~]# host pxc.service.consul
pxc.service.consul has address 172.28.128.7
pxc.service.consul has address 172.28.128.6
pxc.service.consul has address 172.28.128.5

[root@test1 ~]# host pxc.service.consul

pxc.service.consul has address 172.28.128.7

pxc.service.consul has address 172.28.128.6

pxc.service.consul has address 172.28.128.5

So I am using this to my advantage in two ways:

My PXC cluster bootstraps the first node automatically, but all the other nodes use this hostname for their wsrep_cluster_address. This means: no specific hostnames or ips in the my.cnf file, and this hostname will always be up to date with what nodes are active in the cluster; which is the precise list that should be in the wsrep_cluster_address at any given moment.
My test servers connect to this hostname, therefore they always know where to connect and they will round-robin (if I have enough sysbench threads and PXC nodes) to different nodes based on the response of the dns lookup, which returns 3 of the active nodes in a different order each time.

(Some of) The Issues

This is still a work in progress and there are many improvements that could be made:

I’m relying on PCT to collect my data, but it’d be nice to utilize Consul’s central key/value store to store results of the independent sysbench runs.
Consul’s leader election could be used to help the cluster determine which node should bootstrap on first startup. I am assuming node1 should bootstrap.
A variety of bugs in various software still makes this a bit clunky sometimes to manage. Here is a sample:
- Consul events sometimes don’t fire in the current release (though it looks to be fixed soon)
- PXC joining nodes sometimes get stuck putting speed bumps into the automated deploy.
- Automated installs of percona-agent (which sends data to Percona Cloud Tools) is straight-forward, except when different cluster nodes clobber each other’s credentials.

So, in summary, I am happy with how easily Consul integrates and I’m already finding it useful for a product in its 0.4.1 release.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

epcim

8 years ago

Hi, I am about to use percona + consul for production. I am about to avoid consul-template since I already use chef for configuration.

Can you please confirm that usage of 'wsrep_cluster_address' => 'gcomm://pxc.service.consul', will result in using/specifying of “all” the hosts registered in ‘pxc’ service defined on consul? Like one..N IP addresses? It ovbiously depends on what percona does with “gcomm://” handler.

Thanks PMi

Jay Janssen

Author

8 years ago

Hi epcim — wsrep_cluster_address does not demand every active cluster node be listed. It is used only when the local node is starting and it only needs to find ONE active node in the cluster. pxc.service.consul only needs to give a single ip of another active node for the local node to successfully join.

epcim

8 years ago

Thanks!

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant

What this has to do with Percona XtraDB Cluster

Seeing it in action

Involving Percona Cloud Tools in the system

So to summarize thus far:

Another big trick of Consul’s

(Some of) The Issues

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant

What this has to do with Percona XtraDB Cluster

Seeing it in action

Involving Percona Cloud Tools in the system

So to summarize thus far:

Another big trick of Consul’s

(Some of) The Issues

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation