How MaxScale monitors servers

In this post, we’ll address how MaxScale monitors servers. We saw in the

We saw in the previous post how we could deal with high availability (HA) and read-write split using MaxScale.

If you remember from the previous post, we used this section to monitor replication:

 [Replication Monitor]
type=monitor
module=mysqlmon
servers=percona1, percona2, percona3
user=maxscale
passwd=264D375EC77998F13F4D0EC739AABAD4
monitor_interval=1000
script=/usr/local/bin/failover.sh
events=master_down

[Replication Monitor]

type=monitor

module=mysqlmon

servers=percona1, percona2, percona3

user=maxscale

passwd=264D375EC77998F13F4D0EC739AABAD4

monitor_interval=1000

script=/usr/local/bin/failover.sh

events=master_down

But what are we monitoring? We are monitoring the assignment of master and slave roles inside MaxScale according to the actual replication tree in the cluster using the default check from the mysqlmon monitoring modules.

There are other monitoring modules available with MaxScale:

galera monitor

So back to our setup. MaxScale monitors the roles of our servers involved in replication. We can see the status of every server like this:

# maxadmin -pmariadb show server percona2
Server 0x1cace90 (percona2)
	Server:                              192.168.90.3
	Status:                              Slave, Running
	Protocol:                    MySQLBackend
	Port:                                3306
	Server Version:			5.6.28-76.1-log
	Node Id:                     2
	Master Id:                   1
	Slave Ids:                   
	Repl Depth:                  1
	Number of connections:               0
	Current no. of conns:                0
	Current no. of operations:   0

# maxadmin -pmariadb show server percona2

Server 0x1cace90 (percona2)

Server: 192.168.90.3

Status: Slave, Running

Protocol: MySQLBackend

Port: 3306

Server Version: 5.6.28-76.1-log

Node Id: 2

Master Id: 1

Slave Ids:

Repl Depth: 1

Number of connections: 0

Current no. of conns: 0

Current no. of operations: 0

Now if we stop the slave, we can see:

# maxadmin -pmariadb show server percona2
Server 0x1cace90 (percona2)
	Server:                              192.168.90.3
	Status:                              Running
	Protocol:                    MySQLBackend
	Port:                                3306
	Server Version:			5.6.28-76.1-log
	Node Id:                     2
	Master Id:                   -1
	Slave Ids:                   
	Repl Depth:                  -1
	Number of connections:               40
	Current no. of conns:                0
	Current no. of operations:   0

# maxadmin -pmariadb list servers
Servers.
-------------------+-----------------+-------+-------------+--------------------
Server             | Address         | Port  | Connections | Status              
-------------------+-----------------+-------+-------------+--------------------
percona1           | 192.168.90.2    |  3306 |           0 | Master, Running
percona2           | 192.168.90.3    |  3306 |           0 | Running
percona3           | 192.168.90.4    |  3306 |           0 | Slave, Running
-------------------+-----------------+-------+-------------+--------------------

# maxadmin -pmariadb show server percona2

Server 0x1cace90 (percona2)

Server: 192.168.90.3

Status: Running

Protocol: MySQLBackend

Port: 3306

Server Version: 5.6.28-76.1-log

Node Id: 2

Master Id: -1

Slave Ids:

Repl Depth: -1

Number of connections: 40

Current no. of conns: 0

Current no. of operations: 0

# maxadmin -pmariadb list servers

Servers.

-------------------+-----------------+-------+-------------+--------------------

Server | Address | Port | Connections | Status

-------------------+-----------------+-------+-------------+--------------------

percona1 | 192.168.90.2 | 3306 | 0 | Master, Running

percona2 | 192.168.90.3 | 3306 | 0 | Running

percona3 | 192.168.90.4 | 3306 | 0 | Slave, Running

-------------------+-----------------+-------+-------------+--------------------

and in the MaxScale logs:

2016-02-23 14:29:09   notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave

1	2016-02-23 14:29:09 notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave

Now if the slave is lagging, nothing happens, and we will then keep sending reads to a slave that is not up to date 🙁

To avoid that situation, we can add to the “[Replication Monitor]” section the following parameter:

detect_replication_lag=true

1	detect_replication_lag=true

If we do so, MaxScale (if it has enough privileges) will create a schema maxscale_schema with a table replication_heartbeat . This table will be used to verify the replication lag like pt-heartbeat does.

When enabled, after we restart MaxScale, we can see the slave lag:

# maxadmin -pmariadb show server percona2
Server 0x2784f00 (percona2)
	Server:                              192.168.90.3
	Status:                              Slave, Running
	Protocol:                    MySQLBackend
	Port:                                3306
	Server Version:			5.6.28-76.1-log
	Node Id:                     2
	Master Id:                   1
	Slave Ids:                   
	Repl Depth:                  1
	Slave delay:		670
	Last Repl Heartbeat:	Tue Feb 23 14:25:24 2016
	Number of connections:               0
	Current no. of conns:                0
	Current no. of operations:   0

# maxadmin -pmariadb show server percona2

Server 0x2784f00 (percona2)

Server: 192.168.90.3

Status: Slave, Running

Protocol: MySQLBackend

Port: 3306

Server Version: 5.6.28-76.1-log

Node Id: 2

Master Id: 1

Slave Ids:

Repl Depth: 1

Slave delay: 670

Last Repl Heartbeat: Tue Feb 23 14:25:24 2016

Number of connections: 0

Current no. of conns: 0

Current no. of operations: 0

Does this mean that now the node won’t be reached (no queries will be routed to it)?

Let’s check:

percona3 mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| percona2   |
+------------+

percona3 mysql> select @@hostname;

+------------+

| @@hostname |

+------------+

| percona2 |

+------------+

That doesn’t sound good…

# maxadmin -pmariadb show server percona2
Server 0x2784f00 (percona2)
	Server:                              192.168.90.3
	Status:                              Slave, Running
	Protocol:                    MySQLBackend
	Port:                                3306
	Server Version:			5.6.28-76.1-log
	Node Id:                     2
	Master Id:                   1
	Slave Ids:                   
	Repl Depth:                  1
	Slave delay:		1099
	Last Repl Heartbeat:	Tue Feb 23 14:25:24 2016
	Number of connections:               1
	Current no. of conns:                1
	Current no. of operations:   0

# maxadmin -pmariadb show server percona2

Server 0x2784f00 (percona2)

Server: 192.168.90.3

Status: Slave, Running

Protocol: MySQLBackend

Port: 3306

Server Version: 5.6.28-76.1-log

Node Id: 2

Master Id: 1

Slave Ids:

Repl Depth: 1

Slave delay: 1099

Last Repl Heartbeat: Tue Feb 23 14:25:24 2016

Number of connections: 1

Current no. of conns: 1

Current no. of operations: 0

We can see that there is 1 current connection .

How come? The monitoring actually works as expected, but we didn’t configure our Splitter Service to not use that lagging slave.

We need to configure it like this:

[Splitter Service]
type=service
router=readwritesplit
servers=percona1, percona2
max_slave_replication_lag=30
...

[Splitter Service]

type=service

router=readwritesplit

servers=percona1, percona2

max_slave_replication_lag=30

...

And now, if the slave lags for 30 seconds or more, it won’t be used.

But what happen if for any reason we need to stop all the slaves (or if replication breaks)?

To find out, I performed a STOP SLAVE; on percona2 and percona3. This what we see in the logs:

2016-02-23 22:55:16   notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave
2016-02-23 22:55:34   notice : Server changed state: percona1[192.168.90.2:3306]: lost_master
2016-02-23 22:55:34   notice : Server changed state: percona3[192.168.90.4:3306]: lost_slave
2016-02-23 22:55:34   error  : No Master can be determined. Last known was 192.168.90.2:3306
2016-02-23 22:55:45   error  : Couldn't find suitable Master from 2 candidates.
2016-02-23 22:55:45   error  : 140003532506880 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error.
2016-02-23 22:55:46   error  : Couldn't find suitable Master from 2 candidates.
2016-02-23 22:55:46   error  : 140003542996736 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error.

2016-02-23 22:55:16 notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave

2016-02-23 22:55:34 notice : Server changed state: percona1[192.168.90.2:3306]: lost_master

2016-02-23 22:55:34 notice : Server changed state: percona3[192.168.90.4:3306]: lost_slave

2016-02-23 22:55:34 error : No Master can be determined. Last known was 192.168.90.2:3306

2016-02-23 22:55:45 error : Couldn't find suitable Master from 2 candidates.

2016-02-23 22:55:45 error : 140003532506880 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error.

2016-02-23 22:55:46 error : Couldn't find suitable Master from 2 candidates.

2016-02-23 22:55:46 error : 140003542996736 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error.

If there are no more slaves, the master is not a master anymore, and the routing doesn’t work. The service is unavailable!

As soon as we start a slave, the service is back:

2016-02-23 22:59:17   notice : Server changed state: percona3[192.168.90.4:3306]: new_slave
2016-02-23 22:59:17   notice : A Master Server is now available: 192.168.90.2:3306

1 2	2016-02-23 22:59:17 notice : Server changed state: percona3[192.168.90.4:3306]: new_slave 2016-02-23 22:59:17 notice : A Master Server is now available: 192.168.90.2:3306

Can we avoid this situation when all slaves are stopped?

Yes we can, but we need to add into the monitoring section the following line:

detect_stale_master=true

1	detect_stale_master=true

If we stop the two slaves again, in MaxScale’s log we can now read:

2016-02-23 23:02:19   notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave
2016-02-23 23:02:46   warning: [mysql_mon]: root server [192.168.90.2:3306] is no longer Master, let's use it again even  if it could be a stale master, you have been warned!
2016-02-23 23:02:46   notice : Server changed state: percona3[192.168.90.4:3306]: lost_slave

2016-02-23 23:02:19 notice : Server changed state: percona2[192.168.90.3:3306]: lost_slave

2016-02-23 23:02:46 warning: [mysql_mon]: root server [192.168.90.2:3306] is no longer Master, let's use it again even if it could be a stale master, you have been warned!

2016-02-23 23:02:46 notice : Server changed state: percona3[192.168.90.4:3306]: lost_slave

And we can still connect to our service and use the single master.

Next time we will see how the read-write split works.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jordan

8 years ago

Great points here. It’s so important for people to understand how servers are monitored like this. Thanks for sharing this!

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How MaxScale monitors servers

Related

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How MaxScale monitors servers

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation