Timezone and pt-table-checksum

I recently worked through an issue with a client trying to detect data drift across some servers that were located in different timezones. Unfortunately, several of the tables had timestamp fields and were set to a default value of CURRENT_TIMESTAMP. From the manual, here is how MySQL handles timezone locality with timestamp fields:

Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval.

This can result in every row being deemed out of sync when each server is using in it’s own timezone. Fortunately, there is a workaround for this as a result of how pt-table-checksum/pt-table-sync calculate their checksums. Lets look at an example:

Node1 > PDT > select timediff(now(),convert_tz(
 -> now(),
 -> @@session.time_zone,'+00:00'
 -> )) as tz_offset;
+-----------+
| tz_offset |
+-----------+
| -07:00:00 |
+-----------+

Node1 > PDT > select timediff(now(),convert_tz(

-> now(),

-> @@session.time_zone,'+00:00'

-> )) as tz_offset;

+-----------+

| tz_offset |

+-----------+

| -07:00:00 |

+-----------+

Node2 > EDT > select timediff(now(),convert_tz(
 -> now(),
 -> @@session.time_zone,'+00:00'
 -> )) as tz_offset;
+-----------+
| tz_offset |
+-----------+
| -04:00:00 |
+-----------+

Node2 > EDT > select timediff(now(),convert_tz(

-> now(),

-> @@session.time_zone,'+00:00'

-> )) as tz_offset;

+-----------+

| tz_offset |

+-----------+

| -04:00:00 |

+-----------+

Node1 is configured in PDT time while node2 is configured in EDT time. Now, lets insert some rows into a table with timestamp field configured to use CURRENT_TIMESTAMP as the default and verify that they look different when no timezone manipulation takes place:

Node1 > PDT > select * from foo.bar;
+----+-------------+---------------------+
| id | data        | date_created        |
+----+-------------+---------------------+
| 1  | node1, row1 | 2012-09-24 09:38:49 |
| 2  | node1, row2 | 2012-09-24 09:38:49 |
| 3  | node1, row3 | 2012-09-24 09:38:49 |
+----+-------------+---------------------+

Node1 > PDT > select * from foo.bar;

+----+-------------+---------------------+

| id | data | date_created |

+----+-------------+---------------------+

| 1 | node1, row1 | 2012-09-24 09:38:49 |

| 2 | node1, row2 | 2012-09-24 09:38:49 |

| 3 | node1, row3 | 2012-09-24 09:38:49 |

+----+-------------+---------------------+

Node2 > EDT > select * from foo.bar;
+----+-------------+---------------------+
| id | data        | date_created        |
+----+-------------+---------------------+
| 1  | node1, row1 | 2012-09-24 12:38:49 |
| 2  | node1, row2 | 2012-09-24 12:38:49 |
| 3  | node1, row3 | 2012-09-24 12:38:49 |
+----+-------------+---------------------+

Node2 > EDT > select * from foo.bar;

+----+-------------+---------------------+

| id | data | date_created |

+----+-------------+---------------------+

| 1 | node1, row1 | 2012-09-24 12:38:49 |

| 2 | node1, row2 | 2012-09-24 12:38:49 |

| 3 | node1, row3 | 2012-09-24 12:38:49 |

+----+-------------+---------------------+

So, when we run pt-table-checksum against node1, we see that even though MySQL is storing these values in UTC internally, we have “data drift” on both nodes:

pt-table-checksum –replicate=percona.checksums –databases=foo h=localhost

Node1 > PDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 763d97f1 | 763d97f1   |
+----------+------------+

Node1 > PDT > select this_crc, master_crc from percona.checksums;

+----------+------------+

| this_crc | master_crc |

+----------+------------+

| 763d97f1 | 763d97f1 |

+----------+------------+

Node2 > EDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| b890c395 | 763d97f1   |
+----------+------------+

Node2 > EDT > select this_crc, master_crc from percona.checksums;

+----------+------------+

| this_crc | master_crc |

+----------+------------+

| b890c395 | 763d97f1 |

+----------+------------+

Naturally, pt-table-sync finds this and reports that all 3 rows in this chunk are different and gives the REPLACE INTO statements bring node2 in sync:

[root@test-master mbenshoof]# pt-table-sync --print --replicate percona.checksums --sync-to-master h=node2
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('1', 'node1, row1', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('2', 'node1, row2', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;
REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('3', 'node1, row3', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;

[root@test-master mbenshoof]# pt-table-sync --print --replicate percona.checksums --sync-to-master h=node2

REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('1', 'node1, row1', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;

REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('2', 'node1, row2', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;

REPLACE INTO `foo`.`bar`(`id`, `data`, `date_created`) VALUES ('3', 'node1, row3', '2012-09-24 09:38:49') /*percona-toolkit src_db:foo src_tbl:bar src_dsn:P=3306,h=192.168.1.45,p=...,u=percona dst_db:foo dst_tbl:bar dst_dsn:h=192.168.1.46,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1821 user:root host:test-master*/;

This is where we can use the @@session.time_zone variable to our advantage. If both servers were set up using UTC as the timezone (which is the recommended practice – store everything in UTC and then perform the timezone logic in the application), we wouldn’t see any data drift. And as explained above, the timestamps are actually stored in UTC on the server regardless of the timezone. So, to overcome the timezone inconsistencies, we can use the –set-vars option with pt-table-checksum like this:

pt-table-checksum –replicate percona.checksums –databases=foo –set-vars=”time_zone=’+00:00′” h=localhost

This effectively sets both servers to UTC (in terms of display logic) and the checksums are calculated in the same locality:

Node1 > PDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 4cdfe786 | 4cdfe786   |
+----------+------------+

Node1 > PDT > select this_crc, master_crc from percona.checksums;

+----------+------------+

| this_crc | master_crc |

+----------+------------+

| 4cdfe786 | 4cdfe786 |

+----------+------------+

Node2 > EDT > select this_crc, master_crc from percona.checksums;
+----------+------------+
| this_crc | master_crc |
+----------+------------+
| 4cdfe786 | 4cdfe786   |
+----------+------------+

Node2 > EDT > select this_crc, master_crc from percona.checksums;

+----------+------------+

| this_crc | master_crc |

+----------+------------+

| 4cdfe786 | 4cdfe786 |

+----------+------------+

This same option can and should be applied to pt-table-sync as well, especially if the checksums were created without it. Since it uses the checksum table to find the chunks but still recalculates the checksums on the fly to get the actual rows in need of syncing, it will not find those rows even though pt-table-checksum reported the chunk as out of sync. You can verify this yourself looking at the actual SQL generated by pt-table-sync when looking at rows:

SELECT /*foo.bar:1/1*/ 0 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS(‘#’, id, data, date_created + 0, CONCAT(ISNULL(data)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM foo.bar FORCE INDEX (PRIMARY) WHERE (1=1) FOR UPDATE;

When you update your session timezone (SET @@session.time_zone = “+01:00” for example), you will see different output for this. Experiment with different timezones and each time, the checksum will change. However, the underlying data isn’t touched and remains the same.

In general, it is safest to always use UTC as your timezone for MySQL to prevent this extra logic from being required, but the use of –set-vars can be a very powerful tool when you need to influence the session variables on different servers for whatever reason.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Derek Downey

11 years ago

Hey Mike, thanks for sharing this experience. I’ve never had to manage cross-timezone servers, but this is an apt reminder to ‘plan for the future’ and so I’ll be making sure my server config files include the default_time_zone option…just in case I forgot to set the server locale to be UTC 😉

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Timezone and pt-table-checksum

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Timezone and pt-table-checksum

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation