Comments on: InnoDB Flushing in Action for Percona Server for MySQL

By: Victor Machuca

Victor Machuca — Thu, 13 Feb 2020 14:58:58 +0000

This is relevant to us since, we are seeing a boost in performance on MySQL percona-server-5.6.27-76.0 by increasing lru_scan_depth in accordance to io_capacity. However, we still see periods in which buf_flush_lru_manager_thread sleeps for pretty good 2 mins while all other foreground threads are sleeping waiting on him.

By: mtb2020

mtb2020 — Mon, 27 Jan 2020 17:51:15 +0000

In reply to Yves Trudeau. Im seeing sharp checkpoints where all dirty pages are flushed and checkpoint age goes to 0. This was happening on a roughly daily basis when our logfile size was 1GB and now it happens roughly every 3 days after moving the logfile size to 10GB. It is not preceded by a metric hitting max_checkpoint_age, far from it. Im unclear if this sharp checkpointing behavior is expected and im seeing a problem because my writes are slow and there too many dirty pages built up, or if people normally operate mysql in a state of fuzzy checkpointing where it never does this sharp checkpoint/full flush. Im not sure if its because flushing is too slow or too late. Are those questions answered by any pre-existing PMM dashboards? I will check and maybe have to add some to answer further. No other existing dashboard seems to have a preceding indicator of this behavior, its not preceded by some large volume of database foreground or background activity that I can tell. I was reaching a theory that flushing was too lazy and wasnt reaching into the older LSNs and whatever this 'occasional logical operation where everything older must be flushed' it found too much old stuff and flushed everything, but if that process is checking every time it writes to the logfile it doesnt make as much sense to me. Ive since started to realize that it wasnt really a direct problem with innodb log file size being too small and have started exploring tuning other things (innodb_max_dirty_pct and lwm, innodb_adaptive_flushing_lwm mostly) to make flushing more aggressive and to kick in adaptive flushing with those large logfiles, and am seeing those changes affect the metrics for checkpoint age and dirty pages. Time will tell if the periodic full flush still happens after those changes. Im guessing the follow up blogpost about tuning guidance will be very helpful to me as well!

By: Yves Trudeau

Yves Trudeau — Mon, 27 Jan 2020 16:16:43 +0000

In reply to mtb2020. @mtb2020, I just looked at the InnoDB code (5.7.28) and the checkpoint value is updated as part of the log_io_complete operation, so that's every time the log files are written to. In your case, if I understand correctly, the flushing is too slow or occurs too late. Is the flushing speed, the number of pages written by InnoDB every second, matches io_cap_max? How many pending writes do you have?

By: mtb2020

mtb2020 — Mon, 27 Jan 2020 14:57:54 +0000

This post is extremely relevant to me right now, Im struggling to prevent sharp checkpoints in a percona 5.6 server. I went through one round of attempting to simply increase log file sizes but I think I was simply moving the timing of the problem and creating a scenario where more dirty pages were allowed to pile up before the sharp checkpoint occurred.

This blog post discusses what I think I am seeing: https://www.percona.com/blog/2012/02/17/the-relationship-between-innodb-log-checkpointing-and-dirty-buffer-pool-pages/ “The checkpoint process is really a logical operation. It occasionally (as chunks of dirty pages get flushed) has a look through the dirty pages in the buffer pool to find the one with the oldest LSN, and that’s the Checkpoint. Everything older must be fully flushed.”

Is there any information about what triggers the ‘checkpoint process to find the [page] with the oldest lsn’ and how to impact that so its not so old and has less to flush? Im guessing the LRU parameter will be a good thing to tune for this.