In this blog post, I’ll look for the bottleneck that prevented the performance in my previous post from achieving better results.
The powerful machine I used in the tests in my previous post has a comparatively slow disk, and therefore I expected my tests would hit a point when I couldn’t increase performance further due to the disk speed.
Hardware configuration:
Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS
Versions tested and configuration: same as in the first post of this series (check the post for specifics).
Even though I expected my tests would stop increasing in performance due to the disk speed, I did not observe high IO rates in the iostat output. I already tested with a full data set that fits in memory. In this case, write performance only affected data flushes and log writes. But we should still see a visible decrease in speed. So I decided to try RW tests totally in memory. I created a ramdisk and put the MySQL datadir on it. Surprisingly, results on the SSD and ramdisk did not differ.
I asked my colleagues from “Postgres Professional” to test PostgreSQL with the ramdisk. They got similar results:
It’s interesting that the value of innodb_io_capacity does not have any effect on this situation. Data for the graph below was taken when I ran tests on ramdisk. I wanted to see if I could control the IO activity of a disk, which is extremely fast by default, using this variable.
This totally contradicts all my past experiences with smaller machines. Percona re-purposed the machine with a faster disk (which I used before, described in this post), so I used a similar one with slower disk speed.
Hardware configuration:
Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 47.2G
Disk speed: about 3K IOPS
OS: Ubuntu 14.04.5 LTS (trusty)
File system: ext4
Again, in this case innodb_io_capacity benchmarks with a smaller number of CPU cores showed more predictable results.
Conclusion:
Both MySQL and PostgreSQL on a machine with a large number of CPU cores hit CPU resources limits before disk speed can start affecting performance. We only tested one scenario, however. With other scenarios, the results might be different.
Sveta, this is a very interesting observation indeed. However, I was hoping that you would have kept using the same filesystem in both scenarios, any reasons for not keeping it the same? If they were actually different, do we risk this comparison becoming a filesystem benchmark?
Do you mean Percona and Freematiq machines? I did not try to compare them. But Percona machines are more like those which I see when work on Support issues and they demonstrate behavior I used to. So I use them as a baseline and to check if I am not doing something wrong with the test itself.
This is certainly not filesystem benchmark, because workload is not disk-bound by its nature. Still I expected I would hit disk performance issues on Freematiq machine at some point and I did not.
The filesystem is different.
Yes, but I am not comparing filesystems or even Freematiq and Percona machine. I used smaller machine to show behavior which I expect based on my Support experience. See more detailed reply to the previous comment.
If hundreds or thousands of queries faster speed ain’t worth minding, what is ?
Did you get PMP output to see where the CPU/mutex bottlenecks occur?
And first lines of PMP:
$ head -n 10 pt-pmp.log.20170303(ib0mutex.h:850),lock_table(lock0lock.cc:4008),row_search_mvcc(row0sel.cc:4904),ha_innobase::index_read(ha_innodb.cc:8634),handler::ha_index_read_map(handler.cc:3031),handler::read_range_first(handler.cc:7297),handler::multi_range_read_next(handler.cc:6367),DsMrr_impl::dsmrr_next(handler.cc:6751),QUICK_RANGE_SELECT::get_next(opt_range.cc:11182),rr_quick(records.cc:398),mysql_update(sql_update.cc:803),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)(ib0mutex.h:850),log_write_up_to(log0log.cc:1249),trx_flush_log_if_needed_low(trx0trx.cc:1789),trx_flush_log_if_needed(trx0trx.cc:1789),trx_commit_complete_for_mysql(trx0trx.cc:1789),innobase_commit(ha_innodb.cc:4345),ha_commit_low(handler.cc:1900),TC_LOG_DUMMY::commit(tc_log.cc:28),ha_commit_trans(handler.cc:1796),trans_commit(transaction.cc:239),mysql_execute_command(sql_parse.cc:4302),mysql_parse(sql_parse.cc:5611),dispatch_command(sql_parse.cc:1461),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)(ib0mutex.h:850),lock_clust_rec_read_check_and_lock(lock0lock.cc:6267),sel_set_rec_lock(row0sel.cc:1252),row_search_mvcc(row0sel.cc:5319),ha_innobase::index_read(ha_innodb.cc:8634),handler::ha_index_read_map(handler.cc:3031),handler::read_range_first(handler.cc:7297),handler::multi_range_read_next(handler.cc:6367),DsMrr_impl::dsmrr_next(handler.cc:6751),QUICK_RANGE_SELECT::get_next(opt_range.cc:11182),rr_quick(records.cc:398),mysql_update(sql_update.cc:803),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)(ib0mutex.h:850),lock_sec_rec_modify_check_and_lock(lock0lock.cc:6094),btr_cur_del_mark_set_sec_rec(btr0cur.cc:4967),row_upd_sec_index_entry(row0upd.cc:2264),row_upd_sec_step(row0upd.cc:2332),row_upd(row0upd.cc:2332),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::delete_row(ha_innodb.cc:8252),handler::ha_delete_row(handler.cc:7997),Sql_cmd_delete::mysql_delete(sql_delete.cc:468),Sql_cmd_delete::execute(sql_delete.cc:1382),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)(ib0mutex.h:850),lock_update_delete(lock0lock.cc:3561),btr_cur_optimistic_delete_func(btr0cur.cc:5128),row_purge_remove_sec_if_poss_leaf(row0purge.cc:541),row_purge_remove_sec_if_poss(row0purge.cc:587),row_purge_upd_exist_or_extern_func(row0purge.cc:587),row_purge_record_func(row0purge.cc:987),row_purge(row0purge.cc:987),row_purge_step(row0purge.cc:987),que_thr_step(que0que.cc:1056),que_run_threads_low(que0que.cc:1056),que_run_threads(que0que.cc:1056),srv_task_execute(srv0srv.cc:2458),srv_worker_thread(srv0srv.cc:2458),start_thread(libpthread.so.0),clone(libc.so.6)(ib0mutex.h:850),lock_sec_rec_modify_check_and_lock(lock0lock.cc:6094),btr_cur_del_mark_set_sec_rec(btr0cur.cc:4967),row_upd_sec_index_entry(row0upd.cc:2264),row_upd_sec_step(row0upd.cc:2332),row_upd(row0upd.cc:2332),row_upd_step(row0upd.cc:3158),row_update_for_mysql_using_upd_graph(row0mysql.cc:2524),row_update_for_mysql(row0mysql.cc:2722),ha_innobase::update_row(ha_innodb.cc:8139),handler::ha_update_row(handler.cc:7964),mysql_update(sql_update.cc:880),Sql_cmd_update::try_single_table_update(sql_update.cc:2884),Sql_cmd_update::execute(sql_update.cc:3011),mysql_execute_command(sql_parse.cc:3616),Prepared_statement::execute(sql_prepare.cc:3935),Prepared_statement::execute_loop(sql_prepare.cc:3543),mysqld_stmt_execute(sql_prepare.cc:2549),dispatch_command(sql_parse.cc:1395),do_command(sql_parse.cc:999),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
Пт мар 3 10:51:09 UTC 2017
29 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
28 pthread_cond_wait,native_cond_wait(thr_cond.h:140),my_cond_wait(thr_cond.h:140),inline_mysql_cond_wait(thr_cond.h:140),Per_thread_connection_handler::block_until_new_connection(thr_cond.h:140),handle_connection(connection_handler_per_thread.cc:329),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)
20 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
10 __io_getevents_0_4(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2492),LinuxAIOHandler::poll(os0file.cc:2638),os_aio_linux_handler(os0file.cc:2694),os_aio_handler(os0file.cc:2694),fil_aio_wait(fil0fil.cc:5817),io_handler_thread(srv0start.cc:308),start_thread(libpthread.so.0),clone(libc.so.6)
4 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
3 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
3 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),buf_flush_page_cleaner_worker(buf0flu.cc:3496),start_thread(libpthread.so.0),clone(libc.so.6)
2 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
2 pthread_cond_wait,wait(os0event.cc:165),os_event::wait_low(os0event.cc:165),sync_array_wait_event(sync0arr.cc:475),TTASEventMutex::wait(ut0mutex.ic:89),spin_and_try_lock(ib0mutex.h:850),enter(ib0mutex.h:850),PolicyMutex
Sveta, I wonder if Percona Server was launched with numactl –interleave ALL for both cases? We could be getting cache lines thrashed in the bigger box if memory policy is on defaults.
Also, does the above profiler output belong to the bigger box?
Yes and yes.
Here is perf top output:
2,42% libc-2.17.so [.] __memmove_ssse3_back
2,36% mysqld [.] rec_get_offsets_func
2,24% mysqld [.] buf_page_get_gen
1,70% mysqld [.] row_search_mvcc
1,57% [kernel] [k] call_function_interrupt
1,35% mysqld [.] ut_delay