MySQL Challenge: 100k Connections

In this post, I want to explore a way to establish 100,000 connections to MySQL. Not just idle connections, but executing queries.

100,000 connections. Is that really needed for MySQL, you may ask? Although it may seem excessive, I have seen a lot of different setups in customer deployments. Some deploy an application connection pool, with 100 application servers and 1,000 connections in each pool. Some applications use a “re-connect and repeat if the query is too slow” technique, which is a terrible practice. It can lead to a snowball effect, and could establish thousands of connections to MySQL in a matter of seconds.

So now I want to set an overachieving goal and see if we can achieve it.

Setup

For this I will use the following hardware:

Bare metal server provided by packet.net, instance size: c2.medium.x86
Physical Cores @ 2.2 GHz
(1 X AMD EPYC 7401P)
Memory: 64 GB of ECC RAM
Storage : INTEL® SSD DC S4500, 480GB

This is a server grade SATA SSD.

I will use five of these boxes, for the reason explained below. One box for the MySQL server and four boxes for client connections.

For the server I will use Percona Server for MySQL 8.0.13-4 with the thread pool plugin. The plugin will be required to support the thousands of connections.

Initial server setup

Network settings (Ansible format):

- { name: 'net.core.somaxconn', value: 32768 }
- { name: 'net.core.rmem_max', value: 134217728 }
- { name: 'net.core.wmem_max', value: 134217728 }
- { name: 'net.ipv4.tcp_rmem', value: '4096 87380 134217728' }
- { name: 'net.ipv4.tcp_wmem', value: '4096 87380 134217728' }
- { name: 'net.core.netdev_max_backlog', value: 300000 }
- { name: 'net.ipv4.tcp_moderate_rcvbuf', value: 1 }
- { name: 'net.ipv4.tcp_no_metrics_save', value: 1 }
- { name: 'net.ipv4.tcp_congestion_control', value: 'htcp' }
- { name: 'net.ipv4.tcp_mtu_probing', value: 1 }
- { name: 'net.ipv4.tcp_timestamps', value: 0 }
- { name: 'net.ipv4.tcp_sack', value: 0 }
- { name: 'net.ipv4.tcp_syncookies', value: 1 }
- { name: 'net.ipv4.tcp_max_syn_backlog', value: 4096 }
- { name: 'net.ipv4.tcp_mem', value: '50576   64768 98152' }
- { name: 'net.ipv4.ip_local_port_range', value: '4000 65000' }
- { name: 'net.ipv4.netdev_max_backlog', value: 2500 }
- { name: 'net.ipv4.tcp_tw_reuse', value: 1 }
- { name: 'net.ipv4.tcp_fin_timeout', value: 5 }

- { name: 'net.core.somaxconn', value: 32768 }

- { name: 'net.core.rmem_max', value: 134217728 }

- { name: 'net.core.wmem_max', value: 134217728 }

- { name: 'net.ipv4.tcp_rmem', value: '4096 87380 134217728' }

- { name: 'net.ipv4.tcp_wmem', value: '4096 87380 134217728' }

- { name: 'net.core.netdev_max_backlog', value: 300000 }

- { name: 'net.ipv4.tcp_moderate_rcvbuf', value: 1 }

- { name: 'net.ipv4.tcp_no_metrics_save', value: 1 }

- { name: 'net.ipv4.tcp_congestion_control', value: 'htcp' }

- { name: 'net.ipv4.tcp_mtu_probing', value: 1 }

- { name: 'net.ipv4.tcp_timestamps', value: 0 }

- { name: 'net.ipv4.tcp_sack', value: 0 }

- { name: 'net.ipv4.tcp_syncookies', value: 1 }

- { name: 'net.ipv4.tcp_max_syn_backlog', value: 4096 }

- { name: 'net.ipv4.tcp_mem', value: '50576 64768 98152' }

- { name: 'net.ipv4.ip_local_port_range', value: '4000 65000' }

- { name: 'net.ipv4.netdev_max_backlog', value: 2500 }

- { name: 'net.ipv4.tcp_tw_reuse', value: 1 }

- { name: 'net.ipv4.tcp_fin_timeout', value: 5 }

These are the typical settings recommended for 10Gb networks and high concurrent workloads.

Limits settings for systemd:

[Service]
LimitNOFILE=1000000
LimitNPROC=500000

[Service]

LimitNOFILE=1000000

LimitNPROC=500000

And the relevant setting for MySQL in my.cnf:

back_log=3500
max_connections=110000

1 2	back_log=3500 max_connections=110000

For the client I will use sysbench version 0.5 and not 1.0.x, for the reasons explained below.

The workload is sysbench --test=sysbench/tests/db/select.lua --mysql-host=139.178.82.47 --mysql-user=sbtest --mysql-password=sbtest --oltp-tables-count=10 --report-interval=1 --num-threads=10000 --max-time=300 --max-requests=0 --oltp-table-size=10000000 --rand-type=uniform --rand-init=on run

Step 1. 10,000 connections

This one is very easy, as there is not much to do to achieve this. We can do this with only one client. But you may face the following error on the client side:

FATAL: error 2004: Can't create TCP/IP socket (24)

This is caused by the open file limit, which is also a limit of TCP/IP sockets. This can be fixed by setting ulimit -n 100000 on the client.

The performance we observe:

[  26s] threads: 10000, tps: 0.00, reads: 33367.48, writes: 0.00, response time: 3681.42ms (95%), errors: 0.00, reconnects:  0.00
[  27s] threads: 10000, tps: 0.00, reads: 33289.74, writes: 0.00, response time: 3690.25ms (95%), errors: 0.00, reconnects:  0.00

1 2	[ 26s] threads: 10000, tps: 0.00, reads: 33367.48, writes: 0.00, response time: 3681.42ms (95%), errors: 0.00, reconnects: 0.00 [ 27s] threads: 10000, tps: 0.00, reads: 33289.74, writes: 0.00, response time: 3690.25ms (95%), errors: 0.00, reconnects: 0.00

Step 2. 25,000 connections

With 25,000 connections, we hit an error on MySQL side:

Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug

If you try to lookup information on this error you might find the following article.

But it does not help in our case, as we have all limits set high enough:

cat /proc/`pidof mysqld`/limits
Limit                     Soft Limit Hard Limit           Units
Max cpu time              unlimited  unlimited            seconds
Max file size             unlimited  unlimited            bytes
Max data size             unlimited  unlimited            bytes
Max stack size            8388608    unlimited            bytes
Max core file size        0          unlimited            bytes
Max resident set          unlimited  unlimited            bytes
Max processes             500000     500000               processes
Max open files            1000000    1000000              files
Max locked memory         16777216   16777216             bytes
Max address space         unlimited  unlimited            bytes
Max file locks            unlimited  unlimited            locks
Max pending signals       255051     255051               signals
Max msgqueue size         819200     819200               bytes
Max nice priority         0          0
Max realtime priority     0          0
Max realtime timeout      unlimited unlimited            us

cat /proc/`pidof mysqld`/limits

Limit Soft Limit Hard Limit Units

Max cpu time unlimited unlimited seconds

Max file size unlimited unlimited bytes

Max data size unlimited unlimited bytes

Max stack size 8388608 unlimited bytes

Max core file size 0 unlimited bytes

Max resident set unlimited unlimited bytes

Max processes 500000 500000 processes

Max open files 1000000 1000000 files

Max locked memory 16777216 16777216 bytes

Max address space unlimited unlimited bytes

Max file locks unlimited unlimited locks

Max pending signals 255051 255051 signals

Max msgqueue size 819200 819200 bytes

Max nice priority 0 0

Max realtime priority 0 0

Max realtime timeout unlimited unlimited us

This is where we start using the thread pool feature: https://docs.percona.com/percona-server/8.0/threadpool.html

Add:

thread_handling=pool-of-threads

to the my.cnf and restart Percona Server

The results:

[   7s] threads: 25000, tps: 0.00, reads: 33332.57, writes: 0.00, response time: 974.56ms (95%), errors: 0.00, reconnects:  0.00
[   8s] threads: 25000, tps: 0.00, reads: 33187.01, writes: 0.00, response time: 979.24ms (95%), errors: 0.00, reconnects:  0.00

1 2	[ 7s] threads: 25000, tps: 0.00, reads: 33332.57, writes: 0.00, response time: 974.56ms (95%), errors: 0.00, reconnects: 0.00 [ 8s] threads: 25000, tps: 0.00, reads: 33187.01, writes: 0.00, response time: 979.24ms (95%), errors: 0.00, reconnects: 0.00

We have the same throughput, but actually the 95% response time has improved (thanks to the thread pool) from 3690 ms to 979 ms.

Step 3. 50,000 connections

This is where we encountered the biggest challenge. At first, trying to get 50,000 connections in sysbench we hit the following error:

FATAL: error 2003: Can't connect to MySQL server on '139.178.82.47' (99)

Error (99) is cryptic and it means: Cannot assign requested address.

It comes from the limit of ports an application can open. By default on my system it is

cat /proc/sys/net/ipv4/ip_local_port_range : 32768 60999

This says there are only 28,231 available ports — 60999 minus 32768 — or the limit of TCP connections you can establish from or to the given IP address.

You can extend this using a wider range, on both the client and the server:

echo 4000 65000 > /proc/sys/net/ipv4/ip_local_port_range

This will give us 61,000 connections, but this is very close to the limit for one IP address (maximal port is 65535). The key takeaway from here is that if we want more connections we need to allocate more IP addresses for MySQL server. In order to achieve 100,000 connections, I will use two IP addresses on the server running MySQL.

After sorting out the port ranges, we hit the following problem with sysbench:

sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 50000

FATAL: pthread_create() for thread #32352 failed. errno = 12 (Cannot allocate memory)

sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 50000

FATAL: pthread_create() for thread #32352 failed. errno = 12 (Cannot allocate memory)

In this case, it’s a problem with sysbench memory allocation (namely lua subsystem). Sysbench can allocate memory for only 32,351 connections. This is a problem which is even more severe in sysbench 1.0.x.

Sysbench 1.0.x limitation

Sysbench 1.0.x uses a different Lua JIT, which hits memory problems even with 4000 connections, so it is impossible to go over 4000 connection in sysbench 1.0.x

So it seems we hit a limit with sysbench sooner than with Percona Server. In order to use more connections, we need to use multiple sysbench clients, and if 32,351 connections is the limit for sysbench, we have to use at least four sysbench clients to get up to 100,000 connections.

For 50,000 connections I will use 2 servers (each running separate sysbench), each running 25,000 threads from sysbench.

The results for each sysbench looks like:

[  29s] threads: 25000, tps: 0.00, reads: 16794.09, writes: 0.00, response time: 1799.63ms (95%), errors: 0.00, reconnects:  0.00
[  30s] threads: 25000, tps: 0.00, reads: 16491.03, writes: 0.00, response time: 1800.70ms (95%), errors: 0.00, reconnects:  0.00

1 2	[ 29s] threads: 25000, tps: 0.00, reads: 16794.09, writes: 0.00, response time: 1799.63ms (95%), errors: 0.00, reconnects: 0.00 [ 30s] threads: 25000, tps: 0.00, reads: 16491.03, writes: 0.00, response time: 1800.70ms (95%), errors: 0.00, reconnects: 0.00

So we have about the same throughput (16794*2 = 33588 tps in total), however the 95% response time doubled. This is to be expected as we are using twice as many connections compared to the 25,000 connections benchmark.

Step 3. 75,000 connections

To achieve 75,000 connections we will use three servers with sysbench, each running 25,000 threads.

The results for each sysbench:

[ 157s] threads: 25000, tps: 0.00, reads: 11633.87, writes: 0.00, response time: 2651.76ms (95%), errors: 0.00, reconnects:  0.00
[ 158s] threads: 25000, tps: 0.00, reads: 10783.09, writes: 0.00, response time: 2601.44ms (95%), errors: 0.00, reconnects:  0.00

1 2	[ 157s] threads: 25000, tps: 0.00, reads: 11633.87, writes: 0.00, response time: 2651.76ms (95%), errors: 0.00, reconnects: 0.00 [ 158s] threads: 25000, tps: 0.00, reads: 10783.09, writes: 0.00, response time: 2601.44ms (95%), errors: 0.00, reconnects: 0.00

Step 4. 100,000 connections

There is nothing eventful to achieve75k and 100k connections. We just spin up an additional server and start sysbench. For 100,000 connections we need four servers for sysbench, each shows:

[ 101s] threads: 25000, tps: 0.00, reads: 8033.83, writes: 0.00, response time: 3320.21ms (95%), errors: 0.00, reconnects:  0.00
[ 102s] threads: 25000, tps: 0.00, reads: 8065.02, writes: 0.00, response time: 3405.77ms (95%), errors: 0.00, reconnects:  0.00

1 2	[ 101s] threads: 25000, tps: 0.00, reads: 8033.83, writes: 0.00, response time: 3320.21ms (95%), errors: 0.00, reconnects: 0.00 [ 102s] threads: 25000, tps: 0.00, reads: 8065.02, writes: 0.00, response time: 3405.77ms (95%), errors: 0.00, reconnects: 0.00

So we have the same throughput (8065*4=32260 tps in total) with 3405ms 95% response time.

A very important takeaway from this: with 100k connections and using a thread pool, the 95% response time is even better than for 10k connections without a thread pool. The thread pool allows Percona Server to manage resources more efficiently and provides better response times.

Conclusions

100k connections is quite achievable for MySQL, and I am sure we could go even further. There are three components to achieve this:

Thread pool in Percona Server
Proper tuning of network limits
Using multiple IP addresses on the server box (one IP address per approximately 60k connections)

Appendix: full my.cnf

[mysqld]
datadir {{ mysqldir }}
ssl=0

skip-log-bin
log-error=error.log

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
character_set_server=latin1
collation_server=latin1_swedish_ci
skip-character-set-client-handshake

innodb_undo_log_truncate=off

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=110000

# files
innodb_file_per_table
innodb_log_file_size=15G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size= 40G
innodb_buffer_pool_instances=8
innodb_log_buffer_size=64M

# tune
innodb_doublewrite= 1
innodb_thread_concurrency=0
innodb_flush_log_at_trx_commit= 0
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
innodb_lru_scan_depth=2048
innodb_page_cleaners=4
join_buffer_size=256K
sort_buffer_size=256K
innodb_use_native_aio=1
innodb_stats_persistent = 1

#innodb_spin_wait_delay=96
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 16
innodb_write_io_threads = 16
innodb_io_capacity=1500
innodb_io_capacity_max=2500
innodb_purge_threads=4
innodb_adaptive_hash_index=0
max_prepared_stmt_count=1000000
innodb_monitor_enable = '%'
performance_schema = ON

[mysqld]

datadir {{ mysqldir }}

ssl=0

skip-log-bin

log-error=error.log

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

character_set_server=latin1

collation_server=latin1_swedish_ci

skip-character-set-client-handshake

innodb_undo_log_truncate=off

# general

table_open_cache = 200000

table_open_cache_instances=64

back_log=3500

max_connections=110000

# files

innodb_file_per_table

innodb_log_file_size=15G

innodb_log_files_in_group=2

innodb_open_files=4000

# buffers

innodb_buffer_pool_size= 40G

innodb_buffer_pool_instances=8

innodb_log_buffer_size=64M

# tune

innodb_doublewrite= 1

innodb_thread_concurrency=0

innodb_flush_log_at_trx_commit= 0

innodb_flush_method=O_DIRECT_NO_FSYNC

innodb_max_dirty_pages_pct=90

innodb_max_dirty_pages_pct_lwm=10

innodb_lru_scan_depth=2048

innodb_page_cleaners=4

join_buffer_size=256K

sort_buffer_size=256K

innodb_use_native_aio=1

innodb_stats_persistent = 1

#innodb_spin_wait_delay=96

innodb_adaptive_flushing = 1

innodb_flush_neighbors = 0

innodb_read_io_threads = 16

innodb_write_io_threads = 16

innodb_io_capacity=1500

innodb_io_capacity_max=2500

innodb_purge_threads=4

innodb_adaptive_hash_index=0

max_prepared_stmt_count=1000000

innodb_monitor_enable = '%'

performance_schema = ON

Discuss on HackerNews

20 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Vardan

5 years ago

I think you missing in your final my.cnf following:
thread_handling=pool-of-threads

rautamiekka

5 years ago

Not being able to server 100k connections == useless database software.

And when MySQL (or rather, Percona and MariaDB) becomes useless, you enter The Dark Ages.

Lazuardi Nasution

5 years ago

Interesting, is this article mean that it cannot be achieved with MySQL version prior to 8?

vadimtk

Reply to Lazuardi Nasution

5 years ago

No, it means it cannot be achieved without thread pool plugin. Thread pool is available in earlier versions too.

Rudi

5 years ago

Mr. Vadimtk, which part should be changed i i use sever with 32GB of RAM and 100 computer client connected to Database Server.

John Haugeland

5 years ago

next time try jmeter or tsung instead of sysbench

Alexey Kopytov

5 years ago

For this number of connections you want to disable JIT in sysbench with the –luajit-cmd=off option. The option is available in 1.1 prereleases. In 1.0 the same could be achieved by adding a single line (“jit.off()”) to oltp_common.lua.

Alex

5 years ago

Excellent write-up! Thanks.

Adam

5 years ago

Any real life example of this case & solution? I am running a cloud infrastructure, having billions of http requests per day (peak thousands per second) and I never even come close to 500 concurrent connections (using 2 database replicas)

Lazuardi Nasution

Reply to Adam

5 years ago

Hi Adam, I don’t understand why you only get 500 concurrent connections if there is thousands HTTP request per second. My real live case was coming from single old E5-2640v2 processor where there is 8000 request per second (peak) with 400 concurrent connections to MySQL on the same server with web server. Do you use the same small resources?

Adam

Reply to Lazuardi Nasution

5 years ago

You are right. The DB is serving quick and small sets of rows also makes good use of innodb buffer pool and query cache. The indices and queries are also optimized so that looking up rows in a billion-rows table is fast as light

Yashada

5 years ago

Do you have the memory consumption metrics as the number of connections was stepped up? Curious about it.

graykingw

5 years ago

I think you may loss some CPU cores number: 24 Physical Cores @ 2.2 GHz

Vitaly

5 years ago

>. Some applications use a “re-connect and repeat if the query is too slow” technique, which is a terrible practice. It can >lead to a snowball effect, and could establish thousands of connections to MySQL in a matter of seconds.
IMHO, in such situation MySQL server becomes unusable because tons of *queries* running, before we’ll reach 100K connections.

vadimtk

Reply to Vitaly

5 years ago

Vitaly,

It really depends on the workload, but this is the point I wanted to highlight – in these cases thread pool would provide a protection for MySQL to not get overloaded.

Vitaly

Reply to vadimtk

5 years ago

I see, thank you.

Gurnish Anand

5 years ago

what were your settings for the thread-pool?

Vadim Tkachenko

Author

Reply to Gurnish Anand

5 years ago

Gurnish,

I’ve used all defaults . The default behavior will be to set number of threads == number of CPU in the system

Simon J Mudd

4 years ago

It might be interesting to show the memory usage (per client connection) and how it changes as the number of connected threads increases. MySQL documentation about expected memory use on a server has always been rather vague in this area so when you add more connections you have to be careful to avoid running the server out of memory. At the same time we want to use as much memory as reasonably possible for the buffer pool to optimise access times to data.
So even with the close to default settings you were using are there any surprises in memory usage and how does the memory usage change when going switching to using the thread pool?

Rohit Khurana

3 years ago

We are getting segmentation fault (core dump) after increasing thread to 4000

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL Challenge: 100k Connections

Setup

Initial server setup

Step 1. 10,000 connections

Step 2. 25,000 connections

Step 3. 50,000 connections

Sysbench 1.0.x limitation

Step 3. 75,000 connections

Step 4. 100,000 connections

Conclusions

Appendix: full my.cnf

Related

Related Blog Articles

RECOMMENDED ARTICLES

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL Challenge: 100k Connections

Setup

Initial server setup

Step 1. 10,000 connections

Step 2. 25,000 connections

Step 3. 50,000 connections

Sysbench 1.0.x limitation

Step 3. 75,000 connections

Step 4. 100,000 connections

Conclusions

Appendix: full my.cnf

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation