MySQL Query Performance: Not Just Indexes

In this blog post, I’ll look at whether optimizing indexing is always the key to improving MySQL query performance (spoiler, it isn’t).

As we look at MySQL query performance, our first concern is often whether a query is using the right indexes to retrieve the data. This is based on the assumption that finding the data is the most expensive operation – and the one you should focus on for MySQL query optimization. However, this is not always the case.

Let’s look at this query for illustration:

mysql> show create table tbl \G
*************************** 1. row ***************************
      Table: tbl
Create Table: CREATE TABLE `tbl` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `k` int(11) NOT NULL DEFAULT '0',
 `g` int(10) unsigned NOT NULL,
 PRIMARY KEY (`id`),
 KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 \G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: ALL
possible_keys: k_1
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 998490
    filtered: 50.00
       Extra: Using where; Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7;
+--------+----+
| g      | c  |
+--------+----+
|  28846 |  8 |
| 139660 |  8 |
| 153286 |  8 |
...
| 934984 |  8 |
+--------+----+

22 rows in set (6.80 sec)

mysql> show create table tbl \G

*************************** 1. row ***************************

Table: tbl

Create Table: CREATE TABLE `tbl` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`k` int(11) NOT NULL DEFAULT '0',

`g` int(10) unsigned NOT NULL,

PRIMARY KEY (`id`),

KEY `k_1` (`k`)

) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 \G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: tbl

partitions: NULL

type: ALL

possible_keys: k_1

key: NULL

key_len: NULL

ref: NULL

rows: 998490

filtered: 50.00

Extra: Using where; Using temporary; Using filesort

1 row in set, 1 warning (0.00 sec)

mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7;

+--------+----+

| g | c |

+--------+----+

| 28846 | 8 |

| 139660 | 8 |

| 153286 | 8 |

...

| 934984 | 8 |

+--------+----+

22 rows in set (6.80 sec)

Looking at this query, many might assume the main problem is that this query is doing a full table scan. One could wonder then, “Why does the MySQL optimizer not use index (k)?” (It is because the clause is not selective enough, by the way.) This thought might cause someone to force using the index, and get even worse performance:

mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7;
+--------+----+
| g      | c  |
+--------+----+
|  28846 |  8 |
| 139660 |  8 |
...
| 934984 |  8 |
+--------+----+
22 rows in set (9.37 sec)

mysql> select g,count(*) c from tbl force index(k) where k<1000000 group by g having c>7;

+--------+----+

| g | c |

+--------+----+

| 28846 | 8 |

| 139660 | 8 |

...

| 934984 | 8 |

+--------+----+

22 rows in set (9.37 sec)

Or someone might extend the index on (k) to (k,g) to be a covering index for this query. This won’t improve performance either:

mysql> alter table tbl drop key k_1, add key(k,g);
Query OK, 0 rows affected (5.35 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 \G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: tbl
  partitions: NULL
        type: range
possible_keys: k
         key: k
     key_len: 4
         ref: NULL
        rows: 499245
    filtered: 100.00
       Extra: Using where; Using index; Using temporary; Using filesort
1 row in set, 1 warning (0.00 sec)

mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7;
+--------+----+
| g      | c  |
+--------+----+
|  28846 |  8 |
| 139660 |  8 |
...
| 915436 |  8 |
| 934984 |  8 |
+--------+----+
22 rows in set (6.80 sec)

mysql> alter table tbl drop key k_1, add key(k,g);

Query OK, 0 rows affected (5.35 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> explain select g,count(*) c from tbl where k<1000000 group by g having c>7 \G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: tbl

partitions: NULL

type: range

possible_keys: k

key: k

key_len: 4

ref: NULL

rows: 499245

filtered: 100.00

Extra: Using where; Using index; Using temporary; Using filesort

1 row in set, 1 warning (0.00 sec)

mysql> select g,count(*) c from tbl where k<1000000 group by g having c>7;

+--------+----+

| g | c |

+--------+----+

| 28846 | 8 |

| 139660 | 8 |

...

| 915436 | 8 |

| 934984 | 8 |

+--------+----+

22 rows in set (6.80 sec)

This wasted effort is all due to focusing on the wrong thing: figuring out how can we find all the rows that match k<1000000 as soon as possible. This is not the problem in this case. In fact, the query that touches all the same columns but doesn’t use GROUP BY runs 10 times as fast:

mysql> select sum(g) from tbl where k<1000000;
+--------------+
| sum(g)       |
+--------------+
| 500383719481 |
+--------------+
1 row in set (0.68 sec)

mysql> select sum(g) from tbl where k<1000000;

+--------------+

| sum(g) |

+--------------+

| 500383719481 |

+--------------+

1 row in set (0.68 sec)

For this particular query, whether or not it is using the index for lookup should not be the main question. Instead, we should look at how to optimize GROUP BY – which is responsible for some 90% of the query response time.

In my next blog post, I will write about four ways to execute the MySQL GROUP BY operation to provide further help on optimizing these queries.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Raffa Cabo Frio

6 years ago

I liked the article. It’s a topic that interests me a lot. But it seemed to lack a conclusion. 🙁

kabilesh

6 years ago

very excited about next blog

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL Query Performance: Not Just Indexes

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL Query Performance: Not Just Indexes

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation