This blog was originally published in January 2022 and was updated in July 2023.
Working with hundreds of different customers, I often face similar problems around running queries. One very common problem when trying to optimize a database environment is index usage. A query that cannot use an index is usually a long-running one, consuming more memory or triggering more disk iops.
A very common case is when a query uses a filter condition against a column that is involved in some kind of functional expression. An index on that column can not be used.
Starting from MySQL 8.0.13, functional indexes are supported. In this article, I will first explain an overview of indexes in MySQL and cover the MySQL CREATE INDEX before diving into showing what functional indexes are and how they work.
Introduction to MySQL Indexes
Indexes in MySQL are database structures used to optimize data retrieval speed and efficiency. They are a roadmap that speeds up the process of finding specific rows in a large table. By creating indexes on columns, MySQL creates a separate data structure that holds a sorted version of the indexed column’s values, enabling the database engine to quickly locate rows that match specific queries by reducing the need for full-table scans. But, while indexes greatly enhance read operations, they can also impact the speed of write operations, as the indexes need to be updated whenever data changes. As such, index design must strike a balance between improved query speed and efficient data modification.
What is an Index?
An index in MySQL is made up of several elements:
Index columns – These are the table columns on which the index is created. Each value in the index column is associated with the primary key of the related row in the table.
Index key – Representing a sorted data structure, the index key incorporates values from the index columns and pointers to the relevant table rows. This structure makes for faster data access.
The relationship between the index and the underlying data is based on a hierarchical structure that allows the database engine to quickly find the desired rows based on the index key’s sorted values.
The process of crafting indexes involves a trade-off between improved data retrieval speed and impact on data modification operations. While indexes enhance query performance by reducing the need for full-table scans, some things need consideration:
- Choosing Appropriate Columns: Optimal column selection for indexing holds significance. Columns frequently used in queries for filtering or sorting are solid candidates. Over-indexing on too many columns could lead to unnecessary overhead.
- Balancing Index Size and Query Performance: The size of an index impacts its performance; larger indexes may slow down query processing. As such, index design should consider the size of indexed columns and the potential benefits they offer.
- Trade-offs: While indexes accelerate read operations, they can slow down write operations, as data modifications require index updates. A careful approach is required to ensure optimal performance for both read and write operations.
An index within MySQL is a powerful tool that considerably improves data retrieval speed via a structured way to access specific rows within a table. However, DBAs must strike a balance between query performance, index size, and the overall efficiency of operations.
Need help managing complex database environments? Read the eBook from Percona to learn how!
MySQL CREATE INDEX
The MySQL CREATE INDEX
statement syntax is for creating various types of indexes to enhance query performance. Here’s how to create different types of indexes, along with specifying index columns and names:
1 2 | CREATE [UNIQUE] INDEX index_name ON table_name (column1 [, column2, ...]); |
Explanations:
– CREATE INDEX
: Initiates the creation of an index.
– UNIQUE
(optional): Specifies that the index values must be unique across the table.
– index_name
: Specifies the name of the index being created.
– table_name
: The name of the table on which the index is being created.
– (column1 [, column2, ...])
: Lists the columns that the index will be based on. Multiple columns can be specified to create composite indexes.
To create different types of indexes:
Non-Unique Index:
1 | CREATE INDEX idx_column ON table_name (column_name); |
Unique Index:
1 | CREATE UNIQUE INDEX idx_unique_column ON table_name (column_name); |
Composite Index (Index on Multiple Columns):
1 | CREATE INDEX idx_multi_columns ON table_name (column1, column2); |
Prefix Index (Index on First N Characters of a Column):
1 | CREATE INDEX idx_prefix ON table_name (column_name(N)); |
Full-Text Index (For Textual Data Searches):
1 | CREATE FULLTEXT INDEX idx_fulltext_column ON table_name (column_name); |
Spatial Index (For Spatial Data Types):
1 | CREATE SPATIAL INDEX idx_spatial_column ON table_name (column_name); |
To specify index columns and names:
- List the desired column(s) within parentheses after the
ON
clause. - For composite indexes, separate column names with commas.
- Provide a unique
index_name
to identify the index.
Example of creating a composite index:
1 | CREATE INDEX idx_full_name ON employees (first_name, last_name); |
Example of creating a unique index:
1 | CREATE UNIQUE INDEX idx_unique_email ON users (email); |
By identifying the particular columns utilized in filtering, joining, sorting, and text-based search operations and employing the CREATE INDEX statement to generate suitable indexes, you can improve the query performance of your MySQL database.
Want to learn more about MySQL indexes? Check out this blog here!
The Well-Known Indexing Problem
As already mentioned, a very common problem with index usage is when you have a filter condition against one or more columns involved in some kind of functional expression.
Let’s see a simple example.
You have a table called products containing the details of your products, including a create_time TIMESTAMP column. If you would like to calculate the average price of your products in a specific month, you could do the following:
1 2 3 4 5 6 | mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+ |
The query returns the right value, but take a look at the EXPLAIN:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 99015 filtered: 100.00 Extra: Using where |
The query triggers a full scan of the table. Let’s create an index on create_time and check again:
1 2 3 | mysql> ALTER TABLE products ADD INDEX(create_time); Query OK, 0 rows affected (0.71 sec) Records: 0 Duplicates: 0 Warnings: 0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> explain SELECT AVG(price) FROM products WHERE MONTH(create_time)=10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 99015 filtered: 100.00 Extra: Using where |
A full scan again. The index we have created is not effective. Indeed any time an indexed column is involved in a function, the index can not be used.
To optimize the query, the workaround is rewriting it differently to isolate the indexed column from the function.
Let’s test the following equivalent query:
1 2 3 4 5 6 | mysql> SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> EXPLAIN SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: range possible_keys: create_time key: create_time key_len: 5 ref: NULL rows: 182 filtered: 100.00 Extra: Using index condition |
Cool, now the index is used. Then rewriting the query was the typical suggestion.
Quite a simple solution, but not all the times it was possible to change the application code for many valid reasons. So, what to do then?
What is a Function-based Index?
A function-based index is an indexing technique for databases that allows indexing the results of functions applied to columns rather than the traditional index method that directly stores column values. Instead of indexing raw column data, the function-based index stores function-generated results, which helps to optimize queries and data retrieval efficiency involving complex expressions and functional operations.
Function-based indexes are very useful in scenarios where data retrieval entails complex computations like geographic calculations, date manipulations, or text transformations. By indexing the results of functions applied to columns, function-based indexes are valuable when working with large datasets, as they can enhance the efficiency of aggregations and complex analytical queries.
How Do MySQL 8.0 Functional Indexes Work?
Starting from version 8.0.13, MySQL supports functional indexes. Instead of indexing a simple column, you can create the index on the result of any function applied to a column or multiple columns.
Long story short, now you can do the following:
1 2 3 | mysql> ALTER TABLE products ADD INDEX((MONTH(create_time))); Query OK, 0 rows affected (0.74 sec) Records: 0 Duplicates: 0 Warnings: 0 |
Be aware of the double parentheses. The syntax is correct since the expression must be enclosed within parentheses to distinguish it from columns or column prefixes.
Indeed the following returns an error:
1 2 | mysql> ALTER TABLE products ADD INDEX(MONTH(create_time)); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'create_time))' at line 1 |
Let’s check now our original query and see what happens to the EXPLAIN
1 2 3 4 5 6 | mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ref possible_keys: functional_index key: functional_index key_len: 5 ref: const rows: 182 filtered: 100.00 Extra: NULL |
The query is no longer a full scan and runs faster. The functional_index has been used, with only 182 rows examined. Awesome.
Thanks to the functional index we are no longer forced to rewrite the query.
Which Functional Indexes are Permitted
We have seen an example involving a simple function applied to a column, but you are granted to create more complex indexes.
A functional index may contain any kind of expressions, not only a single function. The following patterns are valid functional indexes:
INDEX( ( col1 + col2 ) )
INDEX( ( FUNC(col1) + col2 – col3 ) )
You can use ASC or DESC as well:
INDEX( ( MONTH(col1) ) DESC )
You can have multiple functional parts, each one included in parentheses:
INDEX( ( col1 + col2 ), ( FUNC(col2) ) )
You can mix functional with nonfunctional parts:
INDEX( (FUNC(col1)), col2, (col2 + col3), col4 )
There are also limitations you should be aware of:
- A functional key can not contain a single column. The following is not permitted:
INDEX( (col1), (col2) ) - The primary key can not include a functional key part
- The foreign key can not include a functional key part
- SPATIAL and FULLTEXT indexes can not include functional key parts
- A functional key part can not refer to a column prefix
At last, remember that the functional index is useful only to optimize the query that uses the exact same expression. An index created with nonfunctional parts can be used instead to solve multiple different queries.
For example, the following conditions can not rely on the functional index we have created:
WHERE YEAR(create_time) = 2019
WHERE create_time > ‘2019-10-01’
WHERE create_time BETWEEN ‘2019-10-01’ AND ‘2019-11-01’
WHERE MONTH(create_time+INTERVAL 1 YEAR)
All these will trigger a full scan.
Functional Index Internal
The functional indexes are implemented as hidden virtual generated columns. For this reason, you can emulate the same behavior even on MySQL 5.7 by explicitly creating the virtual column. We can test this, starting by dropping the indexes we have created so far.
1 2 3 4 5 6 7 8 9 10 11 12 | mysql> SHOW CREATE TABLE productsG *************************** 1. row *************************** Table: products Create Table: CREATE TABLE `products` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `description` longtext, `price` decimal(8,2) DEFAULT NULL, `create_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `create_time` (`create_time`), KEY `functional_index` ((month(`create_time`))) ) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
1 2 | mysql> ALTER TABLE products DROP INDEX `create_time`, DROP INDEX `functional_index`; Query OK, 0 rows affected (0.03 sec) |
We can try now to create the virtual generated column:
1 2 | mysql> ALTER TABLE products ADD COLUMN create_month TINYINT GENERATED ALWAYS AS (MONTH(create_time)) VIRTUAL; Query OK, 0 rows affected (0.04 sec) |
Create the index on the virtual column:
1 2 | mysql> ALTER TABLE products ADD INDEX(create_month); Query OK, 0 rows affected (0.55 sec) |
1 2 3 4 5 6 7 8 9 10 11 12 | mysql> SHOW CREATE TABLE productsG *************************** 1. row *************************** Table: products Create Table: CREATE TABLE `products` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `description` longtext, `price` decimal(8,2) DEFAULT NULL, `create_time` timestamp NULL DEFAULT NULL, `create_month` tinyint GENERATED ALWAYS AS (month(`create_time`)) VIRTUAL, PRIMARY KEY (`id`), KEY `create_month` (`create_month`) ) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
We can now try our original query. We expect to see the same behavior as the functional index.
1 2 3 4 5 6 | mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+ |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ref possible_keys: create_month key: create_month key_len: 2 ref: const rows: 182 filtered: 100.00 Extra: NULL |
Indeed, the behavior is the same. The index on the virtual column can be used, and the query is optimized.
The good news is that you can use this workaround to emulate a functional index even on 5.7, getting the same benefits. The advantage of MySQL 8.0 is that it is completely transparent; no need to create the virtual column.
Since the functional index is implemented as a hidden virtual column, there is no additional space needed for the data and only the index space will be added to the table.
By the way, this is the same technique used for creating indexes on JSON documents’ fields.
Limitations of Functional Indexes
In situations where indexed functions may rarely be used in queries, using functional indexes may not be the best choice because the maintenance overhead could outweigh any potential performance benefits. In these scenarios, different approaches may be warranted, including:
- You can optimize queries using query rewriting or using materialized views. By rewriting queries to minimize the requirement of using complex functions and using materialized views to store pre-computed results, query performance can be improved without relying on functional indexes.
- Precomputing and storing function outputs as columns within the table enables traditional indexing on these columns. This minimizes the need for function-based indexes while retaining the advantages of indexed computations.
So, while functional indexes can be great tools for optimizing complex queries, it’s important to know the limitations and potential downsides of using them.
Upgrade to MySQL 8.0, or Get EOL Support for MySQL 5.7 with Percona
The functional index support is an interesting improvement you can find in MySQL 8.0. Some of the queries that required rewriting to get optimized don’t require that anymore. Just remember that only the queries having the same filter pattern can rely on the functional index. Then you need to create additional indexes or other functional indexes to improve other search patterns.
The same feature can be implemented on MySQL 5.7 with the explicit creation of a virtual generated column and the index.
For more detailed information, read the following page:
https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-functional-key-parts
The end of life (EOL) for MySQL 5.7 is scheduled for October 21, 2023, signifying the end of updates and, crucially, the discontinuation of security fixes for identified vulnerabilities. If you’re uncertain about transitioning to MySQL 8.0 or feel pressed for time, Percona provides comprehensive consultative and operational support for MySQL 5.7 for a period of up to three years after its EOL. This support includes operational assistance, ensuring continued security coverage, regular critical updates for Critical and High Severity CVEs, and guidance for those hesitant to migrate to MySQL 8.0.
Move to MySQL 8.0 Get Post-EOL Support for MySQL 5.7
FAQs
What are indexes in MySQL, and why are they important for database performance?
Indexes in MySQL are organized data structures that enable quicker data retrieval by offering optimized access routes to specific rows within a table. They can significantly enhance query speed and reduce the need for full-table scans.
How does the MySQL CREATE INDEX statement work, and what are the steps to create an index?
The MySQL CREATE INDEX statement defines an organized structure that speeds up data access in a table. To create an index, you specify the index columns, and MySQL automatically organizes and maintains the index data for improved query performance.
How do indexes affect database storage and resource usage in MySQL 8.0?
Indexes in MySQL 8.0 do use additional database disk space to store index data, but they also reduce resource usage by improving query performance.
What considerations should be taken into account when choosing columns for indexing?
When choosing columns for indexing, factors to consider involve picking frequently queried columns, maintaining a balanced index size for optimal performance, and avoiding over-indexing to prevent unnecessary overhead on data modification operations.