About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

Now I want to find all flights on Sundays (in 2013) and group by airline.

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

Now the big disappointment: all operations with generated columns are not online right now.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
JImmy Yang

Alex, thanks for the nice write up!

I would like to add that one big difference for this implementation (comparing to MariaDB) is the virtual columns are really “not materialized”, so they are not present in the table data (cluster index). So when user add/drop such virtual column, it is essentially a few system table updates (no need to rebuild table) and really fast. In such, there is no need to make this operation(add or drop virtual column) online. Since even for any current “online” DDL operations, the metadata update phase are not online (locked by metadata lock).

However, it is true that adding a key on it is not online in the LAB release. Such limitation is already lifted in our current implementation.

Peter Zaitsev

I’m glad time has been taken to do the implementation of the “right” way with virtual columns being really virtual and not stored anywhere unless one indexes them. This is a great feature which is long overdue in MySQL and it is great MariaDB has an implementation which works for some users for a while.

I also would point out if adding the column just modifies couple of rows in the index table I would run it with ONLINE modifier as it is online from user point of view, not give the error

Number of Ops people add ONLINE to all their DDL to make sure there is never blocking DDL run in production and causing downtime, so refusing to run it with this modifier can cause confusion.

Matthew Boehm

Your example contradicts the documentation provided by the MySQL Server Team regarding generated columns:

http://mysqlserverteam.com/generated-columns-in-mysql-5-7-5/

“… but only a stored Generated Column can be be a part of an index.”

I guess this was something recent fixed/improved?

JImmy Yang

Peter:

> I also would point out if adding the column just modifies couple of rows
> in the index table I would run it with ONLINE modifier as it is online from
> user point of view, not give the error

> Number of Ops people add ONLINE to all their DDL to make sure there
> is never blocking DDL run in production and causing downtime, so refusing
> to run it with this modifier can cause confusion.

This is a good point! I think this makes sense. I will propose such change (allowing “online” modifier for adding/dropping virtual columns), and if strong reason against it, we will just do it.

JImmy Yang

Matthew:

> I guess this was something recent fixed/improved?

This feature is only part of “MySQL 5.7.7-labs-json version” release, which means it is not yet into official 5.7 releases. Once it goes in, the docuemtation will change accordingly.

Valerie Parham-Thompson

Ah, Jimmy answered my question, “How is this implemented differently than in Maria?”

I would be curious to try, though, given that difference, if operations are faster in the Maria version or the MySQL version.

Nice writeup, looking forward to 5.7.

Aurimas Mikalauskas

This is probably going to be one of my top 3 favourite features of MySQL 5.7. Thanks for sharing some details on this, Alexander!

Matthew Boehm

Mixing indexes on stored and virtual is support in Percona Server 5.7.10. Not necessary to do the “dummy” column any more.

TahorSuiJuris

Excellent! Exactly what I was seeking.

Using MySQL Workbench to modify column with generated expressions, how would one modify a MySQL table to include a generated calculated column?

https://i.imgur.com/u9M9X2b.png

TahorSuiJuris

Am using MySQL Workbench v8.0 does this not accomodate the solution you shared above,

“The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem.”

https://i.imgur.com/NuajW3X.png