Generated (Virtual) Columns in MySQL 5.7 (labs)

About 2 weeks ago Oracle published the MySQL 5.7.7-labs-json version which includes a very interesting feature called “Generated columns” (also know as Virtual or Computed columns). MariaDB has a similar feature as well: Virtual (Computed) Columns.

The idea is very simple: if we store a column

`FlightDate` date

1	`FlightDate` date

in our table we may want to filter or group by year(FlightDate), month(FlightDate) or even dayofweek(FlightDate). The “brute-force” approach: use the above Date and Time MySQL functions in the query; however it will prevent MySQL from using an index (see below). Generated columns will allow you to declare a “Virtual”, non-stored column which is computed based on the existing field; you can then add index on that virtual column, so the query will use that index.

Here is the original example:

CREATE TABLE `ontime` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB

CREATE TABLE `ontime` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`FlightDate` date DEFAULT NULL,

`Carrier` char(2) DEFAULT NULL,

`OriginAirportID` int(11) DEFAULT NULL,

`OriginCityName` varchar(100) DEFAULT NULL,

`OriginState` char(2) DEFAULT NULL,

`DestAirportID` int(11) DEFAULT NULL,

`DestCityName` varchar(100) DEFAULT NULL,

`DestState` char(2) DEFAULT NULL,

`DepDelayMinutes` int(11) DEFAULT NULL,

`ArrDelayMinutes` int(11) DEFAULT NULL,

`Cancelled` tinyint(4) DEFAULT NULL,

`CancellationCode` char(1) DEFAULT NULL,

`Diverted` tinyint(4) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `FlightDate` (`FlightDate`)

) ENGINE=InnoDB

Now I want to find all flights on Sundays (in 2013) and group by airline.

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm 
	    WHERE dayofweek(FlightDate) = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 151253427
        Extra: Using where; Using temporary; Using filesort

Results:
32 rows in set (1 min 57.93 sec)

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm

WHERE dayofweek(FlightDate) = 7 group by carrier

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: ontime_sm

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 151253427

Extra: Using where; Using temporary; Using filesort

Results:

32 rows in set (1 min 57.93 sec)

The problem here is: MySQL will not be able to use index when you use a function which will “extract” something from the column. The standard approach is to “materialize” the column:

ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

1	ALTER TABLE ontime_sm ADD Flight_dayofweek tinyint NOT NULL;

Then we will need to load data into that by running “UPDATE ontime_sm SET Flight_dayofweek = dayofweek(flight_date)”. After that we will also need to change the application to support that additional column or use a trigger to update the column. Here is the trigger example:

CREATE DEFINER = CURRENT_USER 
TRIGGER ontime_insert
BEFORE INSERT ON ontime_sm_triggers 
FOR EACH ROW
SET 
NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

CREATE DEFINER = CURRENT_USER

TRIGGER ontime_insert

BEFORE INSERT ON ontime_sm_triggers

FOR EACH ROW

SET

NEW.Flight_dayofweek = dayofweek(NEW.FlightDate);

One problem with the trigger is that it is slow. In my simple example it took almost 2x slower to “copy” the table using “insert into ontime_sm_copy select * from ontime_sm” when the trigger was on.

The Generated Columns from MySQL 5.7.7-labs-json version (only this version supports it on the time of writing) solves this problem. Here is the example which demonstrate its use:

CREATE TABLE `ontime_sm_virtual` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `FlightDate` date DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `OriginAirportID` int(11) DEFAULT NULL,
  `OriginCityName` varchar(100) DEFAULT NULL,
  `OriginState` char(2) DEFAULT NULL,
  `DestAirportID` int(11) DEFAULT NULL,
  `DestCityName` varchar(100) DEFAULT NULL,
  `DestState` char(2) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
  `ArrDelayMinutes` int(11) DEFAULT NULL,
  `Cancelled` tinyint(4) DEFAULT NULL,
  `CancellationCode` char(1) DEFAULT NULL,
  `Diverted` tinyint(4) DEFAULT NULL,
  `CRSElapsedTime` int(11) DEFAULT NULL,
  `ActualElapsedTime` int(11) DEFAULT NULL,
  `AirTime` int(11) DEFAULT NULL,
  `Flights` int(11) DEFAULT NULL,
  `Distance` int(11) DEFAULT NULL,
  `Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,
  PRIMARY KEY (`id`),
  KEY `Flight_dayofweek` (`Flight_dayofweek`),
) ENGINE=InnoDB

CREATE TABLE `ontime_sm_virtual` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`FlightDate` date DEFAULT NULL,

`Carrier` char(2) DEFAULT NULL,

`OriginAirportID` int(11) DEFAULT NULL,

`OriginCityName` varchar(100) DEFAULT NULL,

`OriginState` char(2) DEFAULT NULL,

`DestAirportID` int(11) DEFAULT NULL,

`DestCityName` varchar(100) DEFAULT NULL,

`DestState` char(2) DEFAULT NULL,

`DepDelayMinutes` int(11) DEFAULT NULL,

`ArrDelayMinutes` int(11) DEFAULT NULL,

`Cancelled` tinyint(4) DEFAULT NULL,

`CancellationCode` char(1) DEFAULT NULL,

`Diverted` tinyint(4) DEFAULT NULL,

`CRSElapsedTime` int(11) DEFAULT NULL,

`ActualElapsedTime` int(11) DEFAULT NULL,

`AirTime` int(11) DEFAULT NULL,

`Flights` int(11) DEFAULT NULL,

`Distance` int(11) DEFAULT NULL,

`Flight_dayofweek` tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL,

PRIMARY KEY (`id`),

KEY `Flight_dayofweek` (`Flight_dayofweek`),

) ENGINE=InnoDB

Here we add Flight_dayofweek tinyint(4) GENERATED ALWAYS AS (dayofweek(FlightDate)) VIRTUAL column and index it.

Now MySQL can use this index:

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual  WHERE Flight_dayofweek = 7 group by carrier
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek
          key: Flight_dayofweek
      key_len: 2
          ref: const
         rows: 165409
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort

mysql> EXPLAIN SELECT carrier, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by carrier

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: ontime_sm_virtual

partitions: NULL

type: ref

possible_keys: Flight_dayofweek

key: Flight_dayofweek

key_len: 2

ref: const

rows: 165409

filtered: 100.00

Extra: Using where; Using temporary; Using filesort

To further increase performance of this query we want to add a combined index on (Flight_dayofweek, carrier) so MySQL will avoid creating temporary table. However it is not currently supported:

mysql> alter table ontime_sm_virtual 
       add key comb(Flight_dayofweek, carrier);
ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

mysql> alter table ontime_sm_virtual

add key comb(Flight_dayofweek, carrier);

ERROR 3105 (HY000): 'Virtual generated column combines with other columns to be indexed together' is not supported for generated columns.

We can add an index on 2 generated columns thou, which is good. So a trick here will be to create a “dummy” virtual column on “carrier” and index 2 of those columns:

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;
Query OK, 0 rows affected (0.43 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);
Query OK, 999999 rows affected (36.79 sec)
Records: 999999  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_sm_virtual
   partitions: NULL
         type: ref
possible_keys: Flight_dayofweek,comb
          key: comb
      key_len: 2
          ref: const
         rows: 141223
     filtered: 100.00
        Extra: Using where; Using index

mysql> alter table ontime_sm_virtual add Carrier_virtual char(2) GENERATED ALWAYS AS (Carrier) VIRTUAL;

Query OK, 0 rows affected (0.43 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table ontime_sm_virtual add key comb(Flight_dayofweek, Carrier_virtual);

Query OK, 999999 rows affected (36.79 sec)

Records: 999999 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT Carrier_virtual, count(*) FROM ontime_sm_virtual WHERE Flight_dayofweek = 7 group by Carrier_virtual

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: ontime_sm_virtual

partitions: NULL

type: ref

possible_keys: Flight_dayofweek,comb

key: comb

key_len: 2

ref: const

rows: 141223

filtered: 100.00

Extra: Using where; Using index

Now MySQL will use an index and completely avoid the filesort.

The last, but not the least: loading data to the table with generated columns is significantly faster compared to loading it into the same table with triggers:

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;
Query OK, 999999 rows affected (27.86 sec)
Records: 999999  Duplicates: 0  Warnings: 0

mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName,  OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;                                   
Query OK, 999999 rows affected (16.29 sec)
Records: 999999  Duplicates: 0  Warnings: 0

mysql> insert into ontime_sm_triggers (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;

Query OK, 999999 rows affected (27.86 sec)

Records: 999999 Duplicates: 0 Warnings: 0

mysql> insert into ontime_sm_virtual (id, YearD, FlightDate, Carrier, OriginAirportID, OriginCityName, OriginState, DestAirportID, DestCityName, DestState, DepDelayMinutes, ArrDelayMinutes, Cancelled, CancellationCode,Diverted, CRSElapsedTime, ActualElapsedTime, AirTime, Flights, Distance) select * from ontime_sm;

Query OK, 999999 rows affected (16.29 sec)

Records: 999999 Duplicates: 0 Warnings: 0

Now the big disappointment: all operations with generated columns are not online right now.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;
ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

mysql> alter table ontime_sm_virtual add Flight_year year GENERATED ALWAYS AS (year(FlightDate)) VIRTUAL, add key (Flight_year), lock=NONE;

ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

mysql> alter table ontime_sm_virtual add key (Flight_year), lock=NONE;

ERROR 1846 (0A000): LOCK=NONE is not supported. Reason: '%s' is not supported for generated columns.. Try LOCK=SHARED.

I hope it will be fixed in the future releases.

Conclusion

Generated columns feature is very useful. Imagine an ability to add a column + index for any “logical” piece of data without actually duplicating the data. And this can be any function: date/time/calendar, text (extract(), reverse(), metaphone()) or anything else. I hope this feature will be available in MySQL 5.7 GA. Finally, I wish adding a generated column and index can be online (it is not right now).

More information:

Original blog post (older implementation) by MySQL server team about Generated Columns in MySQL 5.7.5
New implementation of the virtual column: WL#8114: Don’t store virtual generated columns in database

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

JImmy Yang

9 years ago

Alex, thanks for the nice write up!

I would like to add that one big difference for this implementation (comparing to MariaDB) is the virtual columns are really “not materialized”, so they are not present in the table data (cluster index). So when user add/drop such virtual column, it is essentially a few system table updates (no need to rebuild table) and really fast. In such, there is no need to make this operation(add or drop virtual column) online. Since even for any current “online” DDL operations, the metadata update phase are not online (locked by metadata lock).

However, it is true that adding a key on it is not online in the LAB release. Such limitation is already lifted in our current implementation.

Peter Zaitsev

Admin

9 years ago

I’m glad time has been taken to do the implementation of the “right” way with virtual columns being really virtual and not stored anywhere unless one indexes them. This is a great feature which is long overdue in MySQL and it is great MariaDB has an implementation which works for some users for a while.

I also would point out if adding the column just modifies couple of rows in the index table I would run it with ONLINE modifier as it is online from user point of view, not give the error

Number of Ops people add ONLINE to all their DDL to make sure there is never blocking DDL run in production and causing downtime, so refusing to run it with this modifier can cause confusion.

Matthew Boehm

Editor

9 years ago

Your example contradicts the documentation provided by the MySQL Server Team regarding generated columns:

http://mysqlserverteam.com/generated-columns-in-mysql-5-7-5/

“… but only a stored Generated Column can be be a part of an index.”

I guess this was something recent fixed/improved?

JImmy Yang

9 years ago

Peter:

> I also would point out if adding the column just modifies couple of rows
> in the index table I would run it with ONLINE modifier as it is online from
> user point of view, not give the error

> Number of Ops people add ONLINE to all their DDL to make sure there
> is never blocking DDL run in production and causing downtime, so refusing
> to run it with this modifier can cause confusion.

This is a good point! I think this makes sense. I will propose such change (allowing “online” modifier for adding/dropping virtual columns), and if strong reason against it, we will just do it.

JImmy Yang

9 years ago

Matthew:

> I guess this was something recent fixed/improved?

This feature is only part of “MySQL 5.7.7-labs-json version” release, which means it is not yet into official 5.7 releases. Once it goes in, the docuemtation will change accordingly.

Alexander Rubin

Author

9 years ago

Jimmy, thank you!
Is there any chance that mixing indexes on stored columns and generated columns will be allowed? So we do not have to create a “dummy” generated column to create a combined index?