In this blog post, we’ll discuss JSON document fast lookup with MySQL 5.7.
Recently I attended Morgan Tocker’s talk on MySQL 5.7 and JSON at FOSDEM, and I found it awesome.
I learned some great information from the talk. Let me share one of them here: a very useful trick if you plan to store JSON documents in your MySQL database and want to retrieve the documents from some attribute’s values. So let’s look at how to do JSON document fast lookup with MySQL 5.7!
In this short example, I show you how we can speed up this type of search using JSON functions and virtual columns.
JSON document fast lookup
This our test table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | Table: test_features Create Table: CREATE TABLE `test_features` ( `id` int(11) NOT NULL AUTO_INCREMENT, `feature` json NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1 mysql> show table status like 'test_features'G *************************** 1. row *************************** Name: test_features Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 171828 Avg_row_length: 1340 Data_length: 230326272 Max_data_length: 0 Index_length: 0 Data_free: 3145728 Auto_increment: 206561 Create_time: 2016-03-01 15:22:34 Update_time: 2016-03-01 15:23:20 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: |
We can see the data length is almost 230M:
1 2 3 4 5 6 7 | +--------------------+--------+-------+-------+-------+------------+---------+ | TABLE | ENGINE | ROWS | DATA | IDX | TOTAL SIZE | IDXFRAC | +--------------------+--------+-------+-------+-------+------------+---------+ | json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G | 0.00 | +--------------------+--------+-------+-------+-------+------------+---------+ -rw-r----- 1 mysql mysql 228M Mar 1 15:23 /var/lib/mysql/json/test_features.ibd |
As an example here is one record (the data is coming from https://github.com/zemirco/sf-city-lots-json):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [ -122.41983177253881, 37.80720512387136, 0 ], ... [ -122.41983177253881, 37.80720512387136, 0 ] ] ] }, "properties": { "TO_ST": "600", "BLKLOT": "0010001", "STREET": "BEACH", "FROM_ST": "600", "LOT_NUM": "001", "ST_TYPE": "ST", "ODD_EVEN": "E", "BLOCK_NUM": "0010", "MAPBLKLOT": "0010001" } } |
Now let’s try to find all the records where the street is “BEACH”. “Street” is part of the array attribute properties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'; +----------+ | count(*) | +----------+ | 208 | +----------+ 1 row in set (0.21 sec) mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_features partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 171828 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) |
As you can see, we perform a full table scan to achieve this.
With MySQL, we have the possibility of using virtually generated columns. Let’s create one for the streets:
1 | mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL; |
I use “json_unquote()” to avoid to add the JSON string quotes in the column, and later in the index.
You can verify the size of the table on disk, and you will see this doesn’t increase (as it’s a virtual column).
Even if we can now use the “street” column in the search, that won’t help. We still need to add an index on it:
1 | mysql> ALTER TABLE test_features ADD KEY `street` (`street`); |
And now we can see that the size is larger because we have added the size of the index:
1 | -rw-r----- 1 mysql mysql 232M Mar 1 15:48 /var/lib/mysql/json/test_features.ibd |
Now we can try to run the query like this:
1 2 3 4 5 6 7 | mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH'; +----------+ | count(*) | +----------+ | 208 | +----------+ 1 row in set (0.00 sec) |
Query Execution Plan:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_features partitions: NULL type: ref possible_keys: street key: street key_len: 33 ref: const rows: 208 filtered: 100.00 Extra: Using index |
And finally we can verify this in the statistics available in sys schema:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | mysql> select * from sys.schema_index_statistics where table_name='test_features'G *************************** 1. row *************************** table_schema: json table_name: test_features index_name: street rows_selected: 208 select_latency: 72.59 us rows_inserted: 0 insert_latency: 0 ps rows_updated: 0 update_latency: 0 ps rows_deleted: 0 delete_latency: 0 ps *************************** 2. row *************************** table_schema: json table_name: test_features index_name: PRIMARY rows_selected: 0 select_latency: 0 ps rows_inserted: 0 insert_latency: 0 ps rows_updated: 0 update_latency: 0 ps rows_deleted: 0 delete_latency: 0 ps 2 rows in set (0.00 sec) |
As you can see, this is very fast. If you already know how you want to retrieve data out of your JSON document, it’s very easy to add such indexes in MySQL.
http://databaseblog.myname.nl/2016/03/re-json-document-fast-lookup-with-mysql.html
Great article! What would be the best approach of indexing a property that’s an object or array?
Also, what method did you use to load the data from https://github.com/zemirco/sf-city-lots-json? I’m trying to do that locally but can’t think of an easy way.
Percona Mysql 5.7 support JSON data type?