Tags and FullText indexes in MySQL

Tags and FullText indexes in MySQL As a principal architect at Percona, one of my duties is to tune MySQL database servers for our customers. The tuning effort looks at every aspect of the database service like the operating system, the MySQL configuration, the schema, the queries, etc. We have well-defined processes to tune the operating system and the MySQL configuration. However, tuning the schema and the queries using it can be anywhere from trivial to extremely challenging.

The challenge with the schema and the queries is mostly with the indexes. The most common types of indexes are based on b-trees or hash lists. InnoDB doesn’t support hash indexes, a bummer for equality conditions. B-tree indexes are more general-purpose, they are decent for equality conditions and very good for range conditions. They are however quite heavy and very dependent on the defined order of columns. They are also poor for double range conditions.

Double range conditions can be handled by the GIS R-tree type index, a topic I’d like to explore in a future post. The focus of this post is FullText indexes but not for their primary use case like text documents. I want to show you how this type of index can be used to efficiently solve performance problems of a type of schema I’ll refer to as “Tag-based schema”. It is surprising how little attention fulltext indexes are receiving, I did a quick search on this blog and our latest post was from 2018.

What is a Tag-based schema?

At least from a high-level view, a database table is used to map the properties of an object or an entity to columns. The object can be a user or a relation in a social network database or a transaction in a credit card processing database. Some objects have few, very well-defined properties but others have many dozens if not hundreds of properties. A table with a hundred columns that may need indexes is impractical and very difficult to tune. Let’s have a look at a few examples.

Example: House rental

Let’s imagine you are designing the database for a house rental application. You’ll need a table to describe the houses you are offering for rent. In such a context, houses are objects with many properties describing their basic attributes, amenities, and surroundings. Let’s try to list the main ones:

Attributes: Number of rooms, number of bathrooms, private bathrooms, full kitchen, double beds, twin beds, parking spaces, meeting/workspace space, rating (stars), price ranges, region, etc.

Amenities: Air conditioning, swimming pool, jacuzzi, fireplace, pool table, wifi, home theater, cable tv, nice view, quiet (or not), non-smoking (or not), etc.

Surroundings: Beach, hiking, walking, fitness, skiing, restaurants, Bars, Spectacles, Theaters, parks, distance from the city center, distance from the airport, etc.

Example: Tree classification

A very different example is a database listing species of trees and how to identify them based on their general characteristics, their bark, leaves, seeds, etc.

Bark: Smooth, paper, peeling, ridges, scale, etc.

Leaves: SimpleSym, simpleAsym, multi-opposite, multi-alternate, multi-whorled, Needle flat, Needle round, lobed, toothed, etc

Seeds: Fruits, Nuts, Samaras, Acorns, Cones, Pods, etc.

These types of objects are difficult to map. Other than the wide approach (many columns) we discussed above, the other traditional approach is the entity value schema. The entity value schema uses one row per attribute, instead of being wide, we can say it is long. The entity value schema is flexible but also presents a number of tuning challenges. Let’s explore an alternative using fulltext indexes and tags. In this alternative, you store the attributes word elements (tags), concatenating them using a delimiter into a TEXT column.

Tags with a fulltext index

First, we need to define a dictionary table for the attributes/tags:

CREATE TABLE `dictTags` (
  `tag` varchar(10) NOT NULL,
  `description` varchar(100) DEFAULT NULL,
  `groupTags` int unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`tag`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `dictTags` (

`tag` varchar(10) NOT NULL,

`description` varchar(100) DEFAULT NULL,

`groupTags` int unsigned NOT NULL DEFAULT '0',

PRIMARY KEY (`tag`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

The table contains all the tags defined. A description is a human-readable version of the tag, GroupTags allows to set exclusive groups (only one value possible). The above table is a simple example but it can easily be expanded.

Here’s a sample tag list for a house or apartment rental site. Keep in mind I am not specialized at all in that type of business, I could easily be missing critical aspects. More realistically, there could be many hundreds of tags.

mysql> select * from dictTags;
+-------------+----------------------------------------+-----------+
| tag         | description                            | groupTags |
+-------------+----------------------------------------+-----------+
| 1bath       | One bathroom                           |         1 |
| 1bedr       | One bedroom                            |         2 |
| 1kmCenter   | Less than 1 kilometer from city center |         6 |
| 2bath       | Two bathrooms                          |         1 |
| 2bedr       | Two bedrooms                           |         2 |
| 2kmCenter   | Less than 2 kilometer from city center |         6 |
| 3bath       | Three bathrooms                        |         1 |
| 3bedr       | Three bedrooms                         |         2 |
| 3kmCenter   | Less than 3 kilometer from city center |         6 |
| 4bath       | Four bathrooms                         |         1 |
| 4bedr       | Four bedrooms                          |         2 |
| aircond     | Air conditioning                       |         0 |
| beach       | Direct beach access                    |         0 |
| City        | Urban environment                      |         4 |
| ctv         | Cable TV                               |         0 |
| Doublebed   | Double beds                            |         5 |
| firep       | Fireplace                              |         0 |
| hike        | Hiking                                 |         0 |
| hometh      | Home Theater                           |         0 |
| Jacu        | Jacuzzi                                |         0 |
| Nature      | Nature environment                     |         4 |
| night       | Nightlife options close by             |         3 |
| poolt       | Pool table                             |         0 |
| quiet       | Quiet neighborhood                     |         3 |
| sky         | Skying options                         |         0 |
| swim        | Swimming pool                          |         0 |
| Twinbed     | Twin beds                              |         5 |
| view        | Nice view                              |         0 |
| wifi        | Wifi Internet                          |         0 |
+-------------+----------------------------------------+-----------+
29 rows in set (0.00 sec)

mysql> select * from dictTags;

+-------------+----------------------------------------+-----------+

| tag | description | groupTags |

+-------------+----------------------------------------+-----------+

| 1bath | One bathroom | 1 |

| 1bedr | One bedroom | 2 |

| 1kmCenter | Less than 1 kilometer from city center | 6 |

| 2bath | Two bathrooms | 1 |

| 2bedr | Two bedrooms | 2 |

| 2kmCenter | Less than 2 kilometer from city center | 6 |

| 3bath | Three bathrooms | 1 |

| 3bedr | Three bedrooms | 2 |

| 3kmCenter | Less than 3 kilometer from city center | 6 |

| 4bath | Four bathrooms | 1 |

| 4bedr | Four bedrooms | 2 |

| aircond | Air conditioning | 0 |

| beach | Direct beach access | 0 |

| City | Urban environment | 4 |

| ctv | Cable TV | 0 |

| Doublebed | Double beds | 5 |

| firep | Fireplace | 0 |

| hike | Hiking | 0 |

| hometh | Home Theater | 0 |

| Jacu | Jacuzzi | 0 |

| Nature | Nature environment | 4 |

| night | Nightlife options close by | 3 |

| poolt | Pool table | 0 |

| quiet | Quiet neighborhood | 3 |

| sky | Skying options | 0 |

| swim | Swimming pool | 0 |

| Twinbed | Twin beds | 5 |

| view | Nice view | 0 |

| wifi | Wifi Internet | 0 |

+-------------+----------------------------------------+-----------+

29 rows in set (0.00 sec)

We will store our properties to rent in this table:

CREATE TABLE `PropertiesToRent` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `owner` int unsigned NOT NULL,
  `address` varchar(100) DEFAULT NULL,
  `tags` text,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `idxFtTags` (`tags`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `PropertiesToRent` (

`id` int unsigned NOT NULL AUTO_INCREMENT,

`owner` int unsigned NOT NULL,

`address` varchar(100) DEFAULT NULL,

`tags` text,

PRIMARY KEY (`id`),

FULLTEXT KEY `idxFtTags` (`tags`)

) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

In order to help manipulate the tags in a consistent way, we’ll create four handy SQL functions, hasTag, AddTag, AddTags, and removeTag. The SQL code for these functions is available on my GitHub. With these functions, let’s add a few properties.

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (1,1,'1 Québec street',addTags('4bedr,2bath,wifi,aircond,swim',''));
Query OK, 1 row affected (0.00 sec)

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (2,1,'1 Ontario street',addTags('3bedr,2bath,wifi,Jacu,poolt',''));
Query OK, 1 row affected (0.00 sec)

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (3,2,'1 Manitoba street',addTags('3bedr,3bath,sky,nature,ctv,Jacu',''));
Query OK, 1 row affected (0.00 sec)

mysql>; select * from PropertiesToRent;
+----+-------+-------------------+---------------------------------+
| id | owner | address           | tags                            |
+----+-------+-------------------+---------------------------------+
|  1 |     1 | 1 Québec street   | 4bedr,2bath,wifi,aircond,swim   |
|  2 |     1 | 1 Ontario street  | 3bedr,2bath,wifi,Jacu,poolt     |
|  3 |     2 | 1 Manitoba street | 3bedr,3bath,sky,nature,ctv,Jacu |
+----+-------+-------------------+---------------------------------+
3 rows in set (0.00 sec)

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (1,1,'1 Québec street',addTags('4bedr,2bath,wifi,aircond,swim',''));

Query OK, 1 row affected (0.00 sec)

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (2,1,'1 Ontario street',addTags('3bedr,2bath,wifi,Jacu,poolt',''));

Query OK, 1 row affected (0.00 sec)

mysql>; insert into PropertiesToRent (id,owner,address,tags) values (3,2,'1 Manitoba street',addTags('3bedr,3bath,sky,nature,ctv,Jacu',''));

Query OK, 1 row affected (0.00 sec)

mysql>; select * from PropertiesToRent;

+----+-------+-------------------+---------------------------------+

+----+-------+-------------------+---------------------------------+

| 1 | 1 | 1 Québec street | 4bedr,2bath,wifi,aircond,swim |

| 2 | 1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |

| 3 | 2 | 1 Manitoba street | 3bedr,3bath,sky,nature,ctv,Jacu |

+----+-------+-------------------+---------------------------------+

3 rows in set (0.00 sec)

Now, we can use the fulltext index on tags to search the table. Let’s say you want to find out which properties have a Jacuzzi:

mysql> select * from PropertiesToRent where match (tags) against ('+Jacu' in Boolean mode);
+----+-------+-------------------+---------------------------------+
| id | owner | address           | tags                            |
+----+-------+-------------------+---------------------------------+
|  2 |     1 | 1 Ontario street  | 3bedr,2bath,wifi,Jacu,poolt     |
|  3 |     2 | 1 Manitoba street | 3bedr,3bath,sky,nature,ctv,Jacu |
+----+-------+-------------------+---------------------------------+
2 rows in set (0.00 sec)

mysql> select * from PropertiesToRent where match (tags) against ('+Jacu' in Boolean mode);

+----+-------+-------------------+---------------------------------+

+----+-------+-------------------+---------------------------------+

| 2 | 1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |

| 3 | 2 | 1 Manitoba street | 3bedr,3bath,sky,nature,ctv,Jacu |

+----+-------+-------------------+---------------------------------+

2 rows in set (0.00 sec)

Do you also need to have a wifi connection?

mysql> select * from PropertiesToRent where match (tags) against ('+Jacu +wifi' in Boolean mode);
+----+-------+------------------+-----------------------------+
| id | owner | address          | tags                        |
+----+-------+------------------+-----------------------------+
|  2 |     1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |
+----+-------+------------------+-----------------------------+
1 row in set (0.01 sec)

mysql> select * from PropertiesToRent where match (tags) against ('+Jacu +wifi' in Boolean mode);

+----+-------+------------------+-----------------------------+

+----+-------+------------------+-----------------------------+

| 2 | 1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |

+----+-------+------------------+-----------------------------+

1 row in set (0.01 sec)

Then you realize a swimming pool could replace the Jacuzzi:

mysql> select * from PropertiesToRent where match (tags) against ('+(Jacu swim) +wifi' in Boolean mode);
+----+-------+------------------+-------------------------------+
| id | owner | address          | tags                          |
+----+-------+------------------+-------------------------------+
|  1 |     1 | 1 Québec street  | 4bedr,2bath,wifi,aircond,swim |
|  2 |     1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt   |
+----+-------+------------------+-------------------------------+
2 rows in set (0.00 sec)

mysql> select * from PropertiesToRent where match (tags) against ('+(Jacu swim) +wifi' in Boolean mode);

+----+-------+------------------+-------------------------------+

+----+-------+------------------+-------------------------------+

| 1 | 1 | 1 Québec street | 4bedr,2bath,wifi,aircond,swim |

| 2 | 1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |

+----+-------+------------------+-------------------------------+

2 rows in set (0.00 sec)

The weakness of this approach is that some attributes include other ones. For example, if you want a Property with at least three bedrooms, you’ll need to specify the additional attributes to include higher counts of bedrooms:

mysql> select * from PropertiesToRent where match (tags) against ('+(3bedr 4bedr) +(Jacu swim) +wifi' in Boolean mode);
+----+-------+------------------+-------------------------------+
| id | owner | address          | tags                          |
+----+-------+------------------+-------------------------------+
|  1 |     1   | 1 Québec street  | 4bedr,2bath,wifi,aircond,swim |
|  2 |     1   | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt   |
+----+-------+------------------+-------------------------------+
2 rows in set (0.00 sec)

mysql> select * from PropertiesToRent where match (tags) against ('+(3bedr 4bedr) +(Jacu swim) +wifi' in Boolean mode);

+----+-------+------------------+-------------------------------+

+----+-------+------------------+-------------------------------+

| 1 | 1 | 1 Québec street | 4bedr,2bath,wifi,aircond,swim |

| 2 | 1 | 1 Ontario street | 3bedr,2bath,wifi,Jacu,poolt |

+----+-------+------------------+-------------------------------+

2 rows in set (0.00 sec)

Other implementations

My above example is fairly simple. As mentioned previously, there are two other implementations I can think of: wide and entity. Let’s look at what would be the equivalent queries of these alternate implementations.

Wide (many columns):

select * from PropertiesToRent where bedr >= 3 and (Jacuzzi = ‘y’ or swimingpool = ‘y’) and wifi = ‘y’;

1	select * from PropertiesToRent where bedr >= 3 and (Jacuzzi = ‘y’ or swimingpool = ‘y’) and wifi = ‘y’;

Entity (vertical):

select * from PropertiesToRent
where id in (select propertyId from PropertyAttributes where AttributeName = ‘Bedroom’ and iValue > 3)
and id in (select propertyId from PropertyAttributes where AttributeName in  (‘Jaccuzzi’,’Swimminpool’) and bValue = ‘y’)
and id in (select propertyId from PropertyAttributes where AttributeName in  (’wifi’) and bValue = ‘y’);

select * from PropertiesToRent

where id in (select propertyId from PropertyAttributes where AttributeName = ‘Bedroom’ and iValue > 3)

and id in (select propertyId from PropertyAttributes where AttributeName in (‘Jaccuzzi’,’Swimminpool’) and bValue = ‘y’)

and id in (select propertyId from PropertyAttributes where AttributeName in (’wifi’) and bValue = ‘y’);

Both of these implementations have their own tuning challenges. The wide implementation is easy to deploy but hard to maintain and nearly impossible to index properly given the large number of columns that can be part of the where clause. You can easily expand the Entity implementation but then, the PropertyAttributes table will grow quite large. You must also have one column for every supported data type.

Conclusion

In this post, we used fulltext indexes to solve issues with schema mapping objects with a large number of attributes. This approach, using text tags to map the attributes is quite efficient and elegant. It is much easier to maintain than a wide approach mapping attributes to columns. It is also more efficient than the entity value schema where every attribute is a row from a table.

Fulltext indexes are less common than the b-tree indexes around which InnoDB is built. With the GIS R-tree indexes, there are among the less used index types of MySQL. Yet, they can be useful out of their intended initial scope, as we demonstrated in this post.

In a future post, I’d like to explore the use of R-tree indexes to solve another common SQL problem, the double range condition. If you are interested in unorthodox usages of database features, stay posted.

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Tags and FullText indexes in MySQL

What is a Tag-based schema?

Example: House rental

Example: Tree classification

Tags with a fulltext index

Other implementations

Wide (many columns):

Entity (vertical):

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Tags and FullText indexes in MySQL

What is a Tag-based schema?

Example: House rental

Example: Tree classification

Tags with a fulltext index

Other implementations

Wide (many columns):

Entity (vertical):

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation