Comments on: Storing UUID and Generated Columns

By: Mario Bittencourt

Mario Bittencourt — Tue, 11 Sep 2018 23:00:49 +0000

I wonder if the performance hit for UUID4 would be the same.

By: Rick James

Rick James — Thu, 04 May 2017 17:10:21 +0000

In reply to Nils. I agree; "CHAR(36) CHARACTER SET utf8" always consumes 108 bytes (in older MySQL versions), so it is really bad to use utf8 instead of ascii (or latin1 or binary).

By: Nils

Nils — Wed, 03 May 2017 22:22:12 +0000

Now why would you use utf8 to encode what is basically a few hexadecimal numbers and minus signs?

By: David Ducos

David Ducos — Wed, 03 May 2017 21:14:39 +0000

In reply to Rick James. Hi Rick, You are right that in essence uuid is a 'random' string that on a large table the performance will drop. So, the base case is just to show how much my server can insert in the best scenario of the worse case. However, in the second case it will store it in order avoiding this issue. The second case is where I wanted to compare different approaches. Using another type of UUID version, for sure, is another valid approach that I thought, but I didn't add it, as at the end, it won't add any value to the comparison with generated columns. From my understanding, UUID is packaging data that at the end is unique, but also valuable! What I wanted to show is that decomposing this valuable info, helps us to perform queries over internal fields, and you will still have the UUID as a generated column. Some queries could be: - Which devices (MAC) sent events at "2017-01-05 09:00:00"? select mac from uuid_generated where timestamp = "2017-01-05 09:00:00" group by mac; - When this particular device send the last event? select max(timestamp) from uuid_generated where mac=conv("6d5178b4e60c",16,10); etc... MySQL 8.0 will have built-in functions but that will be useful when you don't care how the UUID is made of.

By: Rick James

Rick James — Wed, 03 May 2017 19:24:01 +0000

How big was innodb_buffer_pool_size? How big was the table? If the table was smaller than the buffer_pool, you have not yet gotten to the interesting use case.

As far as inserting into a BTree index, UUID() is essentially a ‘random’ string. Once the uuid index (the PRIMARY KEY, in your first test) becomes bigger than the buffer_pool, the ‘next’ INSERT has a chance of causing a cache miss. Once the index is, say, 5 times as big as the buffer_pool, then 80% of INSERTs will cause cache-misses! This will be dramatic on performance.

If your goal is to rearrange the UUID value so that the “time” is the first part of the string, then INSERT performance will be similar to AUTO_INCREMENT or TIMESTAMP — ‘all’ new rows go into the ‘last’ block of the BTree. The buffer_pool_size no longer matters.

But… Only Type 1 UUIDs have a “time” in them. So only those can be “fixed” that way. Note: MySQL’s UUID is Type 1, but many clients use some other Type.

My blog on such: http://mysql.rjweb.org/doc.php/uuid . Also, MySQL version 8.0 has builtin functions to facilitate rearranging UUIDs and putting them into BINARY(16) to save a significant amount of space (hence speed).