Comments on: To UUID or not to UUID ? https://www.percona.com/blog/to-uuid-or-not-to-uuid/ Mon, 26 May 2014 03:05:52 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: Jamiat Abdillah https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-6884560 Mon, 26 May 2014 03:05:52 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-6884560 Thanks, Good Article

]]>
By: orubel https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-6662989 Thu, 15 May 2014 20:52:25 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-6662989 @bar4ka Composite keys are not a good idea as they are very slow. Almost all DBA’s will tell you to avoid composite keys.

]]>
By: bar4ka https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-831000 Thu, 13 Oct 2011 13:05:32 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-831000 UUID is 2 integers long, store it as a composite key of two ints, take the upper part from UUID to the first column and the lower part for the second composite column.

]]>
By: Mike O. https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-794716 Thu, 20 Jan 2011 23:18:32 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-794716 Why make things so complicated? Innodb and MyISAM allows for a compound primary key. Why can’t you simply add a Server_Id column to each table?

CREATE TABLE cust (
Cust_Id int(10) unsigned NOT NULL AUTO_INCREMENT,
Server_Id smallint(6) NOT NULL DEFAULT ‘1’,
First_Name char(15) DEFAULT NULL,
Last_Name char(20) DEFAULT NULL,
PRIMARY KEY (Cust_Id,Server_Id)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1

When each database is defined, run an app that alters the default value of the Server_Id of each table. (You probably also want a ServerId table with a row set to the same value). When the table data gets merged from other servers, you still have unique rows and you know which server created the row. The user app would identify each row as ccccc.sss where ccccc is the Cust_Id and sss is the server id. So the user sees customer id of 1234.21 and the program behind the scene just splits it into Cust_Id=1234 and Server_Id=21.

It is fast and it is readable. Of course it all depends on the administrator setting up unique Server_Id values for each server but if they are on a network you can write an app to do it and verify it.

Mike

]]>
By: Daevid Vincent https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-786218 Fri, 10 Dec 2010 21:05:33 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-786218 Another thing I suspect may happen if you use UUIDs as PK is that every insert would cause mySQL to re-index since they’re not sequential by default as a normal numeric PK is. I think you’d take a performance hit on all INSERT operations.

]]>
By: Josh Smith https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-769681 Tue, 20 Jul 2010 08:26:22 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-769681 If you really need to find something between Jens Wagner’s solution Jay Paroline’s in PHP/MySQL, I suggest this approach.

First simply SELECT REVERSE(UNHEX(REPLACE(UUID(),’-‘,”))); this will give us the UNHEXed and REVERSEd UUID in the way that Jens Wagner above suggests. Then you can place this value into your database as you please.

Then when you retrieve the UUID from the database later, you call SELECT HEX(REVERSE(‘{$uuid}’)); And to make the UUID look more “natural,” simply run strtolower() to remove the caps.

]]>
By: Jens Wagner https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-766313 Tue, 08 Jun 2010 22:05:34 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-766313 As I understood, the last part of UUIDs changes least on one machine, e.g:

1388681F-7348-11DF-83CD-D0B6181E9833
13CF3551-7348-11DF-8556-D1B6181E9833
13F8B66F-7348-11DF-BBC6-D2B6181E9833
141E8E03-7348-11DF-BFCE-D3B6181E9833
1438CCD7-7348-11DF-807F-D4B6181E9833

To improve InnoDB insert performance, primary keys should be inserted with least changing bits first, correct?

I wrote two functions to encode and decode UUIDs, please find them below.

Best,
-jens

DELIMITER $$

CREATE FUNCTION ENCODE_UUID (uuid CHAR(36))
RETURNS BINARY(16)
DETERMINISTIC
BEGIN
RETURN REVERSE(UNHEX(REPLACE(uuid, ‘-‘,”)));
END
$$

DELIMITER $$

CREATE FUNCTION DECODE_UUID (uuid BINARY(16))
RETURNS CHAR(36)
DETERMINISTIC
BEGIN
RETURN concat(
HEX(LEFT(REVERSE(uuid),4)),’-‘,
HEX(MID(REVERSE(uuid),5,2)),’-‘,
HEX(MID(REVERSE(uuid),7,2)),’-‘,
HEX(MID(REVERSE(uuid),9,2)),’-‘,
HEX(RIGHT(REVERSE(uuid),6))
);
END
$$

SELECT DECODE_UUID(ENCODE_UUID(‘0935C9C3-7345-11DF-BADD-A8B4181E9833’))

]]>
By: David Tang https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-750948 Wed, 21 Apr 2010 03:35:54 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-750948 I am building an application that can function when network is not available, i.e. off line. I would then replicate the record back to central computer. The problem I face is the unique number to assign to the row. If I use int, I need to allocate a section of number of each of the instance of offline application. I am thinking UUID to solve the problem. Is this the right choice to solve this problem?

]]>
By: Jay Paroline https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-750501 Tue, 20 Apr 2010 08:29:52 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-750501 @me, my thoughts exactly. In fact you can make them even smaller if you strip out the information you already “know” when you store it. Portions of the UUID never change when generated on the same server, so if you have some other way of uniquely identifying the server you can toss that out when you store it and just add it back in when it’s needed. That brings the size down to a mere 10 bytes, which isn’t bad at all.

BTW in PHP that looks like this:
$uuid = substr($uuid, 0, 8) . substr($uuid, 9, 4) . substr($uuid, 14, 4) . substr($uuid, 19, 4);
where $uuid is the result of doing a SELECT UUID() in MySQL.

]]>
By: me https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-706090 Tue, 05 Jan 2010 17:50:33 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-706090 Oh boy. UUIDs are 128 bits == 16 bytes.
Store them in a BINARY(16) column.
Don’t mess around with CHAR(32) or even VARCHAR(36).

]]>
By: Josh P https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-567152 Tue, 26 May 2009 06:51:31 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-567152 Here is the solution that I’m considering for my project:

1. Concatenate the numeric user id (1-9 digits) and the UNIX timestamp (10 digits)
–> [i.e. “3333” . “1111111111” = “33331111111111”] ***GUARANTEED UNIQUE.

2. Convert to base-36
–> [ “33331111111111” –> “a3gsei3”] — (if I had 100 million users, the longest ID would still only be 10 characters)

3. Store as binary in CHAR().

My best guess is that this strategy is a win-win (for my situation) over GUIDs and INTs. I have guaranteed-unique ids (since they are tied to the user id and the timestamp) that are available without querying the database. PLUS, they are SIGNIFICANTLY shorter than GUIDs (they are 1/3 the size).

Admittedly, I’m relatively new to highly-scalable database architecture, so I’d appreciate any thoughts or feedback. Thanks.

]]>
By: Rob https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-525902 Tue, 31 Mar 2009 21:16:51 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-525902 My biggest gripe about UUID() is that it doesn’t generate random (or at least pseudo-random) values. A reason you would want to use UUID/GUID over auto incrementing integers is when you’re synchronizing data from multiple dispersant sources. If you’re constrained by incremental integers for primary keys you’re constantly updating IDs on record inserts during synchronization. If the UUID() values were random (like a Guid in .NET) there should be very little chance of collisions.

I would also like to see a UUID data type (which would use 16-byte binary representation instead of VARCHAR(36))

]]>
By: xli https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-366358 Mon, 27 Oct 2008 16:46:50 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-366358 Hi, Peter

I noticed your benchmark which shows 200 times performance differences between auto-increment and UUID(). I’m wondering if you did the same thing for InnoDB as well. We are using MySQL 5.0 and InnoDB, we got badly lock conflict when inserting rows into InnoDB tables with auto-increment column. So, we are considering to switch to UUID() as a PK. I did a very simple testing: wrote a stored procedure, which has a loop to insert a row into a table. the results for InnoDB are shown as below: inserting 100,000 rows into InnoDB tables, for a table with auto-increment, it tooks 254 sec; for a table without auto-increment but use UUID(), it took 263 sec; the same testing for MyISAM tables, I got 54 sec vs. 68 sec. the 54 sec is similar to what you got. What I did wrong?

]]>
By: Al T. https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-278544 Tue, 15 Apr 2008 17:32:00 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-278544 I have seen those problems with UUID’s in MySQL when stored as text. Ideally, MySQL would have a UUID column type to store the values as binary rather than strings but convert to string anytime the row is returned. The ability to insert an ID in hex text and convert it to binary would be a must too. That’s the biggest problem with using binary fields: you can’t read them or copy and paste them without conversion. That change would provide better performance in terms of storage and indexing (a 16-byte column instead of 36).

In an effort to improve the situation, I created a simple UUID class capable of generating random UUID’s with an option to store them in Base64 rather than hex. Since the length of the UUID is always constant, I was able to trim off the extra = in the Base64 conversion and come up with a case-sensitive 22-byte UUID representation (VTIW7xOgReOGrL3vMRjm4Q, for example). The performance increase was enormous, and the overhead is much smaller (22 bytes vs. 16) and you are able to convert to binary and hex at any time.

Later, as I thought more of the problem, I realized the Base64 encoding was inefficient for a 128-bit number. In order to get maximum efficiency out of Base64, the number of bits needs to be divisible by 6. So I created a new identifier that was only 72 bits. (Yes, the collision probability goes up, but it is still one in 4.7 * 10^21.) These UID’s (as I call them) only take 12 bytes to store in a binary column and strike a very good balance between speed and uniqueness. They can also be translated to GUID’s (########-0000-0000-0000-00##########) and back when needed. (If 72-bits is not enough, use 96 bits to make a 16-byte UID).

]]>
By: cybermonk https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-254921 Thu, 20 Mar 2008 17:54:39 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-254921 Well it’s too bad the comparison wasn’t done using the UUID in binary format, autogenerating the GUID on the client side using Jimmy Nilsson’s GUID.COMB. Can you do that comparison peter? NHibnerate has an implementation of GUID.COMB.

]]>
By: Anthony Mathews https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-218410 Sat, 15 Dec 2007 20:09:55 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-218410 It appears that the benchmark done in this article was storing the UUID as a ascii representation of a binary number when in fact the UUID should have bin stored in binary form. This is equivalent to searching using a 4 byte integer for the auto_increment primary key and then using a persons First, Middle and Last name to search for the UUID implementation.

Write a function that converts the value returned by the UUID function to binary and store it in a binary(128) column. Write yourself another function that will cast it back to a characters string with hyphenation if you need to display it.

However, the primary use for UUIDs or GUIDs is data portability, not speed for searching. If you have worked for large companies where you have redundant data stored in many locations you have to manage this primary key much closer and have the ability to generate something that you know will be unique across the company. Integers and auto increment will not cut it.

]]>
By: Peter Zaitsev https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-106459 Wed, 11 Apr 2007 09:34:29 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-106459 Jacob,

I can tell you what we do for http://www.boardreader.com with some billion rows which need quick retrieval.
The data is partitioned in “table groups” which are mapped to the servers. We use 64bit identifiers with lower byte used to store table group.

Search is done using “Sphinx” search engine and we basically need to find rows by IDs to show result set in most cases.

]]>
By: Jacob https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-106309 Tue, 10 Apr 2007 21:14:09 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-106309 Kevin,

Do you have any good links or more information on how to go about doing “client side sharding of data.” This sounds like what I need to do. I am creating a app that will need quick storage and retrieval of 8-10 Billion rows of data. Splitting up the data onto several servers with a deterministic way of finding it again is what I need.

Any help would be greatly appreciated.

]]>
By: Sachman Bhatti https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-95934 Tue, 27 Mar 2007 01:15:42 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-95934 Ok I misinterpreted that, someone else clarified it for me. I thought there was a way to do O(1) updates….not the mapping 🙂

]]>
By: Sachman Bhatti https://www.percona.com/blog/to-uuid-or-not-to-uuid/#comment-95919 Tue, 27 Mar 2007 00:41:02 +0000 https://www.percona.com/blog/2007/03/13/to-uuid-or-not-to-uuid/#comment-95919 O(1) Updates??? Awesome! What do you mean by hashcode routing though? Does this happen automatically? If I replaced my primary keys with UUID() generated varchars can I just use a normal UPDATE statement and expect the operation to happen in O(1)? If not, can someone please post a link on where one can read more about hashcode routing?

]]>