Comments on: Tuning InnoDB Primary Keys https://www.percona.com/blog/tuning-innodb-primary-keys/ Thu, 11 Apr 2024 22:28:50 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10970826 Fri, 24 May 2019 14:22:50 +0000 https://www.percona.com/blog/?p=51380#comment-10970826 @H, yes, it would make perfect sense to prefix the primary with doc_account. You need an additional index for the auto_increment doc_id column though. The secondary index is not bad at all. Remember InnoDB adds the primary key to the index columns. Naively, this gives (doc_account, doc_date, doc_account,doc_id) but, InnoDB is wise enough to avoid repeating columns so you really have (doc_account, doc_date,doc_id) for the secondary key.

]]>
By: H https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10970821 Thu, 23 May 2019 11:53:43 +0000 https://www.percona.com/blog/?p=51380#comment-10970821 Yves, thanks for this very helpful article! I have a question regarding the concepts you explained: when having e.g. document data in a InnoDB table in a multi-tenant database used for multiple customer accounts, would it make sense / be wise to “preface” the primary key with the account ID? (Access pattern would be to retrieve e.g. the newest documents for a specific account.)

So instead of

CREATE TABLE documents (
doc_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
doc_account MEDIUMINT UNSIGNED NOT NULL,
doc_title VARCHAR(128) NOT NULL,
doc_date DATE NOT NULL,

PRIMARY KEY (doc_id),
INDEX (doc_account, doc_date)
)

it would be better to do this

CREATE TABLE documents (
doc_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
doc_account MEDIUMINT UNSIGNED NOT NULL,
doc_title VARCHAR(128) NOT NULL,
doc_date DATE NOT NULL,

PRIMARY KEY (doc_account, doc_id),
INDEX (doc_account, doc_date)
)

? Would I leave the secondary index just as it is? Any help is highly appreciated!

]]>
By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969890 Thu, 01 Nov 2018 13:30:47 +0000 https://www.percona.com/blog/?p=51380#comment-10969890 @John

If you search mainly by timestamp, it would be logical to have the primary key defined as (timestamp,id) and secondary key on (id) to support the auto_increment. Partitions by range of timestamp could still be useful to prune older data.

]]>
By: John Lee https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969886 Thu, 01 Nov 2018 00:24:12 +0000 https://www.percona.com/blog/?p=51380#comment-10969886 Very informative. What is the relationship between primary key vs. partition key and their impact on performance? For example, I’d like to use ‘id’ as primary key because they are unique. id is also incrementally generated, so storing them together generally provide grouping of items that are relatively close in incoming timestamp. However, search is usually done with timestamp since I do not know id in advance. I am thinking a primary key index as such (id, timestamp). Since the tables are usually well over 100G, I would like to partition as well but on timestamp.

]]>
By: SCG82 https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969605 Sat, 25 Aug 2018 20:05:36 +0000 https://www.percona.com/blog/?p=51380#comment-10969605 because 93% of queries are by user_id. your schema would perform very poorly for these queries.

]]>
By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969506 Wed, 01 Aug 2018 13:42:20 +0000 https://www.percona.com/blog/?p=51380#comment-10969506 Just because removing it means modifying the application, some ORMs are stubborn in their will of having an “id” column. If it can be modified to use the new primary key, then yes, the id column and idx_id index can be removed.

]]>
By: Sids https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969505 Wed, 01 Aug 2018 13:28:06 +0000 https://www.percona.com/blog/?p=51380#comment-10969505 Thanks for such informative post.

Just a random question. Is there any particular reason you’ve kept column ( id int(10) unsigned NOT NULL AUTO_INCREMENT) and index (KEY idx_id (id)) ? Since you’ve already set new PRIMARY KEY (user_id,friend_user_id), do you still need ID column and with index on it?!

]]>
By: Sids https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969492 Tue, 31 Jul 2018 03:31:33 +0000 https://www.percona.com/blog/?p=51380#comment-10969492 Great article on tuning ideology.

Just a random question, is there any specific need of ID column “id int(10) unsigned NOT NULL AUTO_INCREMENT” afterwards you’ve shifted primary key to “PRIMARY KEY (user_id,friend_user_id)”?

Is there any use-case to keep
Column : id int(10) unsigned NOT NULL AUTO_INCREMENT and
Key : idx_id (id) ?

]]>
By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969483 Sun, 29 Jul 2018 21:58:43 +0000 https://www.percona.com/blog/?p=51380#comment-10969483 You basically have a covering index, it will work great for searches by friend_user_id and the innodb change buffer will limit the impacts of the index maintenance.

I see a few issues.

– You cannot search by user_id
– You don’t have unicity for the pair (user_id,friend_user_id)
– Query forms like: “select friends_user_id from friends where user_id=1234 and active = 1;” are very bad because the index doesn’t provide “active”. The primary key b-tree must be queried.

My table works fine in all cases.

]]>
By: Jose María Zaragoza https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969482 Sun, 29 Jul 2018 16:40:58 +0000 https://www.percona.com/blog/?p=51380#comment-10969482 Thanks for your post

Why not to create ‘friends’ table like this: 

CREATE TABLE friends (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  user_id int(10) unsigned NOT NULL,
  friend_user_id int(10) unsigned NOT NULL,
  created timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  active tinyint(4) NOT NULL DEFAULT ‘1’,
  PRIMARY KEY (id),
  KEY idx_friend (friend_user_id,’user_id’)
) ENGINE=InnoDB AUTO_INCREMENT=144002 DEFAULT CHARSET=latin1

– PK is autoincrement ( so, inserts are ordered , better for selecting write page )
– only one secondary index. It supports searches by (friend_user_id) and (friend_user_id,user_id)

Regards

]]>
By: Jose María Zaragoza https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969480 Sat, 28 Jul 2018 06:13:54 +0000 https://www.percona.com/blog/?p=51380#comment-10969480 Thanks for you post

Why not to create ‘friends’ table like this:

CREATE TABLE friends (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
user_id int(10) unsigned NOT NULL,
friend_user_id int(10) unsigned NOT NULL,
created timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
active tinyint(4) NOT NULL DEFAULT ‘1’,
PRIMARY KEY (id),
KEY idx_friend (friend_user_id,’user_id’)
) ENGINE=InnoDB AUTO_INCREMENT=144002 DEFAULT CHARSET=latin1

– PK is autoincrement ( so, inserts are ordered , better for selecting write page )
– only one secondary index. It supports searches by (friend_user_id) and (friend_user_id,user_id)

Regards

]]>
By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969479 Fri, 27 Jul 2018 18:04:04 +0000 https://www.percona.com/blog/?p=51380#comment-10969479 Varchar are slow to compare, it is one byte at a time and use the charset and collation code. Integer types, other than mediumint, are compared by the CPU directly. MD5 and UUIDs are huge and random. All the secondary keys will be inflate by the huge PK. Maybe it is a good idea for developers but it is bad for InnoDB. When I have a customer stuck with suck PKs, we have to struggle to improve performance. I have an upcoming blog post on that topic.

]]>
By: Stofa. https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969478 Fri, 27 Jul 2018 16:15:59 +0000 https://www.percona.com/blog/?p=51380#comment-10969478 Hi,

The question seems to be stupid.
What about the choice of using a varchar column as a PRIMARY KEY ? Is there any limit concerning the maximum length of a varchar column used as PRIMARY KEY for Innodb table?
Someone has suggested, as alternative to int or a bingint columns, the use of MD5 or an UUID as a PRIMARY key for innodb.

Stofa.

Thanks.

]]>
By: Yves Trudeau https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969476 Fri, 27 Jul 2018 12:27:16 +0000 https://www.percona.com/blog/?p=51380#comment-10969476 From https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html

“The row ID is a 6-byte field that increases monotonically as new rows are inserted.”

]]>
By: 肖建平 https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969474 Fri, 27 Jul 2018 02:04:38 +0000 https://www.percona.com/blog/?p=51380#comment-10969474 If there is no primary key for a table, InnoDB adds a hidden auto-incremented 6 bytes counter to the table and use that hidden counter as the primary key.
I remember is 8 bytes.

]]>
By: xiaojianping https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969473 Fri, 27 Jul 2018 02:01:29 +0000 https://www.percona.com/blog/?p=51380#comment-10969473 If there is no primary key for a table, InnoDB adds a hidden auto-incremented 6 bytes counter to the table and use that hidden counter as the primary key.
I remember is 8 bytes.

]]>
By: Scott Klasing https://www.percona.com/blog/tuning-innodb-primary-keys/#comment-10969472 Thu, 26 Jul 2018 19:08:29 +0000 https://www.percona.com/blog/?p=51380#comment-10969472 Yves, great article on what so many consider after they have placed a table in production. My only complaint is you have not written anything on how to choose the best primary key and it involves/requires knowledge of the data.

If more mysql or database technicians were to focus on this during the data modeling phase they would have far fewer problems. There are huge gains for those who best understand database modeling, normalization of data, and the relational model itself as invented by Ted Codd back in the 60’s era. The model is straight set theory and is easily, mathmatically proven to work well when adhered to.

I have to admit I am very surprised the database industry does not require you to specify a pk rather than default to an internal hidden useless to users approach. Force the dba/programmer/user to think what is the best access path to the data. Ted Codd (inventor of the relational model) use to say if you have not identified the table’s PK then you have not identified the tables purpose which I fully agree. The proof PK’s should be mandated is if you have a table without a PK, then how do your batch programs indentify which row should be updated. You might be able to view the data and make that determination, but a program can not.

As most already know, know a Primary Key must be unique and can not contain nulls, otherwise you have not identified the row so to start you first identify all possible candidate keys, ie, those that are unique and dont allow nulls. Second choose the better of the candidate keys.

An example, best explains. Usually in most mysql shops you see lazy choices of auto incremented PKs proliferating the design and at the same time you see the most common secondary indexes that should have/ could have been the primary key. The point is improper choice of the PK always results in additional unnecessary secondary indexes which further slows inserts/updates/deletes. The pattern that shows is usually in the WHERE clauses where since the application did not know the value of the PK but it did know a name or a description it searched by the secondary key to then identify the PK. Note the similar pattern where Mysql includes the pk in all secondary indexes. Quite often the main secondary index assuming unique outweighs the choice of the autoincrement, not thought through key.

Then question the impact of single attribute PK’s like autoincrement verses a key that better represents the business but is perhaps a two attribute PK. If your model table hierarchy is only 2 to 3 deep then a multi-attribute key is manageable. If much deeper then it becomes problematic writing code since the lower levels of the hierarchy have to add an additional attribute that presents the next level. Also question the order of the attributes in the PK where your WHERE clauses must reference the first order attributes to even use the index. Also question the distinct cardinality of an attribute because the higher the distinct cardinality of an attribute the better it is in an index. Example, if an index’s first attribute has low cardinality, number of distinct values, then the index has to read/lock more similar rows to answer the questions quickly.

I believe the above process is why DBA’s, developers, implementors get lazy because they dont understand their data as well as they should. If they did understand the above becomes a very easy process to identify the best PK.

In summary I always inform our teams, choose the candidate key that best presents the business, the one you will search by, and most likely order by, group by since the PK is typically the clustered, physically sorted already index. Finally, for shops with huge write volumes as in 100’s of thousands of queries per second and more, Always, always, always UPDATE VIA the PK, never allow updates or deletes that are not PK based. Even if your volumes are lower “Design for millions of transactions per second” by only taking ONE ROW based update/delete locks as opposed to locking many rows. If you dont know the PK to update or delete by, then read in non locking mode to obtain the lists of PKs and then update or delete by the list.

I have used the above techniques for the last 37 plus years and they have never failed to produced significantly improved performance results. The above logic applies to every database I have ever worked on (IDMS, IMS, DB2, Oracle, MySQL, ObjectStar, CouchDB, MongoDB, other NOSQL db’s, and several CloudDBs), not just innodb engine tables.

Loved the article, keep up the good work.

]]>