I occasionally get in to light arguments healthy discussions with students about whether or not to use Foreign Key constraints on InnoDB tables. My standard response has always been: “it depends on how much of a tradeoff you are willing to make for performance. In some situations the cost can be considerable”.
.. that’s when they expect me to “come up with some real proof” to show them. I do not disagree with their logic or proof being on their list-of-demands. I support the use of data to make decisions.
The problem is that MySQL has (traditionally) been lacking the instrumentation required to make these decisions easy. This is getting better – here is an example we recently added to our InnoDB course:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | CREATE TABLE parent ( id INT NOT NULL AUTO_INCREMENT, bogus_column char(32), PRIMARY KEY (id) ) ENGINE=InnoDB; CREATE TABLE child ( id INT NOT NULL AUTO_INCREMENT, parent_id INT NOT NULL, bogus_column char(32), PRIMARY KEY (id), KEY (parent_id), CONSTRAINT child_ibfk_1 FOREIGN KEY (parent_id) REFERENCES parent (id) ) ENGINE=InnoDB; INSERT INTO parent (bogus_column) VALUES ('aaa'), ('bbb'), ('ccc'), ('ddd'), ('eee'); INSERT INTO child (parent_id,bogus_column) VALUES (1, 'aaa'), (2,'bbb'), (3, 'ccc'), (4, 'ddd'), (5, 'eee'); START TRANSACTION; # session1 START TRANSACTION; # session2 # session1 UPDATE child SET parent_id = 5
 WHERE parent_id = 4; #session2 UPDATE parent SET bogus_column = 'new!' WHERE id = 4; #session2 UPDATE parent SET bogus_column = 'new!' WHERE id = 5; |
In the last statement, session2 will block waiting on a lock. Want to know where that lock is? Check information_schema.innodb_locks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | mysql> SELECT * FROM information_schema.innodb_locks\G *************************** 1. row *************************** lock_id: 87035:1300:3:6 lock_trx_id: 87035 lock_mode: X lock_type: RECORD lock_table: `test`.`parent` lock_index: `PRIMARY` lock_space: 1300 lock_page: 3 lock_rec: 6 lock_data: 5 *************************** 2. row *************************** lock_id: 87034:1300:3:6 lock_trx_id: 87034 lock_mode: S lock_type: RECORD lock_table: `test`.`parent` lock_index: `PRIMARY` lock_space: 1300 lock_page: 3 lock_rec: 6 lock_data: 5 2 rows in set (0.00 sec) |
The same example without the foreign key constraints does not block on the last statement. We also see the expected output change to:
1 2 | mysql> SELECT * FROM information_schema.innodb_locks; Empty set (0.00 sec) |
This information_schema table is new to InnoDB plugin. In earlier releases of MySQL you may be able to get the data, but it is not in such an easily digestible form. Instrumentation is the most under talked about feature in all new releases of MySQL and Percona Server. See BUG #53336 for more examples of pure awesomeness.
Since update 1 happens in a different (uncommitted) transaction then updates 2 and 3, they shouldn’t be affecting eachother, should they? (Also, are the transactions needed for this to work?)
And anyway, now next time those students ask for proof, you’ve got it. 🙂
@Jory – My interpretation of this is: Foreign key constraints add additional locking to make sure no changes can happen that would break consistency. i.e. when we update the child with parent_id=5, the parent row 5 can not be modified.
You do not need START TRANSACTION for this to occur – this is just the easiest way to make it visible. By default each statement is it’s own transaction.
InnoDB’s mantra consists in adding locks and key controls -so it’s undoubtable slower- in order to ensure that the database will not be corrupted by a statement. My opinion is, that kind of controls should be mandatory and the question of performance over security should not be asked.
If you want to go fast, pass over controls and denormalize your data, NoSQL & RAM caches are the solutions.
But please, let the database to be a database ^_^
Denis, that’s exactly where the discussion starts 😉
Foreign keys can give you some warning of application logic problems. I enjoy using them in development, but there is a tradeoff that can successfully be made:
The parent id 5 row is locked to prevent against changes that could produce inconsistency. How often would you change the primary key of the parent row? The only situation I could think of is as part of a delete – in which case I would have the application cascade down and delete child rows. InnoDB is being more careful than I need it to be.
Morgan, please correct me if i’m wrong:
we have 2 transactions, they start one after another. Then updating of child row occurs in 1-st transaction. Meanwhile in a second transaction two statements are executed. After all this, transactions are commited.
So, the first question: why “UPDATE parent SET bogus_column = ‘new!’ WHERE id = 4;” is not blocked? As I see, the child row with parent_id = 4 is modified! And isn’t it strange that “UPDATE parent SET bogus_column = ‘new!’ WHERE id = 5;” is blocked — why it should be? There no operations over the child row with parent_id = 5!
What am i missing?
ПожалуйÑта, ну хоть кто-нибудь!
You will notice that a child is changed to point to parent_id = 5:
(UPDATE child SET parent_id = 5 WHERE parent_id = 4;)
So now, one child is pointing to parent where parent_id = 5 but is not committed yet… It will be locked until committed…
Question is, why is this lock needed?!?!
a) I agree that only if the parent is deleted it should be a problem…
b) If the parent is found deleted at time of trying to commit the transaction, it should fail with “original item/parent was deleted by other transaction” message…
It seems to me that MySQL is not as robust as I thought…
But, I am not as educated as I would like to, and come from many years of using PostgreSQL that seems to be awesome (yes, I do like journaling databases, they keep my data safe 🙂
Any comments of comparison?
Will this only create temporary lock or will it create a deadlock if session 1 is never committed? (I do assume there is a max wait time where it is canceled, but even waiting 30 secs is a lifetime in webapps)
This is an older discussion but still important since Google searches will still direct folks here.
Denis says let the database be a database. What is a database? A means of storing data that allows you to get to the data quickly and logically. A database engine could have all sorts of cool bells and whistles but if those bells and whistles cause your application to drag down and result in crappy response times, they are not appropriate for that application.
Usually I choose the bells and whistles based on the size and complexity of the database. And where in my application those capabilities becomes necessary. If I have one or two spots where a foreign key restraint becomes necessary in a large and complex application it seems a little code that accomplishes the same thing is better than having that rule cause performance problems throughout the application.
By all means use the most sophisticated tools available within a data base engine that improves response and efficiency. But just because it has a really cool capability doesn’t mean that capability SHOULD or MUST be used.
I have used everything from DB2 to Oracle and find mySQL every bit as robust and efficient as any commercial product for what I have developed so far. But the biggest difference between the two environments is that with DB2 and Oracle I always had a group of DBA types taking my database design and doing the implementation. Far better than my experience could manage. I do my own mySQL implementation. It is my believe that a good mySQL DBA could get just as much out of mySQL as any good DB2 DBA could get out of DB2.
Maybe I’m wrong. But first I have to develop and application where the size and complexity of the data would make the difference.
So, locking and subsequent blocking is one thing, but as you said, the data referenced via your FK may not be changed all that often depending on how you application is written. For example, someone creates an account, that account gets assigned an ID, then a transaction is created which references that account via an FK; it’s unlikely that the “reference data” of the account information will be updated frequently. The blocking issue can be mitigated by keeping it in mind while coding the application.
I have heard that with Oracle, FKs provide a performance benefit because the optimizer has more information about what needs to be done and the plan is created faster. Does innodb get any such performance benefit in the same manner?
One thing not mentioned here is: I think we’ve all run into schemas where the only way to understand the entity relationships was to read through the application. FKs can provide a very intuitive view into data relationships, not to mention help prevent the application from creating inconsistent data. FKs can help make the schema the final gate for data consistency.
So, how about some real performance data benchmarks with and without FKs to see if the additional locking (blocking aside) has any real concern?
I’ve been caught out exactly on this making some assumptions of what it “should” do internally. InnoDB is not as specific as you might assume when it comes to parent row locking. It does what it says on the tin, row locking. Even if you have a transaction that only works on children the parent rows will still be locked for writes in general.
Theoretically in this case InnoDB should lock the parent table or row (depending on isolation required) against insertion, deletion and update of the PK. In reality the behaviour I am observing is that the parent row is locked outright against any update. It does allow inserts to the table however which I suspect can cause small anomalies.
It could in theory be faster and more specific but it isn’t.