“Hey, what’s going on with my applications? I installed a newer version of MySQL. I have queries that perfectly run with the older version and now I have a lot of errors.”

This is a question some customers have asked me after upgrading MySQL. In this article, we’ll see what one of the most frequent causes of this issue is, and how to solve it.

We are talking about this error:

Have you ever seen it?

SQL_MODE

As the first thing let me introduce the concept of SQL_MODE.

MySQL can work using different SQL modes that affect the syntax of the queries and validation checks. Based on the configured value of the variable sql_mode, it means that a query can be valid and regularly executes or can receive a validation error and cannot be executed.

The oldest versions of MySQL got users accustomed to writing queries that were not semantically correct because it was designed to work in the “forgiving mode”. Users could write any kind of syntactically valid query regardless of SQL standard compliance or semantic rules. This was a bad habit that was corrected introducing the sql_mode to instruct MySQL to work in a more restrictive way for query validation.

Some users are not aware of this feature because the default value was not so restrictive. Starting from 5.7, the default value is more restrictive and this the reason why some users have problems with unexpected query failures after migration to 5.7 or 8.0.

The sql_mode variable can be set in the configuration file (/etc/my.cnf) or can be changed at runtime. The scope of the variable can be GLOBAL and SESSION, so it can change by the purpose of the mode for any single connection.

The sql_mode variable can have more values, separated by a comma, to control different behaviors. For example, you can instruct MySQL how to deal with dates with zeros as ‘0000-00-00’, to ensure the date be considered as valid or not. In the “forgiving mode” (or if sql_mode variable is empty) you can INSERT such a value without problems.

But this is not the correct behavior as stated by the TRADITIONAL mode. As good programmers know, you have to validate dates into your source code in order to avoid to have incorrect data or incorrect results.

The following is how you can dynamically instruct MySQL to behave in the traditional mode to throw an error instead:

There are many other modes you can use. Covering all the modes is not the goal of the article, so please refer to the official documentation for more details and examples:

https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html

 

The ONLY_FULL_GROUP_BY issue

Let’s focus on the most frequent cause of errors when migrating to 5.7 or 8.0. As we said, 5.7 has a default SQL mode that is more restrictive than 5.6, and as such it’s for 8.0. This is true when you upgrade MySQL copying the old my.cnf file that doesn’t have a specific setting for the sql_mode variable. So, be aware.

Let’s create a sample table to store the clicks on the webpages of our site. We would like to log the page name and the id of the registered user.

 

Now we want to issue a query to calculate the most visited pages.

The query works, but it’s not really correct. It is easily understandable that page_url is the column of the grouping function, the value we are most interested in and we want to be unique for counting. Also, the visits column is good, as it’s the counter. But what about user_id? What does this column represent? We grouped on the page_url so the value returned for user_id is just one of the values in the group. In fact, it was not only user number1 to visit the index.html, but even users 2 and 3 visited the page. How can I consider that value? Is it the first visitor? Is it the last one?

We don’t know the right answer! We should consider the user_id column’s value as a random item of the group.

Anyway, the right answer is that the query is not semantically correct, because it has no meaning to return a value from a column that is not part of the grouping function. Then the query is expected to be invalid in the traditional sql.

Let’s test it.

Now we have an error, as expected.

The SQL mode ONLY_FULL_GROUP_BY is part of the TRADITIONAL mode and it is enabled by default starting from 5.7.

A lot of customers had this kind of issue after migration to a recent version of MySQL.

Now we know what the cause of the issue is, but our applications are still not working. What possible solutions do we have to let the applications work again?

Solution 1 – rewrite the query

Since it’s not correct to select a column that is not part of the grouping, we can rewrite the query without those columns. Very simple.

If you have a lot of queries affected by the problem, you have to potentially do a lot of work to retrieve and rewrite them. Or maybe the queries can be part of a legacy application you are not able or you don’t want to touch.

But this solution is the one that forces you to write correct queries and let your database configuration be restrictive in term of SQL validation.

 

Solution 2 – step back to the forgiving mode

You can change MySQL’s configuration and step back to the “forgiving” mode.

Or you can only drop the ONLY_FULL_GROUP_BY from the default. The default SQL mode in MySQL 5.7 includes these modes: ONLY_FULL_GROUP_BY, STRINCT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER

 

Solution 3 – use of aggregation functions

If your application absolutely needs to retrieve the user_id field for some valid reason, or it’s too complicated to change your source code,  you can rely on an aggregation function in order to avoid changing the sql mode configuration.

For example we can use MAX(), MIN() or even GROUP_CONCAT() aggregation functions.

 

MySQL provides even a specific function for solving the problem: ANY_VALUE().

 

Conclusion

I personally prefer solution number 1 because it forces you to write SQL-92 compliant queries. Following the standards is often considered a best practice.

Solution 2 is good in case you cannot change your application code or if rewriting all the queries is really too complicated. The solution is very good to solve the issues in a matter of seconds, however, I strongly suggest to have a long term plan to rewrite the queries that are not SQL-92 compliant.

For more details: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Photo of sign by Ken Treloar on Unsplash

Need a drop-in replacement for any MySQL database? Check out Percona Server for MySQL

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jean-François Gagné
Tony Marston

I think that restricting yourself to the SQL-92 standard is too, er, restrictive. The SQL-99 standard introduced the idea of “functional dependency” which means that if the GROUP BY clause contains a key field then any columns which are functionally dependent on that key field (i.e. on the same record as the one with that key) can appear in the SELECT list without having to appear in the GROUP BY list.

Vijay Kumar Kanta

Good post. Thanks for clarifying with potential fixes for SQL compliance using your own example code. Well done.

Sašo Kovačič

I need to migrate to the 5,7 version of mysql server and I have some problems with queries that selects fields that are not in the group by statement.

Here is an example of one of those queries. It is really simple one but still…

I have two tables: risks and measures
Every risk can have 0 or more measures.

Here is a structure of the risks table

id
name
residual_direct_cost
residual_indirect_cost
etc…

and here is a structure of the measures table

id
risk_id
name
total_annual_cost
etc…

and here is a query that I’m executing in our application

SELECT risks.id, risks.name, risks.residual_direct_cost, risks.indirect_residual_cost, SUM(measures.total_annual_cost) AS measures_cost
FROM risks
INNER JOIN measures ON measures.risk_id = risks.id
ORDER BY risks.name
GROUP BY risks.id

What I want to get is a list of risks with the total cost of their measures…

id | name | residual_direct_cost | residual_indirect_cost | measures_cost
1 | Risk 1 | 3000.00 | 3400.00 | 432.00
2 | Risk 2 | 4320.00 | 1234.00 | 542.00
etc..

This query is not valid in mysql 5.7+. So how can I rewrite it to be valid. I can’t just group by the name or *_cost fields, because there can be more than one risk with the same value in each of these fields.

Thanks

Sašo Kovačič

Sorry for the long question. I see I have to refresh my knowledge about SQL. Using ORM tools for too long can be harmful. Slowly you forget about the very basic rules of the SQL language… Answer to my question is to put all the non-aggregate columns into the group by statement.