Comments on: MySQL: Followup on UNION for query optimization, Query profiling

By: Christian

Christian — Sun, 17 Jun 2012 14:56:38 +0000

Actually, I’ve just come across and idea and on the first sight it seems to work: if you cannot decide between an index on filtering (because of a range) and an index on sorting, is there anything wrong with using a Sub-SELECT?

E.g. SELECT … FROM table_sorting WHERE EXISTS (SELECT … FROM table_filtering AS tf WHERE tf.birthdate BETWEEN … AND … ) ORDER BY last_online DESC LIMIT 20

For me it seems to work, the main SELECT uses the index for sorting the results of the inner Sub-SELECT, which uses the index on birthdate for filtering. The query takes on my localhost about 0.09s for 2milion rows compared to about 6s for using the index for filtering or 2-3s for using the index for sorting.

Would anybody see a weakness with this solution that I have not?

By: Christian

Christian — Sun, 17 Jun 2012 14:31:47 +0000

Hi Peter,

thank you for the two great posts! Just as you said in practice you won’t store the age in a database but rather the birthdate. For this, your workarounds (IN or UNION SELECT) both would not be applicable… what should you do in this case? E.g. having a database with birthdates as timestamps, no other filtering option and ORDER BY last_online?

By: Robert

Robert — Sun, 14 Feb 2010 13:53:51 +0000

Onur,

I think it is not a good idea to concatinate 1k querys. With such a query you’ll achive the limits of max_allowed_packet very rapidly (if you did not make the mistake to rise the value up to 100MB or so). The UNION workaround is only good for a REALLY small set of numbers, because as you saw, a last filesort is still remaining.

I would recommend the usage of a temporary table. After that you’ll be able to join that table with a much better index usage.

Robert

By: Ronald Bradford

Ronald Bradford — Fri, 13 Mar 2009 17:50:20 +0000

Peter,

Could you give the specific version of MySQL you were using here?

By: Onur Aktas

Onur Aktas — Sat, 07 Feb 2009 01:23:14 +0000

Hi Peter,

I use a query like you mentioned above and i can see the performance improvement as you mention above.
However, is it also good to use such a union operation for about 1000 Queries to query for about 20 millions of total records.

I mean, about 1000 times.

(SELECT …….. LIMIT 15)
UNION ALL
(SELECT ……… LIMIT 15)
.
.
ORDER BY CREATED LIMIT 15

Or do you advice me to use another way?

By: Peter Zaitsev

Peter Zaitsev — Fri, 13 Jun 2008 16:57:05 +0000

Tobias,

UNION generally has larger overhead than IN or OR. It is faster in the cases when it allows better execution plan – avoiding full table scan with OR on different columns (before 5.0) or avoiding filesort.

By: Peter Zaitsev

Peter Zaitsev — Fri, 13 Jun 2008 16:55:11 +0000

Brian,

You would need to test large IN vs BETWEEN this likely to be rather workload dependent.
Note very large IN sets (especially cascaded) may cause MySQL to abandon full range optimization – in this case there is it should become slower than BETWEEN.

By: Tobias

Tobias — Fri, 13 Jun 2008 14:44:39 +0000

Peter,

one question on UNION vers. IN : Is it a general rule that UNION is faster then IN? Another thing I experienced is a serious speed-advantage when using UNION instead of long OR-Statements. Is this correct?

Thank You!

By: brian

brian — Wed, 09 Apr 2008 00:52:48 +0000

Hi Peter,

I wonder if you might be able to expand on the limitations of using “IN” vs. “BETWEEN” in terms of a point of decreasing returns? (e.g., at what point do the number of values in “IN” become large enough that there is no performance difference between using “IN” and “BETWEEN”?)

Also, does the performance increase involved in using “IN” apply to JOIN queries as well? Is there a difference in how “IN” is treated when the query in question utilizes a JOIN?

Of course, one thing I love about this site is that often times you tell us how we can measure for ourselves. I can see if one query is faster than the other, but am still not so awesome at unwrapping things to determine why.
Thanks.

By: Jorje

Jorje — Tue, 30 Oct 2007 04:29:21 +0000

Great article…One quaestion : Can I use UNION, if I don’t have certain values for a field ( like age=18, age=19 ) ?? What if I have a query like :

SELECT * from table WHERE type = 1 and value >= 150.4 and value 31 order by score desc limit 20 ???

How can I avoid “using filesort”, in this case ???

Thanx

By: Peter Zaitsev

Peter Zaitsev — Fri, 15 Dec 2006 10:24:37 +0000

Mocanu,

Nope there is a serious difference. Assuming you have query which matches 50.000 rows and you want to get and display only 10 of them – a way most search applications work. In case you’re using simply IN you will get 50.000 rows stored in temporary table and sorted by filesort – going to be quite slow.

If you’re using UNION – each of the queries in the UNION will produce only 10 rows, using index for sorting not filesort. If you have 5 queries in the UNION you will have 50 rows in the temporary table which you still need to do filesort for.

Sorting 50.000 rows vs 50 rows is the difference here, now avoiding filesort completely 🙂

By: Sorin Mocanu

Sorin Mocanu — Fri, 15 Dec 2006 10:00:38 +0000

Peter, I think you are making a mistake regarding the filesort – in both cases (when using “IN (18,19,20)” and when using “UNION”) mysql makes a temporary table of about the same size on which it does a filesort. Basically, in the first version the results are filtered by age and then a temporary table is created with the *filtered* records.