Updated June 1, 2023.
Suboptimal MySQL ORDER BY implementation, especially together with MySQL LIMIT is often the cause of MySQL performance problems. Here is what you need to know about MySQL ORDER BY LIMIT optimization to avoid these problems.
Try Now: Free your applications with Percona Distribution for MySQL
MySQL LIMIT clause
The MySQL LIMIT clause is a valuable tool for controlling the number of rows returned by a SELECT statement. By specifying the maximum number of rows to retrieve from the result set, it enables you to work with subsets of data, especially in situations involving large tables. This feature enhances query performance and optimizes resource usage by fetching only the necessary rows.
Syntax of the MySQL LIMIT clause
The LIMIT clause in MySQL accepts one or two arguments: offset and count. Both parameters should be non-negative integers.
The offset parameter indicates the position of the first row to be returned from the result set, determining the number of rows to skip before returning the rows.
The count parameter specifies the maximum number of rows to be retrieved from the result set, setting a limit on the number of rows to be returned.
When using two parameters, the first parameter represents the offset, and the second parameter represents the count. This lets you retrieve a specific range of rows from the result set. But when using only one parameter, it signifies the number of rows to be returned from the beginning of the result set.
The basic syntax of the LIMIT clause is as below:
1 2 3 | SELECT column1, column2, ... FROM table_name LIMIT offset, count; |
How to use the ORDER BY and LIMIT clauses in a query
MySQL ORDER BY with LIMIT is the most common use of ORDER BY in interactive applications with large data sets being sorted. On many websites, you will find top tags, recently registered users, etc., which often require ORDER BY with LIMIT in the back end. In general, this type of ORDER BY looks like SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N, M.
Make sure it uses index. It is very important to have ORDER BY with LIMIT executed without scanning and sorting the full result set, so it is important for it to use index – in this case, index range scan will be started, and query execution stopped as soon as the required amount of rows generated.
MySQL LIMIT clause examples
Below we take a look at several examples of how to use the MySQL LIMIT clause to retrieve specific results, including date created, category IDs, and running queries on multiple columns.
Using MySQL LIMIT 10 to find result sets by ‘date_created’ and ‘category_id”
For example, if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10, I would use index on (date_created) to get a result set very fast.
Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;
In this case index by date_created may also work but it might not be the most efficient – If it is a rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be a better idea.
Let’s take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;
Even though it looks quite similar to the previous one, it is a lot different as there are multiple category_id values in the list now, so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from a performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.
Using MySQL LIMIT for queries on multiple columns
So what if you have an application which can perform a search on many different columns with worse-than-perfect selectivity? Various social networking and dating sites are perfect examples of such queries.
SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;
There could be many possible limiting factors, with all of them being optional. This is a hard nut to crack, and I know on high-end custom search solutions can be developed, but if we stick to simple MySQL, using multiple indexes on most selective columns would be a good idea for the performance of such queries.
For example, you may put index on(gender,last_online), assuming most people will have gender specified, as well as (country_id,city_id,last_online), assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with a good set of indexes for such cases, it also may need to be adjusted in the future.
The main thing to watch for if you do not have a full WHERE clause resolved by index is how many rows you need to scan to resolve order by (this can be found in a slow query log or by examining Hander statistics). If only 50 rows are examined to provide 10 rows of a result set, you’re in decent shape. But if it is 5,000, you might need to rethink your indexing.
Also, note – the number of records scanned to provide a result set will be very dynamic based on particular constant and other factors.
For example, for our dating example, if we use only (last_online) index and look for people from the USA, we likely will find ten people pretty quickly if the country is small or simply there are few members from the country, i.e., Slovenia – the same kind of search might need to scan 1,000s of times more rows to provide a result set.
In the example above, we did order by last column. In fact, the index can be used for ORDER BY if sorting is done by leading column(s). Note, however, columns following column used for order by can’t be used to restrict the result set. For example:
key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case, first two columns from the index can be used to satisfy order by, index, however, will not be helpful to check c=5 (unless it is index covered query). Index on (c,a,b) would work better for the query above.
Related Content: Need a drop-in replacement for any MySQL database? Check out Percona Server for MySQL
Best practices for using the MySQL limit clause
Do not sort by expressions
I guess this one is obvious – expressions or functions will block index usage for order by.
Sort by column in leading table
If you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table, which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other tables.
Here is an example when ORDER BY is done by the second table, which requires filesort:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by t.k limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec) |
However, if the first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by the second table:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.i=5 and test.k=t.k order by t.k limit 5; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | test | const | PRIMARY,k | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | t | ref | k | k | 4 | const | 1 | Using index | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.01 sec) |
The difference between these cases is “i” is the primary key while “k” is simply an indexed column.
Note: In some cases, even if it is possible to use index to do ORDER BY with JOIN, MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:
1 2 3 4 5 6 7 8 | mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by test.k,t.j limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec) |
In this case, there is index (k,j) on the table, so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done, however.
Sort in one direction
If you have ORDER BY col1, col2, it can be optimized using index. If you have ORDER BY col1 DESC, col2 DESC same thing, however, if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. A classic solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2), but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values so that you can do ORDER BY col1, col2_reverse instead.
Beware of large LIMIT
Using index to sort is efficient if you need first few rows, even if some extra filtering takes place, so you need to scan more rows by index than requested by LIMIT. However, if you’re dealing with LIMIT query with large offset, efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 pages in results. However, Search Engine Bots may very well do so. I’ve seen bots looking at 200+ pages in my projects. Also, many websites fail to take care of this, providing a very easy task to launch a DOS attack – a request page with a large number from few connections, and it is enough. If you do not do anything else, make sure you block requests with too large page numbers.
For some cases, for example, if results are static, it may make sense to precompute results so you can query them for positions. So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009, which has the same efficiency for any position (as long as it is indexed)
Force index if needed
In some cases, MySQL Optimizer may prefer to use a different index, which has better selectivity or just better estimates, instead of which allows you to do the sort. For example, if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10, the first index will likely be selected even if it leads to filesort.
The solution for this problem is either extending your indexes so MySQL Optimizer does not have to choose between better sort or better lookup or using FORCE INDEX to force it to use the appropriate index.
Many of the tips I’ve mentioned here work for MySQL ORDER BY without LIMIT as well, but there are some differences. I should write another article about ORDER BY without limit and large tables soon.
For queries that combine
ORDER BY
withLIMIT
, the optimizer may switch to an index that applies to theORDER BY
. In some cases, the decision to switch was based on a heuristic rather than on cost. The optimizer now uniformly makes the decision whether to switch on a cost basis. This should result in better performance when switching would cause a query to read an entire index or a large part of it to find qualifying rows.
References: See also: Bug #78993, Bug #22108385, Bug #73837, Bug #19579507, Bug #16522053.
Using descending index
SELECT * FROM test
ORDER BY k DESC, j ASC;
The optimizer can use an index on (k, j) if k is descending and j is ascending. It can also use an index on those columns (with a backward scan) if k is ascending and j is descending.
Choosing a MySQL Support Solution
Percona Support for MySQL is a valuable solution for organizations seeking to enhance their existing MySQL talent with the support and expertise provided by Percona’s open source software, services, and industry-best SLAs.
Organizations can enjoy a wide range of benefits, including comprehensive and responsive support for MySQL databases across various infrastructures, whether it’s on-premises, in the cloud, or in a database-as-a-service (DBaaS) environment. With Percona, organizations can have peace of mind knowing they have access to reliable and cost-flexible support for their MySQL databases, allowing them to focus on their core business operations.
Get high availability support for MySQL
Some free resources that you might find useful
Webinars
- Reducing MySQL Costs in the Cloud
- How to Ease Major MySQL Upgrades
- Ask me anything about MySQL 5.7 to 8 Migration
Blog Posts
- Best practices for configuring optimal MySQL memory usage
- MySQL query performance – not just indexes
- Understanding MySQL Query Digest with Performance Schema
- Scaling MySQL – A good problem to have
White Papers & eBooks
- 6 Common Causes of Poor Database Performance
- How to Manage Complex Database Environments Effectively
Learn more about Percona Server for MySQL
FAQS
Does MySQL have a limit?
Yes, MySQL does impose a limit on the number of rows that can be returned using the LIMIT clause. The specific limit varies depending on the version of MySQL being used and the configuration settings.
In earlier versions of MySQL, such as versions prior to 8.0.17, the maximum number of rows that can be returned is 18446744073709551615 (2^64 – 1). This value is extremely large and is considered practically unlimited for most practical use cases.
However, starting from MySQL version 8.0.17, the maximum number of rows that can be returned is determined by the value of the max_allowed_packet variable. By default, this value is set to 4MB. If there is a need to retrieve a larger number of rows, this value can be increased accordingly to accommodate larger result sets.
What is the maximum MySQL database size?
In MySQL, the maximum size of a database is not directly imposed by MySQL itself. Instead, it is determined by the file system limits of the operating system on which the database is hosted. MySQL leverages the file system to store the database files, and the maximum size is subject to the constraints and limitations set by the underlying file system.
What is the limit option in MySQL?
In MySQL, the LIMIT option serves the purpose of limiting the number of rows that are returned by a SELECT statement. It provides a way to specify the maximum number of rows that you want to retrieve from the result set. By using the LIMIT clause, you can control the amount of data that is returned, making it particularly useful when dealing with large tables or when you only need to retrieve a subset of the data.
What is the limit of MySQL user?
In MySQL, there is no predefined limit on the number of users that can be created. The ability to create users depends on the system resources and configuration of the MySQL server. Factors such as available memory, disk space, and the maximum number of open file descriptors allowed by the operating system can impact the number of users that can be effectively managed.
To ensure efficient management of user accounts, it is advisable to create user accounts as needed and revoke unnecessary privileges when they are no longer required.
What does limit 100 mean in SQL?
In SQL, the LIMIT clause is used to restrict the number of rows returned by a SELECT statement. When LIMIT is used with a numeric value, such as LIMIT 100, it specifies that the query should return a maximum of 100 rows from the result set.
Does MySQL sort by primary key?
By default, in MySQL, when you execute a SELECT statement without specifying an explicit ordering using the ORDER BY clause, the result set is not guaranteed to be sorted by the primary key or any other specific column. If you want to ensure a specific order of the result set, you should use the ORDER BY clause and specify the column(s) by which you want the data to be sorted.
Hi! Great article (as always). Just wanted to point out one quick exception to this:
“Do not sort by expressions”
On a FULLTEXT index, if you have a MATCH … AGAINST … in the WHERE expression as well as the ORDER BY expression (for instance, for the purpose of ordering by relevancy), then the FULLTEXT index will indeed by used and the ORDER BY will be quite efficient, as it is *not* reexecuted.
Cheers, and keep up the fantastic articles, Peter and Vadim!
Jay
Thanks for another great blog post!
“Do not sort by expressions”
Picking random rows is another thing that social networking pages use a lot (random x users that live in the same city as you; random x users that share some attribute with you).
ORDER BY RAND() is evil but what alternatives are there? Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (
Something got stripped away:
Is there a best practice to tackle this issue with PHP5/MySQL5, especially when I only need little amounts (less than 20) of random rows?
Good point Maarten, I hope somebody knows an alternative / best practice?
One way is to have something like a unique rowid in that table. Then in the script choose a random value and try to select a row matching that number.
One algorithm would then be:
1) SELECT MAX(rowid) AS maxrowid FROM t1 LIMIT 1; (we know that lowest rowid is 0 since its unsigned int).
2) Run the random() function in your script which will return a numeric value between 0 and MAX(rowid) (dont forget to seed).
3) SELECT col FROM t1 WHERE rowid >= “numeric value from 2)” LIMIT 1;
4) If 0 rows were returned check to the other direction: SELECT col FROM t1 WHERE rowid = “numeric value from 2)” LIMIT 1;
Uhm for some reason half my post vanished !?
Anyway…
5) If still no rows were returned then tell the client that no rows were found.
In the case of “random x user from city y” you can have the select as:
SELECT col FROM t1 WHERE city = y AND rowid >= “numeric value from 2)” LIMIT 1;
along with an index such as INDEX (city, rowid)
Jay,
What do you mean exactly ? First I did not speak about full text search indexes here but about BTREE ones. If do full text search using natural language search you do not need to do any extra ORDER BY clause – it will be automatically sorted by “relevance”:
mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (0.08 sec)
But you must NOT include ORDER BY score DESC if you want your query executed in decent time. I will require Filesort which can take very long:
mysql> select title, match (title,content) against (“internet”)as score from cont where match (title,content) against (“internet”) order by score desc limit 10;
+—————————————+—————–+
| title | score |
+—————————————+—————–+
| Internet dynamics | 9.0042238235474 |
| Internet Generation | 8.8267345428467 |
| Neutral New Zealand Internet Exchange | 8.596981048584 |
| Regional Internet Registries | 8.3543605804443 |
| Internet organizations | 8.2909164428711 |
| Internet exchange points | 8.2456722259521 |
| Internet Explorer shells | 8.2083368301392 |
| Internet celebrities | 8.0902004241943 |
| IPv4 internet | 7.9001092910767 |
| Botswana Internet Exchange | 7.8775291442871 |
+—————————————+—————–+
10 rows in set (1 min 46.73 sec)
In fact it is one of major weaknesses of build in MySQL Full Text search – inability to efficiently combine it with sorting by anything else then relevance. For example if I would want to run boolean full text search for our dating example and sort by date registered or last login time it would crawl. This is where external solutions come in play.
I’ve now modified post to include Beware of large LIMIT section which is somewhing I forgot to write about during initial revision.
ORDER BY RAND() is indeed trouble maker you can meet in many applications. As workarounds are not trivial many people simply decide to leave it this way. It however will not scale for large data sets.
Apachez gives one decent workaround – if you just need one random item and you have sequential row_ids or there are few holes and you do not care about a bit skewed distribution – use this approach – select random(id) in the range and find row close to it.
If you need multiple rows or there is no sequential integer ids, you can use other approach.
– add indexed column called “random” and populate it with random values
– do order by random instead of order by rand()
– once you’ve selected the row update “random” to new random value
– periodically run process to resent random values for everyone otherwise if row has got very large value on update it would be never selected.
This works well for large data set but small number of selects as you need to do a lot of updates in this case.
Other approach would be to pre-create sequence of ordered values and use it. For example generate 1.000.000 of random rows at once. Now have as simple counter of how much of random rows you’ve used, ie first take 10 first rows, then second 10 rows etc. For many applications it is enough to run in circles for other you can generate more rows as soon as they are used.
The good thing about last technique is – you can get same distribution as with order by rand() and you can have very limited overhead.
If you know/calculate the size of your rows you could try this for a faster alternative to RAND() (example in PHP):
$var = row(“SELECT COUNT(id) FROM mytable”);
$var = rand(0, $var – 1);
$result = query(“SELECT random_item FROM mytable WHERE … ORDER BY … LIMIT $var,1”);
A dirty UNION query could produce a large result set.
Speeple,
This is pretty bad idea. You did not write what you do ORDER BY but it may require filesort. But even if you do it by index in average 50% of rows will have to be scanned because of LIMIT.
LIMIT 50000,1 is bad – meaning 50.000 of rows will be generated and frown away.
Apachez provided correct solution for the case if you have id column 🙂
Wups, the ORDER BY is redundent in such a query.
Better 🙂 But even without ORDER BY LIMIT with non zero offset is expensive.
Try LIMIT 10000000,1 on some large table 🙂
On small tables everything may look fast – full table scans, joins without indexing and this is often the root cause of scalability problems later on.
I’ve just tried the query on a several tens of gigabyte table with 12 Mil rows:
SELECT * FROM index_archive LIMIT 1000000,1
True, it is very slow, especially from a web application view point.
I can’t remember where I read that this was a good method… But it’s obviously a poor option, even if it is marginally faster than RAND().
I’ve implemented this in a social network project I’m working on for “random blog” & “random profile”… guess i’ll be giving Apachez’s method a try!
Yeah. Random blog and random profile are indeed easy as they often have sequential integer ids. There could be holes due to removed accounts but still good enough
Yes, the only flaw I have found with Apachez’s method is when there’s a integer jump e.g. 1 > 5 and id 3 is selected resulting in an error.
Well. That is whole point. If you random value was 3
you do where id>=3 order by id asc limit 1;
which gives you value 3 or closest to it if it does not exist.
I just want to disagree with Jay’s point (1st post). I’m using MySQL 5 and am having some fun adapting to their new optimization engine. I have a table that I use a full text index on when I run a query with no order by on it it is increadibly fast. HOWEVER when I use an ORDER BY MATCH … AGAINST … LIMIT the query is painstakingly slow as it uses filesort for some reason that I do not understand.
—
SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
ORDER BY match(title, url, stripped) AGAINST (‘test’ )
DESC
LIMIT 0, 50
27 rows fetched (60.67 sec)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where; Using filesort
— Versus —
SELECT * FROM sites
WHERE MATCH(title, url, stripped) AGAINST(‘test’ IN BOOLEAN MODE) AND stat = ‘A’
LIMIT 0, 50
27 rows fetched (0.75 sec)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sites fulltext stat,match_url_title_stripped match_url_title_stripped 0 1 Using where
Panos,
Jay is wrong on this. See my comment above.
The behavior you’re getting is quite expected. Check my new presentation on FullText Search performance to see other things which will be painfully slow:
http://www.mysqlperformanceblog.com/files/presentations/EuroOSCON2006-High-Performance-FullText-Search.pdf
All the workarounds for ORDER BY RAND() usually assume you are looking for a single record, or some small number. What if you want to randomly select an arbitrary number of rows (say between 500 and 15000)? If someone has a faster alternative to ORDER BY RAND() for this situation, I would be interested in hearing it. Thanks.
Rob ? Are you saying about random number of random rows ?
I mentioned one workaround which may work for you – you can have separate pre-creaed table which has sequence of random row ID something like:
id rnd
1 456
2 123
3 905
So you have sequential row number and random values. Now when you need N random you can do select rnd from tbl where id> LIMIT N;
If N is random – so compute it on your application.
Now you just need to advance as you use data from this table. Also you need something when all random data is used – for some application looping is enough for others you need to make sure you generate enough random data.
Hi there….
Great blog…
I have a difficult (for me) mysql optimisation problem. Basically I want to optimise a query that uses a fulltext search and orders by a stactic field. I.e.
SELECT * FROM WHERE MATCH() AGAINST(”) ORDER BY score DESC
Where score is a precomputed score field.
I have indexes built on the fulltext and score seperately. I cant combine them as they are different types. However, as mysql only uses the fulltext index it is forced to filesort for the orderby. And way round this?
Thanks!
Sorry, sql should read:
SELECT * FROM WHERE MATCH(text_field) AGAINST(‘text’) ORDER BY score DESC
Fulltext search is special. First it automatically does sorting by relevance unless you’re doing BOOLEAN search so you better just remove order by from this query.
Second – are you sure you do not want to use LIMIT ? Getting all matches will be very slow for large data sets.
Hi.
At first I have to say this is the excellent site! 🙂
And now my troubles. I discovered range and order by field in index is the last one used.
Suppose I have a table with a, b and c fields and SQL query
SELECT * FROM table WHERE a=1 AND b>10 ORDER BY c
key(a,b) works, but filesort is used.
key(a,c) works, but b range goes sequentially.
key(a,b,c) doesn’t work (only a and b is used).
Do you have any idea how to go around this? How to define index for a, b and c?
Thank you anyway,
Dan
Dan,
You’re right. This is the challenge and there is no solution which works for everything.
If b range is small you can use this trick:
http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/
Hi
Your articles posted here are great. I’m a beginner in mysql optimization and i have a “maybe stupid” question.
Let’s say i have a table mytable(a,b,c,d,e …) and i want to perform ORDER BY with LIMIT on more than 60-70% of the columns (i want to have many sorts available)
example queries:
select a,b,c FROM mytable ORDER BY a;
select a,b,c FROM mytable ORDER BY b;
… etc…
is it ok to define indexes for every column used in order by? When the indexes defined on a table are considered to be too many (assuming that are not redundant are are used for different purposes…)
Thank you.
Having sorts by many fields is challenging. You basically have only two choices you ether have a lot of indexes which slows down insertions/update and require space on disk and in caches or you use filesort which does not scale well with large data sets.
You say “Sort by column in leading table”, but could it be an workaround to create a view by joining 2 tables and then sort by any colum you want (from the view). Or is the same thins as SELECT with join and ORDER BY
Not really,
View is virtual entity so creating view does not make it to magically behave like it would be table from indexing standpoint – optimizer looks at base tables instead.
use subquery with in FROM, like
select * from (select * from tab where … limit 0, 20) order by rand() limit 0,1
Steve,
What is your advice about ? Your query would not be faster than query in FROM clause and you’re selecting random row out of 20 rows selected by first query which is not entirely the same as fully random 🙂
Peter,
It is semi- or fake random to minimum sort and table scan. I have 5 robot client run in the same time. It is for make sure the 5 client don’t pick up the same record out of 10,000 records.
It is random order in first 20, instead of random order in all (10,000 rows in my case). Performance is much better.
for 5 clients, 5 is OK instead of 20 in the sql. even put 100 there the performance won’t be big difference. I put 20 there, in case I increase the robot number.
Yes. In this case that works. Similar approach would be to create 20 rows store them in separate table and do select order by rand() from it. The benefit in such case is – you can regenerate the table every few minutes and get different set for rotation.
Good article. I didn’t realize the join order would affect the ORDER BY clause.
How to optimize in this kind:
update targets set status=’D’ where status=’C’ order by schedule limit 1;
with index of (
status
,schedule
), and it was really slow. AndSelect * from targets where status=’C’ order by schedule limit 1;
update targets set status=’D’ where id=’id’;
are very fast as expected.
how to solve it?
//Bow
Davies,
The problem in this case you want to UPDATE to use the same key as you’re updating which can cause “loops” for certain updates, consider for example
update tbl set i=i+10 where i>5
Updating value 10 will make it 20 and it will again match the clause etc.
So MySQL can’t use such index for query execution. It could use temporary table in the same way you do it but I guess it does not do it right now.
Hey guys, here’s another good solution. I have a users table where I want to select five people at random who have a certain setting on their profile. I can’t use max ID because the IDs won’t necessarily be in sequence, so here’s what I do:
Select all the user IDs that match my params and put them into an array…
$query = “SELECT user_id FROM users WHERE profile_privacy=’0′ AND profile_display_name””;
… Then use PHP to shuffle the resulting array and take the first five items:
shuffle($random_profile_ids);
$random_profile_ids = array_slice($random_profile_ids,0,5);
… Then do an IN query on the user_id field for the five items left in your array:
$in_sql = “‘” . implode($random_profile_ids,”‘,'”) . “‘”;
$query = “SELECT * FROM users WHERE user_id IN ($in_sql) LIMIT 5”;
On my 30k row table the new solution ran in a combined .0147 secs, compared to .0770 secs for ORDER BY RAND(). Not a bad savings if you ask me. Enjoy,
Rob
Rob,
That won
t work well on large datasets (it will eat all mamory sooner or later as data set grows). Acctually it seems to me, that using RAND() in this case will be more acceptable, even in on small data sets, becouse of not a such big performance gain.
Btw, you can improve your code by *not* using shuffle` but instead get 5 values from rand(0, count($random_profile_ids) – 1); (make sure thay do not overlap) and then just pick appropriate values from array.
I have a question regarding order by and limit..
i used to use this system in order to page the results 100 rows at a time but found it to be very slow as the page numbers grew larger and larger..
what i thought to help it scale better is to have something like this:
SELECT inverted_timestamp FROM whatever WHERE visible = ‘yes’ ORDER BY torrents.inverted_timestamp LIMIT 100,1
the inverted_timestamp field and the orderby go together as i m sorting by a few different columns (its like a spreadsheet table which u can page and sort by a few fields – i have added indexes for all the used cases).. the 100 is just the currentpage*perpage which happens to be 100 since i m on the 2nd page (page 1) and have 100 rows per page.
then i do the main query to fetch the actual results:
SELECT users.username,bla bla….. FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON torrents.owner = users.id WHERE visible = ‘yes’ AND inverted_timestamp>=’3111748733′ ORDER BY inverted_timestamp LIMIT 100
which i have indxed as (visible,inverted_timestamp) (and similarly for the rest of the columns i order by)
the value >=’311..’ comes from the 1st query which i use to essentially find the first row of that page and then use that value in the main query to prevent the main one (which fetches all the results) to have to do an order by in the entire data set… From my testing it seems to be running very well upto around 1-2 million rows.. from then on the second query is still very fast but the first one is starting to take the hit.. previously i had just teh second query with no >= condition and i used limit offset,perpage which was ok up until 100k rows and then it was really slow…
it seems this solution is an improvement .. my question really is, is there a way to make it better?
to illustrate what i meant above these are the queries generated with a different ordering:
SELECT size FROM whatever WHERE visible = ‘yes’ ORDER BY size DESC LIMIT 3800,1
SELECT users.username,bla bla FROM whatever LEFT JOIN categories ON category = categories.id LEFT JOIN users ON owner = users.id WHERE visible = ‘yes’ AND size
it got cut off:
AND size= value_found) which worked well but only when i ordered by id (so it was limited – though i can still probably use this way when the ordering is done via an integer that has sequential properties)..
the problem really lies in how to speed up the other order by’s beyond what i have done already.
hmm it doesnt like my posts.. it cut it again :/
size
ffs i guess its the single quote that fucks up … i wont bother trying again .. i believe u have understood what i m trying to say
Here is my logic to do random.
$entries = array();
$count = 10;
while ($count > 0) {
$DB->query(“SELECT col1,col2 FROM entries AS e1 INNER JOIN members ON userid=id AND active=1 JOIN (SELECT ROUND(RAND() * (SELECT MAX(entryid) FROM entries)) AS entryid) AS e2 WHERE e1.entryid >= e2.entryid AND e1.photos!=’a:0:{}’ ORDER BY e1.entryid ASC LIMIT 1”);
if ($DB->get_num_rows() > 0) {
$count–;
$entries[‘entries’][] = $this->__processEntry($DB->fetch_array());
}
}
It works pretty well for my standards
Great Weblog, I did often visit it with good results but now, I totally stuck in the simplest thing.
EXPLAIN SELECT p.products_id
FROM products p
ORDER BY p.manufacturers_id ASC
LIMIT 3
Gives me a
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index NULL idx_manufacturers_id 5 NULL 3490
could anyone give me a hint why it steps thure 3490 rows, even when using the idx_manufacturers_id?
thanks for any idea (using mysql5) && sorry if this is too stupid post 🙂
Chris
That was extremely helpful. I was having a hard time locating good and clear resources on limit and order optimization. Thanks!
Mikes solution is by far the fastest.
http://jan.kneschke.de/projects/mysql/order-by-rand a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
my choice was to combine the ORDER BY RAND() and the other one.
SELECT * FROM
(SELECT * FROM comments
WHERE count > $number
AND topic = $topic
LIMIT 100)
AS r1 ORDER BY RAND() LIMIT 5
The resulting speed is acceptable. The first LIMIT can be use to adapt the level of random in the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 5 Using temporary; Using ilesort
2 DERIVED comments ALL NULL NULL NULL NULL 140809 Using where
The query as you can see goes through 140.000 rows in 0.0792 seconds. Not as fast as the other one but a lot more flexible and open for other WHERE statements
Once again a post has been cut into pieces.
But it has one problem. What if you intend to look for multiple rows with another criteria.
Lets say you have 100.000 rows and want to output 5 randomized rows. From the 100.000 rows 200 qualify to be in the resulting rows. You can not run the query 5 times because you might end up with the same row 5 times. If there is a large gap between two qualifying results it can happen that you generate a random number that is in the gap thus giving you the same row over and over again.
Secondly you may hit the end of the set thus generating no output at all. Some ppl suggested to run the query backwards and forwards. i.id (smaller-equal) a.id and the i.id (greater) a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
Addendum:
I forgot to create a index over the concerning collumns. Now the efficiency has increased a lot.
But I just found a problem. The results are not spread evenly. They allways cluster around the begining of the table.
Any way to alter this behaviour?
Great resource Peter, and also thanks to Jan (comment #48) for his “compromise” solution to the ORDER BY RAND() dilemma. I’ve now managed to get a query (testing with 500k rows, requiring < 10 random rows) down from 0.8s execution time to just 0.05s. 🙂