Comments on: Nondeterministic Functions in MySQL (i.e. rand) Can Surprise You https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/ Mon, 10 Dec 2018 19:49:52 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: Alexander Rubin https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970045 Mon, 10 Dec 2018 19:49:52 +0000 https://www.percona.com/blog/?p=53997#comment-10970045 Ernie, I think a next reply by Joe shows a good point, really non-deterministic function should be evaluated for each row. Altho it is not what we can expect

]]>
By: Alexander Rubin https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970044 Mon, 10 Dec 2018 19:39:00 +0000 https://www.percona.com/blog/?p=53997#comment-10970044 Yes, I agree here. What is interesting, it seems that MySQL and PostgreSQL have this (described) behavior, but SQLite and (it seems like) MS SQL server has different.

]]>
By: Øystein Grøvlen https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970041 Mon, 10 Dec 2018 03:32:22 +0000 https://www.percona.com/blog/?p=53997#comment-10970041 AFAICT, the described behavior is according to the SQL standard. In theory, a WHERE clause should be evaluated for all rows. A database system may optimize this and not inspect every row if it has some other mechanism to identify which rows will satisfy the WHERE clause (e.g., an index). Also, functions calls that are constant (i.e., deterministic and does not depend on input from the current row) may be evaluated once. In this case, the function is non-deterministic, so it needs to be evaluated on every row.

]]>
By: Peter Zaitsev https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970039 Sat, 08 Dec 2018 18:41:29 +0000 https://www.percona.com/blog/?p=53997#comment-10970039 You may also wish to check this HackerNews Discussion https://news.ycombinator.com/item?id=18617809

]]>
By: Joe https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970034 Fri, 07 Dec 2018 03:37:43 +0000 https://www.percona.com/blog/?p=53997#comment-10970034 I say it’s not a bug. Imagine doing a simulation. Say you had a column containing the object’s (represented by the row) chance of success. Each time you run the query, you want back a random list of rows which “succeeded’ during the simulation. You’d write something like:

SELECT * FROM agents WHERE success_prob > rand()

If rand() evaluated before the the query executed (instead of row by row), then all of the rows that returned when rand() = .2 would also be returned when rand() = .1 (plus some additional rows). There would be no variation among which rows got selected… just among the quantity of rows selected .

The author correctly points out that a variable should be used to store a single random value before query exeution if you want a single random value to evaluate against all rows. But the WHERE clause evaluates for each row, so it makes sense that rand() should evaluate for each row separately if it is included in the WHERE clause.

]]>
By: Ernie Souhrada https://www.percona.com/blog/nondeterministic-functions-in-mysql-i-e-rand-can-surprise-you/#comment-10970033 Thu, 06 Dec 2018 22:41:47 +0000 https://www.percona.com/blog/?p=53997#comment-10970033 I think this is absolutely a bug. Standard rules of algebra would say that we evaluate expressions from the inside out, so we should be able to do the RAND(), then the ROUND(), and be left with a statement equivalent to SELECT * FROM table WHERE id = – in which case we should get back 0 or 1 rows.

The RAND() function is nondeterministic, yes, but it is never going to return more than one value – as opposed to some nondeterministic function that might return an unknown number of rows – so why can the optimizer / parser not be smart enough to figure that out? It used to be the case (and maybe it still is) that if you tried to do an EXPLAIN on a SELECT that contained a subquery, MySQL would actually run the subquery before giving you the optimizer’s plan – this seems like a much less complicated variant of that scenario.

]]>