When working with databases, there are always some projects oriented to performing analytics and reporting tasks over the information stored in the database. Usually, these tasks leverage window functions to do calculations “across a set of table rows that are somehow related to the current row,” as is described in the documentation.
There are several built-in windows functions available in PostgreSQL. In the latest release, PostgreSQL 15, some performance improvements were added for the rank()
, row_number()
, and count()
functions. First, let’s review what these functions can do.
The window functions
As mentioned above, the window functions let us perform some calculations on a set of table rows related to the current one. The “set of table rows” is usually identified as a “partition” defined by a column or columns. As we read in the documentation, the named functions work for:
rank () → bigint
Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group.
row_number () → bigint
Returns the current row number within its partition, counting from 1.
count ( * ) → bigint
Computes the number of input rows.
Consider the following table and data:
1 2 3 4 5 6 7 8 9 10 | Table "public.employee" Column | Type | Collation | Nullable | Default -----------------+--------------------------+-----------+----------+------------------------------------------ emp_id | integer | | not null | nextval('employee_emp_id_seq'::regclass) emp_name | character varying(20) | | not null | emp_country | character varying(35) | | not null | emp_salary | integer | | not null | date_of_joining | timestamp with time zone | | not null | now() Indexes: "employee_pkey" PRIMARY KEY, btree (emp_id) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | demo=# SELECT * FROM employee ; emp_id | emp_name | emp_country | emp_salary | date_of_joining --------+--------------+-------------+------------+------------------------ 1 | KELIO | Japon | 2000 | 2021-10-26 00:00:00+00 2 | JEAN-VINCENT | Canada | 6500 | 2021-01-22 00:00:00+00 3 | JUNO | Japon | 4000 | 2021-02-27 00:00:00+00 4 | GUY-EMMANUEL | Salvador | 2000 | 2020-07-27 00:00:00+00 5 | WALI | Japon | 7000 | 2021-01-31 00:00:00+00 6 | HENRI-PAUL | Canada | 4500 | 2021-08-19 00:00:00+00 7 | MUHAMED | France | 5000 | 2021-07-20 00:00:00+00 8 | MUHITTIN | Madagascar | 2500 | 2021-12-31 00:00:00+00 9 | DEVLIN | Madagascar | 7000 | 2022-04-03 00:00:00+00 10 | JOSUE | Salvador | 5500 | 2020-09-25 00:00:00+00 (10 rows) |
rank()
We can use the rank()
function to get the rank of employees (id) per country based on their salary. Look at the next example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | demo=# SELECT emp_id, emp_salary, emp_country, rank() OVER (PARTITION BY emp_country ORDER BY emp_salary DESC) FROM employee; emp_id | emp_salary | emp_country | rank --------+------------+-------------+------ 2 | 6500 | Canada | 1 6 | 4500 | Canada | 2 7 | 5000 | France | 1 5 | 7000 | Japon | 1 3 | 4000 | Japon | 2 1 | 2000 | Japon | 3 9 | 7000 | Madagascar | 1 8 | 2500 | Madagascar | 2 10 | 5500 | Salvador | 1 4 | 2000 | Salvador | 2 (10 rows) |
row_number()
In the next example, the row_number()
function gets a sorted list of employees’ names per country and their relative numeric position.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | demo=# SELECT emp_id, emp_name, emp_country, row_number() OVER (PARTITION BY emp_country ORDER BY emp_name) FROM employee; emp_id | emp_name | emp_country | row_number --------+--------------+-------------+------------ 6 | HENRI-PAUL | Canada | 1 2 | JEAN-VINCENT | Canada | 2 7 | MUHAMED | France | 1 3 | JUNO | Japon | 1 1 | KELIO | Japon | 2 5 | WALI | Japon | 3 9 | DEVLIN | Madagascar | 1 8 | MUHITTIN | Madagascar | 2 4 | GUY-EMMANUEL | Salvador | 1 10 | JOSUE | Salvador | 2 (10 rows) |
count()
The count()
function is an “old known” tool used by almost everyone with access to a SQL engine. This function is part of the aggregate functions list but can act as windows functions when the OVER
clause follows the call. So we can use it to know how many employees share the same salary as a given employee name.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | demo=# SELECT emp_name, emp_salary, emp_country, count(*) OVER (PARTITION BY emp_salary) FROM employee; emp_name | emp_salary | emp_country | count --------------+------------+-------------+------- KELIO | 2000 | Japon | 2 GUY-EMMANUEL | 2000 | Salvador | 2 MUHITTIN | 2500 | Madagascar | 1 JUNO | 4000 | Japon | 1 HENRI-PAUL | 4500 | Canada | 1 MUHAMED | 5000 | France | 1 JOSUE | 5500 | Salvador | 1 JEAN-VINCENT | 6500 | Canada | 1 WALI | 7000 | Japon | 2 DEVLIN | 7000 | Madagascar | 2 (10 rows) |
Window functions in PostgreSQL 15
Now that we have refreshed what the window functions are, let’s consider what the PostgreSQL 15 release notes say:
Improve the performance of window functions that use row_number(), rank(), and count() (David Rowley)
Accordingly, if we are users of the window functions and move from a previous version to version 15, we should see an improvement in the performance of our workload. Let’s test it.
Laboratory case
To test the performance of the window functions, I created three instances of PostgreSQL, (a) version 13, (b) version 14, and (c) version 15.
I used the same public.employee
table used in the previous examples, and I loaded it with 10K rows. Then I executed the same queries we saw before for the window functions. I got the output from an EXPLAIN (ANALYZE)
command which executes the query, and we can see the timing for the specific window function.
The EXPLAIN (ANALYZE)
output was the same for each version of PostgreSQL.
rank()
PG15
1 2 3 4 5 6 7 8 9 10 | QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=1288.49..1488.49 rows=10000 width=25) (actual time=11.946..18.732 rows=10000 loops=1) -> Sort (cost=1288.49..1313.49 rows=10000 width=17) (actual time=11.928..12.803 rows=10000 loops=1) Sort Key: emp_country, emp_salary DESC Sort Method: quicksort Memory: 980kB -> Seq Scan on employee (cost=0.00..180.00 rows=10000 width=17) (actual time=0.008..2.402 rows=10000 loops=1) Planning Time: 0.143 ms Execution Time: 19.268 ms (7 rows) |
PG14
1 2 3 4 5 6 7 8 9 10 | QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=844.39..1044.39 rows=10000 width=25) (actual time=12.585..20.921 rows=10000 loops=1) -> Sort (cost=844.39..869.39 rows=10000 width=17) (actual time=12.560..13.545 rows=10000 loops=1) Sort Key: emp_country, emp_salary DESC Sort Method: quicksort Memory: 1020kB -> Seq Scan on employee (cost=0.00..180.00 rows=10000 width=17) (actual time=0.011..1.741 rows=10000 loops=1) Planning Time: 0.449 ms Execution Time: 21.407 ms (7 rows) |
PG13
1 2 3 4 5 6 7 8 9 10 | QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=844.39..1044.39 rows=10000 width=25) (actual time=18.949..28.619 rows=10000 loops=1) -> Sort (cost=844.39..869.39 rows=10000 width=17) (actual time=18.896..19.998 rows=10000 loops=1) Sort Key: emp_country, emp_salary DESC Sort Method: quicksort Memory: 1020kB -> Seq Scan on employee (cost=0.00..180.00 rows=10000 width=17) (actual time=0.011..1.228 rows=10000 loops=1) Planning Time: 0.460 ms Execution Time: 29.111 ms (7 rows) |
We can easily see from the WindowAgg
node that the total time was smaller in the PG15 than in the other two. The performance improvement is clear here.
To verify this is consistent, I got the Total Time for the WindowAgg
node from 500 executions and plotted the next graph.
We can see the timing from the PG15 version is better than the other versions. Also, I added a trending line. We see the PG13 performed the “worst,” and even when the PG14 showed a better trend, the PG15 was the best.
I did the same exercise for the row_number()
and count()
functions.
row_number()
count()
I also got the results from 500 executions for these last two functions. In both cases, the difference was smaller than the rank()
function, but the PG15 still showed better results, as seen in the trend line.
The performance improvement for the rank()
, row_number()
, and count()
window function introduced in the new PostgreSQL 15 will let all those analytic and reporting projects process their data faster than in previous versions. As always, every environment has its characteristics and challenges, but as we saw in these quick tests, the new version delivers the improvement just out of the box.