Window Functions in MySQL 8.0I have recently written an article for this blog presenting Window Functions for MongoDB 5.0. I used some public Italian COVID-19 data for a few real examples. Please have a look at it if you like.

Then I thought I should provide the same even for a relational database like MySQL.

MySQL introduced Window Functions in version 8.0. This feature has been in demand by many developers for some time. In this article, I’m going to show the benefits and how you can use Window Functions on MySQL 8.0.

Load some public data

I have a Percona Server for MySQL 8.0 running and I got some public Italian data about COVID-19 infections, hospitalizations, and other info. The data are available on a per-day and per-region basis from the following link:

https://github.com/pcm-dpc/COVID-19/tree/master/dati-regioni.

I loaded just a few months’ data spanning 2021 and 2022. Original data is labeled in Italian, so I created a similar and reduced table just for the needs of this article.

Here is a sample of the data:

 

What are Window Functions

Window Functions permit the run of a window across sorted rows producing calculations over each step of the window. As a consequence, a result of the Window Functions is provided for each row returned by the query.

The main advantage of Window Functions is that you can run aggregations on the fly in more efficient and elegant ways, without the need of creating temporary tables or views to use for further queries.

Typical use cases are calculating rolling averages, correlation scores, or cumulative totals.

Two new clauses are used to define and use Windows Functions: OVER and WINDOW. The second one is not mandatory. We’ll see some examples to show how you can use them.

The first example: hospitalizations in Central Italy area

Let’s focus our attention on the specific area of “Central Italy” which includes the following five regions: Toscana, Umbria, Marche, Lazio, and Abruzzo.

We’d like to find out the number of hospitalizations per day in each region and also calculate the total of hospitalizations for the entire area for a specific period, the first week of December 2021.

 

In this example, the SUM() function works as a window function that operates on a set of rows defined by the contents of the OVER clause. Many other aggregation functions you regularly use with GROUP BY can be used as window functions this way. The OVER clause does the magic by defining the window’s set of rows. Here, the PARTITION BY day tells MySQL to consider as a window all the rows having the same value for the day column, and then calculate the SUM() over those rows only. The main difference between using the aggregation function with GROUP BY or as a window function is that in the first case a single row is returned for each group, in the second case all the rows are returned and the aggregated value is returned for each row.

The following picture shows with different colors how the windows are considered for this query:

 

Multiple window functions in one query

Here is another example to show you can create multiple windows inside a query. Each one must have its own OVER clause to define the partition.

In the following query, we would like to return the number of deceased on average for the entire country in January 2022. Also, we would like to return the maximum number of deceased for each region in the period.

As you can notice, we defined two window functions. The special case of OVER(), without any argument, means the entire result set is considered to apply the aggregation function. The second OVER clause defines partitions based on the region field.

Calculate daily new cases, the non-aggregate functions

Since we have in the table only the number of total cases, we would like to calculate the number of new cases on a per-day basis instead. This way we can understand if the status of the pandemic is getting worse or improving.

We need to define a Window Function to span the previous day’s row and calculate the difference between today’s total cases and yesterday’s total cases.

You can use the following query where a new special “non-aggregate” function is used.

The LAG function is a “non-aggregate” function introduced for the Window Functions usage only. The LAG function returns the value of the argument from the row lagging the current row within the partition. If nothing is specified as an argument, the previous row is considered. So, it’s our case, because we need yesterday’s total case to calculate the difference.

The LAG function can be also used as follows:

LAG(total_cases, 3)

In such a case it returns the value of total_cases from the row that lags (precedes) the current row by three rows within its partition.

Non-aggregate functions

We have introduced the LAG() function as non-aggregate. It is not the only one available. These functions perform for each row from a query, a calculation using rows related to that row.

The following table comes from the official documentation. There are all the available non-aggregate functions you can use with Window Functions.

 

Here is another example with some non-aggregate functions usage:

 

For more details about non-aggregate functions, look at the official documentation: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html

 

Named windows

Take a look at the last query executed. We have used multiple non-aggregate functions and we have defined multiple times the same window. We can simplify the query by defining only once the window providing a specific name. Then we can use that name to address the window without repeating the definition.

Then the previous query can be rewritten as follows:

 

You create the named window using the WINDOW clause at the end of the query. You provide a name and the partition definition, then you can reference that window simply by name in the OVER clauses.

Multiple Window Functions can be created and named in the same query, like in the following example:

 

Frame specification

As already shown in the queries we have tested so far, we can provide a frame clause when defining the Window Function. A frame is a subset of the current partition and the frame clause specifies how to define the subset.

Frames are determined with respect to the current row, which enables a frame to move within a partition depending on the location of the current row within its partition.

 

The boundaries of the frame specification can include:

  • UNBOUNDED PRECEDING: frame starts at the first row of the partition.
  • N PRECEDING: a physical N of rows before the first current row. N can be a literal number or an expression that evaluates to a number.
  • CURRENT ROW: the row of the current calculation
  • UNBOUNDED FOLLOWING: the frame ends at the final row in the partition.
  • N FOLLOWING: a physical N of rows after the current row.

 

Conclusion

Windows Functions is a new amazing feature introduced in MySQL 8.0. This feature permits us to execute fewer and more efficient queries to solve specific problems. Without Window Functions, you may be forced to create temporary tables or not efficient views to achieve the same result by running multiple queries.

For more details, you can take a look at the official documentation:
https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

Test the Window Functions on Percona Server for MySQL 8.0.

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Roman Janko

Excellent article! Just like the previous about CTE 🙂