LAG functionality on SQL Data

An intermediate topic in SQL with the cosmos

Karen Warmbein
Level Up Coding

--

Image credit: NASA

“Mars is there, waiting to be reached.” — BUZZ ALDRIN

Traveling and vacationing on Mars — it’s the stuff of science fiction. Mars, a neighboring plant to Earth, has deep valleys and the tall mountains. One mountain is Olympus Mons — with a staggering height of 72,000 ft (two and a half times Mount Everest’s height above sea level). It is a shield volcano, much like the the large volcanoes making up the Hawaiian Islands.

Although Mars’ environment is hostile to humans (one characteristic is the amount of radiation received on the surface could pose health risks), I’m going to create a scenario that allows us to imagine vacationing on this red planet. Suppose we spent 4 years on a spacecraft traveling to the planet. Among seeing many sites on the planet, we are visiting Olympus Mons this week. At the mountain base, a food stand is selling astronaut ice cream and other freeze dried foods.

Today’s SQL post uses fake data for the purpose of showcasing the LAG window function. Let’s begin.

Creating the data

Since this is a hypothetical scenario, I am going to create the monthly sales imaginary data for the food stand.

Here, I create the table named sales:

CREATE TABLE sales (
months INT,
net_sales FLOAT
);

And, then I insert data into the table:

INSERT INTO sales(months, net_sales)
VALUES (1, 8964),
(2, 4212),
(3, 5618),
(4, 7319),
(5, 9282);

I finally view my work with a SELECT statement:

So, there are 5 records of net_sales, each for a month that the food stand is open.

LAG window function

Suppose I wanted to look at the net sales from the previous month while at the same time viewing the current month. This is possible with the LAG window function. This allows us to access a row at a specific offset that comes before the current row. In terms of our example, if the current row is the 4th month and thenet_sales value is 7319, then with an offset of 1 I expect to see 5618 in the next column for that row. Similarly for the other records, we can expect the following.

Coding this up:

SELECT *
, LAG(net_sales, 1) OVER (ORDER BY months) "previous_month_sales"
FROM sales;

Two comments about this:

  • The LAG function arguments are the returned value (here, net_sales) and the offset (here, 1).
  • The OVER function indicates the window of data the LAG function is applied. Here, we are applying the function over the entire dataset in oder of months.

Looking at the result of the query:

That’s nice. But, let’s take this one step further. Suppose we want to compare two month as the percent change of the sales vales. This can be done as well:

WITH comparison AS (
SELECT *
, LAG(net_sales, 1) OVER (ORDER BY months) "previous_month_sales"
FROM sales
)
SELECT *
, CAST ((net_sales - previous_month_sales) / previous_month_sales
* 100 AS INT) "vs_previous_month"
FROM comparison;

This query is a little more complex and calculates a value from the table. Let’s break it down.

  • First I need to define a temporary table (called comparison) that exists just for one query. This table allows previous_month_sales to be defined as a column we can query against. (Otherwise, it’s just present in a table’s view.)
  • In the next query, I calculate the percent change, but CAST it as an INT so that I have a clean and concise percent in my final table view.

Result:

Reading and interpreting this table, the 2nd month of sales dropped ~53% in comparison with the previous month. The net sales in the 3rd month increased by 33%, and so on.

That’s it for this fake data, folks! Another topic for your SQL toolbox.

For your reference, I have a series of SQL tutorials that cover basic and intermediate queries. Take a look!

Like these SQL posts? Have a question? Let’s chat in the comments!

--

--