Finding seasonality in sales data

An intermediate topic in SQL with the cosmos

Karen Warmbein
Level Up Coding

--

Image credit: NASA

“Mars is there, waiting to be reached.” — BUZZ ALDRIN

Last time in LAG functionality on SQL Data, we imagined traveling to Mars and visiting the base of (one of many mountains), Olympus Mons where a food stand was selling astronaut ice cream. In the example from the post, I used imaginary data to showcase the LAG window function with the sales SQL data table.

This post is going to use the LAG as well as the LEAD window functions to solve an algorithmic problem I came across on LeetCode. I will introduce LEAD functionality, as well as solve the problem: display the records in the sales table which have three or more consecutive sales values equal to or greater than a given number. Let’s begin.

Updating the data

Recall the data we used last time had five records of net_sales; each net sale for a month that the food stand was open. The table, called sales, looked like this:

For the purpose of this blog, I am going to add more data — specifically, ten more records. This represents ten additional months where the net_sales of the food stand were recorded. To do that, I use INSERT:

INSERT INTO sales
VALUES (6, 9343),
(7, 9123),
(8, 8770),
(9, 6780),
(10, 4002),
(11, 4999),
(12, 5673),
(13, 6002),
(14, 7109),
(15, 7994);

A quick view of the table (with SELECT) shows fifteen records, each representing a month, with values for net_sales:

Perfect. Now we have more records to query.

LEAD window function

The LEAD window function is similar to the LAG function. Recall that with the LAG function we can access a row at a specific offset that comes before the current row. Let’s review an example:

Suppose I want to look at the net sales from the previous month while at the same time viewing the current month. If the current row is the 4th month and the net_sales value is 7319, then with an offset of 1, I expect to see 5618 in the next column for the 4th row.

Similarly, with the LEAD window function, we can access a row at a specific offset that comes after the current row. Similar to the previous example:

Suppose I want to look at the net sales from the following month while at the same time viewing the current month. If the current row is the 4th month and the net_sales value is 7319, then with an offset of 1, I expect to see 9282 in the next column for the 4th row.

Applying this to the first five rows of sales:

The Problem

Now we are equipped with some handy tricks to solve the problem where the owner of the food stand wants to know some information about the seasonality of her sales. Specifically, she wants to know if there are three or more consecutive months that have sale values equal to or greater than 7500.

There are a handful of methods to solve this problem, but here is a solution using LAG and LEAD window functions:

SELECT
s.months
, s.net_sales
FROM
(SELECT months
, net_sales
, LAG(net_sales, 2) OVER(ORDER BY months) before_month
, LAG(net_sales, 1) OVER(ORDER BY months) last_month
-- current month sales is net_sales
, LEAD(net_sales, 1) OVER(ORDER BY months) second_month
, LEAD(net_sales, 2) OVER(ORDER BY months) third_month
FROM sales) s
WHERE (s.net_sales>=7500 AND s.second_month>=7500 AND s.third_month>=7500)
OR (s.before_month>=7500 AND s.last_month>=7500 AND s.net_sales>=7500)
OR (s.last_month>=7500 AND s.net_sales>=7500 AND s.second_month>=7500)

Wow — that’s a complex query. Let’s break it down.

First, notice the subquery:

(SELECT months
, net_sales
, LAG(net_sales, 2) OVER(ORDER BY months) before_month
, LAG(net_sales, 1) OVER(ORDER BY months) last_month
-- current month sales is net_sales
, LEAD(net_sales, 1) OVER(ORDER BY months) second_month
, LEAD(net_sales, 2) OVER(ORDER BY months) third_month
FROM sales) s
  • In it, we are creating a new view that has new columns defined by LAG and LEAD.
  • The first five rows of this temporary table are:

Second, notice the WHERE clause:

WHERE (s.net_sales>=7500 AND s.second_month>=7500 AND s.third_month>=7500)
OR (s.before_month>=7500 AND s.last_month>=7500 AND s.net_sales>=7500)
OR (s.last_month>=7500 AND s.net_sales>=7500 AND s.second_month>=7500)
  • This recognizes that there are three situations that could have “three consecutive months.”
  • First situation: Where the current record for net_sales is the 1st month and two months of high sales follow.
  • Second situation: Where the current record for net_sales is the 3rd month and two months of high sales occur before it.
  • Third and final situation: Where the current record for net_sales is the 2nd month and a month of high sales occur before it and another month of high sales occurs after.

Putting it all together, we are querying the table created by the the subquery, s, and looking for three or more consecutive rows that have three values greater than 7500.

Result:

So, it seems that months 5 through (and including) 8 have peak sales.

That’s it for this fake data, folks! Another addition to your SQL toolbox.

For your reference, I have a series of SQL tutorials that cover basic and intermediate queries. Take a look!

Like these SQL posts? Have a question? Let’s chat in the comments!

--

--