Oracle: Order by/Group by/Having clauses and Window Functions

Andrew Suschevich
Level Up Coding
Published in
7 min readAug 31, 2020

--

At first, it is worth mentioning that Oracle is presented here as a collecting SQL language. Aggregate functions and manner of their application fit the entire SQL family and are applicable to all queries considering the syntax of each language.

I’ll try to briefly and quickly explain all the details in two parts. The article will most likely be useful for beginners.

Part 1
Order by, Group by, Having

ORDER BY

The ORDER BY clause allows sorting the output values, i.e., sorting the retrieved value by a specific column. Sorting can also be applied by a column alias that is defined with a clause.

The advantage of ORDER BY is that it can be applied to both numeric and string columns. String columns are usually sorted alphabetically.

By default, the ascending (ASC) sorting is applied. For the sorting to be descending, an additional DESC clause is used.

Syntax:

SELECT column1, column2,... (indicates the name
FROM table_name
ORDER BY column1, column2... ASC | DESC;

Let’s consider the examples:

In the first table, we get all the data and sort it in ascending order by the ID column.

In the second, we also get all the data sorted by the ID column in descending order using DESC.

The third table uses several fields for sorting. First, we sort by the department. If the first statement is equal for the fields with the same department, the second sorting condition is applied; in our case, it’s salary.

It’s pretty simple. We can specify more than one sorting condition, which allows us to sort the output lists more efficiently.

GROUP BY

In SQL, the GROUP BY clause collects data retrieved from specific groups in a database. Grouping divides all data into logical sets so that statistical calculations can be performed separately in each group.

This clause is used to combine the selection results by one or more columns. After grouping, there will be only one entry for each value used in the column.

The use of GROUP BY is closely related to the use of aggregate functions and the HAVING statement. An aggregate function in SQL is a function that returns a single value over a set of column values. For example: COUNT (), MIN (), MAX (), AVG (), SUM ().

Syntax:

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);

GROUP BY appears after the conditional WHERE clause in the SELECT query. Optionally, you can use ORDER BY to sort the output values.

Based on the table from the previous example, we need to find the maximum salary for each department. The final data selection should include the name of the department and the maximum salary.

In the first example, we solve the task without grouping but using a subquery, i.e., nest one query into another one. In the second solution, we use grouping.

The second example is shorter and more readable, although it performs the same functions as the first one.

How does GROUP BY work here: it splits two departments into QA and dev groups first, and then looks for the maximum salary for each of them.

HAVING

HAVING is a filtering tool. It indicates the result of aggregate functions performance. The HAVING clause is used when we cannot use the WHERE keyword, i.e., with aggregate functions.

While the WHERE clause defines predicate for rows filtering, the HAVING clause is used after grouping to establish a logical predicate that filters groups by the values of aggregate functions. This statement is necessary for checking the values obtained through aggregate functions from groups of returned rows.

Syntax:

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition

First, we define the departments with an average salary higher than 4000 and then the max salary using filtering.

The first example uses two subqueries: the 1st to find the maximum salary, and the 2nd to filter the average salary. The second example, again, is much simpler and more concise.

Query plan

Quite often, there are situations when a query runs for a long time, consuming significant memory resources and disks. To understand why queries are running long and inefficiently, we can check the query plan.

The query plan is the intended execution plan for the query, i.e., how the DMS will execute it. The DMS describes all the operations that will be performed within the subquery. Having analyzed everything, we will be able to understand the weak points in the query and use the query plan to optimize them.

Execution of any SQL statement in Oracle retrieves the so-called execution plan. This query execution plan is a description of how Oracle will fetch data according to the SQL statement being executed. A plan is a tree that contains a sequence of steps and the relations between them.

The tools allowing us to get the estimated execution plan of a query include Toad, SQL Navigator, PL / SQL Developer, and others. These tools provide a number of indicators of the query resource intensiveness, among which the main ones are: cost — the cost of execution and cardinality (or rows).

The higher the value of these indicators, the less efficient the query.

Below there is the query plan analysis. The first solution uses a subquery; the second uses grouping. Note that in the first case, 22 lines are processed, in the second one — 15.

Another query plan analysis that uses two subqueries:

This is an example of the ineffective use of SQL tools, and we do not recommend to use it in your queries.

All of the above features will make your life easier when writing queries and increase the quality and readability of your code.

Part 2
Window Functions

Window functions appeared in Microsoft SQL Server 2005. They perform calculations on a given range of rows within a SELECT clause. In short, a “window” is a set of rows where computation takes place. “Window” allows us to reduce the data and process it better. Such a feature allows us to split the entire dataset into windows.

Window functions have a huge advantage. There is no need to form a data set for calculations, which allows us to save all the data set rows with their unique ID. The result of the window functions operation is added to the resulting selection in one more field.

Syntax:

SELECT column_name(s) Aggregate function (column for calculation) OVER ([PARTITION BY column to the group] FROM table_name [ORDER BY column to sort] [ROWS or RANGE to delimit rows within a group])

OVER PARTITION BY is a property to define a window size. Here you can specify additional information; for example, a row number.

Let’s consider the following example: another department has been added to our table, now there are 15 rows there. We’ll try to derive employees, their salary, and the maximum salary of the organization.

In the first field, we take the name, in the second one — the salary. Next, we use the over () window function to get the maximum salary throughout the organization since the window size is not indicated. Over () with empty brackets applies to the entire selection. Therefore, the maximum salary is 10,000. The result of the window function is added to each line.

If we remove the mention of the window function from the fourth line of the query, i.e., only max (salary) remains, the query will not work. The maximum salary simply could not be calculated. Since the data would be processed line by line, and at the time of calling max (salary), there would be only one value of the current row, i.e., of the current employee. That is where you can see the advantage of the window function. At the time of the call, it works with the entire window and with all data available.

Let’s consider another example where we need to get the maximum salary of each department:

Here we set the frames for the window (set of rows on which window function operates), dividing it into departments. We have three departments: dev, QA, and sales.

The window finds the maximum salary for each department. As a result of the selection, we see that it found the maximum salary for dev first, then for QA, and then for sales. As mentioned above, the result of the window function is written in each row’s fetch result.

That’s how PARTITION BY works.

Conclusion

SQL is not as simple as it seems at first glance. Everything described above is the basic functionality of window functions. With their help, you can “simplify” your queries. But there is much more potential hidden in them: there are utility clauses (for example, ROWS or RANGE) that can be combined to add more functionality to queries.

I hope the post was useful for everyone interested in the topic.

Originally published at https://intexsoft.com on August 31, 2020.

--

--

Full-stack developer at intexsoft.com with over 10 years of experience in software development and project management.