Introduction to SQL: Aggregate Functions

Ken Hoffman
Level Up Coding
Published in
4 min readFeb 20, 2021

--

Introduction:

SQL (Structured Query Language) is a domain-specific language that is used in programming to manage data that is stored in a relational database management system. It was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. The system was designed in order to manage and retrieve data that was stored in IBM’s original relational database management system, System R. SQL helped introduce the concept of accessing many records with one command and eliminated the need to specify how to reach a record. SQL can be used to create and modify databases in addition to querying databases.

Aggregate Functions:

In SQL, you will often want to perform certain functions to analyze your data further and draw conclusions. There are certain functions in SQL that allow you to analyze your data to determine the maximum, minimum, average and sum of rows or columns in your data frame.

For the following examples, we will assume that there is a data frame for the New York Yankees that is titled ‘Yankees’. It includes each individual player’s statistics for the 2019–2020 season. The columns are: Player_Name, Position, Hits, HR, RBI, SB and Average.

Maximum:

Let’s assume that we want to figure out which player on the New York Yankees hit the most home runs in the 2019–2020 season. To do this, we would use the MAX function. For this example, the code would look like:

Minimum:

Now, let’s assume that we want to figure out which player on the New York Yankees had the lowest batting average in the 2019–2020 season. To do this, we would use the MIN function. For this example, the code would look like:

Average:

Now, let’s assume that we want to find out the average number of hits that each player on the New York Yankees had in the 2019–2020 season. To do this, we would use the AVG function. For this example, the code would look like:

Sum:

Finally, let’s assume that we want to find out the total number of stolen bases that the players on the New York Yankees had in the 2019–2020 season. To do this, we would use the SUM function. For this example, the code would look like:

Aggregate Functions with WHERE clause:

Often times, it is necessary to perform aggregate functions with specifications. You might want to figure out the total number of Home Runs that were hit by each position rather than the total number of Home Runs that were hit by the whole team. By doing this, you can draw further observations and conclusions from the data.

As an example, let’s say that we want to determine the total number of home runs that were hit by 3rd basemen on the New York Yankees. This code would look like:

Arithmetic with SQL:

In addition to using aggregate functions with data frames, you can perform basic arithmetic with SQL too. To perform arithmetic, you do not need a data frame. The symbols that are used to perform arithmetic are pretty self-explanatory:

  • Add = +
  • Subtract = -
  • Multiply = *
  • Divide = /

To perform these functions, you would use SELECT followed by the equation in parenthesis.

Interestingly, when you divide one whole number by another, SQL will return a whole number regardless of whether the result should have a decimal place. To combat this issue, you will need to insert the numbers in your equation with decimal places (instead of 7/4, you will need to do 7.0/4.0).

References:

  • Chamberlin, Donald (2012). “Early History of SQL”. IEEE Annals of the History of Computing. 34 (4): 78–82. doi:10.1109/MAHC.2012.61. S2CID 1322572
  • “Welcome to the Course!: SQL.” Campus.datacamp.com, campus.datacamp.com/courses/introduction-to-sql/aggregate-functions.

--

--