How to Use Pivot Tables In Pandas

Acusio Bivona
Level Up Coding
Published in
4 min readOct 11, 2020

--

A common Excel function made easier in Pandas.

In this short blog post, I will teach you about the different ways you can structure and index your dataset(s) to make it simpler to process or comprehend. The three goals of this blog post are:

  • Show the difference between a wide dataframe and a long dataframe.
  • Compare simple, flattened index structures and multi-hierarchical index structures.
  • Show how to make them yourself utilizing aggregation functions and pivot tables.

Long vs. Wide

These two distinct arrangements allude to how you can structure your data in a dataframe. In the wide arrangement, every column equals a variable and each row equals a data point. The index is normally an integer, with 0 being the first row. This is the common arrangement you’re probably used to seeing.

In the long arrangement, each index is a point in time for every observation. This means that a given observation can have multiple values over many rows. This is a particularly valuable format when managing time-series or aggregated data since we can utilize multi-hierarchical indexing. The image below provides a visual understanding of long and wide dataframes:

https://www.datasciencemadesimple.com/wp-content/uploads/2017/11/Reshape-long-to-wide-in-pandas-python-with-pivot-function-0.png

Multi-Hierarchial Indexing

Some common tasks when working with data are making pivot tables and/or gathering data utilizing aggregation functions. In this regard, the long arrangement gives you a neat, sorted out method of organizing your data with multiple indexes, or indices. It permits you to neatly and effectively represent various mixtures of data. Consider the pivot table below:

Pivot table representing number of deaths by gender and state.

You can see by looking at the left two columns that this dataframe has a multi-hierarchical index with two levels, comprising of “State” and “Gender”. Because of this, you have a simple way of aggregating and sorting the data, which allows you to answer questions like, “In Alaska, were there more men or women that died?”

Aggregation Methods

Pandas dataframes give a simple way to aggregate information utilizing the .groupby() method. To utilize this method, you simply indicate your indices, which are the columns you need to group the data by, in order, and the method will return another dataframe containing the grouped data as you requested. For example, if you needed to aggregate the above dataset only by state, you would type:

df.groupby("State")

This would give you a long dataframe with a flat index structure. Or, maybe you need to group your data by both the state and the gender, like in the example above. You would type:

df.groupby(["State", "Gender"])

This would return a wide dataframe with a multi-hierarchical index, with “State” being the outermost index.

.groupby() statements are generally used with aggregation methods, which permit you to rapidly calculate summary statistics, like the mean, median, mode, etc. Expanding on the last example, if you needed to see the mean values for each grouping of “State” and “Gender” across multiple other variables, you can utilize method chaining by typing:

df.groupby(["State", "Gender"]).mean()

Pivot Tables

Pivot tables permit you to rapidly inspect your data by “pivoting” on various variables of interest. Pandas permits you to effortlessly make pivot tables with the built-in .pivot() method. To make a pivot table, you simply just call the df.pivot() method, determine the index column & the columns to place in your pivot table, and the value to place in every cell. For example, the following code:

pivot = df.pivot(index=’State’, columns=’Gender’, values=’Deaths_mean’)
pivot.head()

Would produce the following pivot table:

*Note* The context for how “Deaths_mean” was created is not shown. It was created as a different step in my project that will not be discussed in this post. I used it in the above example solely for demonstrating how a pivot table can be created using Pandas.

This concludes my blog post on how to use aggregation methods and pivot tables to index and structure your dataset(s). I hoped this helped! Thank you for reading!

LinkedIn

--

--

Follower of Christ - Fitness Coach - Specialist in Functional Strength and Sustainable Weight Loss