Data Visualization

Data Visualization with the plot method in Pandas

Area plot, scatter plot, hexagonal bin plot, pie plot, density plot, scatter matrix with the plot method.

Tirendaz AI
Level Up Coding
Published in
7 min readMar 29, 2021

--

Photo by Ben White on Unsplash

Data visualization is one of the most enjoyable stages of data analysis. Pandas is one of the most used Python libraries for data preprocessing and data cleaning. Libraries such as Matplotlib and Seaborn are often used to visualize data. But, you can easily visualize Series and DataFrame with Pandas.

In my last article, I showed how to use the plot method and talked about the bar, histogram and box plots with this method. In this post, I’ll cover the following topics:

  • Area plot
  • Scatter plot
  • Hexagonal bin plot
  • Pie plot
  • Density plot
  • Scatter matrix plot

Let’s dive in!

Area Plots

Area plots are drawn by filling in the space below the completed line. Note that for area plots, each column must be either positive or negative. To show area plots let’s import necessary libraries.

import matplotlib.pyplot as plt                       
import numpy as np
import pandas as pd

Let me set the seaborn-white style as the graphic style.

plt.style.use("fivethirtyeight")

You can draw area plots with the plot.area method. To show this method, let me create a DataFrame.

df = pd.DataFrame(np.random.rand(10, 4), columns=list("ABCD"))
df.head()
The first five row of the DataFrame

Note that you can find the notebook and dataset here. Let’s draw an area plot for only one variable in this dataset.

df['A'].plot.area()
Area chart for one variable

Let’s draw the area plots of all columns.

df.plot.area()
Area chart for multiple variables

Area plots are stacked by default. To draw an unstacked plot, you can use the stacked = False parameter.

df.plot.area(stacked=False)
Unstacked area chart

Note that if there is missing data, this value is automatically set to zero. In addition, you can use the fillna method to remove missing data.

Let’s move on and use real datasets named iris and movies. You can download these datasets here. First, I’m going to load the famous iris dataset with the read_csv method.

iris=pd.read_csv("iris.data", header=None)

There are no column names in the dataset. Let’s name the columns of the data set with the columns method.

iris.columns=["sepal_length","sepal_width", "petal_length", 
"petal_width", "species"]

Let’s see the types of columns of the dataset with the dtypes attribute.

iris.dtypes

The first four columns of the iris dataset are numeric and the last column is categorical. Now let’s draw the area plot of the numerical data with the plot.area method.

iris.plot.area()
Area chart for iris dataset

Let’s plot the unstacked plot of the variables with the stacked=False parameter.

iris.plot.area(stacked=False)
Unstacked area chart for iris dataset

Scatter Plots

Scatter plot is used to see the relationship between two numerical variables. The plot.scatter method is used to draw a scatter plot. Let’s draw a scatterplot between variables A and B in the df dataset with this method.

df.plot.scatter(x='A', y='B')
Scatter plot

Let’s now use the IMDb dataset to show the scatter plots. First of all, let’s load this dataset with the read_csv method.

movies=pd.read_csv("imdbratings.txt")

Let’s see the first rows of this dataset with the head method.

movies.head()
The first rows of imdb ratings dataset

Let’s see the types of columns in the dataset.

movies.dtypes
Variable types

Notice that the variables star_rating and duration are numeric. Let’s plot the scatter plots of these two variables with the plot.scatter method.

movies.plot.scatter(x='star_rating', y='duration')

You can draw the scatter plot of two pairs of variables in a plot using the plot method twice. Let’s see the scatter plots of sepal_length and sepal_width and petal_length and petal_width variables in the iris dataset on the same plot. To do this, let’s first create a variable named ax and draw your scatter plot with this variable ax .

ax=iris.plot.scatter(x='sepal_length', y='sepal_width', 
color='Blue', label='sepal')
iris.plot.scatter(x='petal_length', y='petal_width', color='red',
label='petal', ax=ax)
Scatter plot for two pairs of variables

If you want to set the color of each point while comparing two variables, you can write the parameter c as follows:

iris.plot.scatter(x='sepal_length', y='sepal_width', 
c='petal_length', s=100)

You can adjust the size of each of the points on the plot with the s parameter.

iris.plot.scatter(x='sepal_length', y='sepal_width', 
s=iris['petal_length'] * 50)

Hexagonal Bin Plots

If the number of observations in your data is high, you can use a hexagonal plot instead of a scatter plot with the plot.hexbin method. Let’s draw the hexagonal bin plot of the star_rating and duration variables in the movies dataset.

movies.plot.hexbin(x="star_rating", y="duration", gridsize=25)
Hexagonal bin chart

To determine the number of hexagons on the x-axis, you can use the gridsize parameter. This value is 100 by default. Let’s set the gridsize to 10.

movies.plot.hexbin(x="star_rating", y="duration", gridsize=10)
Hexagonal bin chart

Keep in mind that since we set the as gridsize=10 the hexagons get bigger.

Pie Plots

A pie plot is a circular statistical plot that can show only one series of data. You can use the plot.pie method for the pie plot of Series and DataFrame. Let’s use the iris dataset to show this plot. First, I’m going to select the petal_width variable and group the dataset by the variable species .

iris_avg=iris["petal_width"].groupby(iris["species"]).mean()
iris_avg

Now let’s plot a pie plot with the plot.pie method.

iris_avg.plot.pie()

Now let’s draw the pie plot of the two numerical variables of the iris dataset that we grouped according to the variable species. First, let’s create the variables named iris_avg_2 .

iris_avg_2=iris[["petal_width", 
"petal_length"]].groupby(iris["species"]).mean()

Now let’s draw a pie plot separately for each column of this dataset. For the pie plot of the DataFrame data, either the specific a y value is entered or the subplots = True parameter is used.

iris_avg_2.plot.pie(subplots=True)

You can also set other properties such as labels in pie plots. Let’s handle the iris_avg data for instance and draw a pie plot of this data with the default values.

iris_avg.plot.pie()

Now let’s set the properties.

iris_avg.plot.pie(labels=["setosa","versicolor", "virginica"], 
colors=list("brg"), fontsize=25, figsize=(10,10))

To see the percentage of pie slices, you would use the autopct='%.2f' parameter.

iris_avg.plot.pie(labels=["setosa","versicolor", "virginica"],   
colors=list("brg"),
autopct='%.2f',
fontsize=25,
figsize=(10,10))

Density Plot

Density plots allow to visualize the distribution of a numeric variable for one or several groups. You can draw a density plot the plot.kde method. This method can be used for both Series and DataFrame. Let’s draw density plots of numerical variables in iris dataset.

iris.plot.kde()
Density chart

Scatter Matrix

In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix. You can draw a scatter matrix with the scatter_matrix method. Let’s first import this method from pandas.plotting .

from pandas.plotting import scatter_matrix

Now let’s see the scatter matrix of the numeric columns in the movies dataset.

scatter_matrix(movies, alpha=0.5, diagonal='kde')
Scatter matrix

Conclusion

You can use the plot method in Pandas for data visualization. This method allows you to draw the plots more easily. In this post, I talked about area plot, scatter plot, hexagonal bin plot, pie plot, density plot, scatter matrix with this method. That’s it. I hope you enjoy it. Thank you for reading. You can find this notebook here. Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | LinkedIn

Data Visualization with Python

11 stories

If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇

--

--