Data Visualization Tutorial

Hands-on Data Visualization with Pandas

Bar, histogram, and box plots with the plot method.

Tirendaz AI
Level Up Coding
Published in
6 min readMar 14, 2021

--

Photo by Lidya Nada on Unsplash

As a data science lover, one of my favorite stages of data analysis is data visualization. When I visualize data, I feel like an artist. Data visualization is one of the important steps of data analysis. To visualize data, most people usually use Matplotlib and Seaborn. Pandas is one of Python’s most important libraries used for data preprocessing and data cleaning. You can also use the methods in Pandas to draw the plots. These methods allow you to visualize Series and DataFrames more easily. In this post, I’ll cover the following topics:

  • How to use the plot method?
  • Bar plots with the plot method
  • Histograms with the plot method
  • Box plots with the plot method

Let’s dive in!

How to use the plot method?

The plot method is an amazing method that helps you to draw plots more easily. To show this method, let me import the necessary libraries.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Note that you can find the notebook and dataset here. Next, let’s use the %matplotlib inline magic command to see the graphics between the lines.

%matplotlib inline

Let’s choose the seaborn-white style as the graphic style.

plt.style.use("fivethirtyeight")

You can use the plot method to draw a plot in Pandas. Let’s draw a simple line plot. First, let’s create a series and use the cumsum method to add the values.

data=pd.Series(np.random.randn(1000).cumsum())

Now that let’s draw a plot with the plot method.

data.plot()
Line plot for a Series

The plot method for DataFrames

You can also use the plot method for DataFrames. To show this, let’s create a DataFrame.

df1 = pd.DataFrame(np.random.randn(100, 4),columns=list('ABCD'))
df1 = df1.cumsum()

Let’s draw a line plot with the plot method.

df1.plot()
Line plot for a DataFrame

Bar plots with the plot method

You can also draw the bar plots with the plot method. For example, let’s take the first 10 rows of df1 and draw its bar plot with the kind='bar' parameter.

df1.iloc[10].plot(kind='bar')
Bar plot for 4 variables

You can draw other types of graphs by using the attribute of the plot method. For example, let’s draw the bar plot with the bar attribute.

df1.iloc[10].plot.bar()
Bar plot with the plot.bar method

You can also plot multi-bar plots. Let’s create a dataset to demonstrate this.

df2=pd.DataFrame(np.random.rand(7,3), columns=list("ABC"))

Now that let’s draw the multi-bar plots of these 3 columns with the bar .

df2.plot.bar()
Multi-bar plot

You can also plot a stacked bar plot with the stacked=True parameter.

df2.plot.bar(stacked=True)
Stacked bar plot

The barh method is used to see the bars horizontally.

df2.plot.barh(stacked=True)
Horizontally stacked bar plot

Histograms with the plot.hist method

You can use a histogram to see the distribution of data with the hist method. To demonstrate this, let’s load the famous iris dataset.

iris=pd.read_csv("iris.data", header=None)
iris.columns=["sepal_length","sepal_width", "petal_length",
"petal_width", "species"]

You can access this data set here. Let’s take a look at the first 5 rows of this dataset.

iris.head()

There are 4 numerical variables in the iris dataset and a categorical variable indicating three types of iris flowers. Let’s draw histograms of these variables in the same plot with the hist method.

iris.plot.hist(alpha=0.7)
Histograms of numerical variables in the iris dataset

To see the stacked histograms, you can use the stacked=True parameter.

iris.plot.hist(alpha=1, stacked=True)
Stacked histograms of numerical variables in the iris dataset

You can also adjust the size of the bins. To illustrate this, let’s create a bins variable.

bins=25

Now let’s draw the histogram using this bins variable.

iris.plot.hist(alpha=1, stacked=True, bins=25)
Histograms with boxes width adjusted

You can also draw a histogram horizontally with the orientation = "horizontal" parameter.

iris["sepal_width"].plot.hist(orientation="horizontal")
Horizontal histogram

You can use the diff method to draw a histogram with the difference between the values for each row.

iris["sepal_length"].diff().hist()
Histogram with the difference between the values

You can draw separate histograms of each variable in the dataset.

iris.hist(color="blue", alpha=1, bins=20)
Separate histograms

You can draw the histograms for each species in the petal length with the by parameter.

iris.hist("petal_length",by="species")
Histograms for the petal length of each species

Boxplot plots with the plot.box method

The box plot shows the distribution of a continuous variable. This plot allows you to look at the range, median, kurtosis, skewness, and outlier values of the distribution. You can use the plot.box method for both Series and DataFrame in Pandas.

iris.plot.box()
Box plots for numerical variables

You can also adjust the color of boxes, whiskers, medians, and caps in the plot. To illustrate this, let’s take a variable named color .

colors={'boxes': 'Red', 'whiskers': 'blue','medians': 'Black', 'caps': 'Green'}

Now that let’s pass this variable to the plot.box method.

iris.plot.box(color=colors)
Colorized box plots

You can also draw the box plot horizontally the vert=False parameter.

iris.plot.box(vert=False)
Horizontal box plots

Box plots with the boxplot method

You can use the boxplot method to draw the box plots. The boxplot method allows you to draw the box plots of each column of the dataset.

iris.boxplot()
Box plots using the boxplot method

You can draw box plots of the grouped data. For example, let’s draw the types separately for each numeric variable in the iris dataset the by parameter. Before drawing this plot, let’s determine the themes of the plot.

plt.rcParams["figure.figsize"]=(8,8)
plt.style.use("ggplot")

Next, I’m going to draw the box plots for each numeric variable in the iris dataset with the by parameter.

iris.boxplot(by='species')
Box plots for each numeric variable in the iris dataset

Conclusion

You can use the plot method in Pandas for data visualization. This method allows you to draw the plots more easily. That’s it. I hope you enjoy it. Thank you for reading. You can find this notebook here. Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | LinkedIn

Data Visualization with Python

11 stories

If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇

--

--

Generative AI Engineer, PhD | YouTuber on AI | Top writer on Medium