Data Visualization Tutorial
Hands-on Data Visualization with Pandas
Bar, histogram, and box plots with the plot method.
As a data science lover, one of my favorite stages of data analysis is data visualization. When I visualize data, I feel like an artist. Data visualization is one of the important steps of data analysis. To visualize data, most people usually use Matplotlib and Seaborn. Pandas is one of Python’s most important libraries used for data preprocessing and data cleaning. You can also use the methods in Pandas to draw the plots. These methods allow you to visualize Series and DataFrames more easily. In this post, I’ll cover the following topics:
- How to use the plot method?
- Bar plots with the plot method
- Histograms with the plot method
- Box plots with the plot method
Let’s dive in!
How to use the plot method?
The plot method is an amazing method that helps you to draw plots more easily. To show this method, let me import the necessary libraries.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Note that you can find the notebook and dataset here. Next, let’s use the %matplotlib inline
magic command to see the graphics between the lines.
%matplotlib inline
Let’s choose the seaborn-white style as the graphic style.
plt.style.use("fivethirtyeight")
You can use the plot
method to draw a plot in Pandas. Let’s draw a simple line plot. First, let’s create a series and use the cumsum
method to add the values.
data=pd.Series(np.random.randn(1000).cumsum())
Now that let’s draw a plot with the plot
method.
data.plot()
The plot method for DataFrames
You can also use the plot
method for DataFrames. To show this, let’s create a DataFrame.
df1 = pd.DataFrame(np.random.randn(100, 4),columns=list('ABCD'))
df1 = df1.cumsum()
Let’s draw a line plot with the plot
method.
df1.plot()
Bar plots with the plot method
You can also draw the bar plots with the plot
method. For example, let’s take the first 10 rows of df1 and draw its bar plot with the kind='bar'
parameter.
df1.iloc[10].plot(kind='bar')
You can draw other types of graphs by using the attribute of the plot
method. For example, let’s draw the bar plot with the bar
attribute.
df1.iloc[10].plot.bar()
You can also plot multi-bar plots. Let’s create a dataset to demonstrate this.
df2=pd.DataFrame(np.random.rand(7,3), columns=list("ABC"))
Now that let’s draw the multi-bar plots of these 3 columns with the bar
.
df2.plot.bar()
You can also plot a stacked bar plot with the stacked=True
parameter.
df2.plot.bar(stacked=True)
The barh
method is used to see the bars horizontally.
df2.plot.barh(stacked=True)
Histograms with the plot.hist method
You can use a histogram to see the distribution of data with the hist
method. To demonstrate this, let’s load the famous iris dataset.
iris=pd.read_csv("iris.data", header=None)
iris.columns=["sepal_length","sepal_width", "petal_length",
"petal_width", "species"]
You can access this data set here. Let’s take a look at the first 5 rows of this dataset.
iris.head()
There are 4 numerical variables in the iris dataset and a categorical variable indicating three types of iris flowers. Let’s draw histograms of these variables in the same plot with the hist
method.
iris.plot.hist(alpha=0.7)
To see the stacked histograms, you can use the stacked=True
parameter.
iris.plot.hist(alpha=1, stacked=True)
You can also adjust the size of the bins. To illustrate this, let’s create a bins
variable.
bins=25
Now let’s draw the histogram using this bins
variable.
iris.plot.hist(alpha=1, stacked=True, bins=25)
You can also draw a histogram horizontally with the orientation = "horizontal"
parameter.
iris["sepal_width"].plot.hist(orientation="horizontal")
You can use the diff
method to draw a histogram with the difference between the values for each row.
iris["sepal_length"].diff().hist()
You can draw separate histograms of each variable in the dataset.
iris.hist(color="blue", alpha=1, bins=20)
You can draw the histograms for each species in the petal length with the by
parameter.
iris.hist("petal_length",by="species")
Boxplot plots with the plot.box method
The box plot shows the distribution of a continuous variable. This plot allows you to look at the range, median, kurtosis, skewness, and outlier values of the distribution. You can use the plot.box
method for both Series and DataFrame in Pandas.
iris.plot.box()
You can also adjust the color of boxes, whiskers, medians, and caps in the plot. To illustrate this, let’s take a variable named color
.
colors={'boxes': 'Red', 'whiskers': 'blue','medians': 'Black', 'caps': 'Green'}
Now that let’s pass this variable to the plot.box
method.
iris.plot.box(color=colors)
You can also draw the box plot horizontally the vert=False
parameter.
iris.plot.box(vert=False)
Box plots with the boxplot method
You can use the boxplot method to draw the box plots. The boxplot method allows you to draw the box plots of each column of the dataset.
iris.boxplot()
You can draw box plots of the grouped data. For example, let’s draw the types separately for each numeric variable in the iris dataset the by
parameter. Before drawing this plot, let’s determine the themes of the plot.
plt.rcParams["figure.figsize"]=(8,8)
plt.style.use("ggplot")
Next, I’m going to draw the box plots for each numeric variable in the iris dataset with the by
parameter.
iris.boxplot(by='species')
Conclusion
You can use the plot
method in Pandas for data visualization. This method allows you to draw the plots more easily. That’s it. I hope you enjoy it. Thank you for reading. You can find this notebook here. Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | LinkedIn
If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇