Building a Dashboard in Python Ploty

🐼 panData
Level Up Coding
Published in
11 min readApr 28, 2024

--

Photo by Alex McCarthy on Unsplash

In the age of data-driven decision-making, businesses are continually seeking ways to harness their data for a competitive edge. Sales data, with its wealth of quantitative insights, stands as a cornerstone for strategic analysis.

This article embarks on a journey to build an analytical dashboard that distills sales transactions into actionable intelligence using Python.

Each transaction in a business’s ledger tells a story — a story of customer preferences, market trends, and operational efficiency.

Our dataset, structured as a tableau of sales transactions, captures these narratives in the granularity of its rows, each one a distinct beacon of commercial exchange. From the product sold in a windswept mid-western town to the bulk office supplies ordered by a bustling start-up on the east coast, the transactions encode the pulse of commerce.

As we dissect this dataset, our guide will meticulously craft a narrative on extracting, transforming, and visualizing sales data. We will unfold the secrets hidden within numbers and categories, transforming raw data into a visual symphony of graphs and charts that speak volumes of the underlying business currents.

Simulating Sales Transactions for Analytical Insights

Our objective was to create a dataset that not only reflects typical sales transactions but also caters to the nuances of an American business context.

To accomplish this, I wrote a function, generate_american_sales_data, designed to produce a robust set of records that mimic actual sales data.

Defining the Variables

The function encapsulates the diversity of a sales database, detailing attributes such as:

- Order ID: A unique identifier for each transaction.
- Order Date: The date on which the transaction occurred, from 19 to 2023.
- Customer ID: A pseudo-anonymous tag representing the customer.
- Segment: The market segment to which the sale belongs, including Consumer, Corporate, and Home Office.
- Region: The geographical region of the sale, encompassing East, West, Central, and South.
- Country: The national context of the sale, set to the United States.
- Product ID, Category: Detailed classification of the sold product.
- Sales: The dollar value of the transaction.
- Quantity: The number of items sold in the transaction.
- Discount and Profit: Financial metrics that reveal the transaction’s profitability.
- Priority: The urgency level of fulfilling the order, from Critical to Low.

Each of these variables was carefully chosen to ensure that our simulated dataset would provide a comprehensive field for analysis, mirroring the complexities businesses face when interpreting sales data.

Generating the Data

Utilizing Python’s random library, we infused variability into the dataset, from the sales amount to the product categories, ensuring a realistic distribution of data.

This randomness reflects the unpredictability of the market, allowing for a more robust analytical exercise.

Outputting to CSV

After populating a pandas DataFrame with 100 simulated transactions, the data was exported to a CSV file. This format was chosen for its wide use in data analytics workflows, ensuring that our subsequent steps are representative of typical industry practices.

The CSV file, named American_Sales_Data.csv, became the cornerstone of our analysis, a tableau on which we would paint our insights using the Python libraries pandas and Seaborn.

With the dataset in place, we are poised to dive into the heart of data analysis, transforming these simulated sales records into meaningful patterns and narratives. Our next steps will involve data cleaning, exploratory analysis, and the creation of a visually compelling dashboard that would enable stakeholders to glean actionable insights at a glance.

Stay tuned as we transition from data crafting to data storytelling, where each number and figure translate into a strategic decision-making tool.

Conducting a Targeted Exploratory Analysis

Exploratory Data Analysis is a critical phase in the data science process, serving as the bridge between raw data and actionable insights.

Our EDA will not wander aimlessly through the dataset; instead, it will focus on uncovering insights that directly impact business outcomes.

Setting the Stage for Analysis

Before diving into the analysis, we prepare our dataset. This involves loading the data, examining its structure, and ensuring cleanliness by handling missing values or incorrect entries. This step ensures that our findings are based on accurate and complete information.

# Display basic information and statistics
print(sales_data.info())
print(sales_data.describe())

Focused Analysis Areas

Our exploratory analysis will concentrate on several key areas that typically influence business strategies:

1. Sales Performance Over Time

Understanding how sales fluctuate over time is crucial for forecasting and planning. We’ll visualize sales trends across different timeframes and investigate any seasonal patterns or anomalies.

import seaborn as sns
import matplotlib.pyplot as plt

# Convert 'Order Date' to datetime
sales_data['Order Date'] = pd.to_datetime(sales_data['Order Date'])

# Aggregate sales by month
monthly_sales = sales_data.set_index('Order Date').resample('M').sum()

# Plotting monthly sales trends
plt.figure(figsize=(12, 6))
sns.lineplot(data=monthly_sales, x=monthly_sales.index, y='Sales')

plt.title('Monthly Sales Trends')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.show()

The chart presents a dynamic portrayal of monthly sales, characterized by notable ebbs and peaks throughout the observed period. The rhythmic fluctuations suggest a potential seasonal impact on sales performance, with pronounced spikes possibly reflecting increased demand during specific times of the year, such as holidays or promotional events.

Equally important are the discernible dips, which may pinpoint opportunities for strategic promotions or inventory adjustments. A professional analysis would delve into these variations to optimize sales cycles and address any underlying issues that the data may imply.

2. Regional Sales Distribution

By analyzing sales by region, we can identify high-performing and underperforming areas to optimize regional strategies.

# Sales by region
plt.figure(figsize=(10, 6))
sns.barplot(data=sales_data, x='Region', y='Sales', estimator=sum, ci=None)

plt.title('Sales by Region')
plt.xlabel('Region')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.show()

Examining the bar chart, an obvious observation is the Central region’s standout performance, which surpasses the others. This might suggest that the Central region has a larger customer base, more effective sales strategies, or a broader range of products that resonate with its demographic.

A non-obvious insight to explore further is whether the Central region’s high performance is due to a few large transactions or a greater volume of sales.

Analyzing the average transaction size and the number of transactions per region could reveal whether the strategy should focus on increasing transaction volume or value in other regions. This insight could guide tailored regional strategies to leverage the Central region’s success model or identify untapped potential in other regions.

3. Product Category Analysis

We’ll break down sales by product categories to determine which products are driving revenue and which may require reevaluation.

# Sales by category and sub-category
category_sales = sales_data.groupby(['Category', 'Sub-Category']).agg({'Sales': 'sum'}).reset_index()
plt.figure(figsize=(12, 8))
sns.barplot(data=category_sales, x='Category', y='Sales', hue='Sub-Category')

plt.title('Sales by Product Category and Sub-Category')
plt.xlabel('Product Category')
plt.ylabel('Total Sales')
plt.show()

Phones and chairs show strong sales within their categories, signaling high demand or successful sales strategies.

The minimal sales in tables and storage could indicate a need for marketing focus or product development. Considering the high performance of phones, it might be beneficial to explore bundling strategies with lower-performing sub-categories to boost overall sales.

4. Profitability Insights

Finally, analyzing the profitability per sale, considering discounts and their impacts, will offer insights into pricing strategies.

# Calculate profit margin
sales_data['Profit Margin'] = sales_data['Profit'] / sales_data['Sales']
# Plot profit margins
plt.figure(figsize=(10, 6))
sns.boxplot(x='Region', y='Profit Margin', data=sales_data)

plt.title('Profit Margins by Region')
plt.xlabel('Region')
plt.ylabel('Profit Margin')
plt.show()

The profit margin across all regions is hovering around the break-even point, which is quite tight, indicating that there is little room for error before a sale becomes unprofitable.

The Central region shows a wider range of profitability, suggesting variability in the cost structure or discounting practices that could be optimized.

Consistency across the other regions points to a stable, albeit narrow, profit margin that could benefit from a focused strategy on either cost reduction or premium pricing where the market can bear it. This could also signal an opportunity to innovate in product or service delivery to increase margins.

A Seaborn Approach

A key element in data analysis is the ability to visualize data effectively. With our rich dataset in hand, we aim to answer five critical business questions using Python’s Seaborn library to create clear, informative visualizations.

Total Sales Value

Understanding the overall performance starts with knowing the total sales value. This metric provides a high-level view of the company’s performance.

# Calculate total sales
total_sales = sales_data['Sales'].sum()
print(f"Total Sales Value: ${total_sales:,.2f}")

Sales by Product Category

Next, we dissect the sales based on product categories. This breakdown can reveal which categories are the most popular and generate the most revenue.

# Plot sales by product category
plt.figure(figsize=(10, 6))
sns.countplot(data=sales_data, x='Category')
plt.title('Number of Sales by Product Category')
plt.show()

Sales by Country and Priority

To tailor logistical operations, we analyze sales by country, factoring in the delivery priority.

# Plot sales by country with delivery priority
plt.figure(figsize=(12, 8))
sns.catplot(data=sales_data, x='Country', hue='Priority', kind='count', height=5, aspect=2)
plt.title('Number of Sales by Country and Delivery Priority')
plt.show()

Average Discount by Product Subcategory

Delving into discounts, we assess if any particular subcategories are more frequently associated with higher discounts.

# Plot average discount by subcategory
plt.figure(figsize=(14, 7))
sns.barplot(data=sales_data, x='Sub-Category', y='Discount')

plt.title('Average Discount by Product Sub-Category')
plt.xticks(rotation=45)
plt.show()

This bar chart represents the average discount given for different types of products, categorized as sub-categories. Here’s how to interpret it:

  • The bars represent different product sub-categories, like Tables, Art, Accessories, and so on.
  • The height of each bar shows the average discount level for that sub-category. A taller bar means a higher average discount.
  • The lines extending from the top of each bar, called “error bars,” give us an idea of how much the discounts vary within each sub-category.
  • A longer line means more variation in the discounts given for that type of product.

Discounts are pretty similar across most product types, but some sub-categories, like Furnishings and Chairs, have a slightly higher average discount.

There’s considerable variation in the discounts for products like Computers and Furnishings, as indicated by the long error bars. This means that for these products, some sales might have a high discount, while others have a low discount. Products like Binders and Chairs seem to have less variation in the discounts they offer, as their error bars are shorter.

In a nutshell, this chart can help a company understand where they’re giving more discounts and where the discount rates are most inconsistent. With this info, a business can make more informed decisions about pricing strategies and promotions.

Introduction to Plotly

The goal of the dashboard is to offer a multifaceted view of the company’s operational success, highlighting areas for potential improvement and driving strategic decision-making.

Data Preparation

The initial step involves loading the sales data from a CSV file into a Pandas DataFrame.

This data is preprocessed to convert the ‘Order Date’ from a string format to a datetime format, which facilitates time-based aggregation and analysis.

import pandas as pd

data_path = 'American_Sales_Data.csv'
df = pd.read_csv(data_path)
df['Order Date'] = pd.to_datetime(df['Order Date'])

1. Total Sales by Category

This bar chart provides a straightforward comparison of total sales across different product categories, identifying which categories contribute most to revenue.

This visualization is essential for understanding product performance and guiding inventory decisions.

import plotly.express as px

category_sales_fig = px.bar(df.groupby('Category')['Sales'].sum().reset_index(),
x='Category', y='Sales', title='Total Sales by Category', color='Category')
category_sales_fig.show()

2. Profit by Region

By examining profits across different regions, this chart helps in pinpointing geographical areas where the company is most profitable and where it may be facing challenges

profit_region_fig = px.bar(df.groupby('Region')['Profit'].sum().reset_index(),
x='Region', y='Profit', title='Profit by Region', color='Region')
profit_region_fig.show()

3. Sales Performance Over Time

A line chart displays sales trends over time, allowing stakeholders to observe seasonal patterns, growth trajectories, and potential cyclical fluctuations in sales.

df_time = df.groupby(df['Order Date'].dt.to_period("M")).agg({'Sales': 'sum'}).reset_index()
df_time['Order Date'] = df_time['Order Date'].dt.to_timestamp()
sales_time_fig = px.line(df_time, x='Order Date', y='Sales', title='Sales Performance Over Time',
line_group='Order Date', color_discrete_sequence=px.colors.sequential.Viridis)
sales_time_fig.show()

4. Relationship Between Discount and Profit

This scatter plot with a trend line explores whether there is a correlation between the discounts offered and the profits earned, providing insights into pricing strategies.

discount_profit_fig = px.scatter(df, x='Discount', y='Profit',
title='Relationship Between Discount and Profit', trendline="ols",
color='Discount', color_continuous_scale=px.colors.diverging.Tealrose)
discount_profit_fig.show()

5. Top Selling Products

Identifies the products that have generated the most sales, highlighting successful products or potential stock issues.

top_products_fig = px.bar(df.groupby('Product ID')['Sales'].sum().nlargest(10).reset_index(),
x='Product ID', y='Sales', title='Top Selling Products', color='Sales',
color_continuous_scale=px.colors.sequential.Agsunset)
top_products_fig.show()

6. Sales by Customer Segment

This bar chart breaks down the total sales by different customer segments, such as Corporate, Home Office, and Consumer. Understanding which segments are most lucrative can help tailor marketing strategies and sales efforts to the most profitable areas.

sales_by_segment_fig = px.bar(df.groupby('Segment')['Sales'].sum().reset_index(),
x='Segment', y='Sales', title='Sales by Customer Segment', color='Segment')
sales_by_segment_fig.show()

7. Sales by Order Priority

This box plot provides insights into how the priority of orders affects sales volume. It can reveal whether high-priority orders contribute to higher sales and might help refine logistic and customer service strategies.

priority_sales_fig = px.box(df, x='Priority', y='Sales', title='Sales by Order Priority', color='Priority')
priority_sales_fig.show()

8. Profit Efficiency per Quantity Sold

This scatter plot examines the efficiency of sales in terms of profit generated per unit sold. It helps identify whether selling more units correlates with higher profits, or if there are diminishing returns on selling larger quantities.

df['Profit per Quantity'] = df['Profit'] / df['Quantity']
quantity_efficiency_fig = px.scatter(df, x='Quantity', y='Profit per Quantity',
title='Profit Efficiency per Quantity Sold', trendline='ols',
color='Profit per Quantity', color_continuous_scale=px.colors.sequential.Plasma)
quantity_efficiency_fig.show()

9. Discount Distribution by Category

This box plot explores how discounts are distributed across different product categories. It can highlight whether certain categories are more frequently discounted and how deep those discounts are, offering insights into sales and pricing strategies.

discount_distribution_fig = px.box(df, x='Category', y='Discount', title='Discount Distribution by Category', color='Category')
discount_distribution_fig.show()

10. Profit by Sub-Category

This bar chart details the profit made in each product sub-category, helping to identify which sub-categories are most and least profitable. Such insights are crucial for strategic decisions regarding product lines.

subcategory_profit_fig = px.bar(df.groupby('Sub-Category')['Profit'].sum().reset_index(),
x='Sub-Category', y='Profit', title='Profit by Sub-Category', color='Profit',
color_continuous_scale=px.colors.sequential.Inferno)
subcategory_profit_fig.show()

Each graph in the dashboard is designed to provide specific, actionable insights that can aid various departments within a company — from marketing and sales to finance and operations — in making informed decisions. By examining sales and profit data through various lenses (geographic, temporal, and categorical), stakeholders can identify trends, assess performance against strategic objectives, and adjust business strategies accordingly.

This comprehensive approach to data visualization ensures that the dashboard is not only a tool for visual representation but also a critical component of the business intelligence framework that drives data-driven decision-making and supports sustainable growth.

--

--

Derived from the Latin word "pan," which means "all" or "every," a personal repository of Data I have studied and somewhat self-taught. 🐼❤️