Skewed Data on a Scatterplot

Published in

Level Up Coding

2 min readApr 25, 2024

Continuing the discussion on skewed data after the previous article of Skewed Data on a Bar Chart, it is more common to encounter skewed data on numeric data, therefore, it is more likely to deal with skewed data on a scatterplot than on a bar chart. One of the classic examples is displaying the relationship between GDP (Gross Domestic Product, which measures a territory’s income) and population. If we plot the data from the territories on GDP and population, you may find the observations are very skewed to one side like this:

Territories’ GDP and Population in Absolute Terms

Unfortunately, most of the countries are clustered in a lower left area. At the same time, the scatterplot does not clearly show the relationship between GDP and Population.

Like the previous article suggested, you may take a logarithm to the data point. If you are visualizing the data with Python and Plotly, you don’t need to manually take a logarithm to each data point, but rather simply pass the logarithm as a parameter in the layout setting like below:

data = []
data.append(go.Scatter(x=df['Population'], # No need to take log
    y=df['Nominal_GDP'], # No need to take log
    marker_color=df['color'],
    text=df['Territory'],
    hoverinfo='text',
    mode='markers'))

layout = {'title':{'text':'Nations\' GDP vs Population', 'x':0.5},
   'xaxis': {'gridcolor': 'lightgray',
    'type':'log' # Add this parameter to take a log on x-axis
   },
   'yaxis': {'gridcolor': 'lightgray',
    'type':'log' # Add this parameter to take a log on y-axis
   },
   'plot_bgcolor': 'rgba(0,0,0,0)'}

fig = go.Figure(data=data, layout=layout)

Once you have passed these arguments to Plotly, it will generate the scatterplot like below:

Territories’ GDP and Population after Taking Logarithm

Now, not only it is more readable by declustering the observations, but also it is more clear to show the upward-sloping relationship between GDP and population.

The scripts for generating these scatterplots can be found on my Github

DataViz_notes/skewed_scatterplot at main · jacquessham/DataViz_notes

Contribute to jacquessham/DataViz_notes development by creating an account on GitHub.

github.com

My LinkedIn:

https://www.linkedin.com/in/jacquessham/

Skewed Data on a Scatterplot

DataViz_notes/skewed_scatterplot at main · jacquessham/DataViz_notes

Contribute to jacquessham/DataViz_notes development by creating an account on GitHub.

Written by Jacques Sham