Skewed Data on a Scatterplot

Jacques Sham
Level Up Coding
Published in
2 min readApr 25, 2024

--

Continuing the discussion on skewed data after the previous article of Skewed Data on a Bar Chart, it is more common to encounter skewed data on numeric data, therefore, it is more likely to deal with skewed data on a scatterplot than on a bar chart. One of the classic examples is displaying the relationship between GDP (Gross Domestic Product, which measures a territory’s income) and population. If we plot the data from the territories on GDP and population, you may find the observations are very skewed to one side like this:

Territories’ GDP and Population in Absolute Terms

Unfortunately, most of the countries are clustered in a lower left area. At the same time, the scatterplot does not clearly show the relationship between GDP and Population.

Like the previous article suggested, you may take a logarithm to the data point. If you are visualizing the data with Python and Plotly, you don’t need to manually take a logarithm to each data point, but rather simply pass the logarithm as a parameter in the layout setting like below:

data = []
data.append(go.Scatter(x=df['Population'], # No need to take log
y=df['Nominal_GDP'], # No need to take log
marker_color=df['color'],
text=df['Territory'],
hoverinfo='text',
mode='markers'))

layout = {'title':{'text':'Nations\' GDP vs Population', 'x':0.5},
'xaxis': {'gridcolor': 'lightgray',
'type':'log' # Add this parameter to take a log on x-axis
},
'yaxis': {'gridcolor': 'lightgray',
'type':'log' # Add this parameter to take a log on y-axis
},
'plot_bgcolor': 'rgba(0,0,0,0)'}

fig = go.Figure(data=data, layout=layout)

Once you have passed these arguments to Plotly, it will generate the scatterplot like below:

Territories’ GDP and Population after Taking Logarithm

Now, not only it is more readable by declustering the observations, but also it is more clear to show the upward-sloping relationship between GDP and population.

The scripts for generating these scatterplots can be found on my Github

My LinkedIn:

https://www.linkedin.com/in/jacquessham/

--

--

A data analyst at a digital consulting firm, a whisky-lover, an aviation enthusiast, and a gamer. Concern how to use data science to answer questions.