Hands On data

How I Learned SQL Using BigQuery

Digging into SQL with BigQuery.

Jody Roberts
Level Up Coding
Published in
5 min readOct 11, 2020

--

Photo by Edho Pratama on Unsplash

Working with SQL and understanding the data that is all around you is very important to be successful in today’s data-oriented business world.

In this article, I’m digging into how I got started using SQL with Google’s BigQuery tool. To better understand my background as well as some key tips, tricks and takeaways from my journey check out this article first:

Where the rubber meets the road

BigQuery is available via a web-based UI so you can access your data and run queries via your browser and all you need is an internet connection. BigQuery does support other methods of access as well — the bq command-line tool and API access with a variety of client libraries. For the purposes of this article, I’ll stick to the web-based UI.

Another huge benefit of just getting starting with SQL is that BigQuery offers free public datasets (with 1TB of querying included) which you can utilize to hone your skills.

For our purposes, we will look at the Google Analytics Sample dataset. It just so happens as an added benefit Google offers sample queries in conjunction with this dataset.

Google offers public & sample datasets covering many verticals. Images by Author.

Once you’ve got the dataset added to your BigQuery project you’ll see it listed under “Resources” within your BigQuery Project as shown below. In addition, if you select the now available table series — ga_sessions_ — you’ll see the Schema of the table listed. The schema represents the available data columns within the table and their structure.

BigQuery Google Analytics Sample Dataset & Schema. Images by Author.

An easy way to get started is to simply run the sample queries provided. These are presented with questions and each produces a result that has been verified.

One thing I mentioned earlier but worth mentioning again is that in many cases data can be manipulated to tell any story. It’s important not to have a pre-determined bias or you may produce a query and thus a result that tells you what you want to hear.

Google Analytics Sample Dataset Sample Queries. Images by Author.
Sample 1
What is the average number of transactions per purchaser?

Calculate the average number of transactions per purchaser for April 2017.
SELECT
(SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM (
SELECT
fullVisitorId,
SUM (totals.transactions) AS total_transactions_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701'
AND '20170731'
AND totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )
Sample 2
What is the total number of transactions generated per device browser?
Identify the top 5 browsers used for making transactions in July 2017, listed in descending order by total transactions.
SELECT
device.browser,
SUM ( totals.transactions ) AS total_transactions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701'
AND '20170731'
GROUP BY
device.browser
ORDER BY
total_transactions DESC

Breaking it down

These queries hit on a few very important syntax items worth noting:

  • StandardSQL — BigQuery supports the use of both Legacy and Standard SQL. My preference has been to utilized StandardSQL however when I started out with BigQuery I utilized Legacy SQL. There are advantages to both — You can find more information here.
  • WHERE is your filter. Use it.
  • _TABLE_SUFFIX — As noted earlier the sample dataset exists as a table series that includes data separated by date (each day of data is a separate instance). Utilizing the _TABLE_SUFFIX wild card in a WHERE clause allows you to query across multiple days of data. This also can be utilized to limit the amount of data queried!

Sample Queries

  • Both queries utilize the WHERE clause to limit the query to data from July 1st — July 31st, 2020
  • Both queries utilize theSUM operator. The questions are quantitative in nature so its only natural you’d expect a mathematical computation. One thing to note, when utilizing an aggregate function and selecting additional data columns, you must use a GROUP BY operator.
  • Sample 2 is asking for “top 5 browsers” but the query produces more results than 5. Since there is an ORDER BY operator, it's quite easy to pick out the top 5 results. ORDER BY sorts the data either smallest to largest[ascending ASC] or largest to smallest [descending DESC]

Where do we go from here?

This is just a quick jumpstart on digging into real data with SQL via Google’s BigQuery tool. If you don’t have a real dataset already, don’t forget, data is all around you. Track something in a spreadsheet and then upload it to BigQuery as your own table (you can upload a CSV file to create a table) then use SQL to slice and dice it. For further BigQuery details check out — BigQuery’s documentation.

Now that I’ve laid a foundation and established a platform to utilize for honing our SQL-foo, be sure to follow me and stay tuned for subsequent in-depth pieces on analysis using SQL, including a piece on using data visualization and the power of dashboards.

This article is by no means comprehensive in covering SQL or BigQuery and I’d love to hear your opinions or suggestions for future topics. Feel free to share in the comments.

About Me

I work solving problems, driving operational excellence, and launching products. Along with being a DaaS focused Product Manager, I consult for organizations focused on maximizing value through product-focused growth and data value. Reach out to continue the conversation or work together.

--

--

Problem solving is my passion. Tech & Data execution is my profession. When problem solving, I’ll share, learn or leverage an expert. jody@hornetsnest.io