Google Sheet API and Python

Hua Shi
Level Up Coding
Published in
4 min readNov 16, 2020

--

Microsoft released the first version of Excel for the Macintosh on September 30, 1985. After that Google released the Google Sheet which has the same functions as MS Excel. Most companies and their data analysts get used to using Excel or Google Sheet to manipulate and analyze the data. Almost every data analyst use Excel or Google Sheet every day. For me, Excel or Google Sheet is good for storing the data but not comfortable to clean, mine, and visualize the data.

If you are the same as me, there is a way to connect Google Sheet and Python. Then we can extract the data from Google Sheet and upload the updated data to the same Google Sheet but a new sheet.

There are several steps to connect Google Sheet and Python.

Necessary pre-work that we need to do

First of all, you need an account to log in Google developer console and create a new project. You can name it as anything you want. Then you need to click the “Library” on the left side and search “Google Drive API” then select it and enable it. After that, we need to go back to click the “Library” again and search “Google Sheet API”. After you enable it you need to click “Credential” on the left side to create credentials for a Web Server to access Application Data then name the service account and grant it a Project Role of Editor. Then we can download the JSON file to the desired location and rename it as “creds.json”.

Don’t worry. For the first time, you may feel very confused. Here is the video tutorial to all the steps in the pre-work part.

Python Code

  • Install/import libraries
#If you did not install gspread, you need to install it first
!pip install gspread oauth2client
# import necessary libraries
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from pprint import pprint
import pandas as pd
  • Function for obtaining client_email
def client_email(creds_file):
# read and open creds.json file
with open(creds_file) as f:
creds = json.load(f)
# get client_email
client_email=creds['client_email']
return client_email

There is one information from the “creds.json” that is very important — client_email. For example, we have a google sheet which contains all the data that we need to analyze, and if we want to share it, we normally use email to share the Google sheet file. Here we can just simply copy can paste your client_email, and share your Google Sheet with this email.

  • Create a function to extract data from a specific Google Sheet
# obtain the client email and the data from google sheet with dataframe type
def gs_to_df(creds_file,sheet_name):
# Read Data from a Spreadsheet
gc = gspread.service_account(filename=creds_file) # gspread – to interact with Google Spreadsheets
# extract data from google sheet by the name of the sheet
sheet = gc.open(sheet_name)
#For the first sheet, pass the index 0 and so on.
sheet_instance = sheet.get_worksheet(0)
# get all the records of the data
records_data = sheet_instance.get_all_records()
# convert the json to dataframe
df = pd.DataFrame(records_data)
return df, sheet
  • Create a function to upload an updated data to your Google Sheets on a new sheet
def upload_data_to_gs(title,sheet_page,updated_df):
row=updated_df.shape[0] # row size
col=updated_df.shape[1] # column size
# add a sheet with 20 rows and 2 columns
sheet.add_worksheet(rows=row,cols=col,title=title) # title of new data
# get the instance of the second sheet,for the first sheet, pass the index 0 and so on.
new_sheet = sheet.get_worksheet(sheet_page)
# Update values to the sheet
new_sheet.insert_rows(updated_df.values.tolist())
# add headers
header = list(updated_df.columns)
index = 1
new_sheet.insert_row(header, index)
  • Get client_email and Extract data from a Google Sheet
# Get client_email and share the Google Sheet with this email
client_email= client_email=client_email('creds.json')
# After shared the google sheet with the client email,let us extract the data
df, sheet=gs_to_df('creds.json',"Consumer_Behavior_Data")
# my file's name is 'Consumer_Behavior_Data'
df.head()
  • Let us create new data using ‘groupby’
new_data=df.groupby(['Clicked on Ad','Gender'])["Daily Internet Usage"].mean().reset_index()
new_data

You can also clean the data with python and upload the cleaned data or any data that you want to upload the same google sheet file.

  • Now let us upload this new_data to the fourth sheet of the Google Sheet file.
upload_data_to_gs('target_gender_DailyInternetUsage',4,new_data)

Conclusion

For me, it is easier to visualize the data using python because I can flexibly manipulate the data with Python. If you are not very familiar with some functions of Google Sheet, you can just manipulate the desired data using Python, then upload those datasets to the same google sheet file in different sheets. The reason why I want to do this because some companies require you to create a dashboard and sometimes it is very time-consuming to create some pivot tables or any tables using query/sumifs or other functions in Google Sheet. So we can just directly use the datasets that we uploaded to easily create a dashboard.

--

--

Data Engineer /Data Analyst /Machine Learning / Data Engineer/ MS in Economics