How a Simple Change Accelerates Your Google Cloud Spanner Queries by 3x

A surprisingly simple and highly effective optimization for your Spanner SQL queries.

Simon Hawe
Level Up Coding

--

Photo by Curioso Photography on Unsplash

Assume you are serving a machine-learning model via an API. When exposing such an API, you probably always want to optimize for low response latency. The faster you can respond to a request, the happier your customers are. That is a given.

Multiple things can contribute to the latency of your API that you have under control. These include but are not limited to too complex computations during model inference, the choice of your database technology, or how you interact with your chosen database. In the following, we are going to focus on the latter two as this can be solved independently of the machine learning task you are trying to solve.

Considering which database technology to pick, Google Cloud Spanner is a good choice due to its high performance, horizontal scalability, and global availability. Furthermore, Spanner supports a standard SQL interface, allowing you to use familiar SQL queries and tools to interact with the database. This simplifies development and integration with your existing applications.

Though you can use standard SQL, the performance of your query can be better or worse depending on various Spanner-specific factors. In the following, I will highlight a small modification to your current queries that can greatly enhance performance and reduce costs. Are you curious? Then read on!

The Problem

Let’s start with outlining the general problem that we want to solve. All my code examples are written in Python, but the same principles apply to any other programming language offering a Spanner interface.

Now, assume you have an application that requires retrieving information about a user on request time. That information is stored in a Spanner table called users . A user is uniquely identified via theiruser_id , which is also the table's primary key. On request time a user_id is passed to your application and you can retrieve the corresponding information like

Let’s quickly go through this tiny piece of code.

First, we set up a spanner client. Next, there is a function get_userdata that gets the user information from Spanner. The interesting part here is in line 8. There, we insert or interpolate the value user_id into the query string using an f-string and send the resulting string to Spanner.

Now, why is this interesting at all? That seems to be straightforward and logical to do. That’s what I also thought until my API got more requests, and suddenly Spanner didn’t respond anymore at all due to too high CPU usage.

Photo by Jeff Kingma on Unsplash

What a bummer!

Now, to solve that problem we have two options. First, scale up Spanner and pay more to Google 💰, which is not an option. Second, optimize the query to be less CPU-hungry 🍴. The latter sounds more desirable to me.

But how do you want to optimize such a simple query that already uses a primary key for fast look-up? It turns out that this is quite easy.

The Solution

Without further ado, let’s directly jump into the code that solves the problem

Hmm, this looks similar to before. The only thing I have changed is moving away from string interpolation to include the user_id into the query to use query parameters. You use a query parameter in a query via @your_parameter and populate it via a dictionary called params. You can have multiple query parameters in a query and also use the same one in multiple places.

Now, what huge effect can that small change have? Let’s have a look at the latency of my query and the CPU usage when I made the switch

Latency 50th Percentile
Total CPU Usage

What you can see here is a 3x decrease in latency and a similar decrease in CPU usage. That’s a lot! When I first saw this, I was like

Photo by Shubham Dhage on Unsplash

Now, why do we see such a performance gain? This gain is because Spanner can cache the query execution plan of queries that use query parameters. So if you run the same query many times but just parameterized differently, you see that huge benefit. For most HTTP-based applications, this is exactly the case.

To emphasize that once again, this benefit is not only on reduced latency but also on reduced Spanner costs and stability. And the best is that it comes at zero costs for you. That’s nice, isn’t it?

Last, using query parameters also makes your code more secure as it protects you from SQL injections and you don’t have to take care of quoting. That’s all handled by Spanner for you.

Photo by Joyce Adams on Unsplash

Conclusion

In this short article, I have shown you how to improve performance and reduce costs at no cost in Cloud Spanner using query parameters. Have a look at all your queries and apply that change. It is worth it!

Thank you for following this post. As always, feel free to contact or follow me for questions, comments, or suggestions either here on Medium or via LinkedIn. Cheers!

--

--

Tech and programming enthusiast working at Joyn mainly focusing on data sciene, machine learning, data engineering, and python coding.