Effects of Docker Image Size on AutoScaling w.r.t Single and Multi-Node Kube Cluster

Published in

Level Up Coding

6 min readDec 12, 2020

When you ask a developer, what are the two key things that are required for better performance of an application, the answer will surely be

Number of CPU Cores.
Application RAM.

When it comes to monolithic architecture, we estimate the total CPU and RAM required for an application and pick one huge machine with a very huge number of CPU Cores as well as hundreds of GB’s of RAM.

The issue with such huge machines is they are pretty costly compared to multiple horizontally scaled lightweight machines as well as they are hard to get one instantaneously unlike lightweight machines which are available on the fly.

As this article is mainly intended on autoscaling and Docker Image Size, let’s not get into the monolithic vs microservices discussion.

Things that we will be discussing in this article are

What’s autoscaling and when is it required.
What’s the relation between docker image size and autoscaling.
How it impacts end-users w.r.t Single Node & Multi-Node Kubernetes Deployments

Let’s start our discussion

What is autoscaling

Let’s say my estimated number of requests is 50000 RPM(Requests/Min) and my SLA on response time is around 50 milliseconds. Based on these requests and my SLA, I benchmark my server and database and come up with the number of containers required and the capacity of each container(CPU & RAM). Let’s visualize how things will be.

So let’s assume at 50000 RPM, with 3 containers I’m able to serve the requests within my SLA. But with a huge user base, we can’t guarantee that the number of requests can’t exceed or we can’t say that our service doesn’t support more than 50000 RPM.

Now consider, due to some flash sale we started getting requests around 70000 RPM and now this is how each container’s CPU usage will be.

This means we are still going ahead with 3 containers serving the additional load of 70000 RPM and we aren’t sure about the exponential increase in RPM any further. At this point in time, all my CPU Cores are busy serving user requests and there is a whole lot of queuing at each core. As the queuing increases, the response time increases which results in breaching my SLA Timeout as well as canceling few requests because my server can at max handle a max of X Parallel connections.

As developers, we assume what’s the least RPM, mean RPM, highest RPM that the system can get at any point in time. Let’s say the least RPM is 20000 (2 containers can handle the load), mean is 50000 (3 containers) and max is 100000 (6 containers). Now if we have one system wherein if the load on the containers is less, it can decrease to 2 containers and if the load on the system is high, it can increase to 6 containers (Based on our [low, high] configuration) without any manual intervention it will be very helpful for us to scale our system even on peak workloads. This is when the auto-scaling kicks into the picture.

Now I put all my containers under an auto-scaling group, but for the autoscaling group to kick in and dynamically update the no of containers, it needs some metric on which it can scale up or scale down. These are a set of conditions that we set while creating the auto-scaling group.

On the bottom right corner are the conditions on which auto-scaling scales up or scales down the containers based on application health and energy(CPU, RAM).

Now that you got a basic understanding of what’s horizontal scaling and what auto-scaling means, let’s dive into our second point which describes the process during the auto-scaling.

How does Docker Image Size effects auto-scaling

Consider our application serving at 50K RPM starts getting 70K RPM and our existing servers are about to breach the SLA. This auto-scaling group acts as a listener on containers run time stats and as soon as one of the conditions is met, it tries to scale up a new container.

What’s the procedure involved in spinning up a new container?

Container Spawning Time = Total time to download docker image + time taken for application boot up scripts + measuring health of newly spawned container.

We can clearly understand one thing from this equation. If auto-scaling is triggered to increase one container if CPU usage > 80, then let’s say it took 2 mins to download docker images and 30 seconds for boot up scripts, and 30 seconds for the application health check.

So within this 3 minutes gap, many things can happen.

The server can reach 100% CPU utilization which results in a crash and thus interrupting many user requests.
All the servers experience peak load thus increasing response time drastically which results in a bad user experience.

Now one might ask that, why does an image download takes a huge amount of time wherein most of the steps will be picked from the existing cache. As in there are already 3 containers running on the node and if 4th container has to be spawned, it takes from the build cache which is existing on the node and requires lesser download time.

This is where multi-node Deployments come into the picture.

How it impacts end-users w.r.t Single Node & Multi-Node Kubernetes Deployments

As a product, there will be multiple such applications that will be deployed in parallel along with ours. Let’s consider an example.

There are 3 applications of 2 containers each and may extend to a max of 4. Now we deployed these 3 applications across multiple nodes.

Now App 1 is facing some severe load on CPU and thus auto-scaling gets triggered and need to spawn up another instance of App 1. It goes through each of the available nodes wherein it can find a node that has available resources to spawn this new container.

Consider in the best case if Node 1 or Node 2 has extra resources, the new container gets deployed on either of the two wherein the build cache is already present for that image. In this, the download and start-up will be way quicker.

But consider Node 1 or Node 2 are out of resources to spawn another container of App 1 and only Node 3 is available. So kube server schedules a new container deployment on Node 3 but this time Node 3 doesn’t contain the build cache of App 1 image and the image has to be downloaded from the public or private hub at runtime and the total time to download totally depends on the network bandwidth and surely this takes more time.

By now one should’ve gotten a basic clarity on how docker image size plays an important role in auto-scaling and customer satisfaction.

In case if you have any questions or suggestions, please drop a comment in the comment section. I’ll reply back asap.

Effects of Docker Image Size on AutoScaling w.r.t Single and Multi-Node Kube Cluster

Written by mourya venkat