Golang: Optimizing Docker Image Size from 2.4 GB to 100 MB using Docker Multi-Stage Build Process.

mourya venkat
Level Up Coding
Published in
6 min readApr 24, 2021

--

Photo by Mark Olsen on Unsplash

Out of hundreds of optimizations that a developer thinks of, reducing the docker image size will be given the least priority. So did we.

As a result, our docker image started to weigh around 2.4 GB in size which indirectly resulted in our developer’s productivity. Below are the issues we faced with larger docker image size

-> Delay in running automated Integration Tests

Whenever we raise a merge request to the release branch, we have an automated process that builds a new image out of the raised MR code, spins up a new container out of it, and fire thousands of automated requests against the current version build(connected to prod DB) and stable production build and compares both the responses to check if the current release doesn’t break existing functionalities as part of automated integration testing.

Pain Points

  • The build process in this is not cached and hence every time there is a merge request for release, it takes 20–25 minutes to build the image + 10 mins for integration tests.
  • If one of the integration tests fail, we quit the process and the developer had to assume the corner cases, fix, and re-push again wherein it is still not guaranteed that the tests will pass after the fix. The developer has to keep waiting for this to be successful which impacts his/her productivity.

-> Delay in creating the build tag

Once the integration tests are passed and the code is reviewed and pushed to release, the OpenCI automation kicks in and creates a build tag(release-<version>). In this process, the dependencies of the application are pulled and then the final generated image is pushed to the artifact.

We then go ahead and create a JIRA ticket with the release tag and post-approval, the Edge Automation kicks in and creates the newer containers.

Pain Points

  • Pulling dependencies and pushing the generated bulky images over the network to the artifact consumes a huge amount of time and the speed is dependent on network bandwidth.

-> Auto Scaling

For details on the effects of docker image size w.r.t, auto-scaling refer to my other article below.

Now that we got to know the pain points of having such a huge docker image, let’s try to analyze few points on what made the docker image size to be huge. For that, let us have a glimpse of our initial docker image.

FROM centos:7RUN \
yum install -y epel-release bison python-setuptools bzip2 wget make gcc gcc-c++ zlib-devel git lsof && \
easy_install supervisor && \
mkdir -p /opt/logs /etc/supervisord.d && \
yum clean all && \
rm -f /etc/localtime && \
ln -s /usr/share/zoneinfo/Asia/Kolkata /etc/localtime
#Install Go
RUN \
cd /tmp && \
wget
https://storage.googleapis.com/golang/go1.12.6.linux-amd64.tar.gz && \
tar -C /usr/local -xzf go1.12.6.linux-amd64.tar.gz && \
ln -s /usr/local/go/bin/go /bin/go && \
ln -s /usr/local/go/bin/gofmt /bin/gofmt

ENV PATH=$PATH:/usr/local/go/bin:/usr/local/goibibo/<projectName>/bin
ENV GO111MODULE=auto CGO_ENABLED=1
ENV GODEBUG="madvdontneed=1"
ARG GIT_TOKENRUN git config --global url."https://$GIT_TOKEN@github.com/goibibo/".insteadOf "https://github.com/goibibo/"
EXPOSE 80
#Set argument env - to receive input
ARG env
WORKDIR /usr/local/goibibo/<projectName>#Add supervisord and functions in startup scripts
COPY ./init/supervisord /etc/rc.d/init.d/
COPY ./init/services/* /etc/supervisord.d/
COPY ./go.mod . RUN go mod download#Copy source directory
COPY ./ .
RUN makeRUN chmod 0644 /etc/deployments/<build>.sh && \
chmod 0755 /etc/rc.d/init.d/supervisord

Let us go ahead and analyze the steps that are responsible for increasing image size (Bold lines).

  • First, we pull centos 7 as a base Image.
  • We then install all the external software dependencies required.
  • We then download and install the respective Golang version on centos base image to build the code.
  • We then do go mod download to pull all the module dependencies and copy all internal code modules to the working directory and finally make to generate the binary.

Each of the steps needs some storage and all the pulled dependencies will be copied to the final generated image.

But for code to run in production, all we need is a binary generated out of code and a base Operating system(enough functionalities to execute the binary) on which we can execute the binary on. But in our docker image, we have a ton of dependencies that are useless once the binary is generated which includes (Git, GCC, GCC-C++, Golang, Go Modules, Code Modules, etc …) which occupies close to 2 GB of space. Additional to that, for our use case we don’t need the entire centos functionality to run the binary. A bare Alpine OS image(5 MB) suffice to run the binary.

There are two ways in which we can eliminate copying this 2 GB of data to the final image.

  • Step-1 -> As a final step after we build the binary we can go to individual directories where there are unnecessary modules and trim the directories. The process results in way too much manual effort to identify and have an rm -R command on each and every single directory where the unwanted dependencies are installed and also gets hectic in times where we need to add few more dependencies to an existing image.
  • Step-2 -> A convenient way of using docker multi-stage build process. Let us get into detail on the multi-stage build process.

So, how does docker multi-stage works?

Multi-Stage helps us in doing all sorts of jugglery to download or install dependencies required to build the binary without copying the dependencies to the final image. Let’s look at the transformed multi-stage docker image and go through it.

# Stage 1
FROM alpine:3.11.5 AS builder
RUN apk update \
&& apk add --virtual build-dependencies \
bash \
build-base \
coreutils \
ca-certificates \
gcc \
g++ \
git \
make \
lsof \
wget \
curl \
musl-dev \
tzdata \
pkgconfig \
pkgconf
#Install Go
RUN \
cd /tmp && \
wget
https://storage.googleapis.com/golang/go1.12.6.linux-amd64.tar.gz && \
tar -C /usr/local -xzf go1.12.6.linux-amd64.tar.gz && \
ln -s /usr/local/go/bin/go /bin/go && \
ln -s /usr/local/go/bin/gofmt /bin/gofmt
RUN ln -s /usr/local/go/bin/go /bin/go && \
ln -s /usr/local/go/bin/gofmt /bin/gofmt
ENV PATH=$PATH:/usr/local/go/bin:/usr/local/goibibo/<projectName>/binENV GO111MODULE=auto CGO_ENABLED=1
ENV GODEBUG="madvdontneed=1"
ENV GOPRIVATE <Private-Repo-Link>
ARG GIT_TOKENRUN git config --global url."https://$GIT_TOKEN@github.com/goibibo/".insteadOf "https://github.com/goibibo/"WORKDIR /usr/local/goibibo/<projectName>COPY ./pkg ./pkg
COPY ./go.mod .
RUN go mod download#Copy source directory
COPY ./ .
RUN make
-----------------
# Final Stage - Stage 2
FROM alpine:3.11.5 as baseImage
RUN apk update \
&& apk add --virtual build-dependencies \
tzdata \
supervisor \
ca-certificates
RUN \
mkdir -p /opt/logs /etc/supervisord.d && \
rm -f /etc/localtime && \
ln -s /usr/share/zoneinfo/Asia/Kolkata /etc/localtime
ARG BINARY_PATH=/usr/local/goibibo/<projectName>/binENV PATH=$PATH:/usr/local/go/bin:/usr/local/goibibo/<projectName>/binCOPY --from=builder $BINARY_PATH $BINARY_PATHWORKDIR /usr/local/goibibo/<projectName>#Add supervisord and functions in startup scripts
COPY ./init/supervisord /etc/rc.d/init.d/
COPY ./init/services/* /etc/supervisord.d/
COPY ./deployments/ /etc/deployments/
RUN chmod 0644 /etc/deployments/<build>.sh && \
chmod 0755 /etc/rc.d/init.d/supervisord

Let us go through the revised multi-stage docker build process. We divided the entire build process into two stages.

Stage-1 (Stage name as builder) is where we pull in all the software and external module dependencies that are required to generate the build-out of the code.

In stage 2, all we do is

  • Having Alpine as a base Image (5 MB) in size. Install supervisor(to maintain processes) and few other dependencies (20 MB) that are required.
  • Copy the binary generated from Stage 1 to stage 2 using the following command. In our case the binary is of size 65 MB( COPY — from=<stageName> <fromPath> <toPath>) We can copy from any stage to any stage in a top-down manner in the case of multi-stage builds with the above-mentioned command.
  • Here we aren’t copying either code or the go modules to the final stage. It is just the binary.
  • Copy the static files to the final image.
  • Have a RUN Command that starts the supervisor which executes the binary and monitors the health of the process started by the binary.

This way we transitioned a bulky 2.4GB docker image to a slim 100 MB(depends on the binary size and additional dependencies required) docker image with very minimal effort using a multi-stage build process.

Let me know in the comments section in case if the article needs a more detailed explanation or any improvements.

--

--