Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

How Caching Can Save Build Minutes in Bitbucket Pipelines

--

Photo by Aron Visuals on Unsplash

In this post I will help you learn how to properly set up caching in Bitbucket Pipelines. This will help you to speed up the build time of your pipelines, so you can deliver faster and spend fewer build minutes. I will also share another thought about how to speed up your pipelines.

What is Caching in General?

Caching (pronounced “cashing”) is the process of storing data in a cache.

A cache is a temporary storage area. For example, the files you automatically request by looking at a Web page are stored on your hard disk in a cache subdirectory under the directory for your browser. When you return to a page you’ve recently looked at, the browser can get those files from the cache rather than the original server, saving you time and saving the network the burden of additional traffic.

Source: whatis.techtarget.com

What can be Cached in Bitbucket Pipelines?

Bitbucket Pipelines is able to cache external build dependencies and directories, such as 3rd-party libraries, between builds providing faster builds, and reducing the number of consumed build minutes.

Source: support.atlassian.com

Depending on what you are doing in a pipeline, different things can be cached in Bitbucket Pipelines. The biggest win will be storing downloaded third party libraries retrieved by a package manager, so you will not have to download external libraries over and over again. Caching Docker images/layers or caching build steps can also save you a lot of time, the latter can also be shared between steps as artifacts.

You can use pre-defined caches, such as docker, pip, node and maven, these will already cache a specific folder. You can also set up custom caches, where you have to set the path that needs to be cached by yourself.

Good to Know

Below a short and to the point summary about things that you should know.

  • A cache will be stored on a successful build (when there’s no cache already)
  • Only caches less then 1GB (compressed) will be stored
  • Caches will expire after one week
  • You should not cache sensitive data
  • You can clear caches manually

Caching Third Party Packages

I will share several Bitbucket Configuration files that show you how to cache downloaded third party packages, using node and pip as package managers. I will provide results of pipeline run times with and without caching.

Node Caching

I started with a simple pipeline configuration, where we install tensorflow. I think you will never need tensorflow within a pipelines, but this is just because I wanted an example with a somewhat bigger package. This can of course be any package or even a requirements.txt.

image: node:10.15.3pipelines:
default:
- step:
script:
- npm install tensorflow@0.7.0

Running this pipeline the first time took 40 seconds, the second and third time took 15 and 14 seconds. Let us see if enabling caching really improves the pipeline run time, which makes an average of 23 seconds.

image: node:10.15.3pipelines:
default:
- step:
caches:
- node
script:
- npm install tensorflow@0.7.0

The first time running the above pipeline took 20 seconds, on this first run it has no cache yet, so it is also responsible for creating and storing the cache, which of course takes a bit longer. We should see some differences in the second and third run. The second run took 11 seconds and the third run took 12 seconds, an average of around 14 seconds. So using cache on node packages already saves at least a few seconds.

Pip Caching

We can do the same for Python and pip and want I to demonstrate it with the tensorflow library again. Below the pipeline configuration I have used.

image: python:3.7.3pipelines:
default:
- step:
script:
- pip install tensorflow==2.3.0

The first run, without caching, was finished in 47 seconds, the second in 38 seconds and the third in 1 minute and 28 seconds. Average run time without caching: around 58 seconds. I changed the pipeline configuration and added caching for pip.

image: python:3.7.3pipelines:
default:
- step:
caches:
- pip
script:
- pip install tensorflow==2.3.0

Enabling cache resulted in the following results: 1 minute and 14 seconds (without existing cache), 42 seconds and 43 seconds. Average runtime: 53 seconds. Again an improvement compared to the pipelines without caching, looking at the averages.

Evaluation and Downsides of Caching

The results have “outliers” up and down, which makes me wonder if they are really good examples. Same configurations can sometimes differ by tens of seconds, although the run times of cached enable pipelines look more stable, which may be due to the latency to download the third party packages.

When there is cache already in place with the same name, the cache will not be updated, not even when something changed within the folder you cached. You should delete the specific cache at the ‘Pipelines’ page in Bitbucket when you want to create a new cache. Also, the 1 GB limit can be reached very fast, so you still have little or no use for caching.

Other Approaches

Another approach, when you need external libraries within your steps, is to create your own Docker container for your pipelines. You can set the Docker container for a whole pipeline, or set it per step. This can be handy if you have a lint step, where you are using a Python image with pylint already installed, and a test step, with all the packages installed to test your Python application. This ensures you do not have to install these dependencies again and again. See the example below, assuming the image is built and deployed to DockerHub by myself, containing Python 3.7.3, pylint 2.6.0 and pytest 6.0.2, which names and versions are part of the Docker image name and tag.

image:
name: sschrijver/python-pylint-pytest:3.7.3-2.6.0-6.0.2
pipelines:
default:
- step:
name: Lint
script:
- pylint .
- step:
name: Test
script:
- pytest .

The Dockerfile of this image can look like the following:

FROM python:3.7.3RUN pip install pylint==2.6.0 pytest==6.0.2

Or you can set the container per step, using the same principle as mentioned above.

pipelines:
default:
- step:
name: Lint
image: sschrijver/python-pylint:3.7.3-2.6.0
script:
- pylint .
- step:
name: Test
image: sschrijver/python-pytest:3.7.3-6.0.2
script:
- pytest .

Questions, suggestions or Feedback

If you have any questions, suggestions or feedback regarding this article, please let me know!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Responses (2)

Write a response