Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Photo by Mike Von on Unsplash

Implementing Base64 in Rust

Niklas Büchner
Level Up Coding
Published in
5 min readApr 5, 2020

Pretty much every developer uses base64 in one way or another, but how does the algorithm actually work? For me the easiest way to really understand an algorithm is to implement it. So let’s do that.

The base64 algorithm itself is defined in the RFC 4648. Its idea is super simple: any input can be seen as a stream of bits. Usually, these bits are split into bytes, so into groups of 8. Base64 does the same thing except it groups the bits into groups of 6. Every value is then translated into the characters a-z, A-Z, 0-9 and + / using a predefined table. Additionally, = is used as padding. But let’s not get ahead of ourselves.

Let’s start with defining the base64 function interface for encoding and writing a first test. I am going to only implement base64 for strings and not actually for byte streams.

So here is the first test. Our encoding function takes a &str and encodes the string.

Separating the bits into groups of 6

The first step is to regroup our bytes into groups of 6 bits. We start with 3 ascii characters since they have 24 bits in utf8 (Rust’s encoding) and we can regroup them into 4 base64 characters. So first, let’s convert the characters into bytes.

To get our 4 base64 characters we need to apply bit masks to remove unwanted information and shift the characters into the correct positions.

For the first group of 6 we just remove the last two bits and shift everything 2 bits to the right. (Actually, just shifting everything two bits to the right does the job as well… but I did not think of that when I wrote the code.)

The second group consists of the last two bits of the first ascii character and the first 4 bits of the second ascii character. So we remove the first 6 bits of the first ascii character and shift them 4 places to the left. Then we remove the last 4 bits of the second ascii character, move them 4 places to the right and binary OR them together.

The third group needs the last 4 bits of the second ascii character and the first two of the third ascii character. Just as above we remove the excess bits and merge the relevant information together.

And finally the 3rd ascii character’s last 6 bits are the last base64 character. Since you have seen the code for the first three base64 character and the code is very simple, I will not add the code here.

Converting the base64 characters to an actual base64 string

The base64 characters now have to be converted into the characters which will be the base64 output string. Here BASE64_ALPHABET is a slice which codifies the base64 alphabet table displayed at the beginning of the article.

And if we return base64_output we should have our first working test. 🎉

Let’s add a second test to allow the string to be longer than 3 characters.

To achieve this we need to loop over groups of 3 characters in our string and convert them to base64.

And after each iteration we move our index forward by the length of encoded bytes, so 3.

Encode strings with less than 3 bytes

Up until now we have only encoded string with a multiple of 3 bytes. How can we encode 2 or even 1 byte? If we filled the missing characters with zeros the decoder would not be able to know if the last byte(s) were missing or if they actually are just 0. (Remember that base64 is actually for any kind of byte stream and not just strings.)

There is a clever solution. If we only encode 2 bytes, we need 3 base64 characters to contain all the bit information. So if we add a = as padding, the decoder can know that there were only 2 bytes encoded and not 3. Let’s add a test for that.

So we can extract the 3 base64 characters and add the padding.

The same system is true for encoding 1 byte. 2 base64 characters are needed to carry the byte and we can pad the base64 string with == thereby telling the decoder how to correctly decode the string.

Summary

Wrapping up, base64 is the simple idea of splitting bit streams into groups of 6 bits and encoding the values using a fixed character table. It is easy to implement yet I learned a lot through the implementation. Not only is this my first algorithm implemented using its spec, but also the first time I actually had to use bit masking.

Finally, within this blog post I only implemented the encoding. Now it is your turn to implement the decoding. Alternatively, you can check it out in my GitHub repo.

The full code can be found on GitHub: https://github.com/niklasbuechner/experiments/blob/bbdb3136293bb2833529a47edcc5d9fa2dc5d405/base64/src/encode.rs

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Responses (1)

Write a response