All your Base64 are belong to us...

I've used Base64 plenty of times without really knowing why. One popular use is storing small icons in the text of CSS files, avoiding the need to make multiple requests to the server. JSON web tokens store payloads encoded in Base64. It's also used a lot in cryptography (but please, PLEASE don't confuse Base64 encoding with encryption!). I encountered it recently when revisiting the first cryptopals challenge. Every time I've knowingly Base64 encoded something, it was because whatever I was using required it, but I still couldn't really wrap my head around why I was using that particular scheme.

One question led to another, and off I was chasing this strange white rabbit. In many ways, this is a frivolous exercise. You don't need to know any of this to Base64 encode something. Existing libraries in most of your favorite languages will do that for you without too much fuss. Even so, it's not a bad exercise to take a trip to the lower levels of how things work every now and then. By the end of this post, it should be fairly easy to answer the question: should I Base64 encode this? You'll know the trade offs and the strengths of Base64 vs sending information in pure binary format. You'll know why, if you send a just under 5MB Base64 encoded email attachment, and you have a 5MB hard limit, your message will still be too big. And you'll understand why 64 is the magic number for this sort of thing.

Let's get to it!

Basic encoding and decoding

Say you want to transfer an image, but you know that at least some of the places it's going to have to pass through only speak in regular old text. How do you make sure it gets from point A to point B without getting easily mangled?

Well, at its core, that image is going to be a bunch of 1s and 0s (or, in other words, binary information). So, you could just make those 1s and 0s into text and send that.

1010111010111010101111101011110101010110
0101010110101101010101101001100110101011
0111010100100110101010001010010010100101

...and so on.

We don't even have to worry about the line endings here. They're meaningless. We'll throw them away on the other side. All we're interested in are those 1s and 0s, which we'll turn back into real bits on the other end.

(I have no idea whether the above would be in any section of a valid image... as you may suspect, I just randomly typed 1s and 0s.)

Problem solved?

Well, sort of. You could do that. And on the other side, you could take the textual 1s and 0s and turn them back into actual bits. However, it's not going to be very efficient.

Each character is going to take at least a byte of bandwidth. That's 8 bits. 8 times the amount of information you're actually sending. So a 1MB image will now take 8MB of bandwidth to send in this way.

Perhaps you remember that binary, as base 2, is pretty easy to convert into hexadecimal format. That's just taking every possible sequence of 4 of those bits (there will be 2⁴ == 16 of those) and assigning a digit from 0-9 or a letter from A-F to it. So you put two of those together and you're able to send a byte of information.

AEBABEBD56
55AD5699AB
7526A8A4A5

Each of those basic ASCII characters you send are going to be a byte. Now you're only having to send information that's twice as big as your image. We're making progress!

But wait a minute. There are way more ASCII characters at our disposal than those 16. Why don't we use more of them to reduce our overhead? In fact, why are we even doing these gymnastics in the first place? Why don't we just use the entire ASCII character set and send our image with no encoding overhead at all?

Easy corruption

If you take a look at the first few entries in the ASCII table, you may see a problem:

0   NUL  Null
1   SOH  Start of Heading
2   STX  Start of Text
3   ETX  End of Text
4   EOT  End of Transmission
5   ENQ  Enquiry
6   ACK  Acknowledgement
7   BEL  Bell
8   BS   Backspace
9   HT   Horizontal Tab
10  LF   Line Feed

None of those are characters you'd expect to be able to see. Anything along the way that might remove a character (perhaps guarding against malicious input) or interpret it differently would be able to corrupt your image. And certainly, we'd be very surprised if someone was able to copy and paste the text representation and then decode it into the original image. What would it even mean to copy a backspace to the clipboard?

So trying to go for no overhead at all is a recipe for disaster. Even if we weren't worried about humans copying and pasting the data, we'd be crossing our fingers every step of the way that we don't encounter anything that can only handle readable text.

How about only the readable characters?

There are 95 printable characters in ASCII. What if we used all of those? The more characters we have available, the more information we should be able to encode, right? Well, sort of, but here's the problem. Remember that we're ultimately trying to efficiently encode binary information. In addition to making it as efficient as possible to send back and forth, we also want to make it as easy as possible to encode and decode back to binary.

If we used all 95 characters we could, let's think of what would happen when we reached the threshold of a single character:

95th value == 1011111
96th value == 1100000

95 characters allow us to potentially send 7 bits of information, meaning we'd be using an 8 bit character to send the 7 bits we actually want to send, resulting in an approximately 14% overhead. However, there would still be 33 combinations (2⁷ - 95) of 1s and 0s in every one of our image's 7-bit chunks that we need to be able to represent. The next possible value we'd need to represent by using a 2nd character would be the 96th value, which would still have 7 bits. That's pretty weird. Now we have to read two characters to determine whether or not the 7 bits represented by the 1st character is the real value? This problem is just going to keep compounding, and it would create huge problems any time we're trying to stream data. If we can't know that what we've got so far is a valid representation of bits until we read the next character, that means we won't really be able to know it's valid until we read the last character.

In fact, any base that isn't a power of 2 is going to have this problem. Let's look at converting something in base 10 to base 2.

  10 (base 10) == 1010             (base 2)
1010 (base 10) == 1111110010       (base 2)

10 turns into 1010 in binary. Put 2 10s in sequence, which we'd read as 1010 in our regular base 10 system, and you don't get 10101010 in binary. You get 1111110010. There's not even a single 1010 pattern in that.

Looking at a similar situation, but with a base 16 encoding, and you can see the encoding/decoding advantages of a base that's a power of 2. Here 5F5F

  5F (base 16) == 01011111         (base 2)
5F5F (base 16) == 0101111101011111 (base 2)

Here 5F5F, which is 5F twice, is also the 01011111 sequence twice. With every single character we get, we get a representation of every possible pattern of 1s and 0s in a chunk of 4 bits. So once we get that hexadecimal character, those 4 bits are figured out, and we can safely move onto the next. But now we're back to sending 2 times the size of our image.

Lowered expectations

Okay, so we know we can't really make use of those 95 printable characters without creating other unacceptable problems. How many can we make use of? Well, we know we can make use of 16 characters to represent every possible combination of 4 bits.

We'd need 2⁵ == 32 characters to represent every possible combination of 5 bits.

We'd need 2⁶ == 64 characters to represent every possible combination of 6 bits.

And we'd need 2⁷ == 128 characters to represent every possible combination of 7 bits.

We know already that we have a maximum of 95 printable characters to work with, so 7 bits is just too high. We're going to have to settle for 6 bits.

Conclusion

In other words, Base64 is the most compact way to transfer binary information using printable ASCII characters, and that's why we use it and not Base128 or Base95 or any other base when we need to do this sort of thing.

That's also why anything you encode will be about 33% bigger. You're using 8 bits for every 6 bits you have to send. So if you want to avoid the overhead of multiple requests for a bunch of small files, it might be a good trade off. But if you're sending several TB of information, you probably want to use a pure binary format, even if it's a little less flexible.