February 15, 2018

Base64 encoding in Python

By Mohammed Abualrob Code Snippets 0 Comments

Introduction

Today, we are going to talk about base64 encoding in Python. Understanding how characters are represented is very important to make sense of base64. For beginners, I highly recommend that you check the following article. It explains how Python handles Unicode and string data types.

Let us get started…

What is base64 encoding?

Base64 encoding starts with a stream of bytes that could represent any data. The input data can be a string of characters (ex. ASCII, UTF-8), image, email attachment, etc. The goal is to convert the 8 bit stream into 6 bit characters. Note that with only 6 bits we can represent a maximum of 64 characters, hence it is called base64. Base64 maps each 6 bits in the input stream into an ASCII character from the following set (A-Z a-z 0-9 + / =) where = is used for padding (will see what padding means later). Note that for every 3 bytes of data there are at least 4 bytes of base64 data.

Why is base64 needed?

Converting bytes to plain text can have many benefits. For example, there are some systems that only work with text data such as SMTP protocol. We can use this trick to convert email attachments to text and send the email as if it is completely text. If we are building a web service that communicates using JSON format which is text based then we can attach binary data encoded in base64. Not only we can send the binary data disguising in text form but also allows us to view it easily in text editing software.

Byte data in Python

Depending on which Python version you are using (2.x vs 3.x) strings are handled differently. Python 2.x defines the following types…

str data (also called bytes data or ASCII data). This is immutable (cannot be modified)
bytearray data like str but it is mutable (changeable)
Unicode data which is a stream of code points (Check references for more information about Unicode)

On the other hand, Python 3 defines the following data types…

byte: immutable bytes data
bytearray: mutable bytes data
str: Unicode data

Long story short, when dealing with base64, make sure your input data is a stream of bytes. For example, if the input is ASCII then base64 is pointless. If you are using Unicode strings then you need to call the encode function to convert them to a byte stream before doing any base64 encoding. No intention here to confuse the reader by mentioning UTF encoding then base64 encoding in a row but the point is that you need to start with a byte stream regardless of what it represents in order to encode it in base64. Let us see how base64 works then will provide example code…

How base64 works?

We are going to demonstrate base64 using a simple example. Follow the steps below…

Given the following 2 bytes as input (0xFB, 0xFF)
In binary F=1111 B=1011 F=1111 F=1111 or (11111011, 11111111)
Using only 6 bytes (111110, 111111, 111100)
Note that we added 00 to the last chunk to make 6 bits
Converting these numbers to decimal (62, 63, 60)
Doing simple base64 lookup (62 is +, 63 is /, 60 is 8)
Final encoding string is +/8
= is added for padding

Let us now see how to perform base64 encoding in Python…

Base64 encoding decoding example

Below is a simple encoding decoding example…

# Import base64 module
import base64

# Input string
data = 'Hello world'

# Base64 encode
encoded = base64.b64encode(data)
# Base64 decode
decoded = base64.b64decode(encoded)

# Print original data
print("Original data : {}".format(data))
# Print encoded data
print("Encoded data  : {}".format(encoded))
# Print decoded data
print("Decoded data  : {}".format(decoded))

# Import base64 module

import base64

# Input string

data = 'Hello world'

# Base64 encode

encoded = base64.b64encode(data)

# Base64 decode

decoded = base64.b64decode(encoded)

# Print original data

print("Original data : {}".format(data))

# Print encoded data

print("Encoded data : {}".format(encoded))

# Print decoded data

print("Decoded data : {}".format(decoded))

If you run the code snippet above, you should get the following output…

Original data : Hello world
Encoded data  : SGVsbG8gd29ybGQ=
Decoded data  : Hello world

Original data : Hello world

Encoded data : SGVsbG8gd29ybGQ=

Decoded data : Hello world

Base64 padding

As we indicated earlier, the output stream is a sequence of 6 bits segments. Since the input stream consists from bytes, the last segment in the output stream can possibly be 2 or 4 or 6 bits. If it is 4 we add = and == if 2 otherwise no padding is added. Recall that padding is not necessary to decode the data back to its original form. Padding is only needed when the encoded data is concatenated. Without padding, it is not possible to separate the individual strings.

URL safe base64

The default base64 alphabet may use + and / which are used in URLs. This may cause side effects so using an alternate encoding can solve the problem. The + is replaced with a -, and / is replaced with underscore (_). Otherwise, the alphabet is the same. Here is an example…

# Import base64 module
import base64

# Input binary data
# \x is used to denote a byte
# In this example we have 2 bytes
data = '\xfb\xff'
# Note that we are printing the representation
# of the data because the data is not printable
print("original data bytes     : {}".format(repr(data)))
print("original data in binary : {}".format(bin(0xfbff)))
print("6 bit numbers           : 111110 111111 1111")
print("Add 2 zeros to 1111     : 111100")
print("In decimal              : 62     63     60")
print("Lookup these numbers    : +      /      8")


# Encode data using standard base64
standard_encoded_data = base64.standard_b64encode(data)
print("standard encoded data   : {}".format(repr(standard_encoded_data)))
# Decode data using standard base64
standard_decoded_data = base64.standard_b64decode(standard_encoded_data)
print("standard decoded data   : {}".format(repr(standard_decoded_data)))

# Encode data using url safe base64
urlsafe_encoded_data = base64.urlsafe_b64encode(data)
print("url safe encoded data   : {}".format(repr(urlsafe_encoded_data)))
# Decode data using url safe base64
urlsafe_decoded_data = base64.urlsafe_b64decode(urlsafe_encoded_data)
print("url safe decoded data   : {}".format(repr(urlsafe_decoded_data)))

# Encode data using url safe base64 with custom characters
custom_encoded_data = base64.b64encode(data, ['*', '~'])
print("custom encoded data     : {}".format(repr(custom_encoded_data)))
# Decode data using url safe base64 with custom characters
custom_decoded_data = base64.b64decode(custom_encoded_data, ['*', '~'])
print("custom decoded data     : {}".format(repr(custom_decoded_data)))

# Import base64 module

import base64

# Input binary data

# \x is used to denote a byte

# In this example we have 2 bytes

data = '\xfb\xff'

# Note that we are printing the representation

# of the data because the data is not printable

print("original data bytes : {}".format(repr(data)))

print("original data in binary : {}".format(bin(0xfbff)))

print("6 bit numbers : 111110 111111 1111")

print("Add 2 zeros to 1111 : 111100")

print("In decimal : 62 63 60")

print("Lookup these numbers : + / 8")

# Encode data using standard base64

standard_encoded_data = base64.standard_b64encode(data)

print("standard encoded data : {}".format(repr(standard_encoded_data)))

# Decode data using standard base64

standard_decoded_data = base64.standard_b64decode(standard_encoded_data)

print("standard decoded data : {}".format(repr(standard_decoded_data)))

# Encode data using url safe base64

urlsafe_encoded_data = base64.urlsafe_b64encode(data)

print("url safe encoded data : {}".format(repr(urlsafe_encoded_data)))

# Decode data using url safe base64

urlsafe_decoded_data = base64.urlsafe_b64decode(urlsafe_encoded_data)

print("url safe decoded data : {}".format(repr(urlsafe_decoded_data)))

# Encode data using url safe base64 with custom characters

custom_encoded_data = base64.b64encode(data, ['*', '~'])

print("custom encoded data : {}".format(repr(custom_encoded_data)))

# Decode data using url safe base64 with custom characters

custom_decoded_data = base64.b64decode(custom_encoded_data, ['*', '~'])

print("custom decoded data : {}".format(repr(custom_decoded_data)))

If you run the code snippet above, you should get the following output…

original data bytes     : '\xfb\xff'
original data in binary : 0b1111101111111111
6 bit numbers           : 111110 111111 1111
Add 2 zeros to 1111     : 111100
In decimal              : 62     63     60
Lookup these numbers    : +      /      8
standard encoded data   : '+/8='
standard decoded data   : '\xfb\xff'
url safe encoded data   : '-_8='
url safe decoded data   : '\xfb\xff'
custom encoded data     : '*~8='
custom decoded data     : '\xfb\xff'

original data bytes : '\xfb\xff'

original data in binary : 0b1111101111111111

6 bit numbers : 111110 111111 1111

Add 2 zeros to 1111 : 111100

In decimal : 62 63 60

Lookup these numbers : + / 8

standard encoded data : '+/8='

standard decoded data : '\xfb\xff'

url safe encoded data : '-_8='

url safe decoded data : '\xfb\xff'

custom encoded data : '*~8='

custom decoded data : '\xfb\xff'

Summary

Base64 encoding converts data in binary format into text
Exchanging data in text format has many benefits. For example, sending an image using a JSON based web service
Base64 output stream can be longer than the input stream because bytes are split into 6 bits segments
Dealing with Base64 encoding in Python is as easy as importing the base64 module then calling the appropriate function
The standard base64 encoding contains + and / in the output stream. These characters are used in web URLs which may cause problems. To fix this issue, + and / characters are replaced with other characters. Python supports URL safe base64 encoding. You just need to call the right function

References

8 BIT AVENUE

Base64 encoding in Python

More from my site

About Author

Mohammed Abualrob

Add a Comment