Explore the Base64 algorithm’s encoding and decoding techniques through both manual execution and pseudocode. This article offers a concise breakdown of the process, shedding light on how data transformation occurs between binary and textual formats.
Introduction to Base64 Algorithm
The Base64 algorithm is a widely used encoding method for the aim of transforming binary data into a human-readable format. It works by converting binary data into a set of 64 unique ASCII characters, which are made up of upper- and lowercase letters, numerals, and a few unique symbols.
Base64 is useful for a variety of tasks, including encoding binary attachments in emails and transmitting data via protocols that might not consistently accept binary data. By offering a standardized method to encode and decode data and bridging the gap between binary and text representations, this algorithm plays a significant role in modern computing.
The Base64 technique is a fundamental idea in data manipulation that makes processing and delivering data across many platforms and systems simpler. Its simple methodology enables data encoding and decoding without the need for specialist libraries or challenging implementations. This makes it a useful tool for a variety of applications, including network communication and web development.
In the following sections of this article, we will take a closer look at how the Base64 algorithm works, and learn how to encode and decode manually and programmatically.
Base64 Characters and Table
To understand the Base64 algorithm, we need to know the Base64 binary-to-text encoding scheme 64 character character set. During the encoding and decoding processes, we will need the data in the table below, so let’s study it first. (You don’t need to know it by heart, but it’s a good idea to be familiar with its structure for ease of understanding.)
The Base64 character set is a collection of 64 characters, selected using the ASCII (not the extended) table. This set includes uppercase letters (A-Z), lowercase letters (a-z), numerical digits (0-9), and two additional delimiters: “+” and “/”.
Base64 encoding works smoothly across many systems and platforms because these 64 characters contain a balanced mix of ASCII characters.
The Base64 character table is a reference guide that maps characters to their corresponding values.
Here is the comprehensive Base64 character table:
Character | Binary | Decimal |
---|---|---|
A | 000000 | 0 |
B | 000001 | 1 |
C | 000010 | 2 |
D | 000011 | 3 |
E | 000100 | 4 |
F | 000101 | 5 |
G | 000110 | 6 |
H | 000111 | 7 |
I | 001000 | 8 |
J | 001001 | 9 |
K | 001010 | 10 |
L | 001011 | 11 |
M | 001100 | 12 |
N | 001101 | 13 |
O | 001110 | 14 |
P | 001111 | 15 |
Q | 010000 | 16 |
R | 010001 | 17 |
S | 010010 | 18 |
T | 010011 | 19 |
U | 010100 | 20 |
V | 010101 | 21 |
W | 010110 | 22 |
X | 010111 | 23 |
Y | 011000 | 24 |
Z | 011001 | 25 |
a | 011010 | 26 |
b | 011011 | 27 |
c | 011100 | 28 |
d | 011101 | 29 |
e | 011110 | 30 |
f | 011111 | 31 |
g | 100000 | 32 |
h | 100001 | 33 |
i | 100010 | 34 |
j | 100011 | 35 |
k | 100100 | 36 |
l | 100101 | 37 |
m | 100110 | 38 |
n | 100111 | 39 |
o | 101000 | 40 |
p | 101001 | 41 |
q | 101010 | 42 |
r | 101011 | 43 |
s | 101100 | 44 |
t | 101101 | 45 |
u | 101110 | 46 |
v | 101111 | 47 |
w | 110000 | 48 |
x | 110001 | 49 |
y | 110010 | 50 |
z | 110011 | 51 |
0 | 110100 | 52 |
1 | 110101 | 53 |
2 | 110110 | 54 |
3 | 110111 | 55 |
4 | 111000 | 56 |
5 | 111001 | 57 |
6 | 111010 | 58 |
7 | 111011 | 59 |
8 | 111100 | 60 |
9 | 111101 | 61 |
+ | 111110 | 62 |
/ | 111111 | 63 |
How Base64 Encoding Works
We first present a simplified infographic on Base64 encoding, and then explain the process in a bit more detail below.
Here’s a detailed explanation of how the Base64 encoding algorithm works:
- Input data preparation: The input binary data is grouped into blocks of 3 bytes (24 bits). If the last block is less than 3 bytes, padding is added to make it a complete block.
- Binary to decimal conversion: Each block of 3 bytes is converted from binary to decimal.
- Decimal to Base64 conversion: The decimal values obtained in the previous step are mapped to the Base64 character set. Each decimal value corresponds to a specific character in the set.
- Padding: If the input data was not divisible by 3, padding characters (‘=’ symbols) are added to the encoded output to ensure that the length of the encoded data is a multiple of 4 characters.
- Final encoded output: The encoded characters from each block are concatenated to form the final Base64 encoded string.
Example of Base64 Encoding
Now let’s look at an example of how to convert text to Base64 values.
Assume we want to convert the string “Base64” to Base64.
- Convert the characters of the string into their ASCII values:
- B: 66
- a: 97
- s: 115
- e: 101
- 6: 54
- 4: 52
- Convert the ASCII values into 8-bit binary representation:
- 66: 01000010
- 97: 01100001
- 115: 01110011
- 101: 01100101
- 54: 00110110
- 52: 00110100
- Combine the binary representations:
- 01000010 01100001 01110011 01100101 00110110 00110100
- Group the binary bits into sets of 6 bits each:
- 010000 100110 000101 110011 011001 010011 011000 110100
- Convert the groups of 6 bits into decimal:
- 16 38 5 51 25 19 24 52
- Use the Base64 character table to convert the decimal values to characters:
- 16: Q
- 38: m
- 5: F
- 51: z
- 25: Z
- 19: T
- 24: Y
- 52: 0
So if we encode the text “Base64”, the result is “QmFzZTY0”.
You can check your work with our free Base64 Encoder.
Implementing Base64 Encoding Algorithm in Pseudocode
Since Base64 can be implemented in any programming language, regardless of the language, below is a theoretical pseudocode-based code to help you implement the Base64 encoding method in languages that do not natively support it.
Here’s an example of a language-agnostic algorithm in pseudocode:
function base64_encode(input) // The base character set const BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" // The length of the input let input_length = length(input) // The output let output = "" // Process the input in 3-byte blocks for i from 0 to input_length - 1 step 3 // The value of the block let block_value = (input[i] << 16) + (input[i + 1] << 8) + input[i + 2] // Encode the block into 4 characters for j from 0 to 3 let index = (block_value >> ((3 - j) * 6)) & 0x3F output += BASE64_CHARS[index] end for end for // Pad the output length with '=' characters if necessary let padding = input_length % 3 if padding > 0 for i from 0 to (3 - padding) output[output.length - i - 1] = '=' end for end if return output end function
Here is an explanation of the code, broken down into a list:
- The binary data to be encoded is passed as an input parameter to the base64_encode function.
- The basic character set for the encoding, which consists of 64 characters, is designated by the constant BASE64_CHARS.
- The length of the input data is used to calculate the input_length variable.
- The output variable, which will hold the result of the encoding, is initially set to an empty string.
- A for loop that iterates from 0 to input_length – 1 with a step of 3 processes the input data in 3-byte blocks.
- The first byte is shifted left by 16 bits for each block, the second byte is shifted left by 8 bits, and the third byte is added to determine the block_value variable for each block.
- A second for loop that iterates from 0 to 3 is then used to encode the block into 4 characters.
- For each character, an
index
is calculated by shifting theblock_value
to the right by a multiple of 6 bits and masking it with 0x3F. - The character at this index in the
BASE64_CHARS
constant is then appended to theoutput
string. - After all blocks have been processed, the output length is padded with ‘=’ characters if necessary.
- To accomplish this, the padding is determined by dividing the input_length by 3.
- A for loop iterates from 0 to (3 – padding) and replaces the final characters of the output string with ‘=‘ characters if padding is higher than 0.
- The function then returns the result that was encoded and saved in the output variable.
How Base64 Decoding Works
We first present a simplified infographic on Base64 decoding, and then explain the process in a bit more detail below.
Here’s a detailed explanation of how the Base64 decoding algorithm works:
- Remove Padding: If the Base64-encoded string has padding characters (‘=’), remove them. Padding characters are added to ensure that the encoded data is a multiple of 4 characters, but they are not needed for decoding.
- Convert Base64 Characters to Values: Each Base64 character in the encoded string is converted back to its value according to the Base64 character set. This is essentially the reverse lookup of the encoding process.
- Convert decimal values to 6-bit form: Each decimal value must be converted to 6-bit form.
- Concatenate 6-Bit Values: The resulting 6-bit values from step 2 are concatenated together to form a sequence of bits. This sequence of bits represents the binary data.
- Divide Bits into Bytes: The concatenated bits are divided into groups of 8 bits (1 byte). If the number of bits is not a multiple of 8, trailing bits are ignored.
- Convert Bytes to Original Data: Each group of 8 bits (byte) is then converted back to its original binary value. This process essentially reverses the original encoding steps, including the padding and concatenation.
- Reconstruct Original Data: The bytes obtained from step 5 are concatenated together to reconstruct the original binary data.
Example of Base64 Decoding
Now let’s look at how it is possible to decode a text manually.
Let’s decode the Base64 value “QmFzZTY0” back to its original string.
- Convert the Base64 characters to their decimal values:
- Q: 16
- m: 38
- F: 5
- z: 51
- Z: 25
- T: 19
- Y: 24
- 0: 52
- Convert the decimal values to 6-bit binary representations:
- Q16: 010000
- 38: 100110
- 5: 000101
- 51: 110011
- 25: 011001
- 19: 010011
- 24: 011000
- 52: 110100
- Combine the binary representations:
- 010000 100110 000101 110011 011001 010011 011000 110100
- Split the combined binary into groups of 8 bits:
- 01000010 01100001 01110011 01100101 00110110 00110100
- Convert the binary groups to their ASCII values:
- 66 97 115 101 54 52
- Convert the ASCII values to characters:
- 66: B
- 97: a
- 115: s
- 101: e
- 54: 6
- 52: 4
Finally, we got that the Base64 value “QmFzZTY0” corresponds to the text “Base64”.
You can check your work with our free Base64 Decoder.
Implementing Base64 Decoding Algorithm in Pseudocode
Now let’s look at decoding independently of the programming language.
function base64_decode(input) // The base character set const BASE64_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" // The length of the input let input_length = length(input) // The output let output = [] // Process the input in 4-character blocks for i from 0 to input_length - 1 step 4 // The value of the block let block_value = (index_of(BASE64_CHARS, input[i]) << 18) + (index_of(BASE64_CHARS, input[i + 1]) << 12) + (index_of(BASE64_CHARS, input[i + 2]) << 6) + index_of(BASE64_CHARS, input[i + 3]) // Decode the block into 3 bytes for j from 0 to 2 let byte = (block_value >> ((2 - j) * 8)) & 0xFF output.append(byte) end for end for // Remove any padding bytes from the output let padding = count(input, '=') if padding > 0 output = output[0:output.length - padding] end if return output end function
- The
base64_decode
function takes aninput
parameter, which is the string of characters to be decoded. - The
BASE64_CHARS
constant is defined as the base character set for the decoding, which consists of 64 characters. - The length of the input string is used to calculate the input_length variable.
- The output variable, which will hold the result of the decoding, is initialized as an empty array.
- The input data is processed in 4-character blocks using a
for
loop that iterates from 0 toinput_length - 1
with a step of 4. - For each block, the
block_value
variable is calculated by shifting the index of each character in theBASE64_CHARS
constant to the left by a multiple of 6 bits and adding them together. - Then, a second for loop iterating from 0 to 2 decodes the block into 3 bytes.
- By moving the block_value to the right by a multiple of 8 bits and masking it with 0xFF, a value is computed for each byte. This value is then appended to the output array.
- If necessary, any padding bytes are taken out of the output once all blocks have been processed. The count function is used to determine how many padding characters (‘=‘) are present in the input, and then the appropriate number of bytes are subtracted from the end of the output array.
- Finally, the function returns the decoded result stored in the
output
array.