In this article, we’ll be delving into the world of Base64 characters. We’ll introduce you to the possible characters used in Base64 encoding and provide you with a detailed Base64 character table. Additionally, we’ll show you how to use regular expressions to work with Base64 encoded data.

Base64 Characters: A Comprehensive Guide with Tables, Regular Expressions, and More

This article delves into the composition of the Base64 character set, providing a comprehensive table for reference. Additionally, we will explore the utilization of regular expressions to effectively manipulate and validate Base64 encoded data within the context of Angular applications.

The Role of Characters in Base64 Encoding

Base64 encoding relies on a carefully chosen set of 64 characters to represent binary data in text form. Each character within this set corresponds to a specific 6-bit value. This selection is deliberate and ensures compatibility across various systems while avoiding characters that might cause issues in specific contexts like URLs or emails.

This limited character set allows Base64 to generate a consistent and predictable representation of data, facilitating its seamless transmission and storage within text-based environments. Grasping the role of individual characters in Base64 encoding forms the foundation for understanding how binary data is converted to text and subsequently decoded. We will delve into the details of the Base64 character set and its significance in a later section of this tutorial.

The Base64 Character Set

The Base64 character set is a collection of 64 characters that have been carefully chosen to efficiently represent a wide variety of binary values. This set includes uppercase letters (A-Z), lowercase letters (a-z), numerical digits (0-9), and two additional delimiters: “+” and “/”.

These 64 characters have a balanced mix of ASCII characters, allowing Base64 encoding to perform flawlessly across multiple systems and platforms.

The Base64 character set’s uniform distribution and predictability enable consistent encoding and decoding results. While each letter corresponds to a distinct 6-bit binary value, the cumulative effect of these characters allows binary data to be expressed in a text-based format.

Base64 Character Table and Reference

The Base64 character table serves as a comprehensive reference, providing a clear mapping of characters to their respective binary and decimal values.

Here is the comprehensive Base64 character table:

CharacterBinaryDecimal
A0000000
B0000011
C0000102
D0000113
E0001004
F0001015
G0001106
H0001117
I0010008
J0010019
K00101010
L00101111
M00110012
N00110113
O00111014
P00111115
Q01000016
R01000117
S01001018
T01001119
U01010020
V01010121
W01011022
X01011123
Y01100024
Z01100125
a01101026
b01101127
c01110028
d01110129
e01111030
f01111131
g10000032
h10000133
i10001034
j10001135
k10010036
l10010137
m10011038
n10011139
o10100040
p10100141
q10101042
r10101143
s10110044
t10110145
u10111046
v10111147
w11000048
x11000149
y11001050
z11001151
011010052
111010153
211011054
311011155
411100056
511100157
611101058
711101159
811110060
911110161
+11111062
/11111163

Base64 Characters Group

The Base64 character set can be categorized into several distinct groups:

  • Uppercase Letters (indices 0-25): The uppercase alphabet is represented by these characters, which compose the first segment of the Base64 character set. They each contribute a value between 0 and 25.
  • Lowercase Letters (Indices 26-51): This group follows the uppercase letters and includes the lowercase alphabet. These characters increase the number of possible values from 26 to 51.
  • Digits (Indices 52-61): The digit group consists of the numerical digits 0 through 9. These characters, which have indices 52 to 61, are crucial in representing numerical values.
  • Special Symbols (Indices 62-63): The Base64 character set concludes with two special symbols, ‘+’ and ‘/’. These symbols, placed at indices 62 and 63, help to expand the character repertoire while guaranteeing system compatibility.

Base64URL Differences

Base64 uses a character set that includes special characters such as “+”, “/”, and “=”, which, due to their distinct semantics, may pose issues in URL scenarios. Base64URL, on the other hand, uses a URL-safe character set, replacing “+” with “-“, “/” with “_”, and padding with “=”.

Below are the characters in which Base64 and Base64URL differ:

The difference between Base64 vs Base64URL characters - Infographic

Now in table format.

Base64 CharacterBase64URL Equivalent
+
/_
= (Padding)(Padding omitted)

Case Sensitivity in Base64 Encoding

When engaging in data transformation through this method, a common inquiry revolves around the case sensitivity of Base64 encoding. To put it succinctly, Base64 encoding inherently maintains case sensitivity. This implies that distinguishing between uppercase and lowercase letters in the input data can lead to distinct encoded outputs.

For example, if you encode the same data but change the case, you’ll get various Base64-encoded texts. This is because uppercase and lowercase letters are handled as separate characters during the encoding process.

Let’s explore an example to illustrate the impact of case sensitivity in Base64 encoding:

B64ENCODE = QjY0RU5DT0RF
b64encode = YjY0ZW5jb2Rl
B64Encode = QjY0RW5jb2Rl
B64encode = QjY0ZW5jb2Rl

As you can see, the encoded strings differ when the case of the letters changes.

If I take the Base64 value of “B64Encode”, which is “QjY0RW5jb2Rl” and set all the characters to lower or upper case, then you can see when decoding that the result will be completely different from the original content.

qjy0rw5jb2rl = ª<´¯cojå
QJY0RW5JB2RL = @–4EnIdK

Using Regular Expressions for Base64 Detection

Regular expressions (regex) provide a powerful tool for identifying and manipulating Base64 encoded data within larger datasets. Their effectiveness stems from their ability to define precise patterns that match the specific characteristics of Base64 encoding.

These patterns typically include:

  • Valid Characters: Uppercase and lowercase letters (A-Z, a-z), digits (0-9), and the symbols '+' and '/'.
  • Length: A multiple of four characters.
  • Padding: Optional padding characters ('=') at the end to ensure a multiple of four.

By leveraging regex, developers can efficiently:

  • Search: Locate Base64 encoded strings within larger data sets.
  • Extract: Isolate specific portions of encoded data.
  • Validate: Verify the integrity and format of Base64 strings.

Here's a simple example of a regular expression pattern for detecting potential Base64-encoded strings:

^[A-Za-z0-9+/]*={0,2}$

This pattern checks for strings that consist only of Base64 characters and allows for up to two padding characters at the end.

Additionally, the following regular expression will match any character that should never show up in Base 64 encodings:

[^A-Za-z0-9+/=]