In this article, we will go over Base32 encoding in detail, focusing on its unique characteristics in data manipulation. We will look at its character set, provide a reference table, and look at various character groups. In addition, we’ll look at the differences introduced by Z-Base32, talk about case sensitivity, and show how regular expressions can be used to detect Base32. Join us on this educational journey as we explore the complexities of Base32 encoding and its applications.

What is Base32? Characters, Tables, Regular Expressions, Z-Base32, Base32Hex and Crockford’s Base32

In this article, we will go over Base32 encoding in detail, focusing on its unique characteristics in data manipulation. We will look at its character set, provide a reference table, and look at various character groups. In addition, we’ll look at the differences introduced by Z-Base32, talk about case sensitivity, and show how regular expressions can be used to detect Base32. Join us on this educational journey as we explore the complexities of Base32 encoding and its applications.

The Base32 Character Set

In contrast to its sibling Base64, which uses 64 characters, Base32 uses a smaller set of 32 characters. This decision is motivated by the desire to create encoding that avoids similar-looking characters in order to avoid transcription or display errors.

The Base32 character set includes the following symbols: A–Z and 2–7. These characters have been carefully chosen to avoid confusion between visually similar characters such as ‘0’ and ‘O’ or ‘1’ and ‘I’. As a result, the character set ensures that data is reliably represented and transmitted.

Base32 Character Table and Reference

A reference table is required for a clear understanding of Base32 encoding. A comprehensive table demonstrating the correspondence between binary, decimal, and Base32 values is provided below, providing a quick and convenient way to look up character representations for various data.

BinaryDecimalBase32
000000A
000011B
000102C
000113D
001004E
001015F
001106G
001117H
010008I
010019J
0101010K
0101111L
0110012M
0110113N
0111014O
0111115P
1000016Q
1000117R
1001018S
1001119T
1010020U
1010121V
1011022W
1011123X
1100024Y
1100125Z
11010262
11011273
11100284
11101295
11110306
11111317

This reference table is a handy tool for quickly reading Base32-encoded characters and their related values, which will aid you in a variety of encoding and decoding activities.

Base32 Characters Group

Characters in Base32 encoding are classified into distinct categories based on their indices within the character set. These are some of the categories:

Uppercase Letters (Indices 0-25): A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z

Numbers (Indices 26-31): 2, 3, 4, 5, 6, 7

Z-Base32 Differences

While the traditional Base32 encoding uses uppercase letters and numbers, the Z-Base32 variant takes a different approach. The character set in Z-Base32 is designed to reduce potential reading errors caused by visual similarity between characters.

DecimalZ-Base32Binary
0y00000
1b00001
2n00010
3d00011
4r00100
5f00101
6g00110
7800111
8e01000
9j01001
10k01010
11m01011
12c01100
13p01101
14q01110
15x01111
16o10000
17t10001
18110010
19u10011
20w10100
21i10101
22s10110
23z10111
24a11000
25311001
26411010
27511011
28h11100
29711101
30611110
31911111

Case Sensitivity in Base32 Encoding

In Base32 encoding, characters are case-sensitive, which means that lowercase and uppercase letters represent different values. Here are some examples:

B64ENCODER = II3DIRKOINHUIRKS
b64encoder = MI3DIZLOMNXWIZLS
B64encoder = II3DIZLOMNXWIZLS
B64Encoder = II3DIRLOMNXWIZLS

Using Regular Expressions for Base32 Detection

Regular expressions can be an extremely effective tool for pattern matching and detecting specific formats within a string. Regular expressions can help identify and extract Base32 encoded strings within a larger body of text when it comes to detecting Base32 encoded strings within a larger body of text.

Here’s a simple regular expression pattern that can be used to detect Base32 encoded strings:

\b[A-Z2-7]+=*\b

Let’s dissect this pattern’s components:

  • \b: This is a word boundary anchor that ensures the pattern matches whole words.
  • [A-Z2-7]+: This character class matches uppercase letters from A to Z, and the digits 2 to 7. The + indicates that one or more of these characters must be present.
  • =*: This matches any number of equal signs (padding characters) that may be present at the end of the Base32 encoded string.

Base32Hex and Crockford’s Base32

Base32Hex is an alternative to the standard Base32. It is a notation for expressing numbers that can be conveniently and accurately transmitted between humans and computer systems. This textual 32-symbol notation is human readable, machine readable, compact, error resistant, and pronounceable.

On the other hand, Crockford’s Base32 is another alternative design for Base32 created by Douglas Crockford. It has been designed with certain modifications to avoid confusion and accidental obscenity. It excludes the letters I, L, O to avoid confusion with digits and U to prevent accidental obscenity. When decoding, upper and lower case letters are accepted, and i and l will be treated as 1 and o will be treated as 0. When encoding, only upper case letters are used. Crockford also proposes using additional characters for a mod-37 checksum.