ASCII - CameronAuler/python-devops GitHub Wiki

Table of Contents

Introduction to ASCII

ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding standard used in computers, communication systems, and text-based data processing. Each ASCII character is represented by a 7-bit binary number (values 0 to 127), allowing for a total of 128 distinct characters.

Purpose and History

ASCII was developed in 1963 by the American National Standards Institute (ANSI) and the American Standards Association (ASA) to create a universal encoding system for text and control characters. It was derived from telegraph codes and designed for interoperability between different hardware and software systems.

The ASCII standard became widely adopted in computing, replacing earlier proprietary character sets. It was integrated into early operating systems such as UNIX and MS-DOS and remains foundational in modern character encoding.

Technical Characteristics

7-bit encoding (values 0โ€“127) Character set categories: Control characters (0-31, 127) โ€“ Non-printable commands (e.g., line feed, carriage return) Printable characters (32-126) โ€“ Letters, numbers, punctuation, and symbols Standard byte representation: Stored as one byte with the most significant bit (MSB) set to 0 in 8-bit systems Compatible with modern encodings (UTF-8, Unicode)

Role in Computing

ASCII serves as the basis for text representation in:

Programming (source code storage) Networking protocols (HTTP, SMTP, FTP headers) Command-line interfaces (Linux, Windows shells) Data transmission (ASCII-based file formats like .txt and .csv) ASCII remains fundamental in computing despite the evolution of Unicode, ensuring backward compatibility and efficient text representation in legacy systems.

ASCII Character Encoding

Encoding Structure

ASCII uses a 7-bit encoding scheme, assigning a unique binary value (0000000 to 1111111) to each character. It supports 128 distinct characters (0-127) and is stored in a single byte (8-bit systems) with the most significant bit (MSB) set to 0.

Bit Representation

Each ASCII character is represented as:

b6  b5  b4  b3  b2  b1  b0  (7-bit ASCII)

Where b6 is the most significant bit (MSB), and b0 is the least significant bit (LSB).

Encoding Ranges

| Decimal | Binary          | Hex  | Category         | Example Characters |
|---------|----------------|------|-----------------|--------------------|
| 0โ€“31    | 0000000โ€“0011111 | 0x00โ€“0x1F | Control Characters | NULL, LF, CR, ESC |
| 32โ€“47   | 0100000โ€“0101111 | 0x20โ€“0x2F | Punctuation        | Space, `!`, `"`, `#`, `$` |
| 48โ€“57   | 0110000โ€“0111001 | 0x30โ€“0x39 | Digits (0-9)       | `0`, `1`, `2` ... `9` |
| 58โ€“64   | 0111010โ€“1000000 | 0x3Aโ€“0x40 | Special Symbols    | `:`, `;`, `<`, `=`, `>` |
| 65โ€“90   | 1000001โ€“1011010 | 0x41โ€“0x5A | Uppercase Letters  | `A` - `Z` |
| 91โ€“96   | 1011011โ€“1100000 | 0x5Bโ€“0x60 | Special Symbols    | `[`, `\`, `]`, `^`, `_` |
| 97โ€“122  | 1100001โ€“1111010 | 0x61โ€“0x7A | Lowercase Letters  | `a` - `z` |
| 123โ€“126 | 1111011โ€“1111110 | 0x7Bโ€“0x7E | Special Symbols    | `{`, `|`, `}`, `~` |
| 127     | 1111111 | 0x7F | Control Character | DEL |

Control Characters (0โ€“31, 127)

Control characters are non-printable and used for text formatting, device control, and communication.

| ASCII Code | Binary  | Name | Function |
|------------|--------|------|----------|
| 0 (0x00)  | 0000000 | NULL | Terminates strings in C |
| 7 (0x07)  | 0000111 | BEL  | Triggers system beep |
| 8 (0x08)  | 0001000 | BS   | Backspace |
| 9 (0x09)  | 0001001 | TAB  | Horizontal tab |
| 10 (0x0A) | 0001010 | LF   | Line feed (Newline) |
| 13 (0x0D) | 0001101 | CR   | Carriage return |
| 27 (0x1B) | 0011011 | ESC  | Escape sequence starter |
| 127 (0x7F) | 1111111 | DEL | Delete character |

Printable Characters (32โ€“126)

Includes letters, digits, punctuation, and special symbols. These are directly visible in text representation.

| Decimal | Binary  | Hex  | Character |
|---------|--------|------|-----------|
| 32      | 0100000 | 0x20 | (Space)   |
| 33      | 0100001 | 0x21 | !         |
| 34      | 0100010 | 0x22 | "         |
| 35      | 0100011 | 0x23 | #         |
| 36      | 0100100 | 0x24 | $         |
| ...     | ...     | ...  | ...       |
| 65      | 1000001 | 0x41 | A         |
| 66      | 1000010 | 0x42 | B         |
| 67      | 1000011 | 0x43 | C         |
| ...     | ...     | ...  | ...       |
| 97      | 1100001 | 0x61 | a         |
| 98      | 1100010 | 0x62 | b         |
| 99      | 1100011 | 0x63 | c         |
| ...     | ...     | ...  | ...       |
| 122     | 1111010 | 0x7A | z         |
| 123     | 1111011 | 0x7B | {         |
| 124     | 1111100 | 0x7C | |         |
| 125     | 1111101 | 0x7D | }         |
| 126     | 1111110 | 0x7E | ~         |

Storage in Memory

  • Stored as 7-bit values in legacy systems.
  • In 8-bit systems, ASCII characters are stored with the 8th bit (MSB) set to 0.
  • In modern Unicode (UTF-8), ASCII values remain unchanged for backward compatibility.

Python Example

# Convert character to ASCII value
print(ord('A'))  # Output: 65

# Convert ASCII value to character
print(chr(65))  # Output: 'A'

ASCII Table

The following table contains the full ASCII character set (0โ€“127), including control characters, printable characters, and their decimal, binary, hexadecimal, and character representations.

| Dec | Bin       | Hex  | Char | Description           |
|-----|----------|------|------|-----------------------|
| 0   | 0000000  | 0x00 | NUL  | Null                 |
| 1   | 0000001  | 0x01 | SOH  | Start of Heading     |
| 2   | 0000010  | 0x02 | STX  | Start of Text        |
| 3   | 0000011  | 0x03 | ETX  | End of Text          |
| 4   | 0000100  | 0x04 | EOT  | End of Transmission  |
| 5   | 0000101  | 0x05 | ENQ  | Enquiry              |
| 6   | 0000110  | 0x06 | ACK  | Acknowledge          |
| 7   | 0000111  | 0x07 | BEL  | Bell (Beep)          |
| 8   | 0001000  | 0x08 | BS   | Backspace            |
| 9   | 0001001  | 0x09 | TAB  | Horizontal Tab       |
| 10  | 0001010  | 0x0A | LF   | Line Feed (Newline)  |
| 11  | 0001011  | 0x0B | VT   | Vertical Tab         |
| 12  | 0001100  | 0x0C | FF   | Form Feed           |
| 13  | 0001101  | 0x0D | CR   | Carriage Return      |
| 14  | 0001110  | 0x0E | SO   | Shift Out            |
| 15  | 0001111  | 0x0F | SI   | Shift In             |
| 16  | 0010000  | 0x10 | DLE  | Data Link Escape     |
| 17  | 0010001  | 0x11 | DC1  | Device Control 1     |
| 18  | 0010010  | 0x12 | DC2  | Device Control 2     |
| 19  | 0010011  | 0x13 | DC3  | Device Control 3     |
| 20  | 0010100  | 0x14 | DC4  | Device Control 4     |
| 21  | 0010101  | 0x15 | NAK  | Negative Acknowledge |
| 22  | 0010110  | 0x16 | SYN  | Synchronous Idle     |
| 23  | 0010111  | 0x17 | ETB  | End of Transmission Block |
| 24  | 0011000  | 0x18 | CAN  | Cancel              |
| 25  | 0011001  | 0x19 | EM   | End of Medium       |
| 26  | 0011010  | 0x1A | SUB  | Substitute          |
| 27  | 0011011  | 0x1B | ESC  | Escape              |
| 28  | 0011100  | 0x1C | FS   | File Separator      |
| 29  | 0011101  | 0x1D | GS   | Group Separator     |
| 30  | 0011110  | 0x1E | RS   | Record Separator    |
| 31  | 0011111  | 0x1F | US   | Unit Separator      |
| 32  | 0100000  | 0x20 | (space) | Space             |
| 33  | 0100001  | 0x21 | !    | Exclamation mark    |
| 34  | 0100010  | 0x22 | "    | Double quote        |
| 35  | 0100011  | 0x23 | #    | Hash                |
| 36  | 0100100  | 0x24 | $    | Dollar sign         |
| 37  | 0100101  | 0x25 | %    | Percent            |
| 38  | 0100110  | 0x26 | &    | Ampersand          |
| 39  | 0100111  | 0x27 | '    | Apostrophe         |
| 40  | 0101000  | 0x28 | (    | Left parenthesis   |
| 41  | 0101001  | 0x29 | )    | Right parenthesis  |
| 42  | 0101010  | 0x2A | *    | Asterisk          |
| 43  | 0101011  | 0x2B | +    | Plus sign         |
| 44  | 0101100  | 0x2C | ,    | Comma             |
| 45  | 0101101  | 0x2D | -    | Minus sign        |
| 46  | 0101110  | 0x2E | .    | Period (Dot)      |
| 47  | 0101111  | 0x2F | /    | Slash             |
| 48  | 0110000  | 0x30 | 0    | Digit 0           |
| 49  | 0110001  | 0x31 | 1    | Digit 1           |
| 50  | 0110010  | 0x32 | 2    | Digit 2           |
| ... | ...      | ...  | ...  | ...               |
| 65  | 1000001  | 0x41 | A    | Uppercase A       |
| 66  | 1000010  | 0x42 | B    | Uppercase B       |
| 67  | 1000011  | 0x43 | C    | Uppercase C       |
| ... | ...      | ...  | ...  | ...               |
| 97  | 1100001  | 0x61 | a    | Lowercase a       |
| 98  | 1100010  | 0x62 | b    | Lowercase b       |
| 99  | 1100011  | 0x63 | c    | Lowercase c       |
| ... | ...      | ...  | ...  | ...               |
| 122 | 1111010  | 0x7A | z    | Lowercase z       |
| 123 | 1111011  | 0x7B | {    | Left Brace        |
| 124 | 1111100  | 0x7C | |    | Vertical Bar      |
| 125 | 1111101  | 0x7D | }    | Right Brace       |
| 126 | 1111110  | 0x7E | ~    | Tilde             |
| 127 | 1111111  | 0x7F | DEL  | Delete            |

Usage of ASCII Table

  • Control characters (0-31, 127) are used for text formatting, device control, and communication.
  • Printable characters (32-126) include letters, digits, punctuation, and symbols.
  • ASCII remains unchanged in UTF-8 for backward compatibility.

Python Example

# Convert a character to ASCII
print(ord('A'))  # Output: 65

# Convert ASCII to a character
print(chr(65))  # Output: 'A'

Extended ACII (8 bit)

Extended ASCII is an 8-bit character encoding system that expands the original 7-bit ASCII (0-127) by adding an additional 128 characters (128-255). This allows for 256 unique characters, incorporating accented letters, symbols, and graphical characters used in different languages and operating systems.

Technical Characteristics

  • 8-bit encoding (0-255)
  • Backwards compatible with standard ASCII (0-127 remain unchanged)
  • Different variations exist based on specific character needs (e.g., ISO-8859-1, Windows-1252)
  • Supports international characters (French, German, Spanish, etc.)
| Decimal | Binary     | Hex  | Character | Description                      |
|---------|-----------|------|-----------|----------------------------------|
| 128     | 10000000  | 0x80 | ร‡         | Latin Capital Letter C with Cedilla |
| 129     | 10000001  | 0x81 | รผ         | Latin Small Letter U with Diaeresis |
| 130     | 10000010  | 0x82 | รฉ         | Latin Small Letter E with Acute |
| 131     | 10000011  | 0x83 | รข         | Latin Small Letter A with Circumflex |
| 132     | 10000100  | 0x84 | รค         | Latin Small Letter A with Diaeresis |
| 133     | 10000101  | 0x85 | ร          | Latin Small Letter A with Grave |
| 134     | 10000110  | 0x86 | รฅ         | Latin Small Letter A with Ring Above |
| 135     | 10000111  | 0x87 | รง         | Latin Small Letter C with Cedilla |
| 136     | 10001000  | 0x88 | รช         | Latin Small Letter E with Circumflex |
| 137     | 10001001  | 0x89 | รซ         | Latin Small Letter E with Diaeresis |
| 138     | 10001010  | 0x8A | รจ         | Latin Small Letter E with Grave |
| 139     | 10001011  | 0x8B | รฏ         | Latin Small Letter I with Diaeresis |
| 140     | 10001100  | 0x8C | รฎ         | Latin Small Letter I with Circumflex |
| 141     | 10001101  | 0x8D | รฌ         | Latin Small Letter I with Grave |
| 142     | 10001110  | 0x8E | ร„         | Latin Capital Letter A with Diaeresis |
| 143     | 10001111  | 0x8F | ร…         | Latin Capital Letter A with Ring Above |
| ...     | ...       | ...  | ...       | ...                              |
| 176     | 10110000  | 0xB0 | ยฐ         | Degree Symbol                     |
| 177     | 10110001  | 0xB1 | ยฑ         | Plus-Minus Symbol                 |
| 178     | 10110010  | 0xB2 | ยฒ         | Superscript 2                     |
| 179     | 10110011  | 0xB3 | ยณ         | Superscript 3                     |
| 180     | 10110100  | 0xB4 | ยด         | Acute Accent                       |
| ...     | ...       | ...  | ...       | ...                              |
| 224     | 11100000  | 0xE0 | ร          | Latin Small Letter A with Grave  |
| 225     | 11100001  | 0xE1 | รก         | Latin Small Letter A with Acute  |
| 226     | 11100010  | 0xE2 | รข         | Latin Small Letter A with Circumflex  |
| 227     | 11100011  | 0xE3 | รฃ         | Latin Small Letter A with Tilde  |
| 228     | 11100100  | 0xE4 | รค         | Latin Small Letter A with Diaeresis  |
| 229     | 11100101  | 0xE5 | รฅ         | Latin Small Letter A with Ring Above  |
| 230     | 11100110  | 0xE6 | รฆ         | Latin Small Letter AE  |
| 231     | 11100111  | 0xE7 | รง         | Latin Small Letter C with Cedilla  |
| ...     | ...       | ...  | ...       | ...                              |
| 255     | 11111111  | 0xFF | รฟ         | Latin Small Letter Y with Diaeresis |

Extended ASCII Variants

Because ASCII was originally only 7-bit, different 8-bit extended ASCII encodings were created. Some common variations include:

1. ISO-8859-1 (Latin-1):

  • Used in Western European languages
  • Standard in older Unix and Windows systems
  • Supports accented characters (รฉ, รฑ, รธ, etc.)

2. Windows-1252:

  • Microsoft extension of ISO-8859-1
  • Includes additional symbols like โ‚ฌ (Euro sign)
  • Used in legacy Windows applications

3. Code Page 437 (OEM-US):

  • Original IBM PC character set
  • Includes box-drawing characters and symbols

4. ISO-8859-15 (Latin-9):

  • Replacement for ISO-8859-1
  • Includes the Euro (โ‚ฌ) sign and corrected French/Finish characters

Storage & Encoding in Systems

  • Legacy text encoding stored characters as single 8-bit bytes.
  • Modern systems use Unicode (UTF-8, UTF-16, UTF-32) but remain backward-compatible with ASCII.
  • Extended ASCII is NOT standardizedโ€”values above 127 vary by system and locale.

Python Example

# Encoding and decoding extended ASCII characters
char = 'รฉ'
ascii_code = ord(char)
print(ascii_code)  # Output: 233 (ISO-8859-1)

# Convert ASCII code back to character
print(chr(233))  # Output: 'รฉ'

Key Takeaways

  • Extended ASCII expands original ASCII to 8-bit (256 characters).
  • Different encoding systems define values above 127 differently.
  • Not all systems use the same extended ASCII characters.
  • Modern Unicode (UTF-8) supersedes extended ASCII but maintains compatibility.

ASCII Control Characters (0โ€“31, 127)

Control characters are non-printable ASCII characters (0โ€“31, 127) used for text formatting, device communication, and control signaling. They were originally designed for teletypes, terminals, and network protocols.

Technical Characteristics

  • 7-bit encoding (0โ€“31, 127)
  • Non-printable characters
  • Used for cursor movement, device control, and text formatting
  • Common in command-line interfaces, networking, and serial communication

Complete ASCII Control Character Table (0โ€“31, 127)

| Dec  | Bin       | Hex  | Abbr | Name                        | Function                                  | Python Escape Sequence |
|------|----------|------|------|-----------------------------|------------------------------------------|------------------------|
| 0    | 0000000  | 0x00 | NUL  | Null                        | String terminator in C, no effect in text | `\x00`                |
| 1    | 0000001  | 0x01 | SOH  | Start of Heading            | Marks the start of a message header      | `\x01`                |
| 2    | 0000010  | 0x02 | STX  | Start of Text               | Marks the start of the message body      | `\x02`                |
| 3    | 0000011  | 0x03 | ETX  | End of Text                 | Indicates end of a text transmission     | `\x03`                |
| 4    | 0000100  | 0x04 | EOT  | End of Transmission         | Terminates a transmission session        | `\x04`                |
| 5    | 0000101  | 0x05 | ENQ  | Enquiry                     | Requests a response from the receiver    | `\x05`                |
| 6    | 0000110  | 0x06 | ACK  | Acknowledge                 | Confirms successful reception            | `\x06`                |
| 7    | 0000111  | 0x07 | BEL  | Bell (Alert)                | Triggers an audible beep                 | `\a` or `\x07`        |
| 8    | 0001000  | 0x08 | BS   | Backspace                   | Moves cursor one position back           | `\b` or `\x08`        |
| 9    | 0001001  | 0x09 | TAB  | Horizontal Tab              | Moves cursor to the next tab stop        | `\t` or `\x09`        |
| 10   | 0001010  | 0x0A | LF   | Line Feed (Newline)         | Moves cursor to the next line            | `\n` or `\x0A`        |
| 11   | 0001011  | 0x0B | VT   | Vertical Tab                | Moves cursor vertically                  | `\v` or `\x0B`        |
| 12   | 0001100  | 0x0C | FF   | Form Feed                   | Advances paper to a new page (printers)  | `\f` or `\x0C`        |
| 13   | 0001101  | 0x0D | CR   | Carriage Return             | Moves cursor to the start of the line    | `\r` or `\x0D`        |
| 14   | 0001110  | 0x0E | SO   | Shift Out                   | Switches to alternate character set      | `\x0E`                |
| 15   | 0001111  | 0x0F | SI   | Shift In                    | Switches back to default character set   | `\x0F`                |
| 16   | 0010000  | 0x10 | DLE  | Data Link Escape            | Marks start of a control sequence        | `\x10`                |
| 17   | 0010001  | 0x11 | DC1  | Device Control 1 (XON)      | Resumes paused transmission              | `\x11`                |
| 18   | 0010010  | 0x12 | DC2  | Device Control 2            | User-defined device control              | `\x12`                |
| 19   | 0010011  | 0x13 | DC3  | Device Control 3 (XOFF)     | Pauses transmission                      | `\x13`                |
| 20   | 0010100  | 0x14 | DC4  | Device Control 4            | User-defined device control              | `\x14`                |
| 21   | 0010101  | 0x15 | NAK  | Negative Acknowledge        | Signals an error or failed reception     | `\x15`                |
| 22   | 0010110  | 0x16 | SYN  | Synchronous Idle            | Synchronization signal for transmission  | `\x16`                |
| 23   | 0010111  | 0x17 | ETB  | End of Transmission Block   | Signals end of a data block              | `\x17`                |
| 24   | 0011000  | 0x18 | CAN  | Cancel                      | Cancels previous command                 | `\x18`                |
| 25   | 0011001  | 0x19 | EM   | End of Medium               | Marks end of a storage medium            | `\x19`                |
| 26   | 0011010  | 0x1A | SUB  | Substitute                  | Used as a placeholder for corrupt data   | `\x1A`                |
| 27   | 0011011  | 0x1B | ESC  | Escape                      | Starts an escape sequence (ANSI codes)   | `\x1B`                |
| 28   | 0011100  | 0x1C | FS   | File Separator              | Separates data within a file             | `\x1C`                |
| 29   | 0011101  | 0x1D | GS   | Group Separator             | Separates groups of data                 | `\x1D`                |
| 30   | 0011110  | 0x1E | RS   | Record Separator            | Separates records in a database          | `\x1E`                |
| 31   | 0011111  | 0x1F | US   | Unit Separator              | Separates units of data                  | `\x1F`                |
| 127  | 1111111  | 0x7F | DEL  | Delete                      | Erases previous character (historical)   | `\x7F`                |

Python Example

print("Hello\bWorld!")  # Uses backspace (\b) to remove 'o'
print("Column1\tColumn2")  # Uses tab (\t) for spacing
print("New\nLine")  # Uses newline (\n) for line break
print("\033[1;31mRed Text\033[0m")  # Uses escape (\x1B) for terminal color formatting

Key Takeaways

  • Control characters (0โ€“31, 127) are non-printable ASCII characters used for device control, formatting, and communication.
  • Python provides escape sequences for common control characters.
  • ANSI escape sequences (ESC, \x1B) are widely used for terminal manipulation.
  • Many control characters originated from teletype machines but are still relevant in networking and text processing.

ASCII in Programming

ASCII is fundamental in programming for text encoding, data transmission, and character manipulation. It ensures cross-platform compatibility and serves as the basis for modern encodings like UTF-8.

Key Uses of ASCII in Programming

  • String Manipulation โ€“ Converting between characters and their ASCII values.
  • Text Encoding & Decoding โ€“ Handling different character sets.
  • Data Transmission & Protocols โ€“ Used in network communication (HTTP, SMTP).
  • Terminal Control โ€“ ANSI escape codes for color, cursor movement, and formatting.
  • File Handling โ€“ Processing text-based file formats (CSV, TXT, JSON).
  • Low-Level Operations โ€“ Character arithmetic, bitwise operations.

1. Character Conversion

Python provides built-in functions for converting between characters and ASCII values.

# Convert character to ASCII
print(ord('A'))  # Output: 65

# Convert ASCII value to character
print(chr(65))  # Output: 'A'

# Convert lowercase to uppercase using ASCII arithmetic
print(chr(ord('a') - 32))  # Output: 'A'

# Convert uppercase to lowercase
print(chr(ord('G') + 32))  # Output: 'g'

# Convert entire string to ASCII values
ascii_values = [ord(char) for char in "Python"]
print(ascii_values)  # Output: [80, 121, 116, 104, 111, 110]

# Convert ASCII values back to string
char_string = ''.join(chr(num) for num in ascii_values)
print(char_string)  # Output: Python

2. String Processing (Uppercase-lowercase Conversion)

text = "Hello, World!"

# Convert to uppercase using ASCII
uppercase_text = ''.join(chr(ord(char) - 32) if 'a' <= char <= 'z' else char for char in text)
print(uppercase_text)  # Output: HELLO, WORLD!

# Convert to lowercase using ASCII
lowercase_text = ''.join(chr(ord(char) + 32) if 'A' <= char <= 'Z' else char for char in text)
print(lowercase_text)  # Output: hello, world!

Filtering for ASCII Characters:

text = "Python3 is cool! ๐Ÿ˜Ž"

# Keep only ASCII characters
ascii_only = ''.join(char for char in text if ord(char) < 128)
print(ascii_only)  # Output: Python3 is cool!

3. ASCII in Networking & Data Transmission

import socket

host = "example.com"
port = 80

# ASCII-encoded HTTP request
request = "GET / HTTP/1.1\r\nHost: example.com\r\n\r\n"

# Open a socket connection
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((host, port))
    s.sendall(request.encode('ascii'))  # Send as ASCII
    response = s.recv(1024).decode('ascii')  # Receive and decode as ASCII

print(response)

4. ASCII in Terminal Control (ANSI Escape Sequences)

Formatting:

# Print colored text using ASCII Escape Sequences
print("\033[1;31mThis is red text\033[0m")
print("\033[1;32mThis is green text\033[0m")
print("\033[1;33mThis is yellow text\033[0m")
print("\033[1;34mThis is blue text\033[0m")

# Clearing the screen using ASCII ESC sequence
print("\033c")  # Resets the terminal

Moving Cursor:

import time

print("This will be overwritten", end="\r")
time.sleep(2)  # Wait for 2 seconds
print("New text after overwrite")

5. ASCII-Based File Processing

Writing and reading ASCII Files:

# Writing an ASCII text file
with open("ascii_example.txt", "w", encoding="ascii") as file:
    file.write("Hello, ASCII!\nThis is a test file.")

# Reading an ASCII text file
with open("ascii_example.txt", "r", encoding="ascii") as file:
    content = file.read()
    print(content)  # Output: Hello, ASCII!\nThis is a test file.

Checking for Non-ASCII Characters:

# Check if a file contains non-ASCII characters
with open("ascii_example.txt", "r", encoding="utf-8") as file:
    content = file.read()
    contains_non_ascii = any(ord(char) > 127 for char in content)
    print("Contains non-ASCII characters:", contains_non_ascii)

6. ASCII in Cryptography & Encoding

Base64 Encoding (ASCII Compatible):

import base64

# Encode a string in Base64 (ASCII-safe)
text = "Hello, ASCII!"
encoded_text = base64.b64encode(text.encode('ascii'))
print(encoded_text)  # Output: b'SGVsbG8sIEFTQ0lJIQ=='

# Decode Base64
decoded_text = base64.b64decode(encoded_text).decode('ascii')
print(decoded_text)  # Output: Hello, ASCII!

Hex Encoding ASCII Characters:

# Convert string to hex representation
hex_encoded = ' '.join(hex(ord(char)) for char in "ASCII")
print(hex_encoded)  # Output: 0x41 0x53 0x43 0x49 0x49

# Convert hex back to string
hex_decoded = ''.join(chr(int(hex_code, 16)) for hex_code in hex_encoded.split())
print(hex_decoded)  # Output: ASCII

7. ASCII in Bitwise Operations

Checking If a Character is Uppercase Using Bitwise AND:

def is_uppercase(char):
    return not (ord(char) & 32) if 'A' <= char <= 'Z' else False

print(is_uppercase('A'))  # Output: True
print(is_uppercase('a'))  # Output: False

Bitwise Character Manipulation:

# Convert uppercase to lowercase using bitwise OR
lower = chr(ord('A') | 32)  
print(lower)  # Output: 'a'

# Convert lowercase to uppercase using bitwise AND
upper = chr(ord('a') & ~32)
print(upper)  # Output: 'A'

# Toggle case using XOR
toggle_case = lambda char: chr(ord(char) ^ 32) if 'A' <= char <= 'Z' or 'a' <= char <= 'z' else char

print(toggle_case('a'))  # Output: 'A'
print(toggle_case('Z'))  # Output: 'z'

Key Takeaways

  • Python provides built-in ASCII handling via ord(), chr(), and string operations.
  • ASCII arithmetic allows case conversion and character manipulation efficiently.
  • ANSI escape sequences enable advanced terminal formatting.
  • ASCII-based networking ensures compatibility in HTTP, SMTP, and other protocols.
  • Bitwise operations provide efficient ASCII character processing.

ASCII in Text Files and Communication

ASCII plays a crucial role in text files and communication protocols by ensuring consistent encoding, data transmission, and formatting. It is widely used in log files, configuration files, network protocols (HTTP, SMTP, FTP), and structured text formats (CSV, JSON, XML).

Writing and Reading ASCII Text Files

Writing to an ASCII File:

# Writing ASCII text to a file
with open("ascii_text.txt", "w", encoding="ascii") as file:
    file.write("Hello, ASCII!\nThis is a text file.\nLine 3.")

Reading an ASCII File:

# Reading an ASCII text file
with open("ascii_text.txt", "r", encoding="ascii") as file:
    content = file.read()
    print(content)

Appending to an ASCII File:

# Append new content to an existing ASCII file
with open("ascii_text.txt", "a", encoding="ascii") as file:
    file.write("\nAppending new ASCII content.")

Handling Non-ASCII Characters in Files

Detecting Non-ASCII Characters:

# Detect and list non-ASCII characters in a file
with open("ascii_text.txt", "r", encoding="utf-8") as file:
    content = file.read()

non_ascii_chars = [char for char in content if ord(char) > 127]
print("Non-ASCII Characters Found:", non_ascii_chars if non_ascii_chars else "None")

Removing Non-ASCII Characters:

# Remove all non-ASCII characters from text
def remove_non_ascii(text):
    return ''.join(char for char in text if ord(char) < 128)

cleaned_text = remove_non_ascii(content)
print(cleaned_text)

ASCII in CSV and JSON Files

Saving ASCII Data to a CSV File:

import csv

# Writing ASCII data to a CSV file
data = [["Name", "Age"], ["Alice", "25"], ["Bob", "30"]]

with open("ascii_data.csv", "w", newline='', encoding="ascii") as file:
    writer = csv.writer(file)
    writer.writerows(data)

Reading ASCII Data from a CSV File:

# Reading ASCII data from a CSV file
with open("ascii_data.csv", "r", encoding="ascii") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Saving ASCII Data to a JSON File:

import json

data = {"name": "Alice", "age": 25, "city": "New York"}

# Writing ASCII JSON file
with open("ascii_data.json", "w", encoding="ascii") as file:
    json.dump(data, file, ensure_ascii=True)

# Reading ASCII JSON file
with open("ascii_data.json", "r", encoding="ascii") as file:
    loaded_data = json.load(file)
    print(loaded_data)

ASCII in Network Communication

Sending an ASCII-Based HTTP Request:

import socket

host = "example.com"
port = 80

# ASCII-encoded HTTP request
request = "GET / HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n"

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((host, port))
    s.sendall(request.encode('ascii'))  # Send as ASCII
    response = s.recv(4096).decode('ascii')  # Receive and decode as ASCII

print(response)

Encoding an Email Message (SMTP ASCII Communication):

import smtplib

sender = "[email protected]"
receiver = "[email protected]"

# ASCII-encoded email message
message = """\
From: [email protected]
To: [email protected]
Subject: ASCII Test Email

Hello, this is an ASCII-encoded email message.
"""

# Sending email using ASCII SMTP
with smtplib.SMTP("smtp.example.com", 25) as server:
    server.sendmail(sender, receiver, message.encode('ascii'))

ASCII in Log Files and System Communication

Writing Logs in ASCII Format;

import logging

# Configure logging to store messages in ASCII format
logging.basicConfig(filename="ascii_logs.txt", level=logging.INFO, encoding="ascii")

# Write ASCII log messages
logging.info("INFO: ASCII log entry recorded.")
logging.warning("WARNING: This is an ASCII-based warning.")

Reading and Filtering Logs for ASCII Compliance:

# Read ASCII log file and filter for errors
with open("ascii_logs.txt", "r", encoding="ascii") as file:
    logs = file.readlines()

error_logs = [log for log in logs if "ERROR" in log]
print("Error Logs:", error_logs if error_logs else "No errors found.")

ASCII in URL Encoding (Percent-Encoding)

ASCII characters in URLs must be encoded to ensure safe transmission.

Encoding a URL:

import urllib.parse

url = "https://example.com/search?q=Hello ASCII!"
encoded_url = urllib.parse.quote(url)
print(encoded_url)  # Output: https%3A//example.com/search%3Fq%3DHello%20ASCII%21

Decoding an ASCII URL:

decoded_url = urllib.parse.unquote(encoded_url)
print(decoded_url)  # Output: https://example.com/search?q=Hello ASCII!

ASCII in Serial Communication

Many devices (e.g., Arduino, microcontrollers) communicate via ASCII-based serial protocols.

import serial

# Open serial connection
ser = serial.Serial('COM3', 9600, timeout=1)

# Send ASCII data
ser.write("Hello, device!\n".encode('ascii'))

# Read response
response = ser.readline().decode('ascii')
print(response)

# Close serial connection
ser.close()

Key Takeaways

  • ASCII is the standard format for text file storage (TXT, CSV, JSON).
  • Networking protocols (HTTP, SMTP, FTP) use ASCII encoding for communication.
  • ASCII encoding ensures compatibility in log files, system messages, and URLs.
  • Serial communication and hardware interfaces rely on ASCII-based protocols.

ASCII Vs. Unicode

ASCII and Unicode are both character encoding standards, but Unicode extends ASCII to support all languages and symbols.

Feature ASCII Unicode (UTF-8)
Bit Size 7-bit (0-127) Variable (8, 16, 32-bit)
Character Count 128 characters Over 143,000 characters
Language Support English only Multilingual (all scripts)
File Size Smaller Larger for non-ASCII text
Encoding Types Fixed-width (7-bit) UTF-8, UTF-16, UTF-32
| Character | ASCII (Binary) | ASCII (Hex) | UTF-8 (Hex)   |
|-----------|---------------|-------------|---------------|
| A         | 01000001      | 0x41        | 0x41          |
| โ‚ฌ         | N/A           | N/A         | 0xE2 0x82 0xAC |
| ๐Ÿ˜Š       | N/A           | N/A         | 0xF0 0x9F 0x98 0x8A |

ASCII Characteristics

  • 7-bit encoding (values 0-127), stored in 8-bit bytes with the MSB set to 0.
  • Supports basic English characters, digits, punctuation, and control characters.
  • Efficient for text storage but lacks support for non-English languages.

Python Example:

# ASCII encoding
ascii_text = "Hello, ASCII!".encode("ascii")
print(ascii_text)  # Output: b'Hello, ASCII!'

# Decoding ASCII back to string
decoded_text = ascii_text.decode("ascii")
print(decoded_text)  # Output: Hello, ASCII!

Unicode Characteristics

  • Supports all written languages, symbols, and emojis.
  • Variable-width encoding:
    • UTF-8: 1-4 bytes per character (backward-compatible with ASCII).
    • UTF-16: 2 or 4 bytes per character.
    • UTF-32: Fixed 4 bytes per character.
  • Unicode includes ASCII as a subset, ensuring compatibility.

Python Example:

# Unicode encoding (UTF-8)
unicode_text = "Hello, ไฝ ๅฅฝ, ๐Ÿ˜Š".encode("utf-8")
print(unicode_text)  # Output: b'Hello, \xe4\xbd\xa0\xe5\xa5\xbd, \xf0\x9f\x98\x8a'

# Decoding UTF-8 back to string
decoded_unicode = unicode_text.decode("utf-8")
print(decoded_unicode)  # Output: Hello, ไฝ ๅฅฝ, ๐Ÿ˜Š

Compatability

  • ASCII files remain valid in Unicode (UTF-8) since ASCII (0-127) maps directly to UTF-8.
  • Non-ASCII characters require multi-byte encoding in Unicode.

Python Example:

def is_ascii(text):
    try:
        text.encode("ascii")
        return True
    except UnicodeEncodeError:
        return False

print(is_ascii("Hello"))  # Output: True
print(is_ascii("ไฝ ๅฅฝ"))  # Output: False

Key Takeaways

  • ASCII (7-bit) is limited to English, whereas Unicode (UTF-8) supports all languages.
  • UTF-8 is the most widely used encoding, with ASCII as a subset.
  • ASCII is efficient for small text files, while Unicode is required for global applications.
  • Pythonโ€™s default encoding is UTF-8, ensuring full Unicode support.

ASCII Art

ASCII art is a technique that represents images, symbols, or designs using ASCII characters. It is commonly used in command-line tools, email signatures, retro computing, and visual effects.

Feature Description
Character Set Uses ASCII characters (32-126) for visual representation
Common Uses CLI applications, email signatures, banners, text-based UIs
Tools FIGlet, cowsay, pyfiglet, art Python module
Display Medium Terminals, web pages, text files
  /\_/\
 ( o.o )  
 > ^_^ <

Using pyfiglet for Styled ASCII Text

import pyfiglet

ascii_art = pyfiglet.figlet_format("ASCII Art")
print(ascii_art)

Output:

     _    _       _   
    / \  (_) __ _| |_ 
   / _ \ | |/ _` | __|
  / ___ \| | (_| | |_ 
 /_/   \_\_|\__,_|\__|

Using art Library for Predefined ASCII Art

from art import text2art

ascii_text = text2art("Python", font="block")
print(ascii_text)

Output:

 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ•šโ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•‘
 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•‘
 โ–ˆโ–ˆโ•”โ•โ•โ•โ• โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘
 โ–ˆโ–ˆโ•‘     โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•   โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘ โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•‘
 โ•šโ•โ•      โ•šโ•โ•โ•โ•โ•โ•    โ•šโ•โ•   โ•šโ•โ•  โ•šโ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•  โ•šโ•โ•โ•โ•

Generating Animated ASCII Art

ASCII animations can be created using \r and \033c for terminal clearing.

import time
frames = [
    "   (o_o)   ",
    "  (o_o)    ",
    " (o_o)     ",
    "  (o_o)    "
]

while True:
    for frame in frames:
        print("\033c", end="")  # Clear the screen
        print(frame)
        time.sleep(0.2)

ASCII Art in Cybersecurity (Steganography)

ASCII can be used to hide messages inside text-based images.

hidden_message = "SECRET"
ascii_art = """\
  /\_/\
 ( o.o )  
 > ^_^ <
"""

stego_art = "\n".join(f"{line} {hidden_message[i % len(hidden_message)]}" for i, line in enumerate(ascii_art.split("\n")))
print(stego_art)

ASCII in Cybersecurity

ASCII is widely used in cybersecurity for data encoding, obfuscation, exploits, payload delivery, and forensic analysis. Many security mechanisms rely on ASCII-based encoding (Base64, URL encoding, hexadecimal) to manipulate data in network attacks, malware, and encryption schemes.

Security Use Case Description
Encoding & Obfuscation ASCII-based encodings (Base64, Hex, URL encoding) hide data
Payload Injection ASCII characters are used in exploits and buffer overflows
Cryptography ASCII characters represent encrypted text and hashes
Log Analysis ASCII logs track system and network activities
Malware Analysis ASCII strings reveal hidden commands inside malware binaries

ASCII Encoding & Obfuscation in Cybersecurity

Base64 Encoding (Common in Malware & Data Exfiltration):

Attackers encode payloads in Base64 to evade detection in network traffic.

import base64

payload = "DROP TABLE users;"  # SQL Injection payload
encoded_payload = base64.b64encode(payload.encode("ascii")).decode("ascii")
print(encoded_payload)  # Output: RFJPUCBUQUJMRSB1c2Vyczs=

# Decoding Base64
decoded_payload = base64.b64decode(encoded_payload).decode("ascii")
print(decoded_payload)  # Output: DROP TABLE users;

Hexadecimal Encoding (Used in Exploits & Binary Obfuscation):

Malware uses hex encoding to disguise payloads in binary files.

# Convert ASCII to Hex
hex_payload = "DROP TABLE users;".encode("ascii").hex()
print(hex_payload)  # Output: 44524f50205441424c452075736572733b

# Convert Hex back to ASCII
decoded_hex = bytes.fromhex(hex_payload).decode("ascii")
print(decoded_hex)  # Output: DROP TABLE users;

URL Encoding (Used in Web Exploits & Phishing):

Hackers use URL encoding to bypass input validation in web applications.

import urllib.parse

url_payload = "admin' OR '1'='1"
encoded_url = urllib.parse.quote(url_payload)
print(encoded_url)  # Output: admin%27%20OR%20%271%27%3D%271

# Decode URL payload
decoded_url = urllib.parse.unquote(encoded_url)
print(decoded_url)  # Output: admin' OR '1'='1

ASCII-Based Exploits

ASCII Buffer Overflow (Injecting Shellcode):

Sending an excessively long ASCII input can overwrite memory and execute malicious code.

# Generating a simple buffer overflow string
payload = "A" * 100  # 100 'A' characters to overflow memory
print(payload)

ASCII Character Injection (SQL Injection):

Injecting ASCII-based SQL queries manipulates database logic.

user_input = "' OR 1=1 -- "
query = f"SELECT * FROM users WHERE username = '{user_input}'"
print(query)

Output:

SELECT * FROM users WHERE username = '' OR 1=1 -- '

ASCII in Malware Analysis

Extracting ASCII Strings from Malware:

Analysts extract ASCII strings from binaries to identify hidden commands.

import re

binary_data = b"\x50\x72\x69\x76\x69\x6C\x65\x67\x65\x20\x45\x73\x63\x61\x6C\x61\x74\x69\x6F\x6E"
ascii_strings = re.findall(b"[ -~]{4,}", binary_data)
print(ascii_strings)  # Output: [b'Privilege Escalation']

ASCII XOR Encryption (Basic Malware Encryption):

Malware uses XOR encryption to hide ASCII commands.

def xor_encrypt(text, key):
    return ''.join(chr(ord(c) ^ key) for c in text)

payload = "SensitiveData"
key = 42
encrypted = xor_encrypt(payload, key)
print(encrypted)  # Output: Encrypted ASCII

# Decrypt
decrypted = xor_encrypt(encrypted, key)
print(decrypted)  # Output: SensitiveData

ASCII in Log Analysis & Cyber Forensics

Converting ASCII Text to SHA-256 Hash:

Passwords are hashed using ASCII-based cryptographic algorithms.

import hashlib

message = "SecureData"
hashed = hashlib.sha256(message.encode("ascii")).hexdigest()
print(hashed)  # Output: 93e6086d4b8b25a1d9...

Key Takeaways

  • ASCII encoding (Base64, Hex, URL encoding) is widely used in cybersecurity for data obfuscation.
  • ASCII-based attacks (SQL Injection, Buffer Overflow) manipulate character input.
  • ASCII forensic analysis helps detect hidden malware commands.
  • Cryptographic hashing uses ASCII representations for secure data storage.
  • Malware leverages ASCII encoding to hide payloads in legitimate-looking files.
โš ๏ธ **GitHub.com Fallback** โš ๏ธ