ASCII - CameronAuler/python-devops GitHub Wiki
- Introduction to ASCII
- ASCII Character Encoding
- ASCII Table
- Extended ASCII (8-bit)
- ASCII Control Characters (0โ31, 127)
- ASCII in Programming
- ASCII in Text Files and Communication
- ASCII vs. Unicode
- ASCII Art
- ASCII in Cybersecurity
ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding standard used in computers, communication systems, and text-based data processing. Each ASCII character is represented by a 7-bit binary number (values 0 to 127), allowing for a total of 128 distinct characters.
ASCII was developed in 1963 by the American National Standards Institute (ANSI) and the American Standards Association (ASA) to create a universal encoding system for text and control characters. It was derived from telegraph codes and designed for interoperability between different hardware and software systems.
The ASCII standard became widely adopted in computing, replacing earlier proprietary character sets. It was integrated into early operating systems such as UNIX and MS-DOS and remains foundational in modern character encoding.
7-bit encoding (values 0โ127) Character set categories: Control characters (0-31, 127) โ Non-printable commands (e.g., line feed, carriage return) Printable characters (32-126) โ Letters, numbers, punctuation, and symbols Standard byte representation: Stored as one byte with the most significant bit (MSB) set to 0 in 8-bit systems Compatible with modern encodings (UTF-8, Unicode)
ASCII serves as the basis for text representation in:
Programming (source code storage) Networking protocols (HTTP, SMTP, FTP headers) Command-line interfaces (Linux, Windows shells) Data transmission (ASCII-based file formats like .txt and .csv) ASCII remains fundamental in computing despite the evolution of Unicode, ensuring backward compatibility and efficient text representation in legacy systems.
ASCII uses a 7-bit encoding scheme, assigning a unique binary value (0000000 to 1111111) to each character. It supports 128 distinct characters (0-127) and is stored in a single byte (8-bit systems) with the most significant bit (MSB) set to 0.
Each ASCII character is represented as:
b6 b5 b4 b3 b2 b1 b0 (7-bit ASCII)
Where b6
is the most significant bit (MSB), and b0
is the least significant bit (LSB).
| Decimal | Binary | Hex | Category | Example Characters |
|---------|----------------|------|-----------------|--------------------|
| 0โ31 | 0000000โ0011111 | 0x00โ0x1F | Control Characters | NULL, LF, CR, ESC |
| 32โ47 | 0100000โ0101111 | 0x20โ0x2F | Punctuation | Space, `!`, `"`, `#`, `$` |
| 48โ57 | 0110000โ0111001 | 0x30โ0x39 | Digits (0-9) | `0`, `1`, `2` ... `9` |
| 58โ64 | 0111010โ1000000 | 0x3Aโ0x40 | Special Symbols | `:`, `;`, `<`, `=`, `>` |
| 65โ90 | 1000001โ1011010 | 0x41โ0x5A | Uppercase Letters | `A` - `Z` |
| 91โ96 | 1011011โ1100000 | 0x5Bโ0x60 | Special Symbols | `[`, `\`, `]`, `^`, `_` |
| 97โ122 | 1100001โ1111010 | 0x61โ0x7A | Lowercase Letters | `a` - `z` |
| 123โ126 | 1111011โ1111110 | 0x7Bโ0x7E | Special Symbols | `{`, `|`, `}`, `~` |
| 127 | 1111111 | 0x7F | Control Character | DEL |
Control characters are non-printable and used for text formatting, device control, and communication.
| ASCII Code | Binary | Name | Function |
|------------|--------|------|----------|
| 0 (0x00) | 0000000 | NULL | Terminates strings in C |
| 7 (0x07) | 0000111 | BEL | Triggers system beep |
| 8 (0x08) | 0001000 | BS | Backspace |
| 9 (0x09) | 0001001 | TAB | Horizontal tab |
| 10 (0x0A) | 0001010 | LF | Line feed (Newline) |
| 13 (0x0D) | 0001101 | CR | Carriage return |
| 27 (0x1B) | 0011011 | ESC | Escape sequence starter |
| 127 (0x7F) | 1111111 | DEL | Delete character |
Includes letters, digits, punctuation, and special symbols. These are directly visible in text representation.
| Decimal | Binary | Hex | Character |
|---------|--------|------|-----------|
| 32 | 0100000 | 0x20 | (Space) |
| 33 | 0100001 | 0x21 | ! |
| 34 | 0100010 | 0x22 | " |
| 35 | 0100011 | 0x23 | # |
| 36 | 0100100 | 0x24 | $ |
| ... | ... | ... | ... |
| 65 | 1000001 | 0x41 | A |
| 66 | 1000010 | 0x42 | B |
| 67 | 1000011 | 0x43 | C |
| ... | ... | ... | ... |
| 97 | 1100001 | 0x61 | a |
| 98 | 1100010 | 0x62 | b |
| 99 | 1100011 | 0x63 | c |
| ... | ... | ... | ... |
| 122 | 1111010 | 0x7A | z |
| 123 | 1111011 | 0x7B | { |
| 124 | 1111100 | 0x7C | | |
| 125 | 1111101 | 0x7D | } |
| 126 | 1111110 | 0x7E | ~ |
- Stored as 7-bit values in legacy systems.
- In 8-bit systems, ASCII characters are stored with the 8th bit (MSB) set to 0.
- In modern Unicode (UTF-8), ASCII values remain unchanged for backward compatibility.
# Convert character to ASCII value
print(ord('A')) # Output: 65
# Convert ASCII value to character
print(chr(65)) # Output: 'A'
The following table contains the full ASCII character set (0โ127), including control characters, printable characters, and their decimal, binary, hexadecimal, and character representations.
| Dec | Bin | Hex | Char | Description |
|-----|----------|------|------|-----------------------|
| 0 | 0000000 | 0x00 | NUL | Null |
| 1 | 0000001 | 0x01 | SOH | Start of Heading |
| 2 | 0000010 | 0x02 | STX | Start of Text |
| 3 | 0000011 | 0x03 | ETX | End of Text |
| 4 | 0000100 | 0x04 | EOT | End of Transmission |
| 5 | 0000101 | 0x05 | ENQ | Enquiry |
| 6 | 0000110 | 0x06 | ACK | Acknowledge |
| 7 | 0000111 | 0x07 | BEL | Bell (Beep) |
| 8 | 0001000 | 0x08 | BS | Backspace |
| 9 | 0001001 | 0x09 | TAB | Horizontal Tab |
| 10 | 0001010 | 0x0A | LF | Line Feed (Newline) |
| 11 | 0001011 | 0x0B | VT | Vertical Tab |
| 12 | 0001100 | 0x0C | FF | Form Feed |
| 13 | 0001101 | 0x0D | CR | Carriage Return |
| 14 | 0001110 | 0x0E | SO | Shift Out |
| 15 | 0001111 | 0x0F | SI | Shift In |
| 16 | 0010000 | 0x10 | DLE | Data Link Escape |
| 17 | 0010001 | 0x11 | DC1 | Device Control 1 |
| 18 | 0010010 | 0x12 | DC2 | Device Control 2 |
| 19 | 0010011 | 0x13 | DC3 | Device Control 3 |
| 20 | 0010100 | 0x14 | DC4 | Device Control 4 |
| 21 | 0010101 | 0x15 | NAK | Negative Acknowledge |
| 22 | 0010110 | 0x16 | SYN | Synchronous Idle |
| 23 | 0010111 | 0x17 | ETB | End of Transmission Block |
| 24 | 0011000 | 0x18 | CAN | Cancel |
| 25 | 0011001 | 0x19 | EM | End of Medium |
| 26 | 0011010 | 0x1A | SUB | Substitute |
| 27 | 0011011 | 0x1B | ESC | Escape |
| 28 | 0011100 | 0x1C | FS | File Separator |
| 29 | 0011101 | 0x1D | GS | Group Separator |
| 30 | 0011110 | 0x1E | RS | Record Separator |
| 31 | 0011111 | 0x1F | US | Unit Separator |
| 32 | 0100000 | 0x20 | (space) | Space |
| 33 | 0100001 | 0x21 | ! | Exclamation mark |
| 34 | 0100010 | 0x22 | " | Double quote |
| 35 | 0100011 | 0x23 | # | Hash |
| 36 | 0100100 | 0x24 | $ | Dollar sign |
| 37 | 0100101 | 0x25 | % | Percent |
| 38 | 0100110 | 0x26 | & | Ampersand |
| 39 | 0100111 | 0x27 | ' | Apostrophe |
| 40 | 0101000 | 0x28 | ( | Left parenthesis |
| 41 | 0101001 | 0x29 | ) | Right parenthesis |
| 42 | 0101010 | 0x2A | * | Asterisk |
| 43 | 0101011 | 0x2B | + | Plus sign |
| 44 | 0101100 | 0x2C | , | Comma |
| 45 | 0101101 | 0x2D | - | Minus sign |
| 46 | 0101110 | 0x2E | . | Period (Dot) |
| 47 | 0101111 | 0x2F | / | Slash |
| 48 | 0110000 | 0x30 | 0 | Digit 0 |
| 49 | 0110001 | 0x31 | 1 | Digit 1 |
| 50 | 0110010 | 0x32 | 2 | Digit 2 |
| ... | ... | ... | ... | ... |
| 65 | 1000001 | 0x41 | A | Uppercase A |
| 66 | 1000010 | 0x42 | B | Uppercase B |
| 67 | 1000011 | 0x43 | C | Uppercase C |
| ... | ... | ... | ... | ... |
| 97 | 1100001 | 0x61 | a | Lowercase a |
| 98 | 1100010 | 0x62 | b | Lowercase b |
| 99 | 1100011 | 0x63 | c | Lowercase c |
| ... | ... | ... | ... | ... |
| 122 | 1111010 | 0x7A | z | Lowercase z |
| 123 | 1111011 | 0x7B | { | Left Brace |
| 124 | 1111100 | 0x7C | | | Vertical Bar |
| 125 | 1111101 | 0x7D | } | Right Brace |
| 126 | 1111110 | 0x7E | ~ | Tilde |
| 127 | 1111111 | 0x7F | DEL | Delete |
- Control characters (0-31, 127) are used for text formatting, device control, and communication.
- Printable characters (32-126) include letters, digits, punctuation, and symbols.
- ASCII remains unchanged in UTF-8 for backward compatibility.
# Convert a character to ASCII
print(ord('A')) # Output: 65
# Convert ASCII to a character
print(chr(65)) # Output: 'A'
Extended ASCII is an 8-bit character encoding system that expands the original 7-bit ASCII (0-127) by adding an additional 128 characters (128-255). This allows for 256 unique characters, incorporating accented letters, symbols, and graphical characters used in different languages and operating systems.
- 8-bit encoding (0-255)
- Backwards compatible with standard ASCII (0-127 remain unchanged)
- Different variations exist based on specific character needs (e.g., ISO-8859-1, Windows-1252)
- Supports international characters (French, German, Spanish, etc.)
| Decimal | Binary | Hex | Character | Description |
|---------|-----------|------|-----------|----------------------------------|
| 128 | 10000000 | 0x80 | ร | Latin Capital Letter C with Cedilla |
| 129 | 10000001 | 0x81 | รผ | Latin Small Letter U with Diaeresis |
| 130 | 10000010 | 0x82 | รฉ | Latin Small Letter E with Acute |
| 131 | 10000011 | 0x83 | รข | Latin Small Letter A with Circumflex |
| 132 | 10000100 | 0x84 | รค | Latin Small Letter A with Diaeresis |
| 133 | 10000101 | 0x85 | ร | Latin Small Letter A with Grave |
| 134 | 10000110 | 0x86 | รฅ | Latin Small Letter A with Ring Above |
| 135 | 10000111 | 0x87 | รง | Latin Small Letter C with Cedilla |
| 136 | 10001000 | 0x88 | รช | Latin Small Letter E with Circumflex |
| 137 | 10001001 | 0x89 | รซ | Latin Small Letter E with Diaeresis |
| 138 | 10001010 | 0x8A | รจ | Latin Small Letter E with Grave |
| 139 | 10001011 | 0x8B | รฏ | Latin Small Letter I with Diaeresis |
| 140 | 10001100 | 0x8C | รฎ | Latin Small Letter I with Circumflex |
| 141 | 10001101 | 0x8D | รฌ | Latin Small Letter I with Grave |
| 142 | 10001110 | 0x8E | ร | Latin Capital Letter A with Diaeresis |
| 143 | 10001111 | 0x8F | ร
| Latin Capital Letter A with Ring Above |
| ... | ... | ... | ... | ... |
| 176 | 10110000 | 0xB0 | ยฐ | Degree Symbol |
| 177 | 10110001 | 0xB1 | ยฑ | Plus-Minus Symbol |
| 178 | 10110010 | 0xB2 | ยฒ | Superscript 2 |
| 179 | 10110011 | 0xB3 | ยณ | Superscript 3 |
| 180 | 10110100 | 0xB4 | ยด | Acute Accent |
| ... | ... | ... | ... | ... |
| 224 | 11100000 | 0xE0 | ร | Latin Small Letter A with Grave |
| 225 | 11100001 | 0xE1 | รก | Latin Small Letter A with Acute |
| 226 | 11100010 | 0xE2 | รข | Latin Small Letter A with Circumflex |
| 227 | 11100011 | 0xE3 | รฃ | Latin Small Letter A with Tilde |
| 228 | 11100100 | 0xE4 | รค | Latin Small Letter A with Diaeresis |
| 229 | 11100101 | 0xE5 | รฅ | Latin Small Letter A with Ring Above |
| 230 | 11100110 | 0xE6 | รฆ | Latin Small Letter AE |
| 231 | 11100111 | 0xE7 | รง | Latin Small Letter C with Cedilla |
| ... | ... | ... | ... | ... |
| 255 | 11111111 | 0xFF | รฟ | Latin Small Letter Y with Diaeresis |
Because ASCII was originally only 7-bit, different 8-bit extended ASCII encodings were created. Some common variations include:
- Used in Western European languages
- Standard in older Unix and Windows systems
- Supports accented characters (รฉ, รฑ, รธ, etc.)
- Microsoft extension of ISO-8859-1
- Includes additional symbols like โฌ (Euro sign)
- Used in legacy Windows applications
- Original IBM PC character set
- Includes box-drawing characters and symbols
- Replacement for ISO-8859-1
- Includes the Euro (โฌ) sign and corrected French/Finish characters
- Legacy text encoding stored characters as single 8-bit bytes.
- Modern systems use Unicode (UTF-8, UTF-16, UTF-32) but remain backward-compatible with ASCII.
- Extended ASCII is NOT standardizedโvalues above 127 vary by system and locale.
# Encoding and decoding extended ASCII characters
char = 'รฉ'
ascii_code = ord(char)
print(ascii_code) # Output: 233 (ISO-8859-1)
# Convert ASCII code back to character
print(chr(233)) # Output: 'รฉ'
- Extended ASCII expands original ASCII to 8-bit (256 characters).
- Different encoding systems define values above 127 differently.
- Not all systems use the same extended ASCII characters.
- Modern Unicode (UTF-8) supersedes extended ASCII but maintains compatibility.
Control characters are non-printable ASCII characters (0โ31, 127) used for text formatting, device communication, and control signaling. They were originally designed for teletypes, terminals, and network protocols.
- 7-bit encoding (0โ31, 127)
- Non-printable characters
- Used for cursor movement, device control, and text formatting
- Common in command-line interfaces, networking, and serial communication
| Dec | Bin | Hex | Abbr | Name | Function | Python Escape Sequence |
|------|----------|------|------|-----------------------------|------------------------------------------|------------------------|
| 0 | 0000000 | 0x00 | NUL | Null | String terminator in C, no effect in text | `\x00` |
| 1 | 0000001 | 0x01 | SOH | Start of Heading | Marks the start of a message header | `\x01` |
| 2 | 0000010 | 0x02 | STX | Start of Text | Marks the start of the message body | `\x02` |
| 3 | 0000011 | 0x03 | ETX | End of Text | Indicates end of a text transmission | `\x03` |
| 4 | 0000100 | 0x04 | EOT | End of Transmission | Terminates a transmission session | `\x04` |
| 5 | 0000101 | 0x05 | ENQ | Enquiry | Requests a response from the receiver | `\x05` |
| 6 | 0000110 | 0x06 | ACK | Acknowledge | Confirms successful reception | `\x06` |
| 7 | 0000111 | 0x07 | BEL | Bell (Alert) | Triggers an audible beep | `\a` or `\x07` |
| 8 | 0001000 | 0x08 | BS | Backspace | Moves cursor one position back | `\b` or `\x08` |
| 9 | 0001001 | 0x09 | TAB | Horizontal Tab | Moves cursor to the next tab stop | `\t` or `\x09` |
| 10 | 0001010 | 0x0A | LF | Line Feed (Newline) | Moves cursor to the next line | `\n` or `\x0A` |
| 11 | 0001011 | 0x0B | VT | Vertical Tab | Moves cursor vertically | `\v` or `\x0B` |
| 12 | 0001100 | 0x0C | FF | Form Feed | Advances paper to a new page (printers) | `\f` or `\x0C` |
| 13 | 0001101 | 0x0D | CR | Carriage Return | Moves cursor to the start of the line | `\r` or `\x0D` |
| 14 | 0001110 | 0x0E | SO | Shift Out | Switches to alternate character set | `\x0E` |
| 15 | 0001111 | 0x0F | SI | Shift In | Switches back to default character set | `\x0F` |
| 16 | 0010000 | 0x10 | DLE | Data Link Escape | Marks start of a control sequence | `\x10` |
| 17 | 0010001 | 0x11 | DC1 | Device Control 1 (XON) | Resumes paused transmission | `\x11` |
| 18 | 0010010 | 0x12 | DC2 | Device Control 2 | User-defined device control | `\x12` |
| 19 | 0010011 | 0x13 | DC3 | Device Control 3 (XOFF) | Pauses transmission | `\x13` |
| 20 | 0010100 | 0x14 | DC4 | Device Control 4 | User-defined device control | `\x14` |
| 21 | 0010101 | 0x15 | NAK | Negative Acknowledge | Signals an error or failed reception | `\x15` |
| 22 | 0010110 | 0x16 | SYN | Synchronous Idle | Synchronization signal for transmission | `\x16` |
| 23 | 0010111 | 0x17 | ETB | End of Transmission Block | Signals end of a data block | `\x17` |
| 24 | 0011000 | 0x18 | CAN | Cancel | Cancels previous command | `\x18` |
| 25 | 0011001 | 0x19 | EM | End of Medium | Marks end of a storage medium | `\x19` |
| 26 | 0011010 | 0x1A | SUB | Substitute | Used as a placeholder for corrupt data | `\x1A` |
| 27 | 0011011 | 0x1B | ESC | Escape | Starts an escape sequence (ANSI codes) | `\x1B` |
| 28 | 0011100 | 0x1C | FS | File Separator | Separates data within a file | `\x1C` |
| 29 | 0011101 | 0x1D | GS | Group Separator | Separates groups of data | `\x1D` |
| 30 | 0011110 | 0x1E | RS | Record Separator | Separates records in a database | `\x1E` |
| 31 | 0011111 | 0x1F | US | Unit Separator | Separates units of data | `\x1F` |
| 127 | 1111111 | 0x7F | DEL | Delete | Erases previous character (historical) | `\x7F` |
print("Hello\bWorld!") # Uses backspace (\b) to remove 'o'
print("Column1\tColumn2") # Uses tab (\t) for spacing
print("New\nLine") # Uses newline (\n) for line break
print("\033[1;31mRed Text\033[0m") # Uses escape (\x1B) for terminal color formatting
- Control characters (0โ31, 127) are non-printable ASCII characters used for device control, formatting, and communication.
- Python provides escape sequences for common control characters.
- ANSI escape sequences (ESC, \x1B) are widely used for terminal manipulation.
- Many control characters originated from teletype machines but are still relevant in networking and text processing.
ASCII is fundamental in programming for text encoding, data transmission, and character manipulation. It ensures cross-platform compatibility and serves as the basis for modern encodings like UTF-8.
- String Manipulation โ Converting between characters and their ASCII values.
- Text Encoding & Decoding โ Handling different character sets.
- Data Transmission & Protocols โ Used in network communication (HTTP, SMTP).
- Terminal Control โ ANSI escape codes for color, cursor movement, and formatting.
- File Handling โ Processing text-based file formats (CSV, TXT, JSON).
- Low-Level Operations โ Character arithmetic, bitwise operations.
Python provides built-in functions for converting between characters and ASCII values.
# Convert character to ASCII
print(ord('A')) # Output: 65
# Convert ASCII value to character
print(chr(65)) # Output: 'A'
# Convert lowercase to uppercase using ASCII arithmetic
print(chr(ord('a') - 32)) # Output: 'A'
# Convert uppercase to lowercase
print(chr(ord('G') + 32)) # Output: 'g'
# Convert entire string to ASCII values
ascii_values = [ord(char) for char in "Python"]
print(ascii_values) # Output: [80, 121, 116, 104, 111, 110]
# Convert ASCII values back to string
char_string = ''.join(chr(num) for num in ascii_values)
print(char_string) # Output: Python
text = "Hello, World!"
# Convert to uppercase using ASCII
uppercase_text = ''.join(chr(ord(char) - 32) if 'a' <= char <= 'z' else char for char in text)
print(uppercase_text) # Output: HELLO, WORLD!
# Convert to lowercase using ASCII
lowercase_text = ''.join(chr(ord(char) + 32) if 'A' <= char <= 'Z' else char for char in text)
print(lowercase_text) # Output: hello, world!
text = "Python3 is cool! ๐"
# Keep only ASCII characters
ascii_only = ''.join(char for char in text if ord(char) < 128)
print(ascii_only) # Output: Python3 is cool!
import socket
host = "example.com"
port = 80
# ASCII-encoded HTTP request
request = "GET / HTTP/1.1\r\nHost: example.com\r\n\r\n"
# Open a socket connection
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, port))
s.sendall(request.encode('ascii')) # Send as ASCII
response = s.recv(1024).decode('ascii') # Receive and decode as ASCII
print(response)
# Print colored text using ASCII Escape Sequences
print("\033[1;31mThis is red text\033[0m")
print("\033[1;32mThis is green text\033[0m")
print("\033[1;33mThis is yellow text\033[0m")
print("\033[1;34mThis is blue text\033[0m")
# Clearing the screen using ASCII ESC sequence
print("\033c") # Resets the terminal
import time
print("This will be overwritten", end="\r")
time.sleep(2) # Wait for 2 seconds
print("New text after overwrite")
# Writing an ASCII text file
with open("ascii_example.txt", "w", encoding="ascii") as file:
file.write("Hello, ASCII!\nThis is a test file.")
# Reading an ASCII text file
with open("ascii_example.txt", "r", encoding="ascii") as file:
content = file.read()
print(content) # Output: Hello, ASCII!\nThis is a test file.
# Check if a file contains non-ASCII characters
with open("ascii_example.txt", "r", encoding="utf-8") as file:
content = file.read()
contains_non_ascii = any(ord(char) > 127 for char in content)
print("Contains non-ASCII characters:", contains_non_ascii)
import base64
# Encode a string in Base64 (ASCII-safe)
text = "Hello, ASCII!"
encoded_text = base64.b64encode(text.encode('ascii'))
print(encoded_text) # Output: b'SGVsbG8sIEFTQ0lJIQ=='
# Decode Base64
decoded_text = base64.b64decode(encoded_text).decode('ascii')
print(decoded_text) # Output: Hello, ASCII!
# Convert string to hex representation
hex_encoded = ' '.join(hex(ord(char)) for char in "ASCII")
print(hex_encoded) # Output: 0x41 0x53 0x43 0x49 0x49
# Convert hex back to string
hex_decoded = ''.join(chr(int(hex_code, 16)) for hex_code in hex_encoded.split())
print(hex_decoded) # Output: ASCII
def is_uppercase(char):
return not (ord(char) & 32) if 'A' <= char <= 'Z' else False
print(is_uppercase('A')) # Output: True
print(is_uppercase('a')) # Output: False
# Convert uppercase to lowercase using bitwise OR
lower = chr(ord('A') | 32)
print(lower) # Output: 'a'
# Convert lowercase to uppercase using bitwise AND
upper = chr(ord('a') & ~32)
print(upper) # Output: 'A'
# Toggle case using XOR
toggle_case = lambda char: chr(ord(char) ^ 32) if 'A' <= char <= 'Z' or 'a' <= char <= 'z' else char
print(toggle_case('a')) # Output: 'A'
print(toggle_case('Z')) # Output: 'z'
- Python provides built-in ASCII handling via ord(), chr(), and string operations.
- ASCII arithmetic allows case conversion and character manipulation efficiently.
- ANSI escape sequences enable advanced terminal formatting.
- ASCII-based networking ensures compatibility in HTTP, SMTP, and other protocols.
- Bitwise operations provide efficient ASCII character processing.
ASCII plays a crucial role in text files and communication protocols by ensuring consistent encoding, data transmission, and formatting. It is widely used in log files, configuration files, network protocols (HTTP, SMTP, FTP), and structured text formats (CSV, JSON, XML).
# Writing ASCII text to a file
with open("ascii_text.txt", "w", encoding="ascii") as file:
file.write("Hello, ASCII!\nThis is a text file.\nLine 3.")
# Reading an ASCII text file
with open("ascii_text.txt", "r", encoding="ascii") as file:
content = file.read()
print(content)
# Append new content to an existing ASCII file
with open("ascii_text.txt", "a", encoding="ascii") as file:
file.write("\nAppending new ASCII content.")
# Detect and list non-ASCII characters in a file
with open("ascii_text.txt", "r", encoding="utf-8") as file:
content = file.read()
non_ascii_chars = [char for char in content if ord(char) > 127]
print("Non-ASCII Characters Found:", non_ascii_chars if non_ascii_chars else "None")
# Remove all non-ASCII characters from text
def remove_non_ascii(text):
return ''.join(char for char in text if ord(char) < 128)
cleaned_text = remove_non_ascii(content)
print(cleaned_text)
import csv
# Writing ASCII data to a CSV file
data = [["Name", "Age"], ["Alice", "25"], ["Bob", "30"]]
with open("ascii_data.csv", "w", newline='', encoding="ascii") as file:
writer = csv.writer(file)
writer.writerows(data)
# Reading ASCII data from a CSV file
with open("ascii_data.csv", "r", encoding="ascii") as file:
reader = csv.reader(file)
for row in reader:
print(row)
import json
data = {"name": "Alice", "age": 25, "city": "New York"}
# Writing ASCII JSON file
with open("ascii_data.json", "w", encoding="ascii") as file:
json.dump(data, file, ensure_ascii=True)
# Reading ASCII JSON file
with open("ascii_data.json", "r", encoding="ascii") as file:
loaded_data = json.load(file)
print(loaded_data)
import socket
host = "example.com"
port = 80
# ASCII-encoded HTTP request
request = "GET / HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n"
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, port))
s.sendall(request.encode('ascii')) # Send as ASCII
response = s.recv(4096).decode('ascii') # Receive and decode as ASCII
print(response)
import smtplib
sender = "[email protected]"
receiver = "[email protected]"
# ASCII-encoded email message
message = """\
From: [email protected]
To: [email protected]
Subject: ASCII Test Email
Hello, this is an ASCII-encoded email message.
"""
# Sending email using ASCII SMTP
with smtplib.SMTP("smtp.example.com", 25) as server:
server.sendmail(sender, receiver, message.encode('ascii'))
import logging
# Configure logging to store messages in ASCII format
logging.basicConfig(filename="ascii_logs.txt", level=logging.INFO, encoding="ascii")
# Write ASCII log messages
logging.info("INFO: ASCII log entry recorded.")
logging.warning("WARNING: This is an ASCII-based warning.")
# Read ASCII log file and filter for errors
with open("ascii_logs.txt", "r", encoding="ascii") as file:
logs = file.readlines()
error_logs = [log for log in logs if "ERROR" in log]
print("Error Logs:", error_logs if error_logs else "No errors found.")
ASCII characters in URLs must be encoded to ensure safe transmission.
import urllib.parse
url = "https://example.com/search?q=Hello ASCII!"
encoded_url = urllib.parse.quote(url)
print(encoded_url) # Output: https%3A//example.com/search%3Fq%3DHello%20ASCII%21
decoded_url = urllib.parse.unquote(encoded_url)
print(decoded_url) # Output: https://example.com/search?q=Hello ASCII!
Many devices (e.g., Arduino, microcontrollers) communicate via ASCII-based serial protocols.
import serial
# Open serial connection
ser = serial.Serial('COM3', 9600, timeout=1)
# Send ASCII data
ser.write("Hello, device!\n".encode('ascii'))
# Read response
response = ser.readline().decode('ascii')
print(response)
# Close serial connection
ser.close()
- ASCII is the standard format for text file storage (TXT, CSV, JSON).
- Networking protocols (HTTP, SMTP, FTP) use ASCII encoding for communication.
- ASCII encoding ensures compatibility in log files, system messages, and URLs.
- Serial communication and hardware interfaces rely on ASCII-based protocols.
ASCII and Unicode are both character encoding standards, but Unicode extends ASCII to support all languages and symbols.
Feature | ASCII | Unicode (UTF-8) |
---|---|---|
Bit Size | 7-bit (0-127) | Variable (8, 16, 32-bit) |
Character Count | 128 characters | Over 143,000 characters |
Language Support | English only | Multilingual (all scripts) |
File Size | Smaller | Larger for non-ASCII text |
Encoding Types | Fixed-width (7-bit) | UTF-8, UTF-16, UTF-32 |
| Character | ASCII (Binary) | ASCII (Hex) | UTF-8 (Hex) |
|-----------|---------------|-------------|---------------|
| A | 01000001 | 0x41 | 0x41 |
| โฌ | N/A | N/A | 0xE2 0x82 0xAC |
| ๐ | N/A | N/A | 0xF0 0x9F 0x98 0x8A |
- 7-bit encoding (values 0-127), stored in 8-bit bytes with the MSB set to 0.
- Supports basic English characters, digits, punctuation, and control characters.
- Efficient for text storage but lacks support for non-English languages.
# ASCII encoding
ascii_text = "Hello, ASCII!".encode("ascii")
print(ascii_text) # Output: b'Hello, ASCII!'
# Decoding ASCII back to string
decoded_text = ascii_text.decode("ascii")
print(decoded_text) # Output: Hello, ASCII!
- Supports all written languages, symbols, and emojis.
- Variable-width encoding:
- UTF-8: 1-4 bytes per character (backward-compatible with ASCII).
- UTF-16: 2 or 4 bytes per character.
- UTF-32: Fixed 4 bytes per character.
- Unicode includes ASCII as a subset, ensuring compatibility.
# Unicode encoding (UTF-8)
unicode_text = "Hello, ไฝ ๅฅฝ, ๐".encode("utf-8")
print(unicode_text) # Output: b'Hello, \xe4\xbd\xa0\xe5\xa5\xbd, \xf0\x9f\x98\x8a'
# Decoding UTF-8 back to string
decoded_unicode = unicode_text.decode("utf-8")
print(decoded_unicode) # Output: Hello, ไฝ ๅฅฝ, ๐
- ASCII files remain valid in Unicode (UTF-8) since ASCII (0-127) maps directly to UTF-8.
- Non-ASCII characters require multi-byte encoding in Unicode.
def is_ascii(text):
try:
text.encode("ascii")
return True
except UnicodeEncodeError:
return False
print(is_ascii("Hello")) # Output: True
print(is_ascii("ไฝ ๅฅฝ")) # Output: False
- ASCII (7-bit) is limited to English, whereas Unicode (UTF-8) supports all languages.
- UTF-8 is the most widely used encoding, with ASCII as a subset.
- ASCII is efficient for small text files, while Unicode is required for global applications.
- Pythonโs default encoding is UTF-8, ensuring full Unicode support.
ASCII art is a technique that represents images, symbols, or designs using ASCII characters. It is commonly used in command-line tools, email signatures, retro computing, and visual effects.
Feature | Description |
---|---|
Character Set | Uses ASCII characters (32-126) for visual representation |
Common Uses | CLI applications, email signatures, banners, text-based UIs |
Tools |
FIGlet , cowsay , pyfiglet , art Python module |
Display Medium | Terminals, web pages, text files |
/\_/\
( o.o )
> ^_^ <
import pyfiglet
ascii_art = pyfiglet.figlet_format("ASCII Art")
print(ascii_art)
_ _ _
/ \ (_) __ _| |_
/ _ \ | |/ _` | __|
/ ___ \| | (_| | |_
/_/ \_\_|\__,_|\__|
from art import text2art
ascii_text = text2art("Python", font="block")
print(ascii_text)
โโโโโโโ โโโ โโโโโโโโโโโโโโโ โโโ โโโโโโโ โโโโ โโโ
โโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโ โโโ โโโ โโโโโโโโโโโ โโโโโโโโโ โโโ
โโโโโโโ โโโ โโโ โโโ โโโโโโโโโโโ โโโโโโโโโโโโโ
โโโ โโโโโโโโโ โโโ โโโ โโโโโโโโโโโโโโโ โโโโโโ
โโโ โโโโโโโ โโโ โโโ โโโ โโโโโโโ โโโ โโโโโ
ASCII animations can be created using \r
and \033c
for terminal clearing.
import time
frames = [
" (o_o) ",
" (o_o) ",
" (o_o) ",
" (o_o) "
]
while True:
for frame in frames:
print("\033c", end="") # Clear the screen
print(frame)
time.sleep(0.2)
ASCII can be used to hide messages inside text-based images.
hidden_message = "SECRET"
ascii_art = """\
/\_/\
( o.o )
> ^_^ <
"""
stego_art = "\n".join(f"{line} {hidden_message[i % len(hidden_message)]}" for i, line in enumerate(ascii_art.split("\n")))
print(stego_art)
ASCII is widely used in cybersecurity for data encoding, obfuscation, exploits, payload delivery, and forensic analysis. Many security mechanisms rely on ASCII-based encoding (Base64, URL encoding, hexadecimal) to manipulate data in network attacks, malware, and encryption schemes.
Security Use Case | Description |
---|---|
Encoding & Obfuscation | ASCII-based encodings (Base64, Hex, URL encoding) hide data |
Payload Injection | ASCII characters are used in exploits and buffer overflows |
Cryptography | ASCII characters represent encrypted text and hashes |
Log Analysis | ASCII logs track system and network activities |
Malware Analysis | ASCII strings reveal hidden commands inside malware binaries |
Attackers encode payloads in Base64 to evade detection in network traffic.
import base64
payload = "DROP TABLE users;" # SQL Injection payload
encoded_payload = base64.b64encode(payload.encode("ascii")).decode("ascii")
print(encoded_payload) # Output: RFJPUCBUQUJMRSB1c2Vyczs=
# Decoding Base64
decoded_payload = base64.b64decode(encoded_payload).decode("ascii")
print(decoded_payload) # Output: DROP TABLE users;
Malware uses hex encoding to disguise payloads in binary files.
# Convert ASCII to Hex
hex_payload = "DROP TABLE users;".encode("ascii").hex()
print(hex_payload) # Output: 44524f50205441424c452075736572733b
# Convert Hex back to ASCII
decoded_hex = bytes.fromhex(hex_payload).decode("ascii")
print(decoded_hex) # Output: DROP TABLE users;
Hackers use URL encoding to bypass input validation in web applications.
import urllib.parse
url_payload = "admin' OR '1'='1"
encoded_url = urllib.parse.quote(url_payload)
print(encoded_url) # Output: admin%27%20OR%20%271%27%3D%271
# Decode URL payload
decoded_url = urllib.parse.unquote(encoded_url)
print(decoded_url) # Output: admin' OR '1'='1
Sending an excessively long ASCII input can overwrite memory and execute malicious code.
# Generating a simple buffer overflow string
payload = "A" * 100 # 100 'A' characters to overflow memory
print(payload)
Injecting ASCII-based SQL queries manipulates database logic.
user_input = "' OR 1=1 -- "
query = f"SELECT * FROM users WHERE username = '{user_input}'"
print(query)
Output:
SELECT * FROM users WHERE username = '' OR 1=1 -- '
Analysts extract ASCII strings from binaries to identify hidden commands.
import re
binary_data = b"\x50\x72\x69\x76\x69\x6C\x65\x67\x65\x20\x45\x73\x63\x61\x6C\x61\x74\x69\x6F\x6E"
ascii_strings = re.findall(b"[ -~]{4,}", binary_data)
print(ascii_strings) # Output: [b'Privilege Escalation']
Malware uses XOR encryption to hide ASCII commands.
def xor_encrypt(text, key):
return ''.join(chr(ord(c) ^ key) for c in text)
payload = "SensitiveData"
key = 42
encrypted = xor_encrypt(payload, key)
print(encrypted) # Output: Encrypted ASCII
# Decrypt
decrypted = xor_encrypt(encrypted, key)
print(decrypted) # Output: SensitiveData
Passwords are hashed using ASCII-based cryptographic algorithms.
import hashlib
message = "SecureData"
hashed = hashlib.sha256(message.encode("ascii")).hexdigest()
print(hashed) # Output: 93e6086d4b8b25a1d9...
- ASCII encoding (Base64, Hex, URL encoding) is widely used in cybersecurity for data obfuscation.
- ASCII-based attacks (SQL Injection, Buffer Overflow) manipulate character input.
- ASCII forensic analysis helps detect hidden malware commands.
- Cryptographic hashing uses ASCII representations for secure data storage.
- Malware leverages ASCII encoding to hide payloads in legitimate-looking files.