Regular Expressions - CameronAuler/python-devops GitHub Wiki
Regular expressions (regex) are used for pattern matching and text processing in Python. The re
module provides powerful tools for searching, extracting, and replacing text using regex patterns.
Table of Contents
- Regular Expressions (
re
Module) - Basic Pattern Matching
- String Manipulation (
re.split()
&re.sub()
) - Regex Patterns
re
Module
The re module allows working with regular expressions in Python.
re
Import import re
Basic Pattern Matching
The re.search()
, re.match()
, and re.findall()
functions are used for pattern matching.
re.search()
(Find First Match Anywhere)
re.search()
Searches anywhere in the string and returns a match object if found, otherwise None. It is mainly used for extracting specific patterns like phone numbers, emails, or dates.
import re
text = "Hello, my number is 123-456-7890."
match = re.search(r"\d{3}-\d{3}-\d{4}", text)
if match:
print("Phone number found:", match.group())
# Output:
Phone number found: 123-456-7890
re.match()
(Match Only at the Beginning)
re.match()
only matches if the pattern is at the start of the string. It is mainly used for for checking if a string starts with a specific pattern.
import re
text = "123-456-7890 is my number."
match = re.match(r"\d{3}-\d{3}-\d{4}", text)
if match:
print("Match found:", match.group())
# Output:
Match found: 123-456-7890
re.findall()
(Find All Matches)
re.findall()
returns all occurrences of the pattern as a list. It is mainly used for extracting all occurrences of an email pattern.
import re
text = "Emails: [email protected], [email protected]"
emails = re.findall(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", text)
print("Emails found:", emails)
# Output:
Emails found: ['[email protected]', '[email protected]']
re.split()
& re.sub()
)
String Manipulation (re.sub()
(Replacing Text)
re.sub(pattern, replacement, string)
replaces matches in the string. It is mainly used for masking sensitive information like phone numbers or emails.
import re
text = "My phone is 123-456-7890."
new_text = re.sub(r"\d{3}-\d{3}-\d{4}", "XXX-XXX-XXXX", text)
print(new_text)
# Output:
My phone is XXX-XXX-XXXX.
re.split()
(Splitting Strings)
re.split(pattern, string)
splits a string based on a regex pattern. It is mainly used for tokenizing text based on multiple delimiters.
import re
text = "apple, orange; banana | grape"
words = re.split(r"[,\s;|]+", text) # Splitting on commas, spaces, semicolons, or pipes
print(words)
# Output:
['apple', 'orange', 'banana', 'grape']
Regex Patterns
Pattern | Description | Example Match |
---|---|---|
\d |
Matches any digit (0-9 ) |
"123" → 1 , 2 , 3 |
\D |
Matches non-digits | "A1B2" → A , B |
\w |
Matches letters, digits, and _ |
"Hello_123" → Hello_123 |
\W |
Matches non-word characters | "Hello@123" → @ |
\s |
Matches whitespace (spaces, tabs) | "Hello World" → " " |
\S |
Matches non-whitespace | "Hello World" → "Hello", "World" |
^ |
Matches start of string | "Hello" → ^Hello |
$ |
Matches end of string | "world!" → world!$ |
. |
Matches any character except newline | "abc" → a , b , c |
* |
Matches 0 or more repetitions | "ab*" → "a", "ab", "abb" |
+ |
Matches 1 or more repetitions | "ab+" → "ab", "abb" |
? |
Matches 0 or 1 occurrence | "ab?" → "a", "ab" |
{n} |
Matches exactly n times | "\d{3}" → "123" |
{n,} |
Matches at least n times | "\d{2,}" → "12", "123" |
{n,m} |
Matches between n and m times | "\d{2,4}" → "12", "123", "1234" |
| |
OR operator | "cat|dog" → "cat" or "dog" |
() |
Groups patterns | "(ab)+" → "ab", "abab" |
\b |
Matches a word boundary | "\bword\b" → "word" (but not "wording" ) |
\B |
Matches non-word boundaries | "\Bing" → Matches "wording" but not "ing" |
\A |
Matches start of the string | "\AHello" → "Hello world" |
\Z |
Matches end of the string | "world\Z" → "Hello world" |
\G |
Matches position where last match ended | Used in iterative matching |