Regular Expressions - CameronAuler/python-devops GitHub Wiki

Regular expressions (regex) are used for pattern matching and text processing in Python. The re module provides powerful tools for searching, extracting, and replacing text using regex patterns.

Table of Contents

re Module

The re module allows working with regular expressions in Python.

Import re

import re

Basic Pattern Matching

The re.search(), re.match(), and re.findall() functions are used for pattern matching.

re.search() (Find First Match Anywhere)

re.search() Searches anywhere in the string and returns a match object if found, otherwise None. It is mainly used for extracting specific patterns like phone numbers, emails, or dates.

import re

text = "Hello, my number is 123-456-7890."
match = re.search(r"\d{3}-\d{3}-\d{4}", text)

if match:
    print("Phone number found:", match.group())
# Output:
Phone number found: 123-456-7890

re.match() (Match Only at the Beginning)

re.match() only matches if the pattern is at the start of the string. It is mainly used for for checking if a string starts with a specific pattern.

import re

text = "123-456-7890 is my number."
match = re.match(r"\d{3}-\d{3}-\d{4}", text)

if match:
    print("Match found:", match.group())
# Output:
Match found: 123-456-7890

re.findall() (Find All Matches)

re.findall() returns all occurrences of the pattern as a list. It is mainly used for extracting all occurrences of an email pattern.

import re

text = "Emails: [email protected], [email protected]"
emails = re.findall(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", text)

print("Emails found:", emails)
# Output:
Emails found: ['[email protected]', '[email protected]']

String Manipulation (re.split() & re.sub())

re.sub() (Replacing Text)

re.sub(pattern, replacement, string) replaces matches in the string. It is mainly used for masking sensitive information like phone numbers or emails.

import re

text = "My phone is 123-456-7890."
new_text = re.sub(r"\d{3}-\d{3}-\d{4}", "XXX-XXX-XXXX", text)

print(new_text)
# Output:
My phone is XXX-XXX-XXXX.

re.split() (Splitting Strings)

re.split(pattern, string) splits a string based on a regex pattern. It is mainly used for tokenizing text based on multiple delimiters.

import re

text = "apple, orange; banana | grape"
words = re.split(r"[,\s;|]+", text)  # Splitting on commas, spaces, semicolons, or pipes

print(words)
# Output:
['apple', 'orange', 'banana', 'grape']

Regex Patterns

Pattern Description Example Match
\d Matches any digit (0-9) "123"1, 2, 3
\D Matches non-digits "A1B2"A, B
\w Matches letters, digits, and _ "Hello_123"Hello_123
\W Matches non-word characters "Hello@123"@
\s Matches whitespace (spaces, tabs) "Hello World"" "
\S Matches non-whitespace "Hello World""Hello", "World"
^ Matches start of string "Hello"^Hello
$ Matches end of string "world!"world!$
. Matches any character except newline "abc"a, b, c
* Matches 0 or more repetitions "ab*""a", "ab", "abb"
+ Matches 1 or more repetitions "ab+""ab", "abb"
? Matches 0 or 1 occurrence "ab?""a", "ab"
{n} Matches exactly n times "\d{3}""123"
{n,} Matches at least n times "\d{2,}""12", "123"
{n,m} Matches between n and m times "\d{2,4}""12", "123", "1234"
| OR operator "cat|dog""cat" or "dog"
() Groups patterns "(ab)+""ab", "abab"
\b Matches a word boundary "\bword\b""word" (but not "wording")
\B Matches non-word boundaries "\Bing" → Matches "wording" but not "ing"
\A Matches start of the string "\AHello""Hello world"
\Z Matches end of the string "world\Z""Hello world"
\G Matches position where last match ended Used in iterative matching