regex cheat sheet - TeamFlowerPower/kb GitHub Wiki
Regex Cheat Sheet
Escape special characters with a prepending \
Greediness
Regexes are per se greedy; meaning as many as possible characters will be matched while still satisfying the regex pattern
Appending ? to quantifiers results in non-greediness
Basic Regex
Pattern
Description
.
Any character
^
Beginning of line
$
EOL
[a-c8]
Characters a, b, c OR 8
[^chars]
Any character except c, h, a, r, s
( )
Capture group
( a ( b ) c )
Nested capture group >> \1 = abc; \2 = b
( a )?
Optional capture group; Abc?a Matches Abc and Abca
Quantifier
Pattern
Description
*
0 or more
+
1 or more
{N}
N occurences
{N, M}
M to N occurences If omitted: N = 0; M = inf.
{N, M}?
M to Nas few as possible
?
0 OR 1
Extended Regex
Pattern
Description
\w
[a-zA-Z0-9_] (alphanumeric)
\W
[^a-zA-Z0-9_] (non-alphanumeric)
\d
[0-9] (digit)
\D
[^0-9] (non-digit)
\b
Empty string (@ word boundary (between \w and \W))
\B
Empty string (not at word boundary)
\s
[\t\n\r\f\v] (whitespace)
\S
[^\t\n\r\f\v] (non-whitespace)
\A
Beginning of string
\Z
End of string
\g<id>
Previously defined group
R|S
Regex R OR S
PyRegex Extensions
Pattern
Description
(?:...)
Non-capturing group (match but do not use)
(?\<name>A)
Define named group; A = Regex, <name> = callable name
(?P\<name>A)
Same as before; first does not always work
(?P...)
Match any named group
(?#...)
Comment (use for documentation)
(?=...)
Lookahead; matches without consuming
(?!...)
Negative lookahead
(?<=...)
Lookbehind; matches without consuming
(?<!...)
Negative lookbehind
(?(A)B|C)
'B' if A matched, else 'B'
You can easily combine multiple look(ahead|behind|...) as an AND since they are not consuming any characters. If you want to match product and development in any order but both must appear: ^(?=.*product)(?=.*development).*$.
Search & Replace / Reference a Group
Pattern
Description
\1, \2, ... \n
Backreference; Get match of n-th capturing group
Search & replace in some IDEs
You can even backreference capture groups in find and use them in replace.
In some IDEs backreferencing differs:
PyCharm: $n instead of \n
Notepad++: \n
Exemplary Basic Regex Workflow in Python
re.compile()
re.search()
match.groups() or match.group(<group_name>)
importre# "Normal" synthaxpattModuleSummary=re.compile(r"[0-9a-f]{8}") # Matches 8 chars long hex numbers# Find and print matchesforlineinlines:
match=re.search(pattModuleSummary, line)
# Check if we have at least one matchifmatch:
# Print matched groupsprint(match.groups())
Exemplary Extended Regex Workflow in Python
Comment + multiline synthax (ignores whitespaces and (python) comments):
importrepattModuleSummary=re.compile(r"""([0-9a-f]{8}) # Origin(?:\+{1})([0-9a-f]{8}) # Size""", re.X) # <-- re.X is important!!# Find and print matchesforlineinlines:
match=re.search(pattModuleSummary, line)
# Check if we have at least one matchifmatch:
# Print matched groupsprint(match.groups())
re.X is neccesary if you want to use the multiline re.compile synthax.
Exemplary Regex Workflow in Python with Named Capture Groups
importrepattern1=re.compile('^(?P<addr>[0-9a-f]{8,16})\+(?P<size>[0-9a-f]{8,})$')
match=pattern1.search(line)
match.group('addr') # References only the group `addr`