9 ‐ REGREX - CloudScope/DevOpsWithCloudScope GitHub Wiki

Regex (regular expressions) in Linux is a powerful tool for searching, matching, and manipulating text. It is widely used in commands like grep, sed, awk, find, and many more. Here's an overview of the basics and some advanced concepts of regex in Linux:

Basic Regex Elements

Literals: Characters that match themselves, e.g., a matches the character 'a'.
Metacharacters: Special characters with specific meanings:
- . : Matches any single character except a newline.
- ^ : Anchors the pattern to the start of the line.
- $ : Anchors the pattern to the end of the line.
- \ : Escapes a metacharacter to treat it as a literal.
- [] : Bracket expression matches any single character inside the brackets, e.g., [abc] matches 'a', 'b', or 'c'.
- [^] : Matches any single character not inside the brackets, e.g., [^abc] matches any character except 'a', 'b', or 'c'.
Quantifiers: Specify the number of occurrences.
- * : Matches 0 or more occurrences of the preceding element.
- + : Matches 1 or more occurrences of the preceding element.
- ? : Matches 0 or 1 occurrence of the preceding element.
- {n} : Matches exactly n occurrences.
- {n,} : Matches n or more occurrences.
- {n,m} : Matches between n and m occurrences.
Grouping and Alternation:
- () : Groups patterns together, e.g., (abc) treats 'abc' as a single unit.
- | : Alternation, works like a logical OR, e.g., a|b matches 'a' or 'b'.

Advanced Regex Concepts

Character Classes:
- \d : Matches any digit (equivalent to [0-9]).
- \D : Matches any non-digit.
- \w : Matches any word character (alphanumeric + underscore).
- \W : Matches any non-word character.
- \s : Matches any whitespace character (space, tab, newline).
- \S : Matches any non-whitespace character.
Anchors and Boundaries:
- \b : Matches a word boundary (position between a word and a non-word character).
- \B : Matches a position that is not a word boundary.
Lookahead and Lookbehind:
- Lookahead: (?=...) ensures that the following text matches the expression inside the lookahead.
- Negative Lookahead: (?!...) ensures that the following text does not match the expression inside.
- Lookbehind: (?<=...) ensures that the preceding text matches the expression inside the lookbehind.
- Negative Lookbehind: (?<!...) ensures that the preceding text does not match the expression inside.

Common Commands Using Regex in Linux

grep:
- Basic usage: grep 'pattern' file.txt
- Recursive search: grep -r 'pattern' /path/to/directory
- Use extended regex: grep -E 'pattern1|pattern2' file.txt
sed:
- Search and replace: sed 's/old/new/' file.txt
- Use regex groups: sed 's/$pattern1$/replacement/' file.txt
awk:
- Pattern matching: awk '/pattern/ {print $0}' file.txt
- Use regex in conditions: awk '$1 ~ /pattern/' file.txt
find:
- Find files with regex: find /path -regex '.*pattern.*'

Tips

Always be mindful of escaping special characters when needed.
Use extended regex with -E in commands like grep and sed for more complex patterns.
Test your regex with tools like regex101.com for validation and troubleshooting.