9 ‐ REGREX - CloudScope/DevOpsWithCloudScope GitHub Wiki

Regex (regular expressions) in Linux is a powerful tool for searching, matching, and manipulating text. It is widely used in commands like grep, sed, awk, find, and many more. Here's an overview of the basics and some advanced concepts of regex in Linux:

Basic Regex Elements

  1. Literals: Characters that match themselves, e.g., a matches the character 'a'.

  2. Metacharacters: Special characters with specific meanings:

    • . : Matches any single character except a newline.
    • ^ : Anchors the pattern to the start of the line.
    • $ : Anchors the pattern to the end of the line.
    • \ : Escapes a metacharacter to treat it as a literal.
    • [] : Bracket expression matches any single character inside the brackets, e.g., [abc] matches 'a', 'b', or 'c'.
    • [^] : Matches any single character not inside the brackets, e.g., [^abc] matches any character except 'a', 'b', or 'c'.
  3. Quantifiers: Specify the number of occurrences.

    • * : Matches 0 or more occurrences of the preceding element.
    • + : Matches 1 or more occurrences of the preceding element.
    • ? : Matches 0 or 1 occurrence of the preceding element.
    • {n} : Matches exactly n occurrences.
    • {n,} : Matches n or more occurrences.
    • {n,m} : Matches between n and m occurrences.
  4. Grouping and Alternation:

    • () : Groups patterns together, e.g., (abc) treats 'abc' as a single unit.
    • | : Alternation, works like a logical OR, e.g., a|b matches 'a' or 'b'.

Advanced Regex Concepts

  1. Character Classes:

    • \d : Matches any digit (equivalent to [0-9]).
    • \D : Matches any non-digit.
    • \w : Matches any word character (alphanumeric + underscore).
    • \W : Matches any non-word character.
    • \s : Matches any whitespace character (space, tab, newline).
    • \S : Matches any non-whitespace character.
  2. Anchors and Boundaries:

    • \b : Matches a word boundary (position between a word and a non-word character).
    • \B : Matches a position that is not a word boundary.
  3. Lookahead and Lookbehind:

    • Lookahead: (?=...) ensures that the following text matches the expression inside the lookahead.
    • Negative Lookahead: (?!...) ensures that the following text does not match the expression inside.
    • Lookbehind: (?<=...) ensures that the preceding text matches the expression inside the lookbehind.
    • Negative Lookbehind: (?<!...) ensures that the preceding text does not match the expression inside.

Common Commands Using Regex in Linux

  1. grep:

    • Basic usage: grep 'pattern' file.txt
    • Recursive search: grep -r 'pattern' /path/to/directory
    • Use extended regex: grep -E 'pattern1|pattern2' file.txt
  2. sed:

    • Search and replace: sed 's/old/new/' file.txt
    • Use regex groups: sed 's/\(pattern1\)/replacement/' file.txt
  3. awk:

    • Pattern matching: awk '/pattern/ {print $0}' file.txt
    • Use regex in conditions: awk '$1 ~ /pattern/' file.txt
  4. find:

    • Find files with regex: find /path -regex '.*pattern.*'

Tips

  • Always be mindful of escaping special characters when needed.
  • Use extended regex with -E in commands like grep and sed for more complex patterns.
  • Test your regex with tools like regex101.com for validation and troubleshooting.