Regular Expressions (REGEXP) - cunhapaulo/ReferenceCard GitHub Wiki

VS Code

Como substituir texto no VS Code usando regexp:

Descrição Pesquisar por Substituir por Explicação
Trocar aspas ASCII pelas aspas do LaTeX "([^"]*)" ``$1'' ([^"]*): Captura qualquer sequência de caracteres que não sejam aspas duplas.

Python

Interesting sites about REGEXP:

Código Python para usar regexp:


import re

def main():

    # creates an regexp operator with the proper pattern
    # ATTENTION: use an raw string begining with 'r'

    cepREGEXP = re.compile(r"((\d{2})[.]*(\d{3})-(\d{3}\b))")

    # captures the result of pattern matching

    result = cepREGEXP.search("Na minha cidade o CEP é 66.040-100 ou 66050-520")

    # uses patterns found

    print(f"1. First CEP found: {result.group()}")
    print(f"2. First CEP found: {result.groups()}")

    # Finds all ucurrences

    results = cepREGEXP.findall("Na minha cidade o CEP é 66.040-100 ou 66050-520")

    for cep in results:
        print(f"> CEP found: {cep}")

if __name__ == "__main__":
    # calls the main funtion
    main()

  • RESULT:
1.First CEP found: 66.040-100
2.First CEP found: ('66.040-100', '66', '040', '100')
> CEP found: ('66.040-100', '66', '040', '100')
> CEP found: ('66050-520', '66', '050', '520')

Modelos de Expressões Regulares

Item REGEXP Exemplo(s) Parte(s)
Endereço de e-mail ([\w\-.]+)@([\w\-]+\.\w+\.?\w*) [email protected] (paulo.cunha.doc), (gmail.com)
CEP brasileiro (\d{2})[.]*(\d{3})-(\d{3}\b) 66050-520 ou 66.050-520 (66050), (520) ou (66.050), (520)
Telefone brasileiro ([\(\d{2}\)]*) (\d{5})-(\d{4}\b) (91) 98113-5678 ou 98113-5678 ((91)), (98113), (5678) ou (empty) (98113), (5678)
Data dd/mm/yyyy \d{1,2}\/\d{1,2}\/\d{2,4} 31/12/2023 31/12/2023
Data dd/mm/yyyy (\d{1,2})\/(\d{1,2})\/(\d{2,4}) 31/12/2023 (31), (12), (2023)
Data yyyy-mm-dd (\d{4})-(0[1-9] or 1[0-2])-(0[1-9] or 12[0-9] or 3[01]) 2023-12-31 (2023), (12), (31)
Números Americanos ^-?\d+(,\d+)*(\.\d+(e\d+)?)?$ 3.14529, -255.34, 1.9e10, 123,340.00, 128
Funções Python def\s+(\w+)\((.*)\): def transverse_check(directory, pattern="*"):
Funções Python 2 def\s+(_*\w*)\((.*)\)
URIs ^(\w+)://([\w\-\.]+)(:(\d+))? figure below figure below
expressões 'str' '[\w+\s*]+' 'text' '([\w+\s*]+)'

URI´s reached by regexp above

Lookarounds

Type Expression Example Result
Positive Lookahead (?=) \d+(?=PM) Date: 4 Aug 3PM
Negative Lookahead (?!) \d+(?!PM) Date: 4 Aug 3PM
Positive Lookbehind (?<=) (?<=\$)\d+ Product Code: 1064 Price: $5
Negative Lookbehind (?<!) (?<!\$)\d+ Product Code: 1064 Price: $5

Symbols

Symbol Use
. any
? optional (0 or 1)
^ not
[a-zA-Z] range of character
{} repetition
{n} n-occurences
{m,n} from m to n occurences
{m,n}? from m to n occurences, the least
\ Escape character
^ begin of string
$ end of string
() capture group
* 0 or more
+ 1 or more

Classes of Characters

Shorthand character class Represents
\d Any numeric digit from 0 to 9.
\D Any character that is NOT a numeric digit from 0 to 9.
\w Any letter, numeric digit, or the underscore character. Think of this as matching “word” characters.)
\W Any character that is NOT a letter, numeric digit, or the underscore character.
\s Any space, tab, or newline character. (Think of this as matching “space” characters.)
\S Any character that is NOT a space, tab, or newline.

Special Sequences

item Description
\A start of string
\b matches empty string at word boundary (between \w and \W)
\B matches empty string NOT at word boundary
\d digit
\D non-digit
\s whitespace: [ \t\n\r\f\v]
\S non-whitespace
\w alphanumeric: [0-9a-zA-Z_]
\W non-alphanumeric
\Z end of string
\g<id> matches a previously defined group