Regular Expressions - patrickcole/learning GitHub Wiki

Regular Expressions

Basics

RegEx is comprised of:

/pattern/flags

Basic Pattern

// just check for the phrase 'hello':
const regex = /hello/;
console.log(regex.test(`hello world`));
// => true

To get matches in an array, use .exec():

const regex = /hello/;
const string = `hello world`;
const result = regex.exec(string);
console.log(result);

// => ['hello', index: 0, input: 'hello world', groups: undefined]

Using Flags

g: matches pattern multiple times
i: case insensitive
m: multi-line mode, ^ = start, $ = end of entire string; without adding this, multi-line strings match the beginning and end of each line
u: unicode
s: single-line; . also matches new line characters

console.log(/hello/ig.test(`HEllo`));
// => true;
// this also works:
console.log(new RegExp('hello', 'ig').test('HEllo'));
// => true;

Character Groups

Character Set

Matching anything that is enclosed in set:

const regex = /[hc]ello/;
console.log(regex.test('hello'));
// => true;
console.log(regex.test('cello'));
// => true;
console.log(regex.test('jello'));
// => false;

Negated Character Set

Matching anything that is not in set:

const regex = /[^hc]ello/;
console.log(regex.test('hello'));
// => false;
console.log(regex.test('cello'));
// => false;
console.log(regex.test('jello'));
// => true;

Ranges

const regex = /[a-z]ello/;
console.log(regex.test('hello'));
// => true;
console.log(regex.test('cello'));
// => true;
console.log(regex.test('jello'));
// => true;
console.log(regex.test('Hello'));
// => false; as set is all lowercase!

Combining Ranges

const regex = /[A-Z-0-9]/
console.log(regex.test('a'));
// => false;
console.log(regex.test('A'));
// => true;
console.log(regex.test('1'));
// => true;

Multiple Ranges

console.log(/^[A-Z]$/.test('A'));
// => true;
console.log(/^[A-Z]$/.test('AB'));
// => false;
console.log(/^[A-Z]$/.test('Ab'));
// => false;
console.log(/^[A-Z-0-9]$/.test('1'));
// => true;
console.log(/^[A-Z-0-9]$/.test('A1'));
// => false;

Meta Characters

\d - any digit 0-9
\D - any character NOT a digit
\w - any alphanumeric character and underscore
\W - any non-alphanumeric character including underscore
\s - any whitespace character (spaces, tabs, newlines and Unicode spaces)
\S - any non-whitespace character
\0 - null
\n - newline
\t - tab character
\uXXXX - unicode character with XXXX replacing the actual code number
. - any character that is not a newline character, unless use of the s is provided
[^] - matches any character including newline characters

Quantifiers

+ - matches preceding expression 1 or more times

const regex = /\d+/;
console.log(regex.test('1'));
// => true;
console.log(regex.test('1122'));
// => true;
console.log(regex.test('Abdd'));
// => false;

* - matches preceding expression 0 or more times:

const regex = /hi*d/;
console.log(regex.test('hd'));
// => true; because i can still be ommited
console.log(regex.test('hid'));
// => true;

? - matches preceding expression 0 or 1 time:

const regex = /hii?d/;
console.log(regex.test('hid'));
// => true; because second i is not provided, and that's ok due to rule (0 or 1);
console.log(regex.test('hiid'));
// => true; because second i is provided
console.log(regex.test('hiiid'));
// => false; one too many i characters

^ - matches the beginning of the string

const regex = /^h/;
console.log(regex.test('hi'));
// => true;
console.log(regex.test('bye'));
// => false;
console.log(regex.test('hello'));
// => true;

$ - matches the end of the string

const regex = /.com$/;
console.log(regex.test('[email protected]'));
// => true;
console.log(regex.test('test@test'));
// => false;
console.log(regex.test('[email protected]'));
// => true;
console.log(regex.test('.com'));
// => true;
console.log(regex.test('com'));
// => false;

{N} - matches exactly N occurrences

const regex = /hi{2}d/;
console.log(regex.test('hiid'));
// => true;
console.log(regex.test('hid'));
// => false;

{N,} - matches at least N occurrences preceeding

const regex = /hi{2,}d/;
console.log(regex.test('hiid'));
// => true;
console.log(regex.test('hiiid'));
// => true; because at least two i characters exist
console.log(regex.test('hiiiid'));
// => true;

{N,M} - matches at least N and no more than M amount when M > N

const regex = /hi{1,2}d/;
console.log(regex.test('hid'));
// => true;
console.log(regex.test('hiid'));
// => true;
console.log(regex.test('hiiid'));
// => false;

X|Y - matches either X or Y

const regex = /(red|green) apple/;
console.log(regex.test('red apple'));
// => true;
console.log(regex.test('green apple'));
// => true;
console.log(regex.test('delicious apple'));
// => false;

Special Characters

To use special characters or characters used in patterns, you'll need to escape them in the regex:

// this won't check for 'a+b';
const regex = /a+b/;
console.log(regex.test('a+b'));
// => false;
const updated = /a\+b/;
console.log(regex.test('a+b'));
// => true;

Examples

Match Any 10 Digit Number

const regex = /^\d{10}$/;
console.log(regex.test('39930'));
// => false;
console.log(regex.test('9294628302'));
// => true;

String Transformations

Capitalize First Letter of String

Pattern: /^\w/

/: begin RegEx
^: the beginning of the string
\w: matches any word character (alphanumeric & underscore)
/: end RegEx

let phrase = 'the quick green aligator...';
phrase.trim().replace(/^\w/, (char) => char.toUpperCase());
console.log(phrase);
// => "The quick green aligator..."

Capitalize First Letter of Each Word

Pattern: /\w\S*/g

/: begin RegEx
\w: matches any word character (alphanumeric & underscore)
\S: matches any character that is not a whitespace character (spaces, tabs or line breaks)
*: quantifier, match 0 or more of the preceding token
/: end RegEx
g: global search, search entire string

let phrase = 'the quick green alligator...';
phrase = 'the quick green alligator...';
phrase.replace(/\w\S*/g, (w) => (w.replace(/^\w/, (c) => c.toUpperCase())));
// => "The Quick Green Aligator..."

Once again, can use .trim() to remove leading spaces
Also can perform .toLowerCase() before capitalizing the first letter, if the text is mixed case (upper and lower)