3 Regular Expression - theoriginalvisagie/JavaScript-Algorithms-and-Data-Structures GitHub Wiki

What Are Regular Expressions?

Regular expressions, often shortened to "regex" or "regexp", are patterns that help programmers match, search, and replace text. Regular expressions are very powerful, but can be hard to read because they use special characters to make more complex, flexible matches.

JavaScript has multiple ways to use regexes. One way to test a regex is using the .test() method. The .test() method takes the regex, applies it to a string (which is placed inside the parentheses), and returns true or false if your pattern finds something or not.

Literal Matching:

If you want to find the word the in the string The dog chased the cat, you could use the following regular expression: /the/. Notice that quote marks are not required within the regular expression.

let testStr = "freeCodeCamp";
let testRegex = /Code/;
testRegex.test(testStr);

We you search for a word using .test(), it will look for strings that match the reges exactly. In other words, if we searched for "code" or "CODE" we would not get a match.

We can search for multiple patens at once by separating our regex with a pipe |.

let petString = "James has a pet cat.";
let petRegex = /dog|cat|bird|fish/; 
let result = petRegex.test(petString);

Flags:

Ignoring Case:

We can ignore case by using what we call flags. This is the i flag. We append it to the regex after the last forward slash as follows:

let petString = "James has a pet cat.";
let petRegex = /james/i; 
let result = petRegex.test(petString);

Return more than one match:

To find multiple matches in a string we use the flag g. We use it just as if we where ignoring the case of a string:

let testStr = "Repeat, Repeat, Repeat";
let ourRegex = /Repeat/g;
testStr.match(ourRegex);

We can combine multiple flags together.

let testStr = "Repeat, repeat, repeat";
let ourRegex = /Repeat/gi;
testStr.match(ourRegex);

The above block of code will search for all matches and ignore case.

Extracting Matches:

We can extract the match by using the .match() method.

let ourStr = "Regular expressions";
let ourRegex = /expressions/;
ourStr.match(ourRegex);

Wildcards:

Sometimes you won't (or don't need to) know the exact characters in your patterns. Thinking of all words that match, say, a misspelling would take a long time. Luckily, you can save time using the wildcard character: ..

We append the . character to start/end of our regex string, before the last forward slash.

let humStr = "I'll hum a song";
let hugStr = "Bear hug";
// Words that start with "hu".
let huRegex = /hu./;
huRegex.test(humStr);
huRegex.test(hugStr);

let exampleStr = "Let's have fun with regular expressions!";
// Words that end with "un".
let unRegex = /.un/;
let result = unRegex.test(exampleStr);

Multiple Possibilities:

You can search for a literal pattern with some flexibility with character classes. Character classes allow you to define a group of characters you wish to match by placing them inside square ([ and ]) brackets.

let bigStr = "big";
let bagStr = "bag";
let bugStr = "bug";
let bogStr = "bog";
// Here we are looking for words that start with "b" and end with "g", but contain one of the following, a,i,u.
let bgRegex = /b[aiu]g/;
bigStr.match(bgRegex);
bagStr.match(bgRegex);
bugStr.match(bgRegex);
bogStr.match(bgRegex);

If we are looking for multiple characters that follow one another within the alphabet, we can shorten our regex as follows:

let catStr = "cat";
let batStr = "bat";
let matStr = "mat";
// The below line looks for all characters from "a" to "e".
let bgRegex = /[a-e]at/;
catStr.match(bgRegex);
batStr.match(bgRegex);
matStr.match(bgRegex);

We can expand our regex further to include numbers:

let jennyStr = "Jenny8675309";
let myRegex = /[a-z0-9]/ig;
jennyStr.match(myRegex);

The above looks for characters from "a" to "z" and 0 to 9 ignoring case and looking for more than one matches.

Characters:

So far, you have created a set of characters that you want to match, but you could also create a set of characters that you do not want to match. These types of character sets are called negated character sets.

Match Single Characters That Aren't Specified:

To create a negated character set, you place a caret character (^) after the opening bracket and before the characters you do not want to match, i.e. /[^aeiou]gi.

let quoteSample = "3 blind mice.";
let myRegex = /[^aeiou0-9]/gi; // Matches all that aren't vowels or numbers.
let result = quoteSample.match(myRegex); 

Match Characters that Occur One or More Times:

Sometimes, you need to match a character (or group of characters) that appears one or more times in a row. This means it occurs at least once, and may be repeated.

You can use the + character to check if that is the case. Remember, the character or pattern has to be present consecutively. That is, the character has to repeat one after the other.

For example, /a+/g would find one match in abc and return ["a"]. Because of the +, it would also find a single match in "aabc" and return ["aa"].

let difficultSpelling = "Mississippi";
let myRegex = /s+/gi; 
let result = difficultSpelling.match(myRegex);

Match Characters that Occur Zero or More Times:

There's also an option that matches characters that occur zero or more times.

The character to do this is the asterisk or star: *.

let soccerWord = "gooooooooal!";
let gPhrase = "gut feeling";
let oPhrase = "over the moon";
let goRegex = /go*/;
soccerWord.match(goRegex);
gPhrase.match(goRegex);
oPhrase.match(goRegex);

In order, the three match calls would return the values ["goooooooo"], ["g"], and null.

Lazy Matching:

A lazy match finds the smallest possible part of the string that satisfies the regex pattern. You can use the ? character to change it to lazy matching. "titanic" matched against the regex of /t[a-z]*?i/ returns ["ti"].

let text = "<h1>Winter is coming</h1>";
let myRegex = /<.*?>/; 
let result = text.match(myRegex);

Patterns:

We can also look for patterns in specific positions in a string.

Match Beginning String Patterns:

We used the caret character ^ to find things that would not match, i.e. [^aeoui]. This time around if we remove the brackets and place the caret character at the start of the string we look for a match at the starting position of that string.

let firstString = "Ricky is first and can be found.";
let firstRegex = /^Ricky/;
firstRegex.test(firstString);

Match Ending String Patterns:

You can search the end of strings using the dollar sign character $ at the end of the regex.

let theEnding = "This is a never ending story";
let storyRegex = /story$/;
storyRegex.test(theEnding);

Letters And Numbers:

Using character classes, you were able to search for all letters of the alphabet with [a-z]. This kind of character class is common enough that there is a shortcut for it, although it includes a few extra characters as well.

Match All Letters and Numbers:

The closest character class in JavaScript to match the alphabet is \w. This shortcut is equal to [A-Za-z0-9_]. This character class matches upper and lowercase letters plus numbers. Note, this character class also includes the underscore character (_).

let longHand = /[A-Za-z0-9_]+/;
let shortHand = /\w+/;
let numbers = "42";
let varNames = "important_var";
longHand.test(numbers);
shortHand.test(numbers);
longHand.test(varNames);
shortHand.test(varNames);

Match Everything But Letters and Numbers:

A natural pattern you might want to search for is the opposite of alphanumerics.

You can search for the opposite of the \w with \W. Note, the opposite pattern uses a capital letter. This shortcut is the same as [^A-Za-z0-9_].

NB! Note that the W is a capital letter.

let shortHand = /\W/;
let numbers = "42%";
let sentence = "Coding!";
numbers.match(shortHand);
sentence.match(shortHand);

Match All Numbers:

The shortcut to look for digit characters is \d, with a lowercase d. This is equal to the character class [0-9], which looks for a single character of any number between zero and nine.

let movieName = "2001: A Space Odyssey";
let numRegex = /\d/g;
let result = movieName.match(numRegex).length;

Match All Non-Numbers:

You can also search for non-digits using a similar shortcut that uses an uppercase D instead.

The shortcut to look for non-digit characters is \D. This is equal to the character class [^0-9], which looks for a single character that is not a number between zero and nine.

let movieName = "2001: A Space Odyssey";
let noNumRegex = /\D/g; // Change this line
let result = movieName.match(noNumRegex).length;

Restriction:

Usernames are used everywhere on the internet. They are what give users a unique identity on their favorite sites.

You need to check all the usernames in a database. Here are some simple rules that users have to follow when creating their username.

  1. Usernames can only use alpha-numeric characters.
  2. The only numbers in the username have to be at the end. There can be zero or more of them at the end. Username cannot start with the number.
  3. Username letters can be lowercase and uppercase.
  4. Usernames have to be at least two characters long. A two-character username can only use alphabet letters as characters.
let username = "JackOfAllTrades";
let userCheck = /^[a-z][a-z]+\d*$|^[a-z]\d\d+$/i;
let result = userCheck.test(username);
console.log(result)

Whitespace:

You can also match the whitespace or spaces between letters.

Match Whitespace:

You can search for whitespace using \s, which is a lowercase s. This pattern not only matches whitespace, but also carriage return, tab, form feed, and new line characters.

let whiteSpace = "Whitespace. Whitespace everywhere!"
let spaceRegex = /\s/g;
whiteSpace.match(spaceRegex);

Match Non-Whitespace Characters:

Search for non-whitespace using \S, which is an uppercase s. This pattern will not match whitespace, carriage return, tab, form feed, and new line characters.

let whiteSpace = "Whitespace. Whitespace everywhere!"
let nonSpaceRegex = /\S/g;
whiteSpace.match(nonSpaceRegex).length;

Upper and Lower Number of Matches:

Specify Upper and Lower Number of Matches:

You can specify the lower and upper number of patterns with quantity specifiers. Quantity specifiers are used with curly brackets ({ and }). You put two numbers between the curly brackets - for the lower and upper number of patterns.

For example, to match only the letter a appearing between 3 and 5 times in the string ah, your regex would be ```/a{3,5}h/````.

let A4 = "aaaah";
let A2 = "aah";
let multipleA = /a{3,5}h/;
multipleA.test(A4);
multipleA.test(A2);

Specify Only the Lower Number of Matches:

To only specify the lower number of patterns, keep the first number followed by a comma.

For example, to match only the string hah with the letter a appearing at least 3 times, your regex would be /ha{3,}h/.

let A4 = "haaaah";
let A2 = "haah";
let A100 = "h" + "a".repeat(100) + "h";
let multipleA = /ha{3,}h/;
multipleA.test(A4);
multipleA.test(A2);
multipleA.test(A100);

Specify Exact Number of Matches:

To specify a certain number of patterns, just have that one number between the curly brackets.

For example, to match only the word hah with the letter a 3 times, your regex would be /ha{3}h/.

let A4 = "haaaah";
let A3 = "haaah";
let A100 = "h" + "a".repeat(100) + "h";
let multipleHA = /ha{3}h/;
multipleHA.test(A4);
multipleHA.test(A3);
multipleHA.test(A100);

Check for All or None:

Sometimes the patterns you want to search for may have parts of it that may or may not exist. However, it may be important to check for them nonetheless.

You can specify the possible existence of an element with a question mark, ?. This checks for zero or one of the preceding element.

let american = "color";
let british = "colour";
let rainbowRegex= /colou?r/;
rainbowRegex.test(american);
rainbowRegex.test(british);

Lookaheads:

Lookaheads are patterns that tell JavaScript to look-ahead in your string to check for patterns further along. This can be useful when you want to search for multiple patterns over the same string.

There are two kinds of lookaheads: positive lookahead and negative lookahead.

Positive Lookahead:

A positive lookahead will look to make sure the element in the search pattern is there, but won't actually match it. A positive lookahead is used as (?=...) where the ... is the required part that is not matched.

let quit = "qu";
let quRegex= /q(?=u)/;
quit.match(quRegex);

Negative Lookahead:

A negative lookahead will look to make sure the element in the search pattern is not there. A negative lookahead is used as (?!...) where the ... is the pattern that you do not want to be there. The rest of the pattern is returned if the negative lookahead part is not present.

let noquit = "qt";
let qRegex = /q(?!u)/;
noquit.match(qRegex);

Groups Of Characters:

Sometimes we want to check for groups of characters using a Regular Expression and to achieve that we use parentheses ().

If you want to find either Penguin or Pumpkin in a string, you can use the following Regular Expression: /P(engu|umpk)in/g.

let testStr = "Pumpkin";
let testRegex = /P(engu|umpk)in/;
testRegex.test(testStr);

Capture Groups:

Capture groups can be used to find repeated substrings.

Capture groups are constructed by enclosing the regex pattern to be captured in parentheses. In this case, the goal is to capture a word consisting of alphanumeric characters so the capture group will be \w+ enclosed by parentheses: /(\w+)/.

The substring matched by the group is saved to a temporary "variable", which can be accessed within the same regex using a backslash and the number of the capture group (e.g. \1). Capture groups are automatically numbered by the position of their opening parentheses (left to right), starting at 1.

let repeatRegex = /(\w+) \1 \1/;
repeatRegex.test(repeatStr);
repeatStr.match(repeatRegex); 

Search And Replace:

You can search and replace text in a string using .replace() on a string. The inputs for .replace() is first the regex pattern you want to search for. The second parameter is the string to replace the match or a function to do something.

let wrongText = "The sky is silver.";
let silverRegex = /silver/;
wrongText.replace(silverRegex, "blue");
⚠️ **GitHub.com Fallback** ⚠️