Regular Expressions - ThePix/QuestJS GitHub Wiki

Regular expressions are a powerful tool in most modern programming languages (and implemented almost the same across all languages, which is very unusual!), but are not straightforward.

A regular expression (regex for short) is a pattern that we can compare to a string. They look are similar to strings in that they are sequences of characters, but rather than quote marks at the start and end they have forward slashes. Here is a simple example:

/example/

This regex will match any string that contains that exact sequence of characters, so it will match the previous paragraph, but not this one. However, we can do much more than matching specific sequences.

Wildcards

You can match various wildcards. The simplest is the full stop (period), ., which will match any character. Thus, this regex will match "they" or "the " or indeed anywhere that "th" is followed by two more characters.

/th../

If you only want to match letters and numbers, use \w. This example will match "that" but not "the "

/th\w\w/

There are a number of wildcard options, but you can also set up your own by putting the options in square brackets. This example will match "tha" or "the" but not "thr".

/th[ea]/

You can use ranges inside square brackets. This example will match "tha", thm" and "thx", but not "thn".

/th[a-mx-z]/

Some or none

You can add a question mark to make a character optional - the pattern will match one or zero. Use a plus to match one of more, or an asterisk to match zero or more. The latter two can have a question match appended to make the "non-greedy".

Regex th the there
th\w? th the the
th\w+ - the there
th\w* th the there
th\w+? - the the
th\w*? th th th
th\w{3,6} - - there
th\w{3,} - - there
th\w{3} - - there

The first here matches "th" and optionally one more letter. The second matches "th", but then demands at least one more letter, so fails to match the first string. Furthermore, it is greedy so will grab all the letters it can. Compare to the fourth, which is non-greedy, so grabs the minimum it can.

The last three show how you can specify the number more precisely. Within these three, the first requires between 3 and 6 letters, the second at least 3 and the third exactly 3.

Anchors

Anchors do not match characters, instead they match boundaries.

Symbol Placement
^ Start of the string
$ End of the string
\b Word boundary (start or end)

Groups

Often it is convenient to group things. There may be a set number of specific sequences, and you are looking for one of the other. Sequences like this are grouped inside parenthesis, and separated by vertical bars. This example will match "put hat in box" and "put hat on box"

put .+ (on|in) .+

If the sequences overlap, the order is important. In this example "in the" is before "in". Either way the system will "put hat in the box", but the way it is done, "in the" was match to the group, and "box" was matched to ".+". If I had done it the other way around, it would have matched just "in" to the group, and "the box" to the last bit. This can be important!

put .+ (in the|on the|on|in) .+

Capture Groups

Often you want to grab what was matched, but you do not want the whole thing. Looking at the previous example, I want "hat" and "box". We can do that with capture groups - which is just the same as other groups really.

put (.+) (in the|on the|on|in) (.+)

This will give us three captured groups: "hat", "in the" and "box".

Non-capture groups

We do not really need that second onem, so we can flag it as not to be captured:

put (.+) (?:in the|on the|on|in) (.+)

More on Greedy and Non-Greedy

Generally this is not important, but occasionally it is. Consider these two example; the first capture group is greedy in the first, and non-greedy in the second.

/^ask (.+) (about|what|who|how|why|where|when) (.+)$/
/^ask (.+?) (about|what|who|how|why|where|when) (.+)$/

Suppose the user types:

ASK DOCTOR KYLE ABOUT HOW TO ESCAPE

The first example is greedy, so will grab all it can, whilst still getting a match, so we end up with "DOCTOR KYLE ABOUT" in the first capture group, "HOW" in the second and "TO ESCAPE" in the third.

The second regex, being non-greedy, takes the minimum it can, so now the first capture group is "DOCTOR KYLE", the second is "ABOUT" and the third is "HOW TO ESCAPE", which is probably what we want.

Which you need is a bit of a judgement call. For ASK/ABOUT and TELL/ABOUT it is more likely the joining words will appear in the topic than a character's name, so the second example is probably best (and is how Quest does it), but that will not always be the case.

Flags

Flags modify the behaviour of a regex, and are appended to the end, after the terminating slash. The two important ones are "i" to make it case insensitive, and "g" to make it global, i.e., to get multiple matches in one string. Neither are applicable to regexes used in commands, but could be in functions.

Functions

In Quest 6 the most common place to find a regex is a command or the regex attribute of an item, and you can let Quest handle what to do with them. However they can be useful in code, so how do you use them?

match

Use match on a string, with a regex as the parameter (or a string that will get converted to a regex). Will return an array if there is a match, or null if not. More here.

search

Use search on a string, with a regex as the parameter (or a string that will get converted to a regex). Will return the position of the first character in a match if there is a match, or -1 if not. More here.

exec

This is kind of the reverse of match in that it used on a regex, with a string as a parameter. Again will return an array if there is a match, or null if not. More here.

test

This is like exec, except it just tells you if there is a match or not, returning true or false accordingly. More here.

replace

Use search on a string, with a regex as the parameter. The first match will get replaced by the second parameter. If the regex is flagged as global, then every occurrence of the match will be replaced. More here.

Creating

Note that regexes can be created using new RegExp. This is useful if you want to create one dynamically.

const regex1 = /get hat/
const s = "hat"
const regex2 = new RegExp("get " + s)

Resources

This web page can only give an overview. If you want to play around with regexes, the Regex101 web site is excellent; you can experiment all you like, and there is a quick reference section too.

An alternative tutorial can be found here. A more technical reference can be found here.