Just Say No to Regex - google/mug GitHub Wiki
String manipulation (matching, extraction, splitting, removals, replacements etc.) in Java traditionally resorts to two approaches.
For the simplest cases (such as taking the part before, after a delimiter, or between two delimiters), it takes a input.indexOf(myChar)
and then a input.substring(startIndex, endIndex)
call. Along the way some remember to check the index being -1
and some just feel lucky and not bother.
For anything more complex, there's regex.
But regex in Java is in a sad state:
- Its recursive backtracking implementation suffers worst-case exponential time complexity. See the StackOverflow outage.
- Regex patterns tend to be cryptic to read, especially in Java where you need to scape
\
, which then requires\\
for any regex escape.
Luckily, you don't really need regex as you may have thought!
In this page I'll try to give a few examples so hopefully you can see where I'm going.
Imagine you need to find the ChromeOS version from the device model number that looks like "Linux,CrOS,eve|x86_64,EVE D6B-A6B-C4C-F8N-P8A-A36|10863.0.0"
. In summary, the device model string is in the format of {OS}|{hardware}|{OS-version}
.
Being a regex wizard, you may come up with the regex pattern like "^\\w+,CrOS,[^|]+\\|[^|]+\\|([0-9\\.]+)"
. But it's not quite easy to read is it (at least to the regex muggles)?
Let's just say no to regex. Try the following:
int version = new StringFormat("{...},CrOS,{...}|{hardward}|{version}")
.parseOrThrow(deviceModel, (hardware, v) -> Integer.parseInt(v));
- The
{hardware}
,{version}
syntax are placeholders captured by the lambda. -
{...}
is a wildcard placeholder not captured by the lambda. - All other characters (
,
|
) are literal.
The code is intuitive to read. And StringFormat
does no backtracking.
Need to split around a pattern?
Substring.consecutive(Character::isSpace)
.repeatedly()
.split(...);
Need to replace some patterns?
Substring.between("<password>", "</password>")
.repeatedly()
.replaceAllFrom(input, pwd -> "***");
Want string substitution?
String template = "{who} is going to {where}";
Map<String, String> substitutions = Map.of(
"who", "Arya",
"where", "Braavos"
);
// Matches all {placeholder} syntaxes
Substring.RepeatingPattern placeholders =
Substring.word()
.immediatelyBetween("{", INCLUSIVE, "}", INCLUSIVE)
.repeatedly();
// Returns "Arya is going to Braavos"
String result = placeholders.replaceAllFrom(
template,
// Skip the braces to turn {who} to "who",
// then look up the map to get "Arya".
placeholder -> substitutions.get(placeholder.skip(1, 1).toString()));
Did we also talk about the simple cases where you may be used to using indexOf()
? Fiddling with indexes can be prone to off-by-one errors and unreadable code. Instead, consider using either StringForamt
like:
new StringFormat("'{quoted}'").scan(input, quoted -> quoted);
Or Substring
like:
Substring.between('\'', '\'').repeatedly().from(input);
Life will be easier without regexes, my friend.