Regular Expressions - pford68/groovy-examples GitHub Wiki

def result = 'abc' =~ /[a-z]+/
result.matches()  // TRUE

Creating a Regular Expression

Surround a string with forward slashes (/) to make it a pattern, just like in JavaScript.

Operators

  • ”~” - used before a string and it will cause the string to be compiled to a Pattern for later use

    // \b means word boundary, [A-Z] means any capital letter, + means one or more
    // so this matches any string of one or more capital letter with a word boundary (non-word character) on either side of it
    def shoutedWord = ~/\b[A-Z]+\b/   
    
  • ”=~” - Creates a Matcher out of the String on the left hand side and the Pattern on the right.

    def matcher = ("EUREKA" =~ shoutedWord)  
    assert matcher.matches()         // TRUE
    
    def numberMatcher = "1234" =~ /\d+/  
    assert numberMatcher.matches()   // TRUE
    
  • ”==~” - Returns a boolean that specifies if the full String matches the Pattern

    assert "1234" ==~ /\d+/    // TRUE
    assert "FOO2" ==~ /\d+/    // FALSE!!!
    

Strings

Groovy Strings have replace and replaceAll methods.

Collections

Groovy also makes significant additions to what you can do with Collections. In addition to each, collect, inject, etc, there is a regular expression aware iterator called grep that will pass each item in the Collection through a filter and return a subset of items that match the filter. We can use a regular expression as a filter:

// regular expression says 0 or more characters (".*") followed by the string "bar" that is at the end of the string ("$")
assert ["foobar", "bazbar"] == ["foobar", "bazbar", "barquux"].grep(~/.*bar$/)

You can achieve the same thing with findAll but it takes a little more typing:

assert ["foobar", "bazbar"] == ["foobar", "bazbar", "barquux"].findAll { it ==~ /.*bar$/ } 

Matchers

Again, using the =~ operator will return a Matcher object. The Groovy way to work with Matchers leverages collection iterators and the built-in closures that Groovy provides to them. Matcher supports the iterator() method and with that, gets everything else that any groovy List or Collection would have, including collect, inject, findAll, etc.

def paragraph = """
    Lorem ipsum dolor 12:30 AM sit amet, 
    consectetuer adipiscing 1:15 AM elit. 
    Nunc rutrum diam sagittis nisi 9:22 PM.
"""

def HOUR = /10|11|12|[0-9]/
def MINUTE = /[0-5][0-9]/
def AM_PM = /AM|PM/
def time = /($HOUR):($MINUTE) ($AM_PM)/

assert ["12:30 AM", "1:15 AM", "9:22 PM"] == (paragraph =~ time).collect { it }

assert ["12:30 AM", "1:15 AM"] == (paragraph =~ time).grep(~/.*AM$/)

A limitation of the iterator-based methods is that they don’t give you access to the individual groups (hour, minute, am/pm), just the full matched string (β€œ12:30 AM”). The each method is more powerful because as it iterates through, it passes the full match as well as each of the individual groups into the closure.

("foo1 bar30 foo27 baz9 foo600" =~ /foo(\d+)/).each { match, digit -> println "+$digit" }

// result:
// +1
// +27
// +600

Another example (using the paragraph and time Matcher from above) showing how to pretty print all of the timestamps:

(paragraph =~ time).each {match, hour, minute, amPm -> 
    println "$hour:$minute ${amPm == 'AM' ? 'this morning' : 'this evening' }"
}

// result: 
// 12:30 this morning
// 1:15 this morning
// 9:22 this evening

References