GREL String Functions - tfmorris/OpenRefine GitHub Wiki

String functions supported by OpenRefine Expression Language (GREL)

See also: All GREL Functions.

Basic

length(string s)

Returns the length of s as a number.

Testing String Characteristics

startsWith(string s, string sub)

Returns boolean indicating whether s starts with sub. For example, startsWith("food", "foo") returns true, whereas startsWith("food", "bar") returns false. You could also write the first case as "food".startsWith("foo").

endsWith(string s, string sub)

Returns boolean indicating whether s ends with sub. For example, endsWith("food", "ood") returns true, whereas endsWith("food", "odd") returns false. You could also write the first case as "food".endsWith("foo").

contains(string s, string sub)

Returns boolean indicating whether s contains sub. For example, contains("food", "oo") returns true whereas contains("food", "ee") returns false. You could also write the first case as "food".contains("oo").

Basic String Modification

Case Conversion

toLowercase(string s)

Returns s converted to lowercase.

toUppercase(string s)

Returns s converted to uppercase.

toTitlecase(string s)

Returns s converted to titlecase. For example, toTitlecase("Once upon a midnight dreary") returns the string Once Upon A Midnight Dreary.

Trimming

trim(string s) and strip(string s)

Returns a copy of the string, with leading and trailing whitespace removed. For example, trim(" island ") returns the string island.

chomp(string s, string sep)

Returns a copy of s with sep removed from the end if s ends with sep; otherwise, just returns s. For example, chomp("hardly", "ly") and chomp("hard", "ly") both return the string hard.

Substring

substring(s, number from, optional number to)

Returns the substring of s starting from character index from and upto character index to. If to is omitted, it's understood as the end of the string s. For example, substring("profound", 3) returns the string found, and substring("profound", 2, 4) returns the string of.

Character indexes start from zero. Negative character indexes are understood as counting from the end of the string. For example, substring("profound", 1, -1) returns the string rofoun.

slice(s, number from, optional number to)

See substring function above.

get(o, number or string from, optional number to)

See substring function above.

Find and Replace

indexOf(string s, string sub)

Returns the index of sub first ocurring in s as a character index; or -1 if s does not contain sub. For example, indexOf("internationalization", "nation") returns 5, whereas indexOf("internationalization", "world") returns -1.

lastIndexOf(string s, string sub)

Returns the index of sub last ocurring in s as a character index; or -1 if s does not contain sub. For example, lastIndexOf("parallel", "a") returns 3 (pointing at the second character "a").

replace(string s, string f, string r)

Returns the string obtained by replacing f with r in s. f can be a regular expression, in which case r can also contain capture groups declared in f.

For a simple example, replace("The cow jumps over the moon and moos", "oo", "ee") returns the string The cow jumps over the meen and mees.

replaceChars(string s, string f, string r)

Returns the string obtained by replacing any character in s that is also in f with the character r. For example, replaceChars("commas , and semicolons ; are separators", ",;", "**") returns the string commas ** and semicolons ** are separators.

match(string s, regexp p)

Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups. For example, match("230.22398, 12.3480", /\.(\d\d\d+)/) returns an array of 1 string: 3480. match("230.22398, 12.3480", /.**\.(\d+).*\.(\d+)/) returns an array of 2 strings: 22398 and 3480.

String Parsing and Splitting

toNumber(o)

Returns o converted to a number.

split(s, sep)

Returns the array of strings obtained by splitting s at wherever sep is found in it. sep can be either a string or a regular expression. For example, split("fire, water, earth, air", ",") returns the array of 4 strings: "fire", " water", " earth" , and " air". The double quotation marks are shown here only to highlight the fact that the spaces are retained.

splitByLengths(string s, number n1, number n2, ...)

Returns the array of strings obtained by splitting s into substrings with the given lengths. For example, splitByLengths("internationalization", 5, 6, 3) returns an array of 3 strings: inter, nation, and ali.

smartSplit(string s, optional string sep)

Returns the array of strings obtained by splitting s by the separator sep. Handles quotes properly. Guesses tab or comma separator if sep is not given. Also, value.escape('javascript') is useful for previewing unprintable chars prior to using smartSplit.

smartSplit(value,"\n")  //split cell at Carriage Return or New Line char

splitByCharType(s)

Returns an array of strings obtained by splitting s into groups of consecutive characters where the characters within each group share the same unicode type, and consecutive groups differ in their unicode types.

partition(string s, string or regex frag, optional boolean omitFragment)

Returns an array of strings [ a, frag, b ] where a is the substring within s before the first occurrence of frag in s, and b is the substring after frag in s. For example, partition("internationalization", "nation") returns 3 strings: inter, nation, and alization. If s does not contain frag, it returns an array of [ s, "", "" ] (the first string is the original s and the second and third strings are empty).

If omitFragment is true, frag is not returned. That is, the result is an array of only 2 elements.

rpartition(string s, string or regex frag, optional boolean omitFragment)

Returns an array of strings [ a, frag, b ] where a is the substring within s before the last occurrence of frag in s, and b is the substring after frag in s. For example, partition("parallel", "a") returns 3 strings: par, a, and llel. If s does not contain frag, it returns an array of [ s, "", "" ] (the first string is the original s and the second and third strings are empty).

Encoding and Hashing

diff(o1, o2, optional string timeUnit)

For strings, returns the portion where they differ. For dates, it returns the difference in given time units.

escape(string s, string mode)

Escapes s in the given escaping mode: html, xml, csv, url, javascript.

unescape(string s, string mode)

Unescapes s in the given escaping mode: html, xml, csv, url, javascript.

md5(string s)

Returns the MD5 hash of s.

sha1(string s)

Returns the SHA-1 hash of s.

phonetic(string s, optional string encoding)

Returns the a phonetic encoding of s, optionally indicating which encoding to use (defaults to DoubleMetaphone).

reinterpret(string s, string encoder)

Returns s reinterpreted thru the given encoder. Supported encodings here.

fingerprint(string s)

Returns the fingerprint of s, a derived string that aims to be a more canonical form of it (this is mostly useful for finding clusters of strings related to the same information).

ngram(string s, number n)

Returns an array of the word ngrams of s.

ngramFingerprint(string s, number n)

Returns the n-gram fingerprint of s.

unicode(string s)

Returns an array of strings describing each character of s in their full unicode notation.

unicodeType(string s)

Returns an array of strings describing each character of s in their full unicode notation.

Freebase Specific

mqlKeyQuote(string s)

Quotes s so that it can be used as a Freebase key. More info: MQL key escaping

mqlKeyUnquote(string key)

Unquotes the MQL quoted string key back to its original form. More info: MQL key escaping

⚠️ **GitHub.com Fallback** ⚠️