ftlRegex - SCM-NV/ftl GitHub Wiki
ftlRegex is a convenient Fortran wrapper around either the PCRE library or
alternatively the POSIX regular expression functionality in the C standard
library (aka regex.h
).
Here is a little example code that shows what ftlRegex can do for you:
type(ftlString) :: line
type(ftlRegex) :: regex
line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')
line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)
The ftlString line
now holds:
Element: 12<-mass 6<-Z C<-symbol Carbon<-name
Quite a lot of work done in just one line of Fortran, isn't it?
Note that since ftlRegex
internally uses the regular expression engine of
either the PCRE library or the C standard library, the supported regular
expression elements are up to the implementation of these libraries. It's
probably best to use the PCRE library which is available on all platforms
(Windows, I'm looking at you here ...). Also it is more powerful than the POSIX
regular expressions, i.e. it has non-capturing groups and many more features on
top of what POSIX offers. Everything from the POSIX standard should also work
with PCRE, but if you want to keep the option of linking against either one,
you should stick to the POSIX standard regular expressions, specifically the
POSIX Extended Regular Syntax.
Check how ftlRegex is built in the makefile that comes with the FTL. In summary you have two options:
- Compile with "-DUSE_PCRE" and link with "-lpcreposix -lpcre". This will use the PCRE library as the regex engine.
- Compile without "-DUSE_PCRE" and link with nothing. This will use the standard POSIX regular expressions as the regex engine.
Note that compilation of the ftlRegex.F90 file requires the "configure_ftlRegex.inc" file, which contains the numeric values of some of the enums in the C headers. This file can be generated with the small C program in the configure directory. Again, just check what the makefile of the FTL does ...
Unfortunately the numeric values tend to differ between the PCRE POSIX header file and regex.h, so you need to make sure that this is consistent during compilation and linking, i.e. do not compile with "-DUSE_PCRE" but then link against the standard POSIX regex engine. The linking will succeed, but the resulting ftlRegex library will do strange things at runtime ...
In addition to the ftlRegex
type itself, the ftlRegexModule
defines some
other types that are used as return types of the matching methods of the
ftlRegex
type.
type, public :: ftlRegexMatch
logical :: matches = .false.
type(ftlString) :: text
integer :: begin = 0
integer :: end = 0
type(ftlRegexGroup), allocatable :: group(:)
end type
Here the matches
member is .true.
if a match was found. If a match was
found the text that matches the regular expression is stored as an ftlString
in the text
member variable. The position of the match in the original
string is given by the range [begin
, end
). Not that this (like all
ranges used in the FTL) is a half open interval, meaning that begin
is
included and end
is the first excluded character. So the text
member
compares equal to string(begin:end-1)
, if string
is a raw Fortran
string. The group
member holds the contents of the regular expression's
capture groups, if the particular expressions uses any. The used
ftlRegexGroup
type is defined as:
type, public :: ftlRegexGroup
type(ftlString) :: text
integer :: begin = 0
integer :: end = 0
end type
Here text
is just text captured by the group and begin
and end
delimit where the captured group is found in the original string, again as a
half open interval.
Constructs a new ftlDynArray container from a variety of data sources:
Pattern constructor. Constructs an
ftlRegex
using either anftlString
(or alternatively a normal Fortran string) containing the regular expression pattern, and a number of optional logical arguments.subroutine New(self, pattern, basic, icase, nosub, newline) type(ftlRegex) , intent(inout) :: self type(ftlString), intent(in) :: pattern logical , intent(in) , optional :: basic, icase, nosub, newlineThe optional logicals have the following meaning:
basic
This flag is only relevant when linking against the regular expression engine in the C standard library, instead of the (recommended) PCRE library. If this is the case basic POSIX regular expressions are used instead of the POSIX Extended Regular Syntax that
ftlRegex
uses by default.icase
Do not differentiate case. Subsequent searches using the
ftlRegex
will be case insensitive.nosub
Do not report position of matches or capturing groups. The resulting
ftlRegex
can pretty much only be used to test if something matches, but not where exactly. However, testing for matches will be faster. (Hopefully, this depends on your libc implementation ...)newline
Match-any-character operators don't match a newline. A nonmatching list ([^...]) not containing a newline does not match a newline.
Example usage:
type(ftlRegex) :: regex type(ftlString) :: pattern call regex%New('\s*=\s*') ! construction from raw Fortran string ... pattern = 'TeSt' call regex%New(line, icase=.true.) ! ... or from an ftlString patternCopy constructor. Constructs one regular expression as a copy of another.
subroutine New(self, other) type(ftlRegex), intent(inout) :: self type(ftlRegex), intent(in) :: otherNote that the constructors are also available as free functions named
ftlRegex()
that take the same parameters as above type bound subroutines and return anftlRegex
instance. This is sometimes useful if one wants to use a regular expression only once:write (*,*) ('T12T' .matches. ftlRegex('T[0-9]+T')) ! prints True
Destructs the regular expression. All used memory is deallocated.
subroutine Delete(self) type(ftlRegex), intent(inout) :: selfIt's not necessary to call
Delete
manually. It is used as the finalizer of theftlRegex
type and will be called automatically when anftlRegex
goes out of scope.
Copy assignment. Replaces the contents with a copy of the contents of other.
subroutine assignment(=)(self, other) type(ftlDynArrayT), intent(inout) :: self type(ftlDynArrayT), intent(in) :: otherThis is exactly the same as using the copy constructor. (The assignment has only been implemented because intrinsic assignment would do the wrong thing and crash the program when the assigned regexes go out of scope.)
Compares two regular expressions for (in)equality.
logical function operator(==)(lhs, rhs) type(ftlRegex), intent(in) :: lhs, rhs logical function operator(/=)(lhs, rhs) type(ftlRegex), intent(in) :: lhs, rhsTwo regular expressions are considered equal both the pattern and the (optional) flags passed to their constructor are equal.
Checks whether a
string
(eitherftlString
or raw Fortran string) matches a regular expression.logical function operator(.matches.)(lhs, rhs) type(ftlString), intent(in) :: lhs type(ftlRegex) , intent(in) :: rhsExample usage:
type(ftlRegex) :: newsec type(ftlString) :: line integer :: unit, iostat, numSections ! open some file as unit call newsec%New('^\s*SECTION\s*$', icase=.true., nosub=.true.) numSections = 0 do while (.true.) call line%ReadLine(unit, iostat) if (is_iostat_end(iostat)) exit if (line .matches. newsec) numSections = numSections + 1 enddo write (*,*) 'Found ', numSections, 'in file'
Returns the number of non-overlapping matches of
regex
instring
(which can either be anftlString
or a raw Fortran string).integer function NumMatches(self, string) type(ftlRegex) , intent(in) :: self type(ftlString), intent(in) :: stringExample usage:
type(ftlRegex) :: regex call regex%New('[a-zA-z]\s*=\s*[0-9]+') write (*,*) regex%NumMatches('u=12 F=32 a=b x=7') ! prints 3
Returns an array of all non-overlapping matches of the regular expression in
string
(which can either be anftlString
or a raw Fortran string).function Match(self, string) type(ftlRegex) , intent(in) :: self type(ftlString) , intent(in) :: string type(ftlRegexMatch), allocatable :: matches(:)If no matches are found, the returned array has a size of 0.
Example usage:
type(ftlString) :: line type(ftlRegex) :: r type(ftlRegexMatch), allocatable :: m(:) line = 'keyword option1=value option2=othervalue' call r%New('(\w+)\s*=\s*(\w+)') m = r%Match(line) ! m(1)%text now holds 'option1=value' ! m(2)%text now holds 'option2=othervalue' ! m(:)%group is also populated with the contents of the capture groups. ! e.g. m(1)%group(2)%text holds 'value'
Returns a
ftlRegexMatch
for the first match of the regular expression in astring
(which can either be anftlString
or a raw Fortran string).type(ftlRegexMatch) function MatchFirst(self, string) type(ftlRegex) , intent(in) :: self type(ftlString), intent(in) :: stringIf no match is found then the
matched
member variable of the returnedftlRegexMatch
is set to.false.
.Example usage:
type(ftlRegex) :: regex type(ftlRegexMatch) :: match call regex%New('[a-zA-z]\s*=\s*[0-9]+') match = regex%MatchFirst('u=12 F=32 a=b x=7') ! match%text now holds 'u=12'
Returns an
ftlString
where all matches of the regular expression instring
have been replaced withsub
. Note that bothstring
andsub
can be eitherftlString
or raw Fortran strings.type(ftlString) function Replace(self, string, sub, doGroupSub) class(ftlRegex), intent(in) :: self type(ftlString), intent(in) :: string type(ftlString), intent(in) :: sub logical , intent(in), optional :: doGroupSubIf the optional argument
doGroupSub
is present and.true.
, the contents of the regular expression's capture groups can be used in the substitution string:\n
will be replaced by the contents of the n'th capture group.Example usage:
type(ftlString) :: line type(ftlRegex) :: regex line = 'Element: mass=12 Z=6 symbol=C name=Carbon' call regex%New('(\w+)\s*=\s*(\w+)') line = regex%Replace(line, '\2<-\1', doGroupSub=.true.) ! line now holds: 'Element: 12<-mass 6<-Z C<-symbol Carbon<-name'