Core File Import Rules - Georgetown-University-Libraries/File-Analyzer GitHub Wiki
This rule will import a delimited file (comma separated, tab separated, etc). Please specify the delimiter character to use.
This rule will parse each line of a file and add it to the results table. This rule requires an understanding of regular expressions.
This rule will parse each line of a file against a sequence of regular expressions. Resulting values are stored in named matching groups. The sequence of patterns to apply to the file are stored in a rule file.
On the properties tab, indicate the location of your Parser Rule File.
Here is a sample Parser Rule File
[COLS]
FIRST,LAST,ID,COST
[PATTERNS]
# Sample Comment 1
^(?<FIRST>[^\t\-]+)-(?<ID>[^\t]+)\t(?<LAST>[^\t]+).*\$(?<COST>\d+).*$
^(?<FIRST>[^\t\-]+)-(?<ID>[^\t]+)\t(?<LAST>[^\t]+).*$
# Sample Comment 2
^(?<FIRST>[^\t\-]+)\t(?<LAST>[^\t]+).*\$(?<COST>\d+).*$
^(?<FIRST>[^\t\-]+)\t(?<LAST>[^\t]+).*$
# Ignore empty lines
^$
If you run this rule against the following test data
Joe Smith test $100 test
Jane Doe $400 aa 22
Bob-F123 Foo
Dan-X222 Foober $671
this is a line with no tabs at all
Jim Davis aaa $12345678
The resulting data will be displayed based on the named groups that were defined.
Note: the sample files used in this example are available here.
Count the number of times a key value appears in a file.
![](Count Key Dedup.jpg)
In the core package, the Counter Compliance tests apply only to text files. Use the updated version in the Demo package to parse XLSX files as well. The Counter Compliance Importer operates on a single file rather than on a collection of files.