Core File Import Rules - Georgetown-University-Libraries/File-Analyzer GitHub Wiki

Core package contents

Import Delimited File

This rule will import a delimited file (comma separated, tab separated, etc). Please specify the delimiter character to use.

Regular Expression Parser

This rule will parse each line of a file and add it to the results table. This rule requires an understanding of regular expressions.

screen shot

screen shot

Multi Parser

This rule will parse each line of a file against a sequence of regular expressions. Resulting values are stored in named matching groups. The sequence of patterns to apply to the file are stored in a rule file.

screen shot

On the properties tab, indicate the location of your Parser Rule File.

screen shot

Here is a sample Parser Rule File

[COLS]
FIRST,LAST,ID,COST

[PATTERNS]
# Sample Comment 1
^(?<FIRST>[^\t\-]+)-(?<ID>[^\t]+)\t(?<LAST>[^\t]+).*\$(?<COST>\d+).*$
^(?<FIRST>[^\t\-]+)-(?<ID>[^\t]+)\t(?<LAST>[^\t]+).*$
# Sample Comment 2
^(?<FIRST>[^\t\-]+)\t(?<LAST>[^\t]+).*\$(?<COST>\d+).*$
^(?<FIRST>[^\t\-]+)\t(?<LAST>[^\t]+).*$
# Ignore empty lines
^$

If you run this rule against the following test data

Joe	Smith	test $100 test
Jane	Doe	$400 aa 22
Bob-F123	Foo 
Dan-X222	Foober	$671
this is a line with no tabs at all
Jim Davis	aaa	$12345678 

The resulting data will be displayed based on the named groups that were defined.

screen shot

Note: the sample files used in this example are available here.

Count Key

Count the number of times a key value appears in a file.

screen shot

screen shot

Optionally, this rule can be used to generate de-duplicate file sets for the input file

![](Count Key Dedup.jpg)

Counter Compliance - CSV

In the core package, the Counter Compliance tests apply only to text files. Use the updated version in the Demo package to parse XLSX files as well. The Counter Compliance Importer operates on a single file rather than on a collection of files.

See Counter compliant reports

⚠️ **GitHub.com Fallback** ⚠️