Core File Test Rules - Georgetown-University-Libraries/File-Analyzer GitHub Wiki

Core package contents

Count Files By Type

This test counts the number of files found by file extension. A report will be generated listing the number of files found for each extension as well as a cumulative number of bytes for files of each type.

screen shot

List Files

This rule will generate a listing of the full path to every file it finds. The purpose of this tool is to generate a file list for import into other applications.

screen shot

List Directories

This rule will generate a listing of the unique directory names found within a specific directory. The purpose of this rule is to generate an tracking list when performing a similar batch process on a collection of directories.

screen shot

Match By Name

This test reports on file size by name regardless of the directory in which a file name is found.

screen shot

Match by Base Name

This test reports on file size by base name (no extension) regardless of the directory in which a file name is found.

screen shot

Sort By Checksum

This test reports the checksum for a given filename. The summary report will identify files with the same checksum value. You may select from a number of standard checksum algorithms.

screen shot

Match by Path

This test counts the number of items found in a specific directory. This test will also compute cumulative totals found for each directory that is scanned.

screen shot

Count by Type and Directory

This test counts the number of items found in a specific directory. This test will also compute cumulative totals found for each directory that is scanned.

screen shot

Random Sampling Mil 105E

This test will return a list of files in random order for QC processing. Select the AQL (acceptable quality level) target for your test.
This rule will generate a random sample of the appropriate size based on the number of files found. See http://en.wikipedia.org/wiki/MIL-STD-105 for an explaination.

screen shot

Using the Filter and Export capabilities of the File Analyzer, a random sampling can be exported for use in a quality control process.

Lowercase Test

This test will check that all files are named with only lowercase characters. The File Analzyer can be re-compiled to allow the actual re-name to take place.

This code is provided as an example of how to create a file name validation routine in the File Analyzer. A robust set of pattern matching can be applied to ensure that a collection of files conform to naming standards. When files within a driectory should contain a numeric sequence, pattern matching can be performed to ensure that there are no breaks in sequence.

screen shot

class LowercaseTest extends NameValidationTest {

public LowercaseTest(FTDriver dt, FileTest nextTest) {
    super(dt, new ValidPattern("^[^A-Z]*$", false),nextTest, "Lowercase","Lowercase");
    testPatterns.add(new RenameablePattern(".*", false){
        public String getMessage(File f, Matcher m) {
            return "";
        }

        public File getNewFile(File f, Matcher m) {
            return new File(f.getParentFile(), f.getName().toLowerCase());
        }
        
    });
}

Digital Derivatives

See Identify digital derivatives

Counter Compliance (CSV)

In the core package, the Counter Compliance tests apply only to text files. Use the updated version in the Demo package to parse XLSX files as well.

See Counter compliant reports