Core File Test Rules - Georgetown-University-Libraries/File-Analyzer GitHub Wiki
Count Files By Type
This test counts the number of files found by file extension. A report will be generated listing the number of files found for each extension as well as a cumulative number of bytes for files of each type.
List Files
This rule will generate a listing of the full path to every file it finds. The purpose of this tool is to generate a file list for import into other applications.
List Directories
This rule will generate a listing of the unique directory names found within a specific directory. The purpose of this rule is to generate an tracking list when performing a similar batch process on a collection of directories.
Match By Name
This test reports on file size by name regardless of the directory in which a file name is found.
Match by Base Name
This test reports on file size by base name (no extension) regardless of the directory in which a file name is found.
Sort By Checksum
This test reports the checksum for a given filename. The summary report will identify files with the same checksum value. You may select from a number of standard checksum algorithms.
Match by Path
This test counts the number of items found in a specific directory. This test will also compute cumulative totals found for each directory that is scanned.
Count by Type and Directory
This test counts the number of items found in a specific directory. This test will also compute cumulative totals found for each directory that is scanned.
Random Sampling Mil 105E
This test will return a list of files in random order for QC processing.
Select the AQL (acceptable quality level) target for your test.
This rule will generate a random sample of the appropriate size based on the number of files found.
See http://en.wikipedia.org/wiki/MIL-STD-105 for an explaination.
Using the Filter and Export capabilities of the File Analyzer, a random sampling can be exported for use in a quality control process.
Lowercase Test
This test will check that all files are named with only lowercase characters. The File Analzyer can be re-compiled to allow the actual re-name to take place.
This code is provided as an example of how to create a file name validation routine in the File Analyzer. A robust set of pattern matching can be applied to ensure that a collection of files conform to naming standards. When files within a driectory should contain a numeric sequence, pattern matching can be performed to ensure that there are no breaks in sequence.
class LowercaseTest extends NameValidationTest {
public LowercaseTest(FTDriver dt, FileTest nextTest) {
super(dt, new ValidPattern("^[^A-Z]*$", false),nextTest, "Lowercase","Lowercase");
testPatterns.add(new RenameablePattern(".*", false){
public String getMessage(File f, Matcher m) {
return "";
}
public File getNewFile(File f, Matcher m) {
return new File(f.getParentFile(), f.getName().toLowerCase());
}
});
}
Digital Derivatives
See Identify digital derivatives
Counter Compliance (CSV)
In the core package, the Counter Compliance tests apply only to text files. Use the updated version in the Demo package to parse XLSX files as well.