Home - AdventistMediaMinistries/KeyfilePaperlessDocumentStorageExtraction GitHub Wiki
Introduction
The project consists of a ruby gem, a handful of ruby utilities, and a library of code the utilities use, and which could form the building blocks for additional extractions.
Utilities
Prerequisites: For ideal performance, a recent version of jRuby is recommended. jRuby is capable of true multithreading, and when dealing with millions of files, performance is an issue. While the software should work under your stock ruby, unless you're dealing with a very small database, any extraction of data will take forever. For example, on a particular dataset, system ruby took 8+ hours. jRuby took less than 3.
diff
use to subtract one index file from another
pf
extract a folder of documents creating a pile of files organized by date without regard to structure. Multi-page documents are kept together, but there is no other structure. Documents are extracted and organized in a date-based folder structure
pm
Extract a top level metadata file and everything under it. Document organization is preserved
ex
Extract everything referenced by the passed index file.
sherlock
A general purpose utility for doing a number of things
- decoding single metadata or index files (same as pm above)
- filter various Keyfile types from an indiscriminate list of files
- search a folder of Keyfile data for files containing a particular byte pattern (produces a list)
- Make a list of every keyfile document in a particular folder (this is how we get index files referenced above)
- verify that all the files listed in an index file actually exist
Source Code
Click for a high-level description of the source code and it's objects.