Source Code - AdventistMediaMinistries/KeyfilePaperlessDocumentStorageExtraction GitHub Wiki
As previously stated, the source code consists of a ruby library file (kf_decoder.rb), a number of objects (described further below), and a handful of utilities (described here). Because the Keyfile database is not 100% understood, some of the utilities may not be all that useful, because they were created to extract a particular entity's data in a specific format. However, the objects and functions these utilities use should be generally relevant to any keyfile database, and thus could be used as the basis for a new extraction.
Celluloid is used throughout to make the code multi-threaded. However this feature is mostly used in the utilities to allow extraction of thousands of files at high speed——taking full advantage of a multi-core architecture.
Files and Objects
What follows is a brief description of the key ruby files/objects in the library:
kf_decoder.rb This is a library file. It contains methods used by many parts of the system such as the command line parameter checking code for the various utilities. It also defines a number of objects used by those utilities, in particular several versions of the workers used in the multi-threaded code to quickly extract a bunch of data.
cabinet.rb Defines the file cabinet object (Cabinet), which understands how to decode and manipulate cabinet kob files.
dataStore.rb Defines the DataStore object——a library of code able to take a Kob or Node object and write/extract it's contents to disk in a hierarchy of common folders/image files
index.rb Defines the Index object, which understands how to decode and manipulate index kob files.
kob.rb Defines the Kob object, the foundational object which does the basic decoding of a kob file. Kob.process() calls the appropriate sub-type to complete the decoding once the kob file type is identified.
meta.rb Defines the Meta object, which understands how to decode and manipulate metadata kob files.
node.rb Defines the Node object, which is a library of code that creates a tree of objects in memory representing all the data referenced by a particular metadata file. Node.rb uses recursion and can blow up if the tree it's trying to decode is too large.
search.rb Defines the Search object, used to create a list of occurrences of a particular byte pattern in a kob file. Used to generate an index of references within a metadata file for example.
tiff.rb Defines the Tiff object, able to decode and extract the tiff data found in many kob files.
pdf.rb Defines the PDF object, able to decode and extract the PDF data found in many kob files.