Ingesting CDM - ashish-gehani/SPADE GitHub Wiki
The DARPA Transparent Computing program defined a Common Data Model (CDM) to represent data provenance and information flow. SPADE's CDM reporter ingests provenance emitted by SPADE's CDM storage, in either Avro binary or JSON format, that conforms to the schema in the cfg/spade.storage.CDM.avsc
file.
Configuring CDM reporting
The CDM reporter requires at least one argument, which is the inputFile
containing the CDM. If the file is in JSON format, its name must include the .json extension. If the file is in Avro binary format, its name must include the .bin extension. Note that this must be done in the SPADE controller (after the SPADE server has been started):
-> add reporter CDM inputFile=/tmp/cdm.json
Adding reporter CDM... done
The waitForLog=false
option can be used to ensure that ingestions stops when the reporter is removed. Note that by default, the reporter will continue to process all records even after it is removed.
-> add reporter CDM inputFile=/tmp/cdm.json waitForLog=false
Adding reporter CDM... done
Collection ingestion
If the CDM records are stored in a collection of files, they can be ingested together with the rotate
option. If rotate=true
is specified, the inputFile
is processed first. Next, files with the same name but .1
, .2
, ... extensions are processed in ascending order. For example, /tmp/cdm.json
, /tmp/cdm.json.1
, /tmp/cdm.json.2
, and /tmp/cdm.json.3
can be ingested with the command:
-> add reporter CDM inputFile=/tmp/cdm.json rotate=true
Adding reporter CDM... done
The reporter can be deactivated using the following command in the SPADE controller:
-> remove reporter CDM
Shutting down reporter CDM... done