Setup Guide - datascience/c3po Wiki
This is an installation guide for C3PO 0.6.0. It will help you setup and run the command line application as well as the web app in a server of your choice.
Now it is also possible to install C3PO using
docker. The instructions are at the bottom of the page.
- Java 1.8,
- MongoDB 3.2 or higher,
- sbt 1.0 or higher,
- FITS 0.6 or higher (optional)
Install Java, MongoDB (http://www.mongodb.org) and FITS(0.6) (http://projects.iq.harvard.edu/fits), if you haven't. Take a note of the port where the mongo daemon is running (27017 by default). Clone this repository to a location of your choice. (For this guide we assume ~/c3po) Run Maven:
cd ~/c3po sbt clean compile assembly
You can find the command line c3po in ~/c3po/c3po-cmd/target/scala-2.11.
The command line of c3po has several modes you can choose from. To use c3po use the following command:
java -jar c3po-cmd-assembly-0.1-SNAPSHOT.jar
This will output an error message with the modes that you can use. Here are all the available modes and their options you can use. The ones with the '*' are obligatory.
The help mode prints all the available modes and options.
Usage: c3po help
Prints version information
Usage: c3po version
The gather mode is used to read meta data into the mongo database.
Usage: c3po gather [options] Options: * -c, --collection The name of the collection * -i, --inputdir The input directory where the meta data is stored -r, --recursive Whether or not to gather recursively Default: false -t, --type Optional parameter to define the meta data type. Use one of 'FITS' or 'TIKA', to select the type of the input files. Default is FITS Default: FITS
The profile mode is used to generate a profile in xml format.
Usage: c3po profile [options] Options: -a, --algorithm The algorithm that will be used for selecting the samples records. Supported values are: 'sizesampling', 'syssampling', 'distsampling' Default: sizesampling * -c, --collection The name of the collection -ie, --includeelements If this flag is present, the profile will include a list of element identifiers. Note, that this might be a long list. Default: false -o, --outputdir The output directory where the profile will be stored Default: <empty string> -props, --properties The list of properties for the 'distsampling' algorithm Default:  -s, --size The size of the samples set. Default: 5
The samples mode is used to select representative samples based on different strategies.
Usage: c3po samples [options] Options: -a, --algorithm The algorithm that will be used for selecting the samples records. Use one of 'sizesampling', 'syssampling', 'distsampling' Default: sizesampling * -c, --collection The name of the collection -o, --outputdir The output directory where the samples will be output. If nothing is provided the output is written to the console -props, --properties The list of properties for the 'distsampling' algorithm Default:  -s, --size The size of the samples set. Default: 5
The export mode is used to export the data in a csv format.
Usage: c3po export [options] Options: * -c, --collection The name of the collection -o, --outputdir The output directory where the profile will be stored Default: <empty string>
The remove mode is used to remove a collection.
Usage: c3po remove [options] Options: * -c, --collection The name of the collection
C3PO relies on some simple configuration parameters, like the db name, db host, db port, etc. Defaults are supplied within the jar, so you don't have to do anything. However, if you want to override them create a file called .c3poconfig in your home directory and replace the properties you want. C3PO will use the defaults for all properties that you skip. Here are the defaults.`
#Application default properties. c3po.persistence=default # the class provider for the persistence layer (or default) c3po.controller.adaptors.count=4 # the count of the adaptors c3po.controller.consolidators.count=2 # the count of the consolidators c3po.rule.infer_date_from_file_name=false # a rule that tries to infer a date from the file names c3po.rule.html_info_processing=false # a rule that cleans up special fits meta data c3po.rule.format_version_resolution=true # a rule that fixes some errors in format version parsing c3po.rule.empty_value_processing=true # a rule that does not allow empty values c3po.rule.create_element_identifier=true # a rule that creates element identifiers if none are provided by the adaptor c3po.adaptor.tika.version="unknown" # the tika version (if tika files were processed) #DB default Properties db.host=127.0.0.1 # the host where mongo is running db.port=27017 # the port where mongo is listening db.name=c3po # the name of the db
The Web App provides a UI for the data and allows you to filter the data, select some sample records, export data (xml profile and csv), but also to integrate with tools like PLATO and SCOUT.
Build and Deploy
Note that version 0.6.0 uses Play 2.4, so make sure you install the correct version.
To run web-api, execute command
sbt "project c3po-webapi" run from
Fire up a browser and navigate to localhost:9000/c3po. You should see the application running.
sbt clean compile assemblywill generate everything you need for the standalone version. Just run the generated binary
~/c3po/c3po-webapi/target/scala-2.11/c3po-webapi-assembly-0.1-SNAPSHOT.jar . This will run the app in production.
Docker allows users to start a local instance of C3PO skipping manual installation of sbt, java, and MongoDB. Make sure docker v.17 (or higher) is installed. Specify a location of folder with FITS files instead of "/path/on/host" and execute:
cd ~/c3po/ docker build . -t c3pobundle docker run -it -p 9000:9000 -v **/path/on/host**:/data/FITS c3pobundle
Alternatively, we have prepared and pushed an image with the bundle to Docker hub. You can use the image directly like:
docker run -it -p **port**:9000 -v **/path/on/host**:/data/FITS artourkin/c3po:latest
Once the message
(Server started, use Ctrl+D to stop and go back to the console...) gets printed, C3PO is available at http://localhost:9000/c3po.
If you have any additional questions, please contact us.