Setting Up and Running eXpressD - adarob/eXpress-d GitHub Wiki
There are only two requirements to running eXpress-D: sources for eXpress-D and Spark. eXpress-D is compatible with Spark-0.7.X sources that can be found on the Spark Downloads page.
To get eXpress-D sources, clone a local copy of our GitHub repository.
$ git clone git://github.com/adarob/express-d express-d
Program flags and properties that can be customized are in the express-d/config
directory. To start, copy the express-d/config/config.py.template
file to create a new express-d/config/config.py
file.
$ cd express-d
$ cp config/config.py.template config/config.py
The required flags in config.py
are:
-
SPARK_HOME
: Absolute path to the home directory of the Spark sources that eXpress-D will compile against. -
EXPRESS_D_HOME
: Absolute path to the home directory of the eXpress-D sources, e.g.path/to/express-d
. -
SPARK_CLUSTER_URL
: If running locally, set to"local"
. If running on EC2, this should be the URL to the Spark master instance for your cluster. If you used the provided Spark EC2 scripts to launch the cluster, then it works to useSPARK_CLUSTER_URL = open("/root/spark-ec2/cluster-url", 'r').readline().strip()
. -
EXPRESS_RUNTIME_LOCAL_OPTS
: This represents a list of required and optional properties that the eXpress-D program reads during runtime.- Required properties:
-
"hits-file-path"
: Absolute path to the preprocessed, protobuf file that contains alignment data. -
"targets-file-path"
: Absolute path to the preprocessed, protobuf file that contains targets data.
-
- Optional properties:
-
"should-use-bias"
: Whether eXpress-D should use its bias model and update relevant parameters. Defaults totrue
. -
"should-cache"
: Whether alignments and target datasets should be cached in memory. Defaults totrue
.
-
- Required properties:
An example of how the above flags can be set in a minimal config.py
:
SPARK_HOME="/path/to/spark"
EXPRESS_D_HOME="/path/to/express-d"
SPARK_CLUSTER_URL="local"
...
EXPRESS_RUNTIME_LOCAL_OPTS = [
OptionSet("hits-file-path", ["/path/to/hits.1M.pb"]),
OptionSet("targets-file-path", ["/path/to/targets.pb"]),
OptionSet("should-use-bias", ["true"]),
OptionSet("should-cache", ["true"]),
...
]
Once flags and properties have been set, eXpress-D is ready to be compiled using the bin/build
script.
$ cd express-d; bin/build-and-run
This creates a (fat) express-d-assembly.jar
that contains all dependencies needed to run eXpress-D (i.e. all eXpress-D, Spark, and Apache commons jars are packaged into that jar).
To run eXpress-D, simply do:
$ bin/run
The output file has the naming format hits.<num iterations to convergence>.<time in milliseconds>.<withBias or withoutBias>
and will be created in the EXPRESS_D_HOME
directory.