Setting Up and Running eXpressD - adarob/eXpress-d GitHub Wiki

There are only two requirements to running eXpress-D: sources for eXpress-D and Spark. eXpress-D is compatible with Spark-0.7.X sources that can be found on the Spark Downloads page.

To get eXpress-D sources, clone a local copy of our GitHub repository.

$ git clone git://github.com/adarob/express-d express-d

Program flags and properties that can be customized are in the express-d/config directory. To start, copy the express-d/config/config.py.template file to create a new express-d/config/config.py file.

$ cd express-d
$ cp config/config.py.template config/config.py

The required flags in config.py are:

  • SPARK_HOME: Absolute path to the home directory of the Spark sources that eXpress-D will compile against.
  • EXPRESS_D_HOME: Absolute path to the home directory of the eXpress-D sources, e.g. path/to/express-d.
  • SPARK_CLUSTER_URL: If running locally, set to "local". If running on EC2, this should be the URL to the Spark master instance for your cluster. If you used the provided Spark EC2 scripts to launch the cluster, then it works to use SPARK_CLUSTER_URL = open("/root/spark-ec2/cluster-url", 'r').readline().strip().
  • EXPRESS_RUNTIME_LOCAL_OPTS: This represents a list of required and optional properties that the eXpress-D program reads during runtime.
    • Required properties:
      • "hits-file-path": Absolute path to the preprocessed, protobuf file that contains alignment data.
      • "targets-file-path": Absolute path to the preprocessed, protobuf file that contains targets data.
    • Optional properties:
      • "should-use-bias": Whether eXpress-D should use its bias model and update relevant parameters. Defaults to true.
      • "should-cache": Whether alignments and target datasets should be cached in memory. Defaults to true.

An example of how the above flags can be set in a minimal config.py:

SPARK_HOME="/path/to/spark"
EXPRESS_D_HOME="/path/to/express-d"
SPARK_CLUSTER_URL="local"
...
EXPRESS_RUNTIME_LOCAL_OPTS = [
    OptionSet("hits-file-path", ["/path/to/hits.1M.pb"]),
    OptionSet("targets-file-path", ["/path/to/targets.pb"]),
    OptionSet("should-use-bias", ["true"]),
    OptionSet("should-cache", ["true"]),
    ...
]

Once flags and properties have been set, eXpress-D is ready to be compiled using the bin/build script.

$ cd express-d; bin/build-and-run

This creates a (fat) express-d-assembly.jar that contains all dependencies needed to run eXpress-D (i.e. all eXpress-D, Spark, and Apache commons jars are packaged into that jar).

To run eXpress-D, simply do:

$ bin/run

The output file has the naming format hits.<num iterations to convergence>.<time in milliseconds>.<withBias or withoutBias> and will be created in the EXPRESS_D_HOME directory.

⚠️ **GitHub.com Fallback** ⚠️