Arivale - Gibbons-Lab/wiki GitHub Wiki

Arivale ISB Analytics images on Dalek

Dalek come configured with an analytics docker setup that provides the following two images:

analytics2 The Jupyter environment currently containing Python 3 and R environments.

analytics2-rstudio An alternative Rstudio environment which is leaner and probably more familiar for the R users.

Contacts

  • Hood Lab: Noa Rappaport
  • Gibbons Lab: Christian Diener

How do I use it?

The Arivale research server

In order to work with data from Arivale you will need the following:

  1. Get a user account for the research server from Noa Rappaport
  2. Start a research docker container on the server for you

Step 2 can be performed by almost any member of the Arivale research group so let us know if you would like help :)

Connecting to the server

You need to connect via SSH. SSH is always available on Mac and Linux but you will need to install a client on Windows (for instance Putty).

To connect to the Arivale server you have to be in the internal ISB network or connected via VPN.

Getting around the server

All analysis of Arivale data has to happen on the server. Data should never leave the server. How that works is that you will start a docker research container on the server which will give you access to Jupyter via your browser. You can then run all analyses directly on the server. Get in contact if you need more than the resources currently provided.

On the server there two locations that are potentially important:

~/notebooks

This is your personal folder of research notebooks. It includes some tutorial notebooks and this is where your notebooks get saved to in the research container (default location).

/home/cache/libs/docker-research/scripts/notebooks

You can change into this directory via the command line using

Starting up an Analytics environment

By default only a single environment is supported for any single user. See below how to request more resources.

First log in to the Arivale research server as described above.

You can start your research environment using the analytics2 script, which is used as summarized as following:

> analytics2 -h
usage: analytics2 [-h] [--ports PORT_FILE] [--version]
                  {start,stop,restart,logs,config} ...

Manage your analytics image.

optional arguments:
  -h, --help            show this help message and exit
  --ports PORT_FILE, -p PORT_FILE
                        a file mapping users to ports
  --version, -v         show program's version number and exit

subcommands:
  See `analytics2 CMD -h` for command specific help.

  {start,stop,restart,logs,config}
    start               start a new analytics container
    stop                stop a running analytics container
    restart             restart a clean analytics container
    logs                show logs for a running analytics container
    config              show your configuration

Thus in order to run a new environment use:

> analytics2 start

In the same vein use restart or stop to restart or stop your environment. This will persist any files in the standard locations (/notebooks in the Jupyter environment and /home/rstudio, which is default in Rstudio). This will reset the installed software to the factory state. Note that any files in non-standard locations will be lost this way. So please make sure to backup/download all relevant files before restarting or stopping an environment.

Using the Jupyter environment (analytics2)

After starting your environment you can access it from within the ISB internal network with any browser using https://dalek.systemsbiology.net:PORT , where PORT is the port assigned to you (ask us or use analytics2 config if you do not know your port). You may have to create a security exception for the self-signed SSL certificate. The password is research1.

You will be presented with a launcher that lets you run notebooks in any of the environments or start up a terminal for administrative tasks.

For notebooks only the environments starting with arivale- are correctly configured and set up so please use those.

Installing software

If you use a package/software regularly consider adding it to the environment permanently as described below in section "Requesting software to include".

For a non-permanent installation do the following:

  1. Open a terminal in the launcher
  2. Activate the environment you want to install to. for instance for Python 3 use arivale-py3:
source activate arivale-py3

The prompt of you terminal will change to reflect the active environment.

  1. Install packages with conda, for instance
conda install tensorflow

Using the Rstudio environment

Start your environment using the --rstudio flag. For instance by using

analytics2 --rstudio start

After starting your environment you can access it from within the ISB internal network with any browser using https://dalek.systemsbiology.net:PORT , where PORT is the port assigned to you (ask us or use analytics2 config if you do not know your port). You may have to create a security exception for the self-signed SSL certificate. The password is USERandarivale where USER is your username on the server and the username is `rstudio.

Any files in the default directory opened by Rstudio (/home/rstudio/) will be persisted after deleting the container.

Installing software

Use Tools > Install Packages....

Observing resource usages

All containers are managed by docker, which you can use to inspect them. For instance to see which containers are running use:

docker ps

on the server.

To see a list of resource usages on the server:

docker stats

You can exit the resource view with Ctrl-C.

How do I adapt it to my needs?

Requesting software to include

If you use a particular package/software regularly it may be a good idea to include it by default as this aids reproducibility. To do so, first request read access to the Github repository. You can then request new software by a Pull Request to either of the following files:

  • python3.yml: Packages for Python 3
  • r.yml: Packages for R in Jupyter
  • rstudio/packages.txt: packages for Rstudio

If you want to request R packages please add them to both R package lists.

Custom startup scripts

The analytics2 script also serves as a Python module that allows you fine-grained control over the startup progress. for instance to access the configuration:

from analytics2 import build_config

config = build_config()
print(config["port"])

Or to start a container with a changed config and a different image:

from analytics2 import build_config, start_container

config = build_config()
config["port"] = "8123"

start_container(config, "my-analytics2")

For developers

Code can be found at https://github.com/gibbons-lab/arivale_docker.

The script and package are distributed along with the docker images. If you change the script make sure to run the tests before pushing:

python -m pytest

in the root directory. This will check if the script still runs. You can update releases by running

bumpversion {major,minor,patch}

on a clean git branch.

To build the images just run make which will download updated base images build both images and tag them automatically with a date-based versioning (YY.MM).