Setting up code dependencies - matthewcornell/wikitest GitHub Wiki

This page documents the steps for setting up dependencies for the import scripts.

The main import scripts are written in R and PL/pgSQL. They are run using a makefile so you would need to install GNU Make.

Setting up R

Required R packages for import process are partially installed (locally) using packrat when you run an R session in the project's root directory. The dependencies' sources are bundled in directory ./packrat/src and are not gitignored because some of them are not present on CRAN.

Additionally, report creation hook requires additional packages to be installed. These are also automatically setup in a post-import hook. Some non R dependencies are needed for generating the realtime dengue report. Detailed description can be found in file $CODE_DIR/hooks/post-import/04-make-realtime-dengue-report.hook. Copied here for convenience:

# The report generation process needs a TeX distribution present in the system.
# Finally credentials need to be present for database access at `~/.creds.rds`
# Credential file has the following object template
# list(port=0000, host="localhost", user="user", password="password", dbname="db")
#
# A relatively recent version of subversion is needed for the commit script to work
# (checked with v1.9.3-1)

Setting up Python

Python is used in the slack bot, import reporting tool and database tests. Python version >=3.6 is needed. We recommend using pyenv for setting up Python versions. Anaconda distributions don't work well with virtualenv which is used by pipenv.

For installing the dependencies, first install pipenv and then run pipenv install in the project's root.

Pandoc

Pandoc is used for creating pdf document for database import report. It should be available in your distribution's package repository. See here for more.

Tips for debugging

Inevitably there will be problems. These tips may help solve them, which were gleaned when setting up a new sudo user from scratch. These details are not perfect, but hopefully they cover the majority of steps and issues.

Test for R access to the database using this

    echo "SELECT * FROM unique_case_data LIMIT 10;" > $CODE_DIR/sql_code/test.sql
    cd $CODE_DIR
    Rscript run_sql_file.R test.sql
    rm $CODE_DIR/sql_code/test.sql

Look at $CODE_DIR/logs for errors.

Try running failed steps manually. The logs should indicate where the problem was

E.g.,

export SPAMD_PATH=/mnt/dengue/spamd
bash $CODE_DIR/hooks/post-import/03-setup-spamd-dependencies.hook

Example output:

Running 8 post-import hooks
✓ 01-diffport-snapshot.hook
✓ 02-diffport-report.hook
✖ 03-setup-spamd-dependencies.hook
✖ 04-make-realtime-dengue-report.hook
✓ 05-slack-notify-import-end.hook
✓ 06-slack-send-diffport-report.hook
✖ 07-slack-send-dengue-report.hook
✓ 08-save-code-snapshot.hook
3 hooks failed

Double-check permissions

We had to set various ones including the following, using something like:

cd $DATA_DIR/.git
sudo chmod -R g+w .git/

  • $DATA_DIR/.git
  • $CODE_DIR/logs
  • SPaMD svn. You can test this via an innocuous commit:
cd $SPAMD_PATH
svn up
svn status
svn commit trunk/source/realtime-dengue-reports/run-forecast-make-report.sh -m "NOP commit from matthew cornell at Umass to test svn push"

Installing R libraries

There are R packages which are setup in $CODE_DIR, but others needed to be manually installed from R. Note that it is important to be in the right directory when installing, either ~/ or $CODE_DIR (?). For example:

install.packages(c("futile.logger", "abind", "lubridate", "dplyr", "reshape2", "tidyr", "ggplot2", "foreach", "sp", "maptools", "rgeos", "RPostgreSQL", "xtable"))

Create ~/.creds.rds

For example:

r <- list()
r["port"] <- 6392; r["dbname"] <- "dengue_cases"; r["user"] <- "cornell"; r["host"] <- "localhost"; r["password"] <- "<your postgres password>"; saveRDS(r, "~/.creds.rds")

Set up Python venv, pyenv, etc.

To install pyenv:

$ curl -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash

To install Python itself:

pyenv install 3.6.2
pyenv global 3.6.2

To install pipenv:

pip install pipenv
pipenv install

To get a shell in that env:

To send a message to Slack:

pipenv run python misc/slack-bot.py msg test

Other useful commands:

pyenv versions
pyenv install -l
pipenv shell
pipenv run diffport

Create ~/pgsql-main-db-creds.json

It might look like this:

{"port": 6392,
 "host": "localhost",
 "user": "cornell",
 "password": "<your postgres password>",
 "dbname": "dengue_cases"
}