Setting up code dependencies - matthewcornell/wikitest GitHub Wiki
This page documents the steps for setting up dependencies for the import scripts.
The main import scripts are written in R and PL/pgSQL. They are run using a makefile so you would need to install GNU Make.
Setting up R
Required R packages for import process are partially installed (locally) using
packrat when you run an R session in the project's root directory. The
dependencies' sources are bundled in directory ./packrat/src
and are not gitignored because some of them are not
present on CRAN.
Additionally, report creation hook requires additional packages to be installed. These are also automatically setup in a
post-import hook. Some non R dependencies are needed for generating the realtime dengue report. Detailed description can
be found in file $CODE_DIR/hooks/post-import/04-make-realtime-dengue-report.hook
. Copied here for convenience:
# The report generation process needs a TeX distribution present in the system.
# Finally credentials need to be present for database access at `~/.creds.rds`
# Credential file has the following object template
# list(port=0000, host="localhost", user="user", password="password", dbname="db")
#
# A relatively recent version of subversion is needed for the commit script to work
# (checked with v1.9.3-1)
Setting up Python
Python is used in the slack bot, import reporting tool and database tests. Python version >=3.6
is needed. We
recommend using pyenv for setting up Python versions.
Anaconda distributions don't work well with virtualenv which is used by pipenv.
For installing the dependencies, first install pipenv and then run pipenv install
in the project's root.
Pandoc
Pandoc is used for creating pdf document for database import report. It should be available in your distribution's package repository. See here for more.
Tips for debugging
Inevitably there will be problems. These tips may help solve them, which were gleaned when setting up a new sudo user from scratch. These details are not perfect, but hopefully they cover the majority of steps and issues.
Test for R access to the database using this
echo "SELECT * FROM unique_case_data LIMIT 10;" > $CODE_DIR/sql_code/test.sql
cd $CODE_DIR
Rscript run_sql_file.R test.sql
rm $CODE_DIR/sql_code/test.sql
$CODE_DIR/logs
for errors.
Look at Try running failed steps manually. The logs should indicate where the problem was
E.g.,
export SPAMD_PATH=/mnt/dengue/spamd
bash $CODE_DIR/hooks/post-import/03-setup-spamd-dependencies.hook
Example output:
Running 8 post-import hooks
✓ 01-diffport-snapshot.hook
✓ 02-diffport-report.hook
✖ 03-setup-spamd-dependencies.hook
✖ 04-make-realtime-dengue-report.hook
✓ 05-slack-notify-import-end.hook
✓ 06-slack-send-diffport-report.hook
✖ 07-slack-send-dengue-report.hook
✓ 08-save-code-snapshot.hook
3 hooks failed
Double-check permissions
We had to set various ones including the following, using something like:
cd $DATA_DIR/.git
sudo chmod -R g+w .git/
- $DATA_DIR/.git
- $CODE_DIR/logs
- SPaMD svn. You can test this via an innocuous commit:
cd $SPAMD_PATH
svn up
svn status
svn commit trunk/source/realtime-dengue-reports/run-forecast-make-report.sh -m "NOP commit from matthew cornell at Umass to test svn push"
Installing R libraries
There are R packages which are setup in $CODE_DIR
, but others needed to be manually installed from R. Note that it is
important to be in the right directory when installing, either ~/
or $CODE_DIR
(?). For example:
install.packages(c("futile.logger", "abind", "lubridate", "dplyr", "reshape2", "tidyr", "ggplot2", "foreach", "sp", "maptools", "rgeos", "RPostgreSQL", "xtable"))
Create ~/.creds.rds
For example:
r <- list()
r["port"] <- 6392; r["dbname"] <- "dengue_cases"; r["user"] <- "cornell"; r["host"] <- "localhost"; r["password"] <- "<your postgres password>"; saveRDS(r, "~/.creds.rds")
Set up Python venv, pyenv, etc.
To install pyenv:
$ curl -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash
To install Python itself:
pyenv install 3.6.2
pyenv global 3.6.2
To install pipenv:
pip install pipenv
pipenv install
To get a shell in that env:
To send a message to Slack:
pipenv run python misc/slack-bot.py msg test
Other useful commands:
pyenv versions
pyenv install -l
pipenv shell
pipenv run diffport
Create ~/pgsql-main-db-creds.json
It might look like this:
{"port": 6392,
"host": "localhost",
"user": "cornell",
"password": "<your postgres password>",
"dbname": "dengue_cases"
}