Using Reproducible R Environments with the `renv` Package - EpiModel/EpiModeling GitHub Wiki
As we work with different research projects, we often need different versions of our R packages.
Using the renv
package from RStudio we can
easily record the current state of our projects as well as record the specific
versions of packages required to make it work.
Let's start with a new project on our personal computer:
- Create the project with RStudio
- run
renv::init()
in the R console (from the project) - reload the project
- Work as usual
Once renv is initialize, we find ourselves in a project were no packages
installed. We can install what we need using install.packages()
and
remotes::install_github()
as usual. The packages are installed in the
project's library, allowing multiple projects to use a different version of the
same package (e.g. EpiModelHIV).
The renv::snapshot()
command record the state of the packages in the project
into a "renv.lock" file. This file contains the information on every packages
and their versions. This will permit to rebuild the environment of the project
easily on another computer.
Once the project works with renv
, we can track the "renv.lock" file with git.
However, this is the only file from renv
we should put on git/GitHub. The
"renv" folder and the ".Rprofile" created automatically should remain untracked.
That is because they contain information potentially specific to the user
configuration.
If you are used to add everything with git, you can append the following lines to your ".gitignore" file. To force git to ignore these files.
renv/*
.Rprofile
NOTE
When pulling an updated version of the "renv.lock" file (for example a when
contributor updated one of the packages), you should manually run renv::status()
to check whether or not renv::restore()
is necessary. R
will not tell you on
it's own if the lockfile is out of sync.
Let's consider now that our project has grown and needs to run simulations on the HPC.
I assume your project is now living in a private GitHub repository, with the "renv.lock" file up to date and in the repository.
Before we can continue we need to add the following lines to our "~/.bashrc" file:
export RENV_PATHS_ROOT="/projects/epimodel/renv/"
export GITHUB_PAT="\<your github private access token\>"
This tells renv
to store the files for the packages on the
"/projects/epimodel" directory and not on the home folder. The latter can cause
problems as the size of the home folder is limited on HPCs.
In this case renv will create it automatically on first use. However, for the other users to use it, the following command should be run by the directory's owner
chmod -R g+rwx /projects/epimodel/renv/
Currently on RSPH, Adrien Le Guillou is the owner of the folder and the permissions are set correctly for all members of the "epimodel" group.
The second line should contain your Github Private Access Token. This will
allow renv
to download packages from private GitHub repositories. This can be
ignored if you stored it in your ".Renviron".
We may want to start a build session if we expect the package installation to require a lot of CPU and RAM.
- First we move to our "project" folder:
cd /project/<user>/
. git clone https://<your github private access token>@github.com/<your/project.git>
- Enter the project:
cd <project>
- Copy your "loadR.sh" script into your project:
cp ~/loadR.sh ./
(see the corresponding wiki section) - Load R:
source loadR.sh
- Start R:
R
- In R:
renv::init()
. This will read the "renv.lock" file and install the correct version for each package.
Once renv
is setup for your project you can run your sbatch scripts. In order
for sbatch to use renv
, you MUST run your scripts from the project root folder.
If your project structure is as follow:
project_root /
- R/
- script1.R
- script2.R
- script3.R
- abc/
- out/
- master.sh # script containing the sbatch calls
- data/
- renv/ # folder automatically created by renv
- .Rprofile # file automatically created by renv
- renv.lock
- loadR.sh # script to load R module from spack
You must then run source abc/master.sh
from the "project_root"
(the shell prompt should look like this: [username@clogin01 project_root]$
)
renv
offers a lot of advanced functionalities that are explained in more
details on the official website