Building R Packages on Hyak - statnet/computing GitHub Wiki

This page describes how to install R packages on Hyak, which requires some specific steps to login to build nodes. The second section discusses how to use Github repositories on Hyak and build R package from local clones of our packages.

Types of Hyak Nodes

There are four types of nodes on Hyak, and they should be used accordingly:

  1. Login nodes are the nodes that one arrives at when ssh'ing into Hyak. Because they have internet access, these nodes should be used for transferring data between Hyak and CSDE or your local computer. They are also the nodes from which we submit simulation jobs with qsub, monitor those jobs, and do other file management. These nodes should not be used for computationally intensive work, like simulations, or even small-scale tasks like building R packages. If you ever need to load the R module (information below), don't do it here.
  2. Build nodes are nodes specifically meant to be occupied for building software like R packages. They may be used for small-scale jobs like package installation, and are connected to the internet, so can easily be used to install R packages from within R using the install.packages() function. They should only be used in an "interactive mode", where package installation is done manually (i.e., rather than submitting batch jobs). To connect to a build node, use the build alias that was set up in Step 1. This is shorthand for starting a new job with qsub but flagging it with the -q build tag, which identifies the job specifically as a build job.
  3. Compute nodes are nodes to be used for large-scale simulation jobs. Of the roughly 900 compute nodes on Hyak, CSDE "owns" 13 of them and the rest are owned by other research groups. The next tutorial will detail how to submit simulation jobs to either CSDE nodes, other compute nodes, or both. Briefly, you will always connect to compute nodes by submitting a job with qsub. You never connect directly to these nodes with ssh.
  4. Interactive node This is one node that CSDE "owns" specifically to run interactive computing jobs. Examples would be to run R interactively to test out a simulation on a small-scale, or to perform some data analysis. You connect to this node using the shell alias described in step 1, which is shorthand for qsub -q int -I, which tells Hyak you want a node in the interactive queue and that the job should specifically be interactive. This node is not connected to the internet, so should not be used to build packages or transfer data.

R Packages on CRAN

To get started building your personal R library on Hyak, log on to a build node.

build

Software like R is loaded in a "module". You can see all the available modules on Hyak with:

module available

To load R, load the R module (there may a more current R version; it's your preference but more up-to-date is better).

module load r_3.2.4

A newer version might be Microsoft's Open R:

module load msropen_3.3.2

Type R in the terminal to start R. Start installing packages interactively on CRAN as usual. Start with EpiModel and all of it's dependencies. If everything worked then you should see no error messages during the installation. You will likely get a message asking if you want to use a personal library, and the answer is yes.

install.packages("EpiModel", dependencies = TRUE)

You may get a message asking you whether you would like to create your own personal directory, if so, respond with yes. If you are using a more current version of R (such as Microsoft Open R 3.3.2), you may need to add a specific repository call to CRAN in order to install packages.

install.packages("EpiModel", dependencies = TRUE, repos = "cran.rstudio.com")

Install any other packages else you use on a regular basis at this point. Please feel free to create a list here for other users:

install.packages("bindata")

R Packages on Github

Many of the packages that we use are not hosted on CRAN, only on Github. This includes many of the applied epidemic modeling software package. We have both public and private repositories on Github. The install process is the same for each, but if you try to install one of the packages in a private repository that you do not have access to, you'll get an error message. At that point, contact Sam or Steve and we'll give you access as necessary.

We used to install these packages directly from github using the devtools package directly in R, but there have been a number of challenges to getting that package installed on Hyak, so we created an alternative.

First, open your .bashrc or .bash_profile file with vi and enter interactive mode (you did the same thing in Step 1 when editing this file to add your aliases).

vi .bash_profile
i

Next, copy and paste the following Unix function into your file. If you are using a different version of R, replace module load r_3.2.4 with module load msropen_3.3.2 or whichever version of R you are using.

installgit() {
  module load r_3.2.4;
  wget -q  https://github.com/$1/$2/archive/master.zip;
  unzip -q master.zip;
  R CMD INSTALL "${2}-master/$3"
  rm -r "${2}-master"
  rm -r master.zip
}

Finally, save and close this file by typing the Esc button then :wq.

This Unix function will download any R package contained in a Github repository and install it for you. Here's what you will type into the terminal:

installgit <owner> <repository> <subdirectory>

So for example, to download the Github version of EpiModel:

installgit statnet EpiModel

We don't use the subdirectory because the R package is located in the root directory of the repository. As an example where we do, install tergmLite with:

installgit statnet tergmLite tergmLite

This R package used to be located in a subdirectory called tergmLite within the main repository, but is now stored and can be accessed with:

installgit statnet tergmLite
⚠️ **GitHub.com Fallback** ⚠️