R parallel Setups - tobigithub/R-parallel GitHub Wiki

Simplified setups for the most common parallel packages are collected here. Many of the examples will work on OSX, Windows and LINUX. For the sake of laziness I focus mostly on Windows. R has a decent history of messing up working code after package updates, and although many problems can be solved by experts the setup and use of old working scripts can be quite frustrating. All setups were performed on a fresh install of R 3.2.2 or PRO 3.2.2 under Windows. Here we go.

library (doParallel) installation

This gist describes the installation of the doParallel package, if errors occur during installation check the R-parallel error page. Common installation errors include that R needs to be run as administrator or better installed into a different user directory.

### Installation of package with all dependent packages
chooseCRANmirror()
install.packages("doParallel", dependencies = c("Depends", "Imports")) 
# also installing the dependencies ‘foreach’, ‘iterators’
#package ‘foreach’ successfully unpacked and MD5 sums checked
#package ‘iterators’ successfully unpacked and MD5 sums checked
#package ‘doParallel’ successfully unpacked and MD5 sums checked
### End of installation, this needs to be run only once

# load the doParallel for direct use
library(doParallel)
# make a cluster with all possible threads (not cores)
cl <- makeCluster(detectCores())
# register the number of parallel workers (here all CPUs)
registerDoParallel(cl)
# return number of parallel workers
getDoParWorkers() 
# insert parallel calculations here
# stop the cluster and remove  Rscript.exe childs (WIN)
stopCluster(cl)

Once this script is invoked for the first time, the Windows firewall and all other personal firewalls will ask for permission. Windows Firewall has blocked some features of this program; Private Networks, ALLOW, Allow access

Allow Rscript.exe to access IP:127.0.0.1 Port 11414

firewall-doparallel

It is important to make sure the stopCluster(cl) is always used at the end, because otherwise [Rscript zombies] (https://github.com/tobigithub/R-parallel/wiki/R-parallel-Errors#r-engine-killed-or-unresponsive) will linger around and harass the system by eating memory and CPU. Opening the taskmanager (Ctrl-Alt-Del) and show processes will allow to see a number of conhost.exe and rscript.exe child processes.


library (doSNOW) installation

The doSNOW package is a "foreach" parallel adaptor for the snow package. The installation of doSNOW requires a number of dependent packages including foreach, iterators and snow. The options for registering the doSNOW cluster include "SOCK", "PVM", "MPI", and "NWS", for multi-core computers and simple Windows clusters the "SOCK" type is recommended. The number of connections for snow and doSNOW is currently limited to 128 nodes (see R source code connections.c).

Once can simply increase the number of local cluster nodes to 128, each rscript.exe requires around 44 MByte RAM so around 6-8 GByte RAM are required to create a snow 128 node cluster. Even if the computer has a lower CPU core count the cluster can be started. Using a local workstation cluster with nNodes > nThreads will not increase performance.

It is important that after the cluster is stopped to re-register the serial backend using the function registerDoSEQ(), otherwise the following error occours for subsequent code tests: "Error in summary.connection(connection) : invalid connection". This is currently not documented in doSNOW.


### Installation of the doSNOW parallel library with all dependencies
chooseCRANmirror()
install.packages("doSNOW", dependencies = c("Depends", "Imports")) 

##Loading required package: foreach
##foreach: simple, scalable parallel programming from Revolution Analytics
##Use Revolution R for scalability, fault tolerance and more.
##http://www.revolutionanalytics.com
##Loading required package: iterators
##Loading required package: snow

# Cluster
# load doSnow library
library(doSNOW)

# Create compute cluster of 4 (try 64)
# One can increase up to 128 nodes
# Each node requires 44 Mbyte RAM under WINDOWS.
cluster = makeCluster(4, type = "SOCK")

# register the cluster
registerDoSNOW(cluster)

# insert parallel computation here

# stop cluster and remove clients
stopCluster(cluster)

# insert serial backend, otherwise error in repetetive tasks
registerDoSEQ()

Source code:


library(parallel) installation

The library(parallel) is a native R package and can be conveniently used. No CRAN install or update required. There are differences between Windows (socket clusters) and Linux/OSX (forked clusters) use. The library can also be used to detect the number of cores and threads.

# Library parallel() is a native R library, no CRAN required
library(parallel)

# detect true cores requires parallel()
nCores <- detectCores(logical = FALSE)
# detect threads
nThreads <- detectCores(logical = TRUE)
# detect threads
cat("CPU with",nCores,"cores and",nThreads,"threads detected.\n")

# automatically creates socketCluster under WIN, fork not allowed
# maximum number of cluster nodes is 128
cl <- makeCluster(nThreads); cl;
# insert parallel calculations here
# stop the cluster and remove parallel instances
stopCluster(cl)

# END

Other packages: doMC, doSMP, doSNOW, foreach

Links: