R parallel package overview - tobigithub/R-parallel GitHub Wiki

The world of parallel R packages is wonderfully cluttered and is based on OS divergence (Linux, Mac, Win) plus the history of clusters, grids and now clouds. The reason that many parallel code snippets do not work out of the box (see R parallel issues on stackexchange.com and stackoverflow.com) and endless discussions about simple parallel code is what the end-user gets.

6959548877_9a5b8e1b0f_m

Photo: Tony Hisgett@Flickr

I understand that parallel packages that support large clusters have other requirements such as complex schedulers than those for simple multi CPU workstations and therefore I can offer no remedy, just some simple guidance in tables and text. There are also some OpenCL and CUDA packages, few have Windows support, the list below is a compilation of the most commonly used R packages.

Num R parallel package WIN LINUX MAC Information GitHub LOC
1 doParallel Y Y Y merger of doSNOW and doMC 7,861
2 parallel Y Y Y In R core (not CRAN), merger of snow and multicore 12,473
3 snowfall Y Y Y wrapper for snow 935
4 snow Y Y Y Network of Workstations 1,680
5 doSNOW Y Y Y Foreach parallel for snow 1,360
6 doMC N Y Y better use doParallel 5,346
7 Rmpi Y Y Y Message Passing Interface for R 783
8 doMPI N Y Y Foreach parallel for Rmpi 281
9 doRNG Y Y Y parallel foreach loops 267
11 Rth - Y Y allows CUDA and OpenMP 18
12 future Y Y Y parallel expressions 179
13 rvpm - - - removed from CRAN (do not use)
14 multicore - - - removed from CRAN (do not use)

Which library to use for Windows? I think the R core library(parallel) and the library(doParallel), library(doSNOW) were the least painful to install and almost worked out of the box.

Which libraries not to use for Windows? I think all the code that requires the unsupported libraries (rvpm or multicore) should not be used. Examples maybe ported. Also doMC is not supported under Windows and library(Rmpi) requires the installation of OpenMPI or MPICH. Both OpenMPI or MPICH work fine under Windows, but once R gets involved the library settings, DLL calls and other setups will not work out of the box.

Which parallel libraries to use for OSX and LINUX? I think library(parallel) and those with the most examples or lines of code (LOC), see table above, because that could provide some good examples.


Links: