R parallel package overview - tobigithub/R-parallel GitHub Wiki
The world of parallel R packages is wonderfully cluttered and is based on OS divergence (Linux, Mac, Win) plus the history of clusters, grids and now clouds. The reason that many parallel code snippets do not work out of the box (see R parallel issues on stackexchange.com and stackoverflow.com) and endless discussions about simple parallel code is what the end-user gets.
Photo: Tony Hisgett@Flickr
I understand that parallel packages that support large clusters have other requirements such as complex schedulers than those for simple multi CPU workstations and therefore I can offer no remedy, just some simple guidance in tables and text. There are also some OpenCL and CUDA packages, few have Windows support, the list below is a compilation of the most commonly used R packages.
Num | R parallel package | WIN | LINUX | MAC | Information | GitHub LOC |
---|---|---|---|---|---|---|
1 | doParallel | Y | Y | Y | merger of doSNOW and doMC | 7,861 |
2 | parallel | Y | Y | Y | In R core (not CRAN), merger of snow and multicore | 12,473 |
3 | snowfall | Y | Y | Y | wrapper for snow | 935 |
4 | snow | Y | Y | Y | Network of Workstations | 1,680 |
5 | doSNOW | Y | Y | Y | Foreach parallel for snow | 1,360 |
6 | doMC | N | Y | Y | better use doParallel | 5,346 |
7 | Rmpi | Y | Y | Y | Message Passing Interface for R | 783 |
8 | doMPI | N | Y | Y | Foreach parallel for Rmpi | 281 |
9 | doRNG | Y | Y | Y | parallel foreach loops | 267 |
11 | Rth | - | Y | Y | allows CUDA and OpenMP | 18 |
12 | future | Y | Y | Y | parallel expressions | 179 |
13 | rvpm | - | - | - | removed from CRAN (do not use) | |
14 | multicore | - | - | - | removed from CRAN (do not use) |
Which library to use for Windows? I think the R core library(parallel) and the library(doParallel), library(doSNOW) were the least painful to install and almost worked out of the box.
Which libraries not to use for Windows? I think all the code that requires the unsupported libraries (rvpm or multicore) should not be used. Examples maybe ported. Also doMC is not supported under Windows and library(Rmpi) requires the installation of OpenMPI or MPICH. Both OpenMPI or MPICH work fine under Windows, but once R gets involved the library settings, DLL calls and other setups will not work out of the box.
Which parallel libraries to use for OSX and LINUX? I think library(parallel) and those with the most examples or lines of code (LOC), see table above, because that could provide some good examples.
Links:
-
A Survey of R Software for Parallel Computing - Esam Mahdi
-
CRAN Parallel package overview - High Performance Computing with R
-
Difference or parallel packages explains some heritage of R parallel packages