RStan Getting Started - PrinceWangR/rstan GitHub Wiki
<wiki:toc max_depth="3" />
Almost all install instructions below are for the aforementioned version of RStan.
RStan is the R (http://www.r-project.org/) interface for Stan. This how-to includes:
For more information on Stan and its modeling language, see:
R version 3.0.2 or later is required (if you are on a Mac, you may need to reinstall various things if you subsequently upgraded Xcode), although RStan is known to not work in some respects for versions of R less than 3.2.0 (plots based on ggplot2 will fail, the rstan.package.skeleton
function will not download the needed files, etc.). The minimal requirement of R version 3.0.2 or later is intended to allow RStan to mostly work on remote servers that may not be able to upgrade to a more recent version of R. RStan is barely tested on anything but the latest stable version of R and the develop version of R. If you have administrative rights to the computer you are using --- including especially anyone who may be participating in a conference workshop involving RStan --- you have no excuse to not use the latest stable version of R, which is available from
Follow the download link, then choose a mirror (we recommend http://cran.rstudio.com/ because it redirects to the closest reliable mirror), then click on the link for your platform (Windows, Linux, or Mac). For Windows, there is an additional step of choosing the "base" package before the download.
The Linux and Mac versions of the R command-line and GUI should install and work with the default configurations.
Although it is not required, for most users we strongly recommend installing RStudio version 0.99.1259 or later from
http://www.rstudio.com/products/rstudio/download/preview
which has basic support for .stan file types and syntax highlighting for Stan 2.10.0 and higher.
- For Mac, see RStan Mac OS X Prerequisite Installation Instructions .
- For Windows, see How to install Rtools on Windows.
- For Linux, use your package manager to install build-essential and a recent version of either g++ or clang++.
This subsection is optional in the sense that RStan should work without it. Nevertheless, the following is strongly recommended. If you do not already have one, create a personal Makevars file as described at https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Customizing-package-compilation The following should work to specify this file programatically, after you open R (either the R GUI, in the terminal using command R, or by opening the recommended RStudio application).
dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, "Makevars")
if (!file.exists(M)) file.create(M)
cat("\nCXXFLAGS=-O3 -mtune=native -march=native -Wno-unused-variable -Wno-unused-function",
file = M, sep = "\n", append = TRUE)
Be advised that setting the optimization level to 3 may prevent some other R packages from installing from source if they are only tested with the stock R configuration.
If using g++ version 4.9 or higher (which is rare on a Mac), we recommend executing in R
cat("\nCXXFLAGS+=-flto -ffat-lto-objects -Wno-unused-local-typedefs",
file = M, sep = "\n", append = TRUE)
In addition, on OS X only you should (unless you do not have clang++ installed) execute in R
cat("\nCC=clang", "CXX=clang++ -arch x86_64 -ftemplate-depth-256",
file = M, sep = "\n", append = TRUE)
Starting with R version 3.3.x, it is possible to download Rtools for Windows that uses g++ 4.9.x, which supports the C++11 standard. Using the C++11 standard is not currently by supported by Stan for versions of g++ up to and including 4.6 but is believed to work for later versions of g++ and any recent version of clang++.
Regardless of whether you utilize the C++11 standard, if you use Rtools33 (or higher), then you need to execute the following once
cat('Sys.setenv(BINPREF = "C:/Rtools/mingw_$(WIN)/bin/")',
file = file.path(Sys.getenv("HOME"), ".Rprofile"),
sep = "\n", append = TRUE)
If you use g++ version 6 or higher, you may want to turn off some verbose warnings that are not relevant to Stan by executing
cat("\nCXXFLAGS += -Wno-ignored-attributes -Wno-deprecated-declarations",
file = M, sep = "\n", append = TRUE)
You can verify that your configuration is correct by executing
cat(readLines(M), sep = "\n")
and if not, opening the file whose path is
cat(M)
with a text editor.
-
Open R (either the R GUI, in the terminal using command R, or by opening the recommended RStudio application).
-
For source builds only (which is atypical on Windows and OS X), set the number of processes to use for the build to the number of cores on your machine you want to devote to the build. For example, to use 4 processes, execute the following in R.
Sys.setenv(MAKEFLAGS = "-j4")
- You can install the latest rstan package and the packages it depends on and suggests from CRAN exactly like this:
# omit the 's' in 'https' if you cannot handle https downloads
install.packages('rstan', repos = 'https://cloud.r-project.org/', dependencies=TRUE)
If all else fails, you can try to install rstan from source via
install.packages("rstan", type = "source")
-
Restart You may well need to restart R after the installation and verify that no objects created by an older version of RStan are (perhaps auto-)loaded into R before loading the rstan package as follows.
-
Verify that your toolchain works by executing in R
fx <- inline::cxxfunction( signature(x = "integer", y = "numeric" ) , '
return ScalarReal( INTEGER(x)[0] * REAL(y)[0] ) ;
' )
fx( 2L, 5 ) # should be 10
The package name is rstan, so we need to use library(rstan)
to load the package.
library(rstan) # observe startup messages
As the startup message says, if you are using rstan locally on a multicore machine and have plenty of RAM to estimate your model in parallel, at this point execute
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
These options respectively allow you to automatically save a bare version of a compiled Stan program to the hard disk so that it does not need to be recompiled and to execute multiple Markov chains in parallel.
This is an example in Section 5.5 of Gelman et al (2003), which studied coaching effects from eight schools. For simplicity, we call this example "eight schools."
First, we specify this model in a file called 8schools.stan
as follows
(it can be found here):
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
real eta[J];
}
transformed parameters {
real theta[J];
for (j in 1:J)
theta[j] = mu + tau * eta[j];
}
model {
target += normal_lpdf(eta | 0, 1);
target += normal_lpdf(y | theta, sigma);
}
In this model, we let theta
be transformed parameters of mu
and eta
instead of directly declaring theta
as parameters. By parameterizing this
way, the sampler will run more efficiently. Assuming we have
8schools.stan
file in our working directory, we can prepare the data
and fit the model as the following R code shows.
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
fit <- stan(file = '8schools.stan', data = schools_dat,
iter = 1000, chains = 4)
We can also specify a Stan model using a character string by
using argument model_code
of function stan
instead. However,
this is not recommended.
The object fit
, returned from function stan
is an S4 object of class
stanfit
. Methods such as print
, plot
, and pairs
are associated with the
fitted result so we can use the following code to check out the results in fit
.
print
provides a summary for the parameter of the model as well
as the log-posterior with name lp__
(see the following example output).
For more methods and details of class stanfit
, see the help of class stanfit
.
In particular, we can use extract
function on stanfit
objects to
obtain the samples. extract
extracts samples from the stanfit
object as a list of arrays for parameters of interest, or just an array.
In addition, S3 functions as.array
and as.matrix
are defined
for stanfit
object (using help("as.array.stanfit")
to check
out the help document in R).
print(fit)
plot(fit)
pairs(fit, pars = c("mu", "tau", "lp__"))
la <- extract(fit, permuted = TRUE) # return a list of arrays
mu <- la$mu
### return an array of three dimensions: iterations, chains, parameters
a <- extract(fit, permuted = FALSE)
### use S3 functions as.array (or as.matrix) on stanfit objects
a2 <- as.array(fit)
m <- as.matrix(fit)
> print(fit, digits = 1)
Inference for Stan model: schools_code.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 7.9 0.2 4.9 -2.1 4.5 7.9 11.0 17.8 422 1
tau 6.3 0.3 5.0 0.2 2.5 5.2 8.9 18.7 214 1
eta[1] 0.4 0.0 0.9 -1.5 -0.2 0.4 1.0 2.1 928 1
eta[2] 0.0 0.0 0.9 -1.8 -0.6 0.0 0.5 1.8 1640 1
eta[3] -0.2 0.0 1.0 -2.1 -0.8 -0.2 0.4 1.8 1243 1
eta[4] 0.0 0.0 0.9 -1.7 -0.6 0.0 0.6 1.7 1421 1
eta[5] -0.3 0.0 0.9 -2.0 -1.0 -0.4 0.3 1.5 883 1
eta[6] -0.2 0.0 0.9 -2.0 -0.8 -0.2 0.4 1.6 926 1
eta[7] 0.4 0.0 0.9 -1.4 -0.2 0.4 0.9 2.1 969 1
eta[8] 0.1 0.0 1.0 -1.8 -0.6 0.1 0.7 2.0 1365 1
theta[1] 11.4 0.3 8.1 -1.4 5.9 10.3 15.2 30.6 574 1
theta[2] 7.7 0.2 6.1 -3.7 3.9 7.8 11.4 19.5 762 1
theta[3] 5.8 0.3 7.9 -12.1 1.8 6.5 10.5 19.9 715 1
theta[4] 8.0 0.2 6.5 -5.4 3.9 8.1 12.3 20.2 977 1
theta[5] 5.0 0.3 6.7 -10.3 1.3 5.7 9.5 16.5 667 1
theta[6] 6.0 0.2 6.6 -8.4 2.0 6.2 10.2 18.6 976 1
theta[7] 10.8 0.3 6.8 -1.1 6.2 10.2 14.9 26.0 596 1
theta[8] 8.6 0.3 7.9 -6.1 4.0 8.1 12.6 27.7 629 1
lp__ -5.0 0.1 2.6 -10.7 -6.6 -4.8 -3.1 -0.5 367 1
Samples were drawn using NUTS2 at Fri Apr 12 22:09:54 2013.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
In addition, as in BUGS (or JAGS), CmdStan (the command line interface to Stan) needs all
the data to be in an R dump file. In the case we have this file, rstan provides
function read_rdump
to read all the data into an R list. For example, if we
have a file named "8schools.rdump" that contains the following text in our
working directory.
J <- 8
y <- c(28, 8, -3, 7, -1, 1, 18, 12)
sigma_y <- c(15, 10, 16, 11, 9, 11, 10, 18)
Then we can read the data from "8schools.rdump" as follows.
schools_dat <- read_rdump('8schools.rdump')
The R dump file actually can be sourced using function source
in R into the global environment. In this case, we can omit the data
argument and stan
will search the calling environment for objects that have the same names as in the data block of 8schools.stan. That is,
source('8schools.rdump')
fit <- stan(file = '8schools.stan', iter = 1000, chains = 4)
The Rats example is also a popular example. For example, we can find the
OpenBUGS version from
here, which originally is from
Gelfand et al (1990).
The data are about the growth of 30 rats weekly for five weeks.
In the following table, we list the data, in which we use x
to denote the dates
the data were collected. We can try this example using the linked data
rats.txt
and model code rats.stan.
Rat | x=8 | x=15 | x=22 | x=29 | x=36 | Rat | x=8 | x=15 | x=22 | x=29 | x=36 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 151 | 199 | 246 | 283 | 320 | 16 | 160 | 207 | 248 | 288 | 324 | |
2 | 145 | 199 | 249 | 293 | 354 | 17 | 142 | 187 | 234 | 280 | 316 | |
3 | 147 | 214 | 263 | 312 | 328 | 18 | 156 | 203 | 243 | 283 | 317 | |
4 | 155 | 200 | 237 | 272 | 297 | 19 | 157 | 212 | 259 | 307 | 336 | |
5 | 135 | 188 | 230 | 280 | 323 | 20 | 152 | 203 | 246 | 286 | 321 | |
6 | 159 | 210 | 252 | 298 | 331 | 21 | 154 | 205 | 253 | 298 | 334 | |
7 | 141 | 189 | 231 | 275 | 305 | 22 | 139 | 190 | 225 | 267 | 302 | |
8 | 159 | 201 | 248 | 297 | 338 | 23 | 146 | 191 | 229 | 272 | 302 | |
9 | 177 | 236 | 285 | 350 | 376 | 24 | 157 | 211 | 250 | 285 | 323 | |
10 | 134 | 182 | 220 | 260 | 296 | 25 | 132 | 185 | 237 | 286 | 331 | |
11 | 160 | 208 | 261 | 313 | 352 | 26 | 160 | 207 | 257 | 303 | 345 | |
12 | 143 | 188 | 220 | 273 | 314 | 27 | 169 | 216 | 261 | 295 | 333 | |
13 | 154 | 200 | 244 | 289 | 325 | 28 | 157 | 205 | 248 | 289 | 316 | |
14 | 171 | 221 | 270 | 326 | 358 | 29 | 137 | 180 | 219 | 258 | 291 | |
15 | 163 | 216 | 242 | 281 | 312 | 30 | 153 | 200 | 244 | 286 | 324 |
y <- read.table('https://raw.github.com/wiki/stan-dev/rstan/rats.txt', header = TRUE)
x <- c(8, 15, 22, 29, 36)
xbar <- mean(x)
N <- nrow(y)
T <- ncol(y)
rats_fit <- stan(file = 'https://raw.githubusercontent.com/stan-dev/example-models/master/bugs_examples/vol1/rats/rats.stan')
You can run many of the BUGS examples and some others that we have created in Stan by executing
model <- stan_demo()
and choosing an example model from the list that pops up. The first time you call stan_demo()
, it will ask you if you want to download these examples. You should choose option 1 to put them in the directory where rstan was installed so that they can be used in the future without redownloading them. The model
object above is an instance of class stanfit
, so you can call print
, plot
, pairs
, extract
, etc. on it afterward.
More details about RStan can be found in the documentation including the vignette of package rstan.
For example, using help(stan)
and help("stanfit-class")
to check out the help for function stan
and S4 class stanfit
.
And see Stan's modeling language manual for details about Stan's samplers, optimizers, and the Stan modeling language.
In addition, the Stan User's Mailing list can be used to discuss the use of Stan, post examples or ask questions about (R)Stan. When help is needed, it is important to provide enough information such as the following:
- model code in Stan modeling language
- data
- necessary R code
- dump of error message using
verbose=TRUE
andcores=1
when calling thestan
function - version of the C++ compiler, for example, using
g++ -v
to obtain this ifgcc
is used - information about R by using function
sessionInfo
in R
- Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis, CRC Press, London, 2nd Edition.
- The Stan Development Team (2015). Stan Modeling Language User's Guide and Reference Manual.
- Gelfand, A. E., Hills S. E., Racine-Poon, A., and Smith A. F. M. (1990). "Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling", Journal of the American Statistical Association, 85, 972-985.
- Stan
- R
- BUGS
- OpenBUGS
- JAGS
- Rcpp