This document provides an introduction and overview of Stan, a programming language for Bayesian statistical modeling and inference. It discusses Stan's motivation as a faster alternative to BUGS that compiles models to C++. Key points covered include:
- How Stan models are specified using blocks like data, transformed data, parameters, model, and generated quantities.
- Stan's support for scalar, vector, matrix, and array variable types.
- An example Stan model that replicates a hierarchical Bayesian regression of rat weight data from a BUGS example.
- How to install Stan and its R interface RStan on Windows, compile Stan models, and run models from the command line or within R for analysis.
2. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
3. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
8. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
9. What Is Stan?
• What Is Stan?
• Who Develop Stan?
• Sample Code of Stan
• Execution of Stan
10. What Is Stan?
• “Stan is a package for obtaining Bayesian
inference using the No-U-Turn sampler, a
variant of Hamiltonian Monte Carlo.”(Official
Site http://mc-stan.org/)
– Similar to BUGS but more Procedural
– Still updating
– Fast:Compile to Execution File
– Easy to use:Having R Interface
– First Converge:Hamilton Monte Carlo and NUTS
11. Who Develop Stan?
• Andrew Gelman, his stuffs, Jiqiang Guo and
Marcus Brubaker
Photo Photo Photo
12. Sample Code of Stan
– Similar to BUGS but more Procedural
# http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
# Page 3: Rats
data {
int<lower=0> N;
int<lower=0> T;
real x[T];
real y[N,T];
real xbar;
}
...
model {
mu_alpha ~ normal(0, 100);
mu_beta ~ normal(0, 100);
sigmasq_y ~ inv_gamma(0.001, 0.001);
From https://github.com/stan-dev/stan/tree/master/src/models/bugs_examples/vol1/rats
13. Execution of Stan
– Fast:Compile to Execution File
1. stanc:translating the Stan program to C++
2. make:compiling the resulting C++ to an executable
3. exe:Running the stan program.
In Detail, Discuss in later
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
>make src/models/bugs_examples/vol1/rats/rats
>.¥rats --data=rats.data.R --init=rats.init.R
14. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
15. How to Install it(Windows).
1. Environment
2. Install rtools
3. Install Rstan
4. Install stan
5. Build Stan
16. 1.Environment
• I tested following model executions and install
at my PC.
•Windows 8 64bit
•Intel(R) Core(TM) i7-2600 CPU 3.4GHZ
•4core
•8thread
•12.0 GB memory
•R 3.0.1
•Rtools 3.1
•Stan 1.3.0
•RStan1.3.0
17. 2.Install Rtools
• Rtools is a collectionof resources for building packages
for R under Microsoft Windows
• g++ is installed by Rtools.
• Download the installer and execute it.
– http://cran.r-project.org/bin/windows/Rtools/
• You shall check install notice of official site but in most
cases you can install it with just clicking “next” .
Installation screen shot
18. 3.Install RStan
• Rstan is a library for using Stan from R.
• It is not registered at CRAN.
• You can install it just doing following script
from R.
– The script was a modified script originally written
in
https://code.google.com/p/stan/wiki/RStanGettin
gStarted#Install_Rstan
19. 3.Install RStan
#additional package instllation
install.packages('inline')
install.packages('Rcpp')
#check to use rcpp:if it works, then it is printed “hello world”
library(inline)
library(Rcpp)
src <- ' std::vector<std::string> s;
s.push_back("hello");
s.push_back("world");
return Rcpp::wrap(s);‘
hellofun <- cxxfunction(body = src, includes = '', plugin = 'Rcpp', verbose = FALSE)
cat(hellofun(), '¥n')
#rstan instllation
Sys.setenv(R_MAKEVARS_USER='')
options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R"))
install.packages('rstan', type = 'source')
#load rstan
library(rstan)
20. 4.Install Stan
To use Stan from command line, we can install
stan itself by following step.
1. Download tar file stan-src-1.m.p.tgz
– Downloading Site:
https://code.google.com/p/stan/downloads/list
2. Just unzip the above file in Documents
directory following command
– tar has been already installed in Windows if Rtools
has been installed.
> tar --no-same-owner -xzf stan-src-1.m.p.tgz
21. 5.Build Stan
Bulid stan at a once after installing Stan.
1. Make the library
2. Make the model parser and code generator
*<stan-home> is the directory which is
generated by the previous tar command.
>cd <stan-home>
>make bin libstan.a
>cd <stan-home>
>make bin/stanc
<stan-home>/bin as a result
22. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
23. Grammer of Stan
1. Grammer of Stan
2. Blocks
3. DataTypes
4. Scope of Variables
24. Stan program …
• Stan Program defines a statistical model
through conditional probability.
• Stan Program consists of variable type
declarations and statements.
• Stan Program has specific blocks.
• Stan Program can deal with various variable
types.
• Stan Program is different from BUGS.
25. Stan Program consits of variable type
declarations and statements.
data {
int<lower=0> N;
int<lower=0> T;
real x[T];
real y[N,T];
real xbar;
}
transformed data {
real x_minus_xbar[T];
real y_linear[N*T];
for (t in 1:T)
x_minus_xbar[t] <- x[t] - xbar;
…
rats_vec.stan
block
block
Variable type declaration
defines variable
Statement
Assingnments, Sampling
Loop, Condition
26. Stan Program has specific blocks.
• Skeletetal Stan Program
• The order must be kept.
• Blocks are optional except model block
data {
... declarations ...
}
transformed data {
... declarations ... statements ...
}
parameters {
... declarations ...
}
transformed parameters {
... declarations ... statements ...
}
model {
... declarations ... statements ...
}
generated quantities {
... declarations ... statements ...
}
Order
Scope
27. Stan Program has specific blocks.
• Given input data.
• Executed first and load
Data
• Transform variables for a
convenience
Transformed
data
• Result output parameter
• Updated on iterations.
Parameters
28. Stan Program has specific blocks.
• Transform parameters for a
convenience
Transformed
Parameters
• Model itself, Write this based
on what you want to describe.Model
• Generate Quantitie for
monitoring convergence.
Generated
Quantities
29. Stan Program can deal with various
variable types.
From http://stan.googlecode.com/files/stan-
reference-1.3.0.pdf
30. Stan Program can deal with various
variable types.
• Scalar
– Int is 32bit scalar integer. Upper and lower constraints are
allowed.
e.g. int N; int<lower=0,upper=1> cond;
– Real is 64bit scalar numeric value.
e.g. real<lower=0> sigma; real<lower=-1,upper=1> rho;
• Vector Data Types
– Real value is only allowed.
– Vector is any types of vector data.
e.g. vector<lower=0>[3] u;
– UnitSimplex:for categorical or multinominal data, a vector
contains non-negative values added to 1
e.g. simplex[5] theta;
31. Stan Program can deal with various
variable types.
• Vector Data Types
– Unit Vector: vector with a norm of one.
e.g. unit_vector[5] theta;
– Ordered Vector:Ordered vectors are most often
employed as cut points in ordered logistic regression
models
e.g. ordered[5] c;
– Positive, Ordered Vector:
e.g. positive_ordered[5] d;
– Row Vector:It is different from vector.Stan distinguish
between row and column
e.g. row_vector<lower=-1,upper=1>[10] u;
32. Stan Program can deal with various
variable types.
• Matrix Data Types
– Matrix:Matrix
e.g. matrix<upper=0>[3,4] B;
– Correlation Matrices:From -1 to 1, values are allowed.
e.g. corr_matrix[3] Sigma;
– Covariance Matrices: symmetric and positive definite.
e.g. cov_matrix[K] Omega;
• Array Data Types
– Arrays are declared by enclosing the dimensions in square
brackets following the name of the variable.
– An array’s elements may be any of the basic data types.
e.g. cov_matrix[5] mu[2,3,4];
33. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
35. Rats Data
• Rats data and its model are
contained WinBUGS example volume I.
(http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf)
• Original article is Gelfand et al (1990)
• Weights of young rats measured by weekly for
hierarchical model
• Rows:individual rats
(N=30)
• Columns:day(M=5)
From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
36. Rats Model
• Hierarchical Regression Model considering individual
and time differences.
ondistributiNormalofprecision
idayofeffectiindividualofeffect
daysofmedianxdaysxdataobservedY
ii
barj
:
::
)22(:::
From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
37. Rats Model
model {
mu_alpha ~ normal(0, 100);
mu_beta ~ normal(0, 100);
sigmasq_y ~ inv_gamma(0.001, 0.001);
sigmasq_alpha ~ inv_gamma(0.001, 0.001);
sigmasq_beta ~ inv_gamma(0.001, 0.001);
alpha ~ normal(mu_alpha, sigma_alpha); // vectorized
beta ~ normal(mu_beta, sigma_beta); // vectorized
for (n in 1:N)
for (t in 1:T)
y[n,t] ~ normal(alpha[n] + beta[n] * (x[t] - xbar), sigma_y);
}
38. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
40. Execution of Stan
1. stanc:translating the Stan program to C++
2. make:compiling the resulting C++ to an executable
3. exe:Running the stan program.
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
>make src/models/bugs_examples/vol1/rats/rats
>.¥rats --data=rats.data.R --init=rats.init.R
41. stanc
• The model translation program stanc
changes .stan file to .cpp file.
USAGE: stanc [options] <model_file>
--name=<string> Model name
(default = "$model_filename_model")
--o=<file> Output file for generated C++ code
(default = "$name.cpp")
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
42. make
• We can compile the generated .cpp file by
make command
>make src/models/bugs_examples/vol1/rats/rats
43. execution
• We can execute stan sampler by executing the
generated .exe file
USAGE: .¥src¥models¥bugs_examples¥vol1¥rats¥rats [options]
OPTIONS:
--data=<file>:Read data from specified dump-format file (required if model declares data)
--init=<file>:Use initial values from specified file or zero values if <file>=0 (default is random
initialization)
--samples=<file> File into which samples are written(default = samples.csv)
--append_samples Append samples to existing file if it exists(does not write header
--seed=<int> Random number generation seed (default = randomly generated from time)
--chain_id=<int> Markov chain identifier (default = 1)
--iter=<+int> Total number of iterations, including warmup(default = 2000)
--thin=<+int> Period between saved samples after warm up(default = max(1, floor(iter -
warmup) / 1000))
--refresh=<int> Period between samples updating progress report print (0 for no printing)
(default = max(1,iter/200)))
>.¥rats --data=rats.data.R --init=rats.init.R
44. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R
• Reference
45. Execution from R
• Rstan
• Execution from R
• plot(stanfit)
• traceplot(stanfit)
• fit using previous model
• parallel execution from R
46. RStan
• Rstan is a interface to Stan
– Compiling Stan code, c++ code and execute from
RStan
– Visualization function of Stan Result(stanfit class)
Stan code C++ code exe
stanc() sampling()stan_model()
S4:stanfit
plot()
traceplot()
extract()
Architecture of Rstan
stan ()
stan ()
47. Execution from R
#set to dir which contains source file
STAN_HOME<-<STAN_HOME>
dirpath<-paste0(STAN_HOME, "/include/stansrc/models/bugs_examples/vol1/rats")
#load data to list:dat
source(paste0(dirpath, "/rats.data.R"))
dat<-list(y=y, x=x, xbar=xbar, N=N, T=T)
#fit1:to simulate the model as one liner
fit1 <- stan(file = paste0(dirpath, "/rats.stan"), data = dat,
iter = 1000, chains = 4)
#fit2:to simulate the model step by step
#translating from stan code to c++ code
rt <- stanc(file = paste0(dirpath, "/rats.stan"), model_name="stan", verbose=TRUE)
#compile c++ code for model
sm <- stan_model(stanc_ret = rt, verbose = FALSE)
#execute model simulation
fit2 <- sampling(sm, data = dat, chains = 4, iter=1000)
50. fit using previous model
Once a model is fitted, we can use the fitted
result as an input to fit the model with other
data or settings. This would save us time of
compiling the C++ code for the model
https://code.google.com/p/stan/wiki/RStanGettingStarted
#fit again using the previous fit result
fit3<-stan(fit=fit1, data = dat, iter = 400, chains = 4)
51. Parallel Execution from R
#parallel processing of
library(doSNOW)
library(foreach)
cl<-makeCluster(4) #change the 2 to your number of CPU cores
registerDoSNOW(cl)
#parallel processing each chain of stan
sflist1<-foreach(i=1:10,.packages='rstan') %dopar% {
stan(fit = fit1, data=dat, chains = 1, chain_id = i, refresh = -1)
}
#merging the chains
f3<-sflist2stanfit(sflist1)
54. INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R
• Reference
55. Reference
• Reference
– User‘s Guide and Reference Manual:Grammer,
Diffrence between BUGS and Get-Started
(http://stan.googlecode.com/files/stan-reference-
1.3.0.pdf)
– Official Site(http://mc-stan.org/)