SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Introduction of Stan
@Teito Nakagawa
#TokyoBUGS 1st
29 September 2013
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
Motivation
As an analyst, I’m using…
SMALL DATA
census
report deficit
data
Motivation
But a requirement is
BIG.
I must make a model.
I must tell many things.
Motivation
That’s the reason that
I start to learn BUGS.
BUT IT TAKES MUCH TIME
Motivation
So, I
start
to learn
Stan.
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
What Is Stan?
• What Is Stan?
• Who Develop Stan?
• Sample Code of Stan
• Execution of Stan
What Is Stan?
• “Stan is a package for obtaining Bayesian
inference using the No-U-Turn sampler, a
variant of Hamiltonian Monte Carlo.”(Official
Site http://mc-stan.org/)
– Similar to BUGS but more Procedural
– Still updating
– Fast:Compile to Execution File
– Easy to use:Having R Interface
– First Converge:Hamilton Monte Carlo and NUTS
Who Develop Stan?
• Andrew Gelman, his stuffs, Jiqiang Guo and
Marcus Brubaker
Photo Photo Photo
Sample Code of Stan
– Similar to BUGS but more Procedural
# http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
# Page 3: Rats
data {
int<lower=0> N;
int<lower=0> T;
real x[T];
real y[N,T];
real xbar;
}
...
model {
mu_alpha ~ normal(0, 100);
mu_beta ~ normal(0, 100);
sigmasq_y ~ inv_gamma(0.001, 0.001);
From https://github.com/stan-dev/stan/tree/master/src/models/bugs_examples/vol1/rats
Execution of Stan
– Fast:Compile to Execution File
1. stanc:translating the Stan program to C++
2. make:compiling the resulting C++ to an executable
3. exe:Running the stan program.
In Detail, Discuss in later
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
>make src/models/bugs_examples/vol1/rats/rats
>.¥rats --data=rats.data.R --init=rats.init.R
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
How to Install it(Windows).
1. Environment
2. Install rtools
3. Install Rstan
4. Install stan
5. Build Stan
1.Environment
• I tested following model executions and install
at my PC.
•Windows 8 64bit
•Intel(R) Core(TM) i7-2600 CPU 3.4GHZ
•4core
•8thread
•12.0 GB memory
•R 3.0.1
•Rtools 3.1
•Stan 1.3.0
•RStan1.3.0
2.Install Rtools
• Rtools is a collectionof resources for building packages
for R under Microsoft Windows
• g++ is installed by Rtools.
• Download the installer and execute it.
– http://cran.r-project.org/bin/windows/Rtools/
• You shall check install notice of official site but in most
cases you can install it with just clicking “next” .
Installation screen shot
3.Install RStan
• Rstan is a library for using Stan from R.
• It is not registered at CRAN.
• You can install it just doing following script
from R.
– The script was a modified script originally written
in
https://code.google.com/p/stan/wiki/RStanGettin
gStarted#Install_Rstan
3.Install RStan
#additional package instllation
install.packages('inline')
install.packages('Rcpp')
#check to use rcpp:if it works, then it is printed “hello world”
library(inline)
library(Rcpp)
src <- ' std::vector<std::string> s;
s.push_back("hello");
s.push_back("world");
return Rcpp::wrap(s);‘
hellofun <- cxxfunction(body = src, includes = '', plugin = 'Rcpp', verbose = FALSE)
cat(hellofun(), '¥n')
#rstan instllation
Sys.setenv(R_MAKEVARS_USER='')
options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R"))
install.packages('rstan', type = 'source')
#load rstan
library(rstan)
4.Install Stan
To use Stan from command line, we can install
stan itself by following step.
1. Download tar file stan-src-1.m.p.tgz
– Downloading Site:
https://code.google.com/p/stan/downloads/list
2. Just unzip the above file in Documents
directory following command
– tar has been already installed in Windows if Rtools
has been installed.
> tar --no-same-owner -xzf stan-src-1.m.p.tgz
5.Build Stan
Bulid stan at a once after installing Stan.
1. Make the library
2. Make the model parser and code generator
*<stan-home> is the directory which is
generated by the previous tar command.
>cd <stan-home>
>make bin libstan.a
>cd <stan-home>
>make bin/stanc
<stan-home>/bin as a result
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
Grammer of Stan
1. Grammer of Stan
2. Blocks
3. DataTypes
4. Scope of Variables
Stan program …
• Stan Program defines a statistical model
through conditional probability.
• Stan Program consists of variable type
declarations and statements.
• Stan Program has specific blocks.
• Stan Program can deal with various variable
types.
• Stan Program is different from BUGS.
Stan Program consits of variable type
declarations and statements.
data {
int<lower=0> N;
int<lower=0> T;
real x[T];
real y[N,T];
real xbar;
}
transformed data {
real x_minus_xbar[T];
real y_linear[N*T];
for (t in 1:T)
x_minus_xbar[t] <- x[t] - xbar;
…
rats_vec.stan
block
block
Variable type declaration
defines variable
Statement
Assingnments, Sampling
Loop, Condition
Stan Program has specific blocks.
• Skeletetal Stan Program
• The order must be kept.
• Blocks are optional except model block
data {
... declarations ...
}
transformed data {
... declarations ... statements ...
}
parameters {
... declarations ...
}
transformed parameters {
... declarations ... statements ...
}
model {
... declarations ... statements ...
}
generated quantities {
... declarations ... statements ...
}
Order
Scope
Stan Program has specific blocks.
• Given input data.
• Executed first and load
Data
• Transform variables for a
convenience
Transformed
data
• Result output parameter
• Updated on iterations.
Parameters
Stan Program has specific blocks.
• Transform parameters for a
convenience
Transformed
Parameters
• Model itself, Write this based
on what you want to describe.Model
• Generate Quantitie for
monitoring convergence.
Generated
Quantities
Stan Program can deal with various
variable types.
From http://stan.googlecode.com/files/stan-
reference-1.3.0.pdf
Stan Program can deal with various
variable types.
• Scalar
– Int is 32bit scalar integer. Upper and lower constraints are
allowed.
e.g. int N; int<lower=0,upper=1> cond;
– Real is 64bit scalar numeric value.
e.g. real<lower=0> sigma; real<lower=-1,upper=1> rho;
• Vector Data Types
– Real value is only allowed.
– Vector is any types of vector data.
e.g. vector<lower=0>[3] u;
– UnitSimplex:for categorical or multinominal data, a vector
contains non-negative values added to 1
e.g. simplex[5] theta;
Stan Program can deal with various
variable types.
• Vector Data Types
– Unit Vector: vector with a norm of one.
e.g. unit_vector[5] theta;
– Ordered Vector:Ordered vectors are most often
employed as cut points in ordered logistic regression
models
e.g. ordered[5] c;
– Positive, Ordered Vector:
e.g. positive_ordered[5] d;
– Row Vector:It is different from vector.Stan distinguish
between row and column
e.g. row_vector<lower=-1,upper=1>[10] u;
Stan Program can deal with various
variable types.
• Matrix Data Types
– Matrix:Matrix
e.g. matrix<upper=0>[3,4] B;
– Correlation Matrices:From -1 to 1, values are allowed.
e.g. corr_matrix[3] Sigma;
– Covariance Matrices: symmetric and positive definite.
e.g. cov_matrix[K] Omega;
• Array Data Types
– Arrays are declared by enclosing the dimensions in square
brackets following the name of the variable.
– An array’s elements may be any of the basic data types.
e.g. cov_matrix[5] mu[2,3,4];
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
Rats Data Model
1. Rats Data
2. Rats Model
Rats Data
• Rats data and its model are
contained WinBUGS example volume I.
(http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf)
• Original article is Gelfand et al (1990)
• Weights of young rats measured by weekly for
hierarchical model
• Rows:individual rats
(N=30)
• Columns:day(M=5)
From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
Rats Model
• Hierarchical Regression Model considering individual
and time differences.
ondistributiNormalofprecision
idayofeffectiindividualofeffect
daysofmedianxdaysxdataobservedY
ii
barj
:
::
)22(:::



From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
Rats Model
model {
mu_alpha ~ normal(0, 100);
mu_beta ~ normal(0, 100);
sigmasq_y ~ inv_gamma(0.001, 0.001);
sigmasq_alpha ~ inv_gamma(0.001, 0.001);
sigmasq_beta ~ inv_gamma(0.001, 0.001);
alpha ~ normal(mu_alpha, sigma_alpha); // vectorized
beta ~ normal(mu_beta, sigma_beta); // vectorized
for (n in 1:N)
for (t in 1:T)
y[n,t] ~ normal(alpha[n] + beta[n] * (x[t] - xbar), sigma_y);
}
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R:RStan
• Reference
Execution from CommandLine
• Execution of Stan
• stanc
• make
• execution
Execution of Stan
1. stanc:translating the Stan program to C++
2. make:compiling the resulting C++ to an executable
3. exe:Running the stan program.
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
>make src/models/bugs_examples/vol1/rats/rats
>.¥rats --data=rats.data.R --init=rats.init.R
stanc
• The model translation program stanc
changes .stan file to .cpp file.
USAGE: stanc [options] <model_file>
--name=<string> Model name
(default = "$model_filename_model")
--o=<file> Output file for generated C++ code
(default = "$name.cpp")
>¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
make
• We can compile the generated .cpp file by
make command
>make src/models/bugs_examples/vol1/rats/rats
execution
• We can execute stan sampler by executing the
generated .exe file
USAGE: .¥src¥models¥bugs_examples¥vol1¥rats¥rats [options]
OPTIONS:
--data=<file>:Read data from specified dump-format file (required if model declares data)
--init=<file>:Use initial values from specified file or zero values if <file>=0 (default is random
initialization)
--samples=<file> File into which samples are written(default = samples.csv)
--append_samples Append samples to existing file if it exists(does not write header
--seed=<int> Random number generation seed (default = randomly generated from time)
--chain_id=<int> Markov chain identifier (default = 1)
--iter=<+int> Total number of iterations, including warmup(default = 2000)
--thin=<+int> Period between saved samples after warm up(default = max(1, floor(iter -
warmup) / 1000))
--refresh=<int> Period between samples updating progress report print (0 for no printing)
(default = max(1,iter/200)))
>.¥rats --data=rats.data.R --init=rats.init.R
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R
• Reference
Execution from R
• Rstan
• Execution from R
• plot(stanfit)
• traceplot(stanfit)
• fit using previous model
• parallel execution from R
RStan
• Rstan is a interface to Stan
– Compiling Stan code, c++ code and execute from
RStan
– Visualization function of Stan Result(stanfit class)
Stan code C++ code exe
stanc() sampling()stan_model()
S4:stanfit
plot()
traceplot()
extract()
Architecture of Rstan
stan ()
stan ()
Execution from R
#set to dir which contains source file
STAN_HOME<-<STAN_HOME>
dirpath<-paste0(STAN_HOME, "/include/stansrc/models/bugs_examples/vol1/rats")
#load data to list:dat
source(paste0(dirpath, "/rats.data.R"))
dat<-list(y=y, x=x, xbar=xbar, N=N, T=T)
#fit1:to simulate the model as one liner
fit1 <- stan(file = paste0(dirpath, "/rats.stan"), data = dat,
iter = 1000, chains = 4)
#fit2:to simulate the model step by step
#translating from stan code to c++ code
rt <- stanc(file = paste0(dirpath, "/rats.stan"), model_name="stan", verbose=TRUE)
#compile c++ code for model
sm <- stan_model(stanc_ret = rt, verbose = FALSE)
#execute model simulation
fit2 <- sampling(sm, data = dat, chains = 4, iter=1000)
plot(stanfit)
We can check a value and
R-hat each paramters
traceplot(stanfit)
We can trace each chains.
fit using previous model
Once a model is fitted, we can use the fitted
result as an input to fit the model with other
data or settings. This would save us time of
compiling the C++ code for the model
https://code.google.com/p/stan/wiki/RStanGettingStarted
#fit again using the previous fit result
fit3<-stan(fit=fit1, data = dat, iter = 400, chains = 4)
Parallel Execution from R
#parallel processing of
library(doSNOW)
library(foreach)
cl<-makeCluster(4) #change the 2 to your number of CPU cores
registerDoSNOW(cl)
#parallel processing each chain of stan
sflist1<-foreach(i=1:10,.packages='rstan') %dopar% {
stan(fit = fit1, data=dat, chains = 1, chain_id = i, refresh = -1)
}
#merging the chains
f3<-sflist2stanfit(sflist1)
Parallel ExecutionPerformance
#Parralel Processing
timecalc<-matrix(0, nrow=4, ncol=7)
iter<-c(1000, 3000, 5000, 10000, 30000, 50000, 100000)
numproc<-c(1,2,4,8)
#Single Processing
for(i in 1:7){
cat("p:", 1,", iter:", iter[i], "¥r¥n")
t<-proc.time()
#-------------------------------------------------
a<-stan(fit = fit1, data=dat, chains = 8, refresh = -1,
iter=iter[i])
#-------------------------------------------------
timecalc[1,i]<-(proc.time()-t)["elapsed"]
}
#Parallel Processing
for(p in 2:4){
for(i in 1:7){
cat("proc:",numproc[p],"iter:", iter[i], "¥r¥n")
t<-proc.time()
#-------------------------------------------------
#parallel processing of
library(doSNOW)
library(foreach)
cl<-makeCluster(numproc[p])
registerDoSNOW(cl)
#parallel processing each chain of stan
sflist1<-foreach(k=1:8,.packages='rstan') %dopar% {
stan(fit = fit1, data=dat, chains = 1, chain_id = k, refresh
= -1, iter=iter[i])
}
#merging each chains
f3<-sflist2stanfit(sflist1)
#-------------------------------------------------
timecalc[p,i]<-(proc.time()-t)["elapsed"]
}
}
Performance result
4cluster is BEST on My PC.
INDEX
• Motivation
• What Is Stan?
• How to Install it(Windows).
• Grammer of Stan
• Rat Data Model
• Execution from Command Line
• Execution from R
• Reference
Reference
• Reference
– User‘s Guide and Reference Manual:Grammer,
Diffrence between BUGS and Get-Started
(http://stan.googlecode.com/files/stan-reference-
1.3.0.pdf)
– Official Site(http://mc-stan.org/)
End Of Slide
Stanislaw
MarcinUlam
(13 April 1909 – 13 May 1984)
http://en.wikipedia.org/wiki/Stanislaw_Ulam

Weitere ähnliche Inhalte

Ähnlich wie Introduction of stan

Project Malware AnalysisCS 6262 Project 3Agenda.docx
Project Malware AnalysisCS 6262 Project 3Agenda.docxProject Malware AnalysisCS 6262 Project 3Agenda.docx
Project Malware AnalysisCS 6262 Project 3Agenda.docx
briancrawford30935
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
Víctor Zabalza
 

Ähnlich wie Introduction of stan (20)

Machine learning Experiments report
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
Deep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data ServicesDeep dive time series anomaly detection with different Azure Data Services
Deep dive time series anomaly detection with different Azure Data Services
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Project Malware AnalysisCS 6262 Project 3Agenda.docx
Project Malware AnalysisCS 6262 Project 3Agenda.docxProject Malware AnalysisCS 6262 Project 3Agenda.docx
Project Malware AnalysisCS 6262 Project 3Agenda.docx
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
 
Workshop slides
Workshop slidesWorkshop slides
Workshop slides
 
Python for Computer Vision - Revision
Python for Computer Vision - RevisionPython for Computer Vision - Revision
Python for Computer Vision - Revision
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
Ch01 basic-java-programs
Ch01 basic-java-programsCh01 basic-java-programs
Ch01 basic-java-programs
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 

Mehr von Teito Nakagawa (8)

Object Detection on AWS Lambda
Object Detection on AWS LambdaObject Detection on AWS Lambda
Object Detection on AWS Lambda
 
BigQuery GISを用いた物件レコメンド
BigQuery GISを用いた物件レコメンドBigQuery GISを用いた物件レコメンド
BigQuery GISを用いた物件レコメンド
 
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
オープンハウスにおける機械学習・データサイエンスの取り組みについてオープンハウスにおける機械学習・データサイエンスの取り組みについて
オープンハウスにおける 機械学習・データサイエンスの 取り組みについて
 
Numacraw for r user(upload)
Numacraw for r user(upload)Numacraw for r user(upload)
Numacraw for r user(upload)
 
Numacraw for r user(upload)
Numacraw for r user(upload)Numacraw for r user(upload)
Numacraw for r user(upload)
 
Stanで人類最強の男を決定する 2
Stanで人類最強の男を決定する 2Stanで人類最強の男を決定する 2
Stanで人類最強の男を決定する 2
 
StanTutorial
StanTutorialStanTutorial
StanTutorial
 
Collaborativefilteringwith r
Collaborativefilteringwith rCollaborativefilteringwith r
Collaborativefilteringwith r
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Introduction of stan

  • 1. Introduction of Stan @Teito Nakagawa #TokyoBUGS 1st 29 September 2013
  • 2. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 3. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 4. Motivation As an analyst, I’m using… SMALL DATA census report deficit data
  • 5. Motivation But a requirement is BIG. I must make a model. I must tell many things.
  • 6. Motivation That’s the reason that I start to learn BUGS. BUT IT TAKES MUCH TIME
  • 8. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 9. What Is Stan? • What Is Stan? • Who Develop Stan? • Sample Code of Stan • Execution of Stan
  • 10. What Is Stan? • “Stan is a package for obtaining Bayesian inference using the No-U-Turn sampler, a variant of Hamiltonian Monte Carlo.”(Official Site http://mc-stan.org/) – Similar to BUGS but more Procedural – Still updating – Fast:Compile to Execution File – Easy to use:Having R Interface – First Converge:Hamilton Monte Carlo and NUTS
  • 11. Who Develop Stan? • Andrew Gelman, his stuffs, Jiqiang Guo and Marcus Brubaker Photo Photo Photo
  • 12. Sample Code of Stan – Similar to BUGS but more Procedural # http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf # Page 3: Rats data { int<lower=0> N; int<lower=0> T; real x[T]; real y[N,T]; real xbar; } ... model { mu_alpha ~ normal(0, 100); mu_beta ~ normal(0, 100); sigmasq_y ~ inv_gamma(0.001, 0.001); From https://github.com/stan-dev/stan/tree/master/src/models/bugs_examples/vol1/rats
  • 13. Execution of Stan – Fast:Compile to Execution File 1. stanc:translating the Stan program to C++ 2. make:compiling the resulting C++ to an executable 3. exe:Running the stan program. In Detail, Discuss in later >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan >make src/models/bugs_examples/vol1/rats/rats >.¥rats --data=rats.data.R --init=rats.init.R
  • 14. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 15. How to Install it(Windows). 1. Environment 2. Install rtools 3. Install Rstan 4. Install stan 5. Build Stan
  • 16. 1.Environment • I tested following model executions and install at my PC. •Windows 8 64bit •Intel(R) Core(TM) i7-2600 CPU 3.4GHZ •4core •8thread •12.0 GB memory •R 3.0.1 •Rtools 3.1 •Stan 1.3.0 •RStan1.3.0
  • 17. 2.Install Rtools • Rtools is a collectionof resources for building packages for R under Microsoft Windows • g++ is installed by Rtools. • Download the installer and execute it. – http://cran.r-project.org/bin/windows/Rtools/ • You shall check install notice of official site but in most cases you can install it with just clicking “next” . Installation screen shot
  • 18. 3.Install RStan • Rstan is a library for using Stan from R. • It is not registered at CRAN. • You can install it just doing following script from R. – The script was a modified script originally written in https://code.google.com/p/stan/wiki/RStanGettin gStarted#Install_Rstan
  • 19. 3.Install RStan #additional package instllation install.packages('inline') install.packages('Rcpp') #check to use rcpp:if it works, then it is printed “hello world” library(inline) library(Rcpp) src <- ' std::vector<std::string> s; s.push_back("hello"); s.push_back("world"); return Rcpp::wrap(s);‘ hellofun <- cxxfunction(body = src, includes = '', plugin = 'Rcpp', verbose = FALSE) cat(hellofun(), '¥n') #rstan instllation Sys.setenv(R_MAKEVARS_USER='') options(repos = c(getOption("repos"), rstan = "http://wiki.stan.googlecode.com/git/R")) install.packages('rstan', type = 'source') #load rstan library(rstan)
  • 20. 4.Install Stan To use Stan from command line, we can install stan itself by following step. 1. Download tar file stan-src-1.m.p.tgz – Downloading Site: https://code.google.com/p/stan/downloads/list 2. Just unzip the above file in Documents directory following command – tar has been already installed in Windows if Rtools has been installed. > tar --no-same-owner -xzf stan-src-1.m.p.tgz
  • 21. 5.Build Stan Bulid stan at a once after installing Stan. 1. Make the library 2. Make the model parser and code generator *<stan-home> is the directory which is generated by the previous tar command. >cd <stan-home> >make bin libstan.a >cd <stan-home> >make bin/stanc <stan-home>/bin as a result
  • 22. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 23. Grammer of Stan 1. Grammer of Stan 2. Blocks 3. DataTypes 4. Scope of Variables
  • 24. Stan program … • Stan Program defines a statistical model through conditional probability. • Stan Program consists of variable type declarations and statements. • Stan Program has specific blocks. • Stan Program can deal with various variable types. • Stan Program is different from BUGS.
  • 25. Stan Program consits of variable type declarations and statements. data { int<lower=0> N; int<lower=0> T; real x[T]; real y[N,T]; real xbar; } transformed data { real x_minus_xbar[T]; real y_linear[N*T]; for (t in 1:T) x_minus_xbar[t] <- x[t] - xbar; … rats_vec.stan block block Variable type declaration defines variable Statement Assingnments, Sampling Loop, Condition
  • 26. Stan Program has specific blocks. • Skeletetal Stan Program • The order must be kept. • Blocks are optional except model block data { ... declarations ... } transformed data { ... declarations ... statements ... } parameters { ... declarations ... } transformed parameters { ... declarations ... statements ... } model { ... declarations ... statements ... } generated quantities { ... declarations ... statements ... } Order Scope
  • 27. Stan Program has specific blocks. • Given input data. • Executed first and load Data • Transform variables for a convenience Transformed data • Result output parameter • Updated on iterations. Parameters
  • 28. Stan Program has specific blocks. • Transform parameters for a convenience Transformed Parameters • Model itself, Write this based on what you want to describe.Model • Generate Quantitie for monitoring convergence. Generated Quantities
  • 29. Stan Program can deal with various variable types. From http://stan.googlecode.com/files/stan- reference-1.3.0.pdf
  • 30. Stan Program can deal with various variable types. • Scalar – Int is 32bit scalar integer. Upper and lower constraints are allowed. e.g. int N; int<lower=0,upper=1> cond; – Real is 64bit scalar numeric value. e.g. real<lower=0> sigma; real<lower=-1,upper=1> rho; • Vector Data Types – Real value is only allowed. – Vector is any types of vector data. e.g. vector<lower=0>[3] u; – UnitSimplex:for categorical or multinominal data, a vector contains non-negative values added to 1 e.g. simplex[5] theta;
  • 31. Stan Program can deal with various variable types. • Vector Data Types – Unit Vector: vector with a norm of one. e.g. unit_vector[5] theta; – Ordered Vector:Ordered vectors are most often employed as cut points in ordered logistic regression models e.g. ordered[5] c; – Positive, Ordered Vector: e.g. positive_ordered[5] d; – Row Vector:It is different from vector.Stan distinguish between row and column e.g. row_vector<lower=-1,upper=1>[10] u;
  • 32. Stan Program can deal with various variable types. • Matrix Data Types – Matrix:Matrix e.g. matrix<upper=0>[3,4] B; – Correlation Matrices:From -1 to 1, values are allowed. e.g. corr_matrix[3] Sigma; – Covariance Matrices: symmetric and positive definite. e.g. cov_matrix[K] Omega; • Array Data Types – Arrays are declared by enclosing the dimensions in square brackets following the name of the variable. – An array’s elements may be any of the basic data types. e.g. cov_matrix[5] mu[2,3,4];
  • 33. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 34. Rats Data Model 1. Rats Data 2. Rats Model
  • 35. Rats Data • Rats data and its model are contained WinBUGS example volume I. (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf) • Original article is Gelfand et al (1990) • Weights of young rats measured by weekly for hierarchical model • Rows:individual rats (N=30) • Columns:day(M=5) From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
  • 36. Rats Model • Hierarchical Regression Model considering individual and time differences. ondistributiNormalofprecision idayofeffectiindividualofeffect daysofmedianxdaysxdataobservedY ii barj : :: )22(:::    From http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/Vol1.pdf
  • 37. Rats Model model { mu_alpha ~ normal(0, 100); mu_beta ~ normal(0, 100); sigmasq_y ~ inv_gamma(0.001, 0.001); sigmasq_alpha ~ inv_gamma(0.001, 0.001); sigmasq_beta ~ inv_gamma(0.001, 0.001); alpha ~ normal(mu_alpha, sigma_alpha); // vectorized beta ~ normal(mu_beta, sigma_beta); // vectorized for (n in 1:N) for (t in 1:T) y[n,t] ~ normal(alpha[n] + beta[n] * (x[t] - xbar), sigma_y); }
  • 38. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R:RStan • Reference
  • 39. Execution from CommandLine • Execution of Stan • stanc • make • execution
  • 40. Execution of Stan 1. stanc:translating the Stan program to C++ 2. make:compiling the resulting C++ to an executable 3. exe:Running the stan program. >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan >make src/models/bugs_examples/vol1/rats/rats >.¥rats --data=rats.data.R --init=rats.init.R
  • 41. stanc • The model translation program stanc changes .stan file to .cpp file. USAGE: stanc [options] <model_file> --name=<string> Model name (default = "$model_filename_model") --o=<file> Output file for generated C++ code (default = "$name.cpp") >¥bin¥stanc --name=rats --o=rats.cpp .¥rats.stan
  • 42. make • We can compile the generated .cpp file by make command >make src/models/bugs_examples/vol1/rats/rats
  • 43. execution • We can execute stan sampler by executing the generated .exe file USAGE: .¥src¥models¥bugs_examples¥vol1¥rats¥rats [options] OPTIONS: --data=<file>:Read data from specified dump-format file (required if model declares data) --init=<file>:Use initial values from specified file or zero values if <file>=0 (default is random initialization) --samples=<file> File into which samples are written(default = samples.csv) --append_samples Append samples to existing file if it exists(does not write header --seed=<int> Random number generation seed (default = randomly generated from time) --chain_id=<int> Markov chain identifier (default = 1) --iter=<+int> Total number of iterations, including warmup(default = 2000) --thin=<+int> Period between saved samples after warm up(default = max(1, floor(iter - warmup) / 1000)) --refresh=<int> Period between samples updating progress report print (0 for no printing) (default = max(1,iter/200))) >.¥rats --data=rats.data.R --init=rats.init.R
  • 44. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R • Reference
  • 45. Execution from R • Rstan • Execution from R • plot(stanfit) • traceplot(stanfit) • fit using previous model • parallel execution from R
  • 46. RStan • Rstan is a interface to Stan – Compiling Stan code, c++ code and execute from RStan – Visualization function of Stan Result(stanfit class) Stan code C++ code exe stanc() sampling()stan_model() S4:stanfit plot() traceplot() extract() Architecture of Rstan stan () stan ()
  • 47. Execution from R #set to dir which contains source file STAN_HOME<-<STAN_HOME> dirpath<-paste0(STAN_HOME, "/include/stansrc/models/bugs_examples/vol1/rats") #load data to list:dat source(paste0(dirpath, "/rats.data.R")) dat<-list(y=y, x=x, xbar=xbar, N=N, T=T) #fit1:to simulate the model as one liner fit1 <- stan(file = paste0(dirpath, "/rats.stan"), data = dat, iter = 1000, chains = 4) #fit2:to simulate the model step by step #translating from stan code to c++ code rt <- stanc(file = paste0(dirpath, "/rats.stan"), model_name="stan", verbose=TRUE) #compile c++ code for model sm <- stan_model(stanc_ret = rt, verbose = FALSE) #execute model simulation fit2 <- sampling(sm, data = dat, chains = 4, iter=1000)
  • 48. plot(stanfit) We can check a value and R-hat each paramters
  • 50. fit using previous model Once a model is fitted, we can use the fitted result as an input to fit the model with other data or settings. This would save us time of compiling the C++ code for the model https://code.google.com/p/stan/wiki/RStanGettingStarted #fit again using the previous fit result fit3<-stan(fit=fit1, data = dat, iter = 400, chains = 4)
  • 51. Parallel Execution from R #parallel processing of library(doSNOW) library(foreach) cl<-makeCluster(4) #change the 2 to your number of CPU cores registerDoSNOW(cl) #parallel processing each chain of stan sflist1<-foreach(i=1:10,.packages='rstan') %dopar% { stan(fit = fit1, data=dat, chains = 1, chain_id = i, refresh = -1) } #merging the chains f3<-sflist2stanfit(sflist1)
  • 52. Parallel ExecutionPerformance #Parralel Processing timecalc<-matrix(0, nrow=4, ncol=7) iter<-c(1000, 3000, 5000, 10000, 30000, 50000, 100000) numproc<-c(1,2,4,8) #Single Processing for(i in 1:7){ cat("p:", 1,", iter:", iter[i], "¥r¥n") t<-proc.time() #------------------------------------------------- a<-stan(fit = fit1, data=dat, chains = 8, refresh = -1, iter=iter[i]) #------------------------------------------------- timecalc[1,i]<-(proc.time()-t)["elapsed"] } #Parallel Processing for(p in 2:4){ for(i in 1:7){ cat("proc:",numproc[p],"iter:", iter[i], "¥r¥n") t<-proc.time() #------------------------------------------------- #parallel processing of library(doSNOW) library(foreach) cl<-makeCluster(numproc[p]) registerDoSNOW(cl) #parallel processing each chain of stan sflist1<-foreach(k=1:8,.packages='rstan') %dopar% { stan(fit = fit1, data=dat, chains = 1, chain_id = k, refresh = -1, iter=iter[i]) } #merging each chains f3<-sflist2stanfit(sflist1) #------------------------------------------------- timecalc[p,i]<-(proc.time()-t)["elapsed"] } }
  • 54. INDEX • Motivation • What Is Stan? • How to Install it(Windows). • Grammer of Stan • Rat Data Model • Execution from Command Line • Execution from R • Reference
  • 55. Reference • Reference – User‘s Guide and Reference Manual:Grammer, Diffrence between BUGS and Get-Started (http://stan.googlecode.com/files/stan-reference- 1.3.0.pdf) – Official Site(http://mc-stan.org/)
  • 56. End Of Slide Stanislaw MarcinUlam (13 April 1909 – 13 May 1984) http://en.wikipedia.org/wiki/Stanislaw_Ulam