Bayesian Regression for Interval Data

Autorizada la entrega del proyecto del alumno:

Rub´ n Salgado Fern´ ndez
e a

EL DIRECTOR DEL PROYECTO

Carlos Mat´ Jim´ nez
e e

Fdo.: Fecha: 12/06/2007

Vo Bo DEL COORDINADOR DE PROYECTOS

Claudia Meseguer Velasco

Fdo.: Fecha: 12/06/2007

UNIVERSIDAD PONTIFICIA DE COMILLAS

ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI)
´ IA

´
INGENIERO EN ORGANIZACION INDUSTRIAL

PROYECTO FIN DE CARRERA

Bayesian Regression System
for Interval-Valued Data.
Application to the Spanish Continuous
Stock Market

AUTOR : Salgado Fern´ ndez, Rub´ n
a e

M ADRID , Junio 2007

Acknowlegdements

Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of
e e
making this project. With him, I have learnt, not only about Statistics and investigation, but also
about how to enjoy with them.

Special thanks to my parents. Their love and all they have taught me in this life are the things
what have made possible being the person I am now.

Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time.

Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving
me the inspiration to go ahead.

Madrid, June 2007

i

Resumen

´
En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma
n e
exitosa en muchos y variados campos tales como marketing, medicina, ingenierá, econometrá o mer-
ı ı
cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN-
ı a
BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de
o o
los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que
e
se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n,
u o
a ´
con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo
a a
han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita-
n
ban el desarrollo del mismo a los investigadores. Si bien la metodologá Bayesiana existe como tal
ı
desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta
expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y
o
perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte
e a e
Carlo.

ı ´
En especial, esta metodologá se ha mostrado extraordinariamente util en la aplicaci´ n a los mod-
o
elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones
o u a
en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun-
o
damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en
a a a
qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar
e o
-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable
pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar
o
informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con
o e o
la metodologá Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo
ı a
los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra.
a ı

ii

iii

a ´
Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a
ı
ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica
ı ı
de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han
a ı o
desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos.
e ı o

En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto
implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el
comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-
ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados.
ı

Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una
amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a dá de hoy, como
o e a a ı
son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo
a o o
diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner
e o a
en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til
a o a a
espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta
n a
herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R.
a a a o

Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n
e a o
tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en
o o
materia pr´ ctica, como es el empleo del lenguaje R.
a

Abstract

In the recent years, Bayesian methods have been spread and successfully used in many and several
fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-
acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives
is that not only does it take into account the objective information coming from the analyzed event,
but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to
the fact that the more knowledge of the situation one has, the more reliable and accurate decisions
could be taken. However, although Bayesian methodology was set long time ago, it has not been
applied in a general way until the 90’s because of the computational difficulties. Such expansion has
been mainly favoured by the advances in that field and the improvement on different calculus meth-
ods, such as Markov-chain Monte Carlo methods.

Particularly, this Bayesian methodology has been resulted in an extraordinary useful application
for the regression models, which have been adopted by large. There are many times in real life in
which it is necessary to analyse the situation between two quantitive variables. The two main objec-
tives of this analysis would be, on the one hand, to determine whether such variables are associated
and in what sense that association comes about (that is, whether the value of one of the variables
tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study
whether the values of one variable can be used to predict the value of the other. A regression model
offers information about one or more events through their relationship with the behaviour of the oth-
ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,
making thus the results be more accurate due to the fact that the results are not isolated from the data
of one determined sample.

On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI
century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the

iv

v

one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic
data; furthermore, there have been developed more statistics methods for some types of symbolic data.

Nowadays, the requirements of the market, and the demands of the world in general, are growing
up. This implies the continuous increase of the desire for predicting the occurrence of an event or for
the ability of controlling the behaviour of certain quantities with the minimum error with the aim of
offering better products, obtaining more beneﬁts or scientiﬁc improvements and better outcomes.

Under this frame, this project tries to responds such needs by offering a large documentation
about several of the most applied and leading nowadays techniques, such as Bayesian data analysis,
regression models, and symbolic data, and suggesting different regression techniques. Similarly, it
has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such
application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-
ily. As far as the development of this tool is concerned, it has been used one of the more innovative
and with more projection languages of the moment: R.

So, the project is about a combination of the techniques that are most innovative and with the
most projection both in theoretical questions such as Bayesian regression applied to interval- valued
data and in practical questions such us the employment of the R language.

List of Figures

1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73
7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74
7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75
7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77
7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78
7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80
7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81
7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81
7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85
7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85
7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87

9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104
10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105
10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

vi

LIST OF FIGURES vii

C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C.4 Deﬁne New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.10 Conﬁrmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131
C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131
C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132
C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133
C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133
C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134
C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134
C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134
C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135
C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135
C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136
C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136
C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136
C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137
C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137
C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138

LIST OF FIGURES viii

C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138
C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres-
sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139
C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139
C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139
C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140
C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141
C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141
C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141
C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143
C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143
C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145
C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146
C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147
C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147
C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147
C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148

List of Tables

2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15
2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16

4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40
5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48
5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59
5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60

6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74
7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78
7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80
7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82

ix

LIST OF TABLES x

7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84
7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86

11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Contents

Acknowlegdements i

Resumen ii

Abstract iv

List of Figures vi

List of Tables x

Contents xvi

1 Introduction 1
1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Bayesian Data Analysis 6
2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10
2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Posterior Simulation 20
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

xi

CONTENTS xii

3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25
3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Sensitivity Analysis 28
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Regression Analysis 35
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48
5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49
5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51
5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Symbolic Data 61
6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67
6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70

CONTENTS xiii

7 Results 72
7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72
7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 A Guide to Statistical Software Today 88
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89
8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94
8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95
8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 96

9 Software Requirements Speciﬁcation 98
9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99
9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99
9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100
9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100

CONTENTS xiv

9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10 Software Architecture Study 103
10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Project Budget 106
11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

12 Conclusions 110
12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110
12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Probability Distributions 113
A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118

CONTENTS xv

A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

B Installation Guide 122
B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

C User’s Guide 123
C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128
C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133
C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136
C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139
C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140
C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143
C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144
C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146

CONTENTS xvi

D Obtaining and Installing R 149
D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

E Obtaining and installing Java Runtime Environment 152
E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153
E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography 157

Chapter 1

Introduction

1.1 Project Motivation

Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved
understanding of some underlying mechanism, or as a means for making informed rational decisions.
Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to
explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type
occur throughout all the physical, social and other sciences. One way of looking at statistics stems
from the perception that, ultimately, probability is the only appropriate way to describe and system-
atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference
statements are precisely framed as probability statements on the possible values of the unknown quan-
tities of interest (parameters or future observations) conditional on the observed, available data. The
scientiﬁc discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly
needed and sophisticated models, often hierarchical models, to describe available data are typically
too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.
In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since
some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should
be appreciated and used by everyone. It is the logic of contemporary society and science. According
to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has
to be done.

Bayesian methods have matured and improved in several ways during last ﬁfteen years. Actually,
they are increasingly becoming attractive to researchers as well as successful applications of Bayesian

1

1. Introduction

data analysis have been appeared in many different ﬁelds, including Actuarial Science, Biometrics,
Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that
the Bayesian approach produces appropriate answers to many current important problems, but also
there is an evident need for it, given the inapplicability of conventional statistics to many of them.

Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating
researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the
more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was
restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion
favoured by the development and improvement of different computational methods in this ﬁeld such
as Markov chain Monte Carlo.

This methodology has shown to be extremely useful in its application to regression models, which
are widely accepted. Let us remember that the general purpose of regression analysis is to learn more
about the relationship between several independent or predictor variables and a dependent or criterion
variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,
improving the results since they do not only depend on the sampling data.

On the other hand, increasingly, datasets are so large that they must be summarized in some fash-
ion so that the resulting summary dataset is of a more manageable size, while still retaining as much
knowledge inherent to the entire dataset as possible. One consequence of this situation is that data
may no longer be formatted as single values such as is the case for classical data, but rather may be
represented by lists, intervals, distributions, and the like. These summarized data are examples of
symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our
mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this
responds to the current need of changing from a Statistics of data in the past century to a Statistics of
knowledge in XXI century.

Market and demand requirements are increasing continuously throughout the time. This implies
a need of better and more accurate methods to forecast new situations and to control different quanti-
ties with the minimum error in order to supply better products, to obtain higher incomes or scientist
advantages and better results.

Dealing with this outlook, this project is intended to respond to those requirements providing a

2

1. Introduction

wide and exhaustive documentation about some of the currently more used and advanced techniques,
including Bayesian data analysis, regression models and symbolic data. Different examples related
to the Continuous Spanish Stock Market have been explained throughout this writing, making clear
the advantages of employing the described methods. Likewise a software tool with a user- friendly
graphical interface has been developed to practice and to check all the acquired knowledge.

Therefore, this is a project combining the most recent techniques with major future implications
in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological
part dealing with the problem of interconnecting two software programs: one used to show the graph-
ical user interface and the other one employed to make computations.

Regarding to a more personal motivation, when accepting this project, several factors were taken
into consideration by the author:

• A great challenge: it is an ambitious project with a high technical complexity related to both its
theoretical basis and its technological basis. This represents a very good letter of introduction
in order to be incorporated to the labour world.

• A good planning time: this project was designed to be ﬁnished before June of 2007, which
means to be able of ﬁnishing the career in June and incorporating to labour world in September.

• Some very interesting issues: on one hand, it deals with the always needed issue of forecasting
and modelling observations and situations in order to get the best possible results. On the other
hand, it focuses on the Stock Market, which meets my personal hobbies.

• A new programming language: the possibility of learning deeply a new and relatively recent
programming language, such as R, was an extra- motivation factor.

• The project director: Carlos Mat´ is considered a demanding and very competent director by
e
the students of the university.

• An investigation scholarship: The possibility of being in the Industrial Organization department
of the University learning from people such as the director mentioned above and another very
recognized professors was a great factor.

3

1. Introduction

1.2 Objectives

This project pretends to get the following aims.

• To provide a wide and rigorous documentation about the following issues: Bayesian data anal-
ysis, regression models and symbolic data. From this point, documentation about Bayesian
regression will be developed, as well as the software tool designed.

• To build a software tool in order to fit Bayesian regression models to interval- valued data,
finding out the most efficient way to design the graphical user interface. This must be as user-
friendly as possible.

• To find out the most efficient way to offer that system to future clients from the tests carried out
with the application.

• To design a survey to measure the quality of the tool and users’ satisfaction.

• The possibility to write an article for a scientific journal.

1.3 Methodology

As the title of the project indicates, the last purpose is the development of an application aimed to-
wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge
is required.

The first stage is the familiarization of the Bayesian data analysis, regression models applied to
Bayesian methodology and symbolic data.

Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get
the most important elements. A special dedication will be given to posterior simulation and computa-
tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,
to deep later into the different Bayesian regression models, applying great part of what was explained
in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic
data, paying special attention to interval- valued data.

The second stage is referred to the development of the software application, employing an incre-
mental methodology for programming and testing iterative prototypes. This methodology has been

4

1. Introduction

considered the most suitable for this project since it will let us introduce successive models into the
application.

The following ﬁgure shows the structure of the work packages the project is divided into:

Figure 1.1: Project Work Packages

5

Chapter 2

Bayesian Data Analysis

2.1 What is Bayesian Data Analysis?

Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,
to summarize and to analyze a set of data.

Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and
confirmatory data analysis. The former is used to represent, describe and analyze a set of data through
simple methods in the first stages of statistical analysis. The latter is applied to make inferences from
data, based on probability models.

In the same way, confirmatory data analysis is divided into two branches depending on the adopted
approach. The first one, known as frequentist, is used to make the inference of the data resulting from
a sampling through classical methods. The second branch, known as Bayesian, goes further in the
analysis and adds to those data the prior knowledge which the researcher has about the treated prob-
lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision
of different classical methods related to the frequentist approach can be found in [Mont02].


 Exploratory


Data Analysis Frequentist
 Confirmatory

 Bayesian

6

2. Bayesian Data Analysis

As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided
into the following three steps:

• To set up a full probability model, through a joint probability distribution for all observable and
unobservable quantities in a problem.

• To condition on observed data, obtaining the posterior distribution.

• Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu-
tion.

f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),
is obtained by means of

f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1)

where y is the set of sampled data. So this distribution is the product of two densities that are referred
to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)).

The sampling distribution, as its name suggests, is the probability model that the researcher as-
signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,
an important problem stands up in relation to parametric approach due to the fact that the probability
model that the researcher chooses could not be adequate. The nonparametric approach overcomes
this inconvenient as it will be seen later.

When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the
likelihood function and obeys the likelihood principle, which states that for a given sample of data,
any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same
inference for θ, (resp. θ).

The prior distribution does not depend upon the data. Accordingly, it contains the information
and the knowledge that the researcher has about the situation or problem to be solved. When there
is not any previous significant population from which the engineer can take his knowledge, that is,
the researcher has not any prior information about the problem, a non-informative prior distribution
must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that
the prior knowledge will have very little importance in the results. But most non- informative priors

7


are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases
it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an
informative prior distribution but with an insignificant weight (around zero) associated to it.

Though the prior distribution can take any form, it is common to choose particular classes of
priors that make computation and interpretation easier. These are the conjugate priors. A conjugate
prior distribution is one which, when combined with the likelihood function, gives a distribution that
falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural
conjugate prior has the additional property that it has the same form as the likelihood does. But it is
not always possible to find this kind of distribution and the researcher has to manage a lot of distribu-
tions to be able to give expression to his prior knowledge about the problem. This is another handicap
that the nonparametric approach reduces.

In relation to the prior, what distribution should be chosen? There are three different points of
view corresponding to different styles of Bayesians:

• Classical Bayesians consider that the prior is a necessary evil and priors that interject the least
information possible should be chosen.

• Modern parametric Bayesians considers that the prior is a useful convenience and priors with
desirable properties such as conjugacy should be chosen. They remark that given a distribu-
tional choice, prior hyper-parameters that interject the least information possible should be
chosen.

• Subjective Bayesians give essential importance to the prior, in the sense they consider it as a
summary of old beliefs. So prior distributions which are based on previous knowledge (either
the results of earlier studies or non-scientific opinion) should be chosen.

Returning to Bayesian data analysis process, simply conditioning on the observed data y and
applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields:

f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ)
f (θ|y) = = (resp. f (θ|y) = = ) (2.2)
f (y) f (y) f (y) f (y)
where

∞ ∞ ∞
f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3)
0 0 0

8


is known as the prior predictive distribution, since it is not conditional upon a previous observation of
the process and is applied to an observable quantity.

An equivalent form of the posterior distribution displayed above omits the prior predictive distri-
bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).
So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability
distribution f (θ, y).

Once the posterior distribution is calculated, some kind of summary measure will be required to
estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior
distribution is a high- dimensional object and its use is not practical for a problem. That measure
which will summarize the posterior distribution can be the posterior mean, mode, median or variance,
apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-
tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)
and provide him information about it (resp. them) taking into account both his prior knowledge and
the data collected from sampling on that parameter.

According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non-
e
Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the
same as the one resulting from the sampling.

Once the data y have been observed, a new unknown observable quantity y can be predicted for
˜
the same process through the posterior predictive distribution, namely f (˜|y):
y

f (˜|y) =
y f (˜, θ|y)dθ =
y f (˜|θ, y)dθ =
y f (˜|θ)f (θ|y)dθ
y (2.4)

To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by
observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a
prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said.

9


Distribution Expression Information Required Result

Likelihood f (y|θ) Data Distribution f (y|θ)

Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ)

Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y)

Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y)

Predictive f (˜|θ)f (θ|y)dθ
y New Data Distribution Posterior Distribution f (˜|y)
y

Table 2.1: Distributions in Bayesian Data Analysis

2.2 Bayesian Analysis for Normal and other distributions

2.2.1 Univariate Normal distribution

The basic model to be discussed concerns an observable variable , normally distributed with mean µ
and unknown variance σ 2 :

y|µ, σ 2 N (µ, σ 2 ) (2.5)

As it can be seen in Appendix A, the likelihood function for a single observation is

1
f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6)
2σ 2
This means that the likelihood function is proportional to a Normal distribution, omitting those
terms that are constant.

Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-
ous section, the parameters to be estimated θ are µ and σ 2 :

10


θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7)

A full probability model must be set up through a joint probability distribution:

f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8)

The likelihood function for a sample of n iid observations in this case is

n
1
f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9)
2σ 2
i=1

As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural
conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-
tion of the form

f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10)

where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ
given σ 2 is Normal (details about these distributions in Appendix A):

µ|σ 2 N (µ0 , σ 2 V0 ) (2.11)
σ2 Inv − χ2 (µ0 , s2 )
0 (2.12)

So the joint prior distribution is:

f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 )
0 0 (2.13)

Its four parameters can be identiﬁed as the location and scale of µ and the degrees of freedom and
scale of σ 2 , respectively.

As a natural conjugate prior was employed, the posterior joint distribution will have the same
form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have:

f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 )
1 1 (2.14)

where it be can shown that

11


µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯
y (2.15)
−1
V1 = V0−1 + n (2.16)
ν1 = ν0 + n (2.17)
V0−1 n
ν1 s2 = ν0 s2 + (n − 1)s2 +
1 0 (¯ − µ0 )2
y (2.18)
V0−1 + n

All these formulae evidence that Bayesian inference combines prior and posterior information.

The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical
mean divided by the sum of their respective weights, where these are represented by V0−1 and the
simple size n.

The second term represents the importance that posterior mean has and it can be seen as a com-
promise between the sample size and the significance given to the prior mean.

The third term indicates that the degrees of freedom of posterior variance are the sum of the prior
degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a
fictitious sample size on which the expert’s prior information is based.

The last term explains the posterior sum of square errors as a combination of prior and empirical
sum of square errors plus a term that measures the conflict between prior and posterior information.

A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06].

It is obvious that the marginal posterior distributions are:

µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19)
σ 2 |y Inv − χ2 (ν1 , s2 )
1 (2.20)

If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details):

µ|y tν1 (µ1 , s2 V0 )
1 (2.21)

12


Let us see an application to the Spanish Stock market. Let us suppose that the monthly close
values associated with Ibex 35 are normally distributed. If we take the values at which the Span-
ish index closed during the first two weeks in January in 2006, it can be shown that the mean was
10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a
Normal distribution with the previous mean and standard deviation. Let us guess that we had asked
any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would
decrease slightly, the mean close value at the end of the month would be around 10870 and, hence,
the standard deviation would be higher, around 100. Then, according to the previous formulas, the
posterior parameters would be

µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12
V1 = (100 + 10)−1 = 0.0091
ν1 = 100 + 10 = 110
(100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 )
110
s1 = = 95.60
110

This means that there is a difference of almost 20 points between the Bayesian estimation and the
non-Bayesian for the mean close value of January. When the month of January would have passed, we
could compare both results and we could note that the Bayesian estimation was closer to the finally
real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how
the blue line representing the Bayesian estimation is closer to the cyan line representing the final real
mean close value than the red line representing the frequentist estimation:

2.2.2 Multivariate Normal distribution

Now, let us consider that we have an observable vector y of d components with the multivariate
Normal distribution:

y N (µ, Σ) (2.22)

where the first parameter is the mean column vector and the second one is the variance-covariance
matrix.

Extending what was said above to the multivariate case, we have:

13


−3
x 10
7
Frequentist Approach
Bayesian Approach
6 Real Mean Colse Value in January

5

4

3

2

1

0
10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000

Figure 2.1: Univariate Normal Example

1
f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23)
2
And for n iid observations:

n
−n/2 1
f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24)
2
i=1

A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see
details in Appendix A), so the prior joint distribution is

Λ0
f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25)
k0
due to the fact that

Σ
µ|Σ N µ0 , (2.26)
k0
Σ Inv − W ishart ν0 , Λ−1
0 (2.27)

14


Univariate Normal Multivariate Normal

Expression y N (µ, σ 2 ) y N (µ, Σ)

Parameters to estimate µ, σ 2 µ, Σ

2
µ|σ 2 N µ0 , σ0
k µ|Σ Σ
N µ0 , k0

Prior Distributions σ2 Inv − χ2 ν0 , σ0
2 Σ Inv − W ishart ν0 , Λ−1
0

2
σ0
µ, σ 2 N − Inv − χ2 µ0 , 2
k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1
Σ
0

2
µ|σ 2 N µ1 , σ1
k µ|Σ Σ
N µ1 , k1

Posterior Distributions σ2 Inv − χ2 ν1 , σ1
2 Σ Inv − W ishart ν1 , Λ−1
1

2
σ1
µ, σ 2 N − Inv − χ2 µ1 , 2
k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1
k
1

Table 2.2: Comparison between Univariate and Multivariate Normal

The posterior results are the same that were told for the univariate case but applying these distri-
butions. For those interested readers, more information in [Gelm04] or [Cong06].

A summary is shown in Table 2.2 in order to get the most important ideas.

2.2.3 Other distributions

As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could
be done. For instance, the exponential distribution is commonly used in reliability analysis. Because
of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail
the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions

15


for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06].

Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters

Bin(y|n, θ) θ Beta α, β α + y, β + n − y

P (y|θ) θ Gamma α, β α + n¯, β + n
y

Exp (y|θ) θ Gamma α, β α + 1, β + y

Geo(y|θ) θ Beta α, β α + 1, β + y

Table 2.3: Conjugate distributions for other likelihood distributions

2.3 Hierarchical Models

Hierarchical data arise when they are structured or related among them. When this occurs, standard
techniques either assume that these groups belong to entirely different populations or ignore the ag-
gregate information entirely.

Hierarchical models provide a way of pooling the information for the disparate groups without
assuming that they belong to precisely the same population.

Suppose we have collected data about some random variable Y from m different populations with
n observations for each population.

Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vector
of parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until this
point, we have only rewritten what it was said previously.

16


Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution
of the θ’s are themselves random variables and assign a prior distribution to these variables as well:

Θ f (ψ) (2.28)

where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and
represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for
these quantities as well, and proceed to another layer of hierarchy.

According to [Gelm04], the idea of exchangeability will be used to create a joint probability
distribution model for all the parameters θ. A formal deﬁnition to explain what exchangeability
consists of is:
”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is
invariant to permutations in the index 1, 2, . . . , n”.

This means that if no information other than the data is available to distinguish any of the θi from
any of the others, and no ordering of the parameters can be made, one must assume symmetry among
the parameters in the prior distribution. So we can treat the parameters for each sub-population as
exchangeable units. This can be formulated by:

f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ
i=1 (2.29)

The prior joint distribution is now:

f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30)

And conditioning on the data, it yields:

f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31)

Perhaps the most important point in practice is that non-hierarchical models are usually inappro-
priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical
structure and assigning concrete values to the hyperprior parameters.

This kind of models will be used in Bayesian regression models with autocorrelated errors, as it
will be seen in the following chapters.

17


For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]
and [Rossi06].

2.4 Nonparametric Bayesian

To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric
approach which achieves to get through and to reduce the restrictions of the parametric approach.
This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to
express in a simple way the prior distributions or the distribution family of F , where F is the distri-
bution function of the studied variable. This process has a parameter, called α, which is transformed
into a distribution probability.

According to [Mat´ 06], a Dirichlet Process for F (t) requires to know:
e

• A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks
the prior knowledge which the engineer has and it is denoted by

α(t)
F0 (t) = (2.32)
M
• A measure of the conﬁdence about the previous proposal, denoted by M , and whose values can
vary between 0 and ∞, depending on whether there is a total conﬁdence in the data or in the
previous proposal respectively.

ˆ
It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,
is given by

ˆ
Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33)
M
where Fn (t) is the empirical distribution function and pn = M +n .

A more detailed information about the nonparametric approach and how Dirichlet processes are
used can be found in [Mull04] or [Gosh03].

18


With this approach not only the limitation of the parametric approach related to the probability
model of the variable to study is avoided, since no hypothesis is required, but also it allows us to
confer a quantiﬁed importance to the prior knowledge which the engineer gives, depending on the
conﬁdence on the certainty about this knowledge.

19

Chapter 3

Posterior Simulation

3.1 Introduction

A practical problem with Bayesian inference is the difficulty of summarizing realistically complex
posterior distributions. In most practical problems, posterior densities will not take the form of any
well-known and understood density, so summary statistics, such as the posterior mean and variance of
parameters of interest, will not be analytically available. It is at this point where the importance of the
Bayesian computation arises and any computational tools are required to gain meaningful inference
from the posterior distribution. Its importance is such that the computing revolution of the last 20
years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or
Health.

Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo
methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested
in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The
original idea was subsequently generalized by [Hast70], but its true potential was not fully realized
within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-
grals commonly occurring in the context of Bayesian statistical inference.

As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from
a specific probability distribution then design a Markov chain whose long-time equilibrium is that
distribution, write a computer program to simulate the Markov chain, run it for a time long enough
to be confident that approximate equilibrium has been attained, then record the state of the Markov

20

3. Posterior Simulation

chain as an approximate draw from equilibrium.

The technique has been developed strongly in different fields and with rather different emphases
in the computer science community concerned with the study of random algorithms (where the em-
phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the
spatial statistics community (where one is interested in understanding what kinds of patterns arise
from complex stochastic models), and also in the applied statistics community (where it is applied
largely in Bayesian contexts, enabling researchers to formulate statistical models which would other-
wise be resistant to effective statistical analyses).

The development of the theoretical work also benefits the development of statistical applications.
The MCMC simulation techniques have been applied to develop practical statistical inferences for
almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-
age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical
models such as GARCH and stochastic volatility.

The simplicity of the underlying principle of MCMC is a major reason for its success. However
a substantial complication arises as the underlying target problem becomes more complex; namely,
how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to
[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,
but in some cases more samples are needed to assure more accuracy.

3.2 Markov chains

The essential theory required in developing Monte Carlo methods based on Markov chains is pre-
sented here. The most fundamental result is that certain Markov chains converge to a unique invariant
distribution, and can be used to estimate expectations with respect to this distribution. But in order to
reach this conclusion, some concepts need to be defined firstly.

A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in
which only the value of Xn−1 influences the distribution of Xn . Formally:

P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1)

21


where the Xn−1 have a common range called the state space of the Markov chain.

The common language to refer to different situations in which a Markov chain can be found is
the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value
i in the step n. This language confers the chain certain dynamic view, which is corroborated by the
main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the
transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability
of changing of state i to state j.

Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-
tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a
Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j.

Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition
matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n .

On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and
irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-
mogenous Markov chain.

Then, vector P is an invariant distribution of the chain Xt if satisfies:

a) πj ≥ 0 such as j πj = 1.

b) π = πP .

That is, a stationary distribution over the states of a Markov chain is one that persists forever once
it is reached.

The concept of ergodic state requires making other definitions clear such as recurrence and aperi-
odicity:

• The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient.
Moreover, i will be positive recurrent if the expected (average) return time is finite, and null
recurrent if it is not.

22


• The period of a state i, denoted by d, is deﬁned as di = mcd(n : [Pn ]ii > 0). The state i is
aperiodic if di = 1, or periodic if it is greater.

Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to deﬁne is the
irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all
i, j ∈ C:

• i and j have the same period.

• i is transient if and only if j is transient.

• i is recurrent if and only if j is null recurrent.

Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-
tion with next lemma:

Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one
stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have
inputs given by πi = µi −1 , where µi denotes the expected return time of the state i.

The relation with the long time behaviour is given by this other lemma:

Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then

1
[Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2)
µi

3.3 Monte Carlo Integration

Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from
the posterior distribution p(θ|y) and averaging

n
1
E[g(θ)] = g(θt ) (3.3)
n
t=1

where the function g(θ) represents the function of interest to estimate. Note that if samples
θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain.

23


3.4 Gibbs sampler

In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the
parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the
full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-
ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model
it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be
p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent
model which will be explained later.

The Gibbs sampler is deﬁned by iterative sampling from each of those p conditional distributions:

1. Set a starting value, θ0 = (θ2 , . . . , θp ).
0 0

2. Take random draws
1 0 0
- θ1 from p(θ1 |y, θ2 , . . . , θp )
1 1 0
- θ2 from p(θ2 |y, θ1 , . . . , θp )
.
-.
.
1 1 1
- θp from p(θp |y, θ1 , . . . , θp−1 )

3. Repeat step 2 as necessary.

4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the ﬁrst p − 1 draws, and average the rest
0 0

of draws applying the Monte Carlo integration.

For instance, in the Normal regression model we would have:

1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ).
0
2

2. Take random draws

- θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 )
1 1 0
2

- θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β)
1
2
1

3. Repeat step 2 as necessary.

1 1
4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration.

24


Those values dropped which are affected by the starting point are called the burn-in. Generally,
any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the
burn-in period is the subject of current research in MCMC methods.

As the state of each draw depends on the state of the previous one, the sequence is a Markov
chain. More detail information can found in [Chen00], [Mart01] or [Rossi06].

3.5 Metropolis-Hastings sampler and its special cases

3.5.1 Metropolis-Hastings sampler

The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.
Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where
some of the conditional posterior distributions are easy to sample from and other ones are not. As
the algorithms above explained, this is based on formulating a Markov chain, but using a proposal
distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .
This proposal is accepted as the next state with probability given by

p(θ∗ |y)q(θt |θ∗ )
α(θt , θ∗ ) = min 1, (3.4)
p(θt |y)q(θ∗ |θt )
If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to
[Mart01], the steps to follow are:

1. Initialize the chain to θ0 and set t=0.

2. Generate a candidate point θ∗ from q(.|θt ).

3. Generate U from a uniform (0,1) distribution.

4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt .

5. Set t=t+1 and repeat steps 2 trough 5.

6. Take the average of the draws g(θ1 ), . . . , g(θn )

Note that it should be, not only recommendable, but also essential that the proposal distribution
q(·|θt ) were easy to sample from.

25


There are some special cases of this method. The most important are brieﬂy explained below. As
well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of
the Metropolis-Hastings algorithm where the proposal point is always accepted.

3.5.2 Metropolis sampler

This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution
has to be symmetric. That is,

q(θ∗ |θt ) = q(θt |θ∗ ) (3.5)

for all θ∗ and θt . Then, the probability of accepting the new point is

p(θ = θ∗ |y)
α(θt , θ∗ ) = min 1, (3.6)
p(θ = θt |y)
The same procedure seen in the Metropolis-Hastings sampler has to be followed.

3.5.3 Random-walk sampler

This special case refers to a proposal distribution of the form

q(θ∗ |θt) = q(|θt − θ∗ |) (3.7)

And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.
Then, the probability of accepting the new point is

p(θ = θ∗ |y)
α(θt , θ∗ ) = min 1, (3.8)
p(θ = θt |y)

3.5.4 Independence sampler

The last variation has a proposal distribution such that

q(θ∗ |θt ) = q(θ∗ ) (3.9)

So it does not depend on θt . Then, the probability of accepting the new point is

26


p(θ∗ |y)p(θt ) w(θ∗ )
α(θt , θ∗ ) = min 1, = min 1, (3.10)
p(θt |y)p(θ∗ ) w(θt )
where

p(θ|y)
w(θ) = (3.11)
q(θ)
It is important to remark that to make this method works well, the proposal distribution q should
be very similar to the posterior distribution p(θ|y).


3.6 Importance sampling

Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.
The idea behind this method is that certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others. So instead of taking a simple average,
importance sampling takes a weighted average.

Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)
is called the importance function, and the importance sampling can be deﬁned:

PS (s) )g(θ (s) )
s=1 w(θ p(θ=θ(s) |y)
The function gs =
ˆ PS (s) )
, where w(θ(s) ) = q(θ=θ (s) )
, converges to E[g(θ)|y] as
s=1 w(θ
S −→ inf.

p∗ (θ|y)
In fact, w(θ(s) ) can be formulated by w(θ(s) ) = q ∗ (θ|y) , where the new densities are proportional
to the old ones.

For more information and details about Markov chain Monte Carlo methods and their application,
the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05].

27

Bayesian Regression for Interval Data

Bayesian Regression for Interval Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Bayesian Regression for Interval Data

Ähnlich wie Bayesian Regression for Interval Data (20)

Bayesian Regression for Interval Data