SlideShare ist ein Scribd-Unternehmen logo
1 von 178
Downloaden Sie, um offline zu lesen
Autorizada la entrega del proyecto del alumno:

              Rub´ n Salgado Fern´ ndez
                 e               a




        EL DIRECTOR DEL PROYECTO

              Carlos Mat´ Jim´ nez
                        e    e




Fdo.:                            Fecha: 12/06/2007




Vo Bo DEL COORDINADOR DE PROYECTOS

           Claudia Meseguer Velasco




Fdo.:                            Fecha: 12/06/2007
UNIVERSIDAD PONTIFICIA DE COMILLAS


        ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI)
                 ´                          IA



                                  ´
           INGENIERO EN ORGANIZACION INDUSTRIAL




         PROYECTO FIN DE CARRERA


    Bayesian Regression System
      for Interval-Valued Data.
Application to the Spanish Continuous
             Stock Market




                          AUTOR : Salgado Fern´ ndez, Rub´ n
                                              a          e

                                      M ADRID , Junio 2007
Acknowlegdements

Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of
                                                      e    e
making this project. With him, I have learnt, not only about Statistics and investigation, but also
about how to enjoy with them.


   Special thanks to my parents. Their love and all they have taught me in this life are the things
what have made possible being the person I am now.


   Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time.


   Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving
me the inspiration to go ahead.



                                                                                    Madrid, June 2007




                                                   i
Resumen

       ´
En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma
                n         e
exitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer-
                                                                            ı            ı
cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN-
                                       ı                             a
BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de
                                             o                                o
los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que
                                           e
se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n,
                                 u                                                               o
                           a                          ´
con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo
                                                               a a
han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita-
                                              n
ban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal
                                                                      ı
desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta
expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y
       o
perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte
                                e           a                 e
Carlo.


                               ı                                     ´
    En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod-
                                                                                        o
elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones
               o                               u                           a
en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun-
                                         o
damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en
                     a          a                                                   a
qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar
  e                             o
-o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable
pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar
                                                                             o
informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con
         o                                    e               o
la metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo
            ı                                                                        a
los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra.
                a                           ı




                                                   ii
iii

                         a                                            ´
    Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a
                                                                                        ı
ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica
                         ı                                                                         ı
de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han
                            a                                ı                        o
desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos.
              e            ı                                       o


    En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto
implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el
comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro-
ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados.
                                                   ı


    Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una
amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como
                  o                        e         a                 a               ı
son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo
         a                                                 o                   o
diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner
            e                  o                                 a
en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til
     a                                                         o        a                         a
espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta
    n              a
herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R.
                      a                       a                    a           o


    Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n
                                                         e         a                                 o
tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en
                   o                         o
materia pr´ ctica, como es el empleo del lenguaje R.
          a
Abstract

In the recent years, Bayesian methods have been spread and successfully used in many and several
fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char-
acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives
is that not only does it take into account the objective information coming from the analyzed event,
but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to
the fact that the more knowledge of the situation one has, the more reliable and accurate decisions
could be taken. However, although Bayesian methodology was set long time ago, it has not been
applied in a general way until the 90’s because of the computational difficulties. Such expansion has
been mainly favoured by the advances in that field and the improvement on different calculus meth-
ods, such as Markov-chain Monte Carlo methods.


    Particularly, this Bayesian methodology has been resulted in an extraordinary useful application
for the regression models, which have been adopted by large. There are many times in real life in
which it is necessary to analyse the situation between two quantitive variables. The two main objec-
tives of this analysis would be, on the one hand, to determine whether such variables are associated
and in what sense that association comes about (that is, whether the value of one of the variables
tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study
whether the values of one variable can be used to predict the value of the other. A regression model
offers information about one or more events through their relationship with the behaviour of the oth-
ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis,
making thus the results be more accurate due to the fact that the results are not isolated from the data
of one determined sample.


    On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI
century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the


                                                   iv
v

one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic
data; furthermore, there have been developed more statistics methods for some types of symbolic data.


   Nowadays, the requirements of the market, and the demands of the world in general, are growing
up. This implies the continuous increase of the desire for predicting the occurrence of an event or for
the ability of controlling the behaviour of certain quantities with the minimum error with the aim of
offering better products, obtaining more benefits or scientific improvements and better outcomes.


   Under this frame, this project tries to responds such needs by offering a large documentation
about several of the most applied and leading nowadays techniques, such as Bayesian data analysis,
regression models, and symbolic data, and suggesting different regression techniques. Similarly, it
has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such
application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas-
ily. As far as the development of this tool is concerned, it has been used one of the more innovative
and with more projection languages of the moment: R.


   So, the project is about a combination of the techniques that are most innovative and with the
most projection both in theoretical questions such as Bayesian regression applied to interval- valued
data and in practical questions such us the employment of the R language.
List of Figures

 1.1   Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       5

 2.1   Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        14

 6.1   Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   62

 7.1   Classical Regression with single values in training test . . . . . . . . . . . . . . . .     73
 7.2   Classical Regression with single values in testing test . . . . . . . . . . . . . . . . .    74
 7.3   Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . .      75
 7.4   Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . .     75
 7.5   Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . .      76
 7.6   Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . .       77
 7.7   Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . .      78
 7.8   Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . .        80
 7.9   Classical Regression with single values in training test . . . . . . . . . . . . . . . .     81
 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . .     81
 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . .      82
 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . .       83
 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . .        85
 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . .       85
 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . .        87

 9.1   BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104
 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105
 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


                                                vi
LIST OF FIGURES                                                                                   vii

  C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
  C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
  C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
  C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
  C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
  C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
  C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
  C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
  C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
  C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
  C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
  C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
  C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
  C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131
  C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131
  C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
  C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132
  C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133
  C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133
  C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134
  C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134
  C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134
  C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135
  C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135
  C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136
  C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136
  C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136
  C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137
  C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137
  C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
LIST OF FIGURES                                                                                  viii

  C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138
  C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres-
       sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
  C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139
  C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139
  C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139
  C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
  C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140
  C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140
  C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141
  C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141
  C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141
  C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
  C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142
  C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143
  C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143
  C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
  C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144
  C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145
  C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
  C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145
  C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146
  C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147
  C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147
  C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147
  C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
List of Tables

 2.1   Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . .        10
 2.2   Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . .            15
 2.3   Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . .       16

 4.1   Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      29
 4.2   Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        33
 4.3   Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       34

 5.1   Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . .            40
 5.2   Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . .      45
 5.3   Sensitivity analysis of parameter   σ2   . . . . . . . . . . . . . . . . . . . . . . . . . .   46
 5.4   Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . .         48
 5.5   Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .         57
 5.6   Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . .         58
 5.7   Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . .         59
 5.8   Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . .          60

 6.1   Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         63
 6.2   Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          64

 7.1   Error Measures for Classical Regression with single values . . . . . . . . . . . . . .         74
 7.2   Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . .         76
 7.3   Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . .         77
 7.4   Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . .          78
 7.5   Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . .            80
 7.6   Error Measures for Classical Regression with single values . . . . . . . . . . . . . .         82



                                                  ix
LIST OF TABLES                                                                                   x

  7.7   Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . .   83
  7.8   Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . .   84
  7.9   Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . .    84
  7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . .       86

  11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
  11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
  11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Contents

Acknowlegdements                                                                                          i

Resumen                                                                                                  ii

Abstract                                                                                                 iv

List of Figures                                                                                          vi

List of Tables                                                                                           x

Contents                                                                                                xvi

1 Introduction                                                                                           1
   1.1     Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1
   1.2     Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4
   1.3     Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       4

2 Bayesian Data Analysis                                                                                 6
   2.1     What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . .         6
   2.2     Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . .       10
           2.2.1   Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . .     10
           2.2.2   Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . .     13
           2.2.3   Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    15
   2.3     Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    16
   2.4     Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       18

3 Posterior Simulation                                                                                  20
   3.1     Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   20

                                                    xi
CONTENTS                                                                                              xii

   3.2   Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    21
   3.3   Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      23
   3.4   Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    24
   3.5   Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . .      25
         3.5.1   Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . .      25
         3.5.2   Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     26
         3.5.3   Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        26
         3.5.4   Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       26
   3.6   Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      27

4 Sensitivity Analysis                                                                                28
   4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   28
   4.2   Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     29
   4.3   Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . .    30
   4.4   Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . .    31
   4.5   Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         32

5 Regression Analysis                                                                                 35
   5.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   35
   5.2   Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     36
   5.3   The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      39
   5.4   Normal Linear Regression Model subject to inequality constraints . . . . . . . . . .         48
   5.5   Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . .           49
   5.6   Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . .           51
         5.6.1   Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     53
         5.6.2   Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    54
   5.7   Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       56

6 Symbolic Data                                                                                       61
   6.1   What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . .      61
   6.2   Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    65
   6.3   Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . .    67
   6.4   Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . .       70
CONTENTS                                                                                              xiii

7 Results                                                                                              72
   7.1   Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . .         72
   7.2   Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .       72
   7.3   Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      79

8 A Guide to Statistical Software Today                                                                88
   8.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    88
   8.2   Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       89
         8.2.1   The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . .       89
         8.2.2   Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     90
         8.2.3   BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        90
         8.2.4   SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      91
         8.2.5   S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      91
         8.2.6   Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    92
   8.3   Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       93
         8.3.1   R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     93
         8.3.2   BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      93
   8.4   Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . .      94
         8.4.1   Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      94
         8.4.2   Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       95
         8.4.3   Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    95
   8.5   Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . .         95
         8.5.1   Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    95
         8.5.2   C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     96
   8.6   Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . .              96

9 Software Requirements Specification                                                                   98
   9.1   Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     98
   9.2   Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       98
   9.3   Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     99
         9.3.1   Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . .      99
         9.3.2   Classical Regression with interval- valued data . . . . . . . . . . . . . . . .       99
         9.3.3   Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.4   Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
CONTENTS                                                                                            xiv

         9.3.5   Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.6   Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
         9.3.7   Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
   9.4   External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
         9.4.1   User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
         9.4.2   Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10 Software Architecture Study                                                                      103
   10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
   10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Project Budget                                                                                   106
   11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
   11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
         11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

12 Conclusions                                                                                      110
   12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110
   12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
   12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
   12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Probability Distributions                                                                         113
   A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
         A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
         A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
         A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
   A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
         A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
         A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
         A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
         A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
         A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
CONTENTS                                                                                           xv

        A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
        A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
        A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
        A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
        A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
        A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

B Installation Guide                                                                              122
   B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
   B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

C User’s Guide                                                                                    123
   C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.1   Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.2   Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
        C.1.3   Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
        C.1.4   Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127
        C.1.5   Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
        C.1.6   Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128
        C.1.7   Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
   C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
        C.2.1   Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
        C.2.2   Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
   C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
        C.3.1   Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131
        C.3.2   Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133
        C.3.3   Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136
        C.3.4   Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139
   C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
        C.4.1   Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140
        C.4.2   Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143
        C.4.3   Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144
        C.4.4   Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
CONTENTS                                                                                         xvi

D Obtaining and Installing R                                                                     149
   D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
   D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
   D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

E Obtaining and installing Java Runtime Environment                                              152
   E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
   E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
        E.2.1   Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153
        E.2.2   Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
   E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography                                                                                     157
Chapter 1

Introduction

1.1 Project Motivation

Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved
understanding of some underlying mechanism, or as a means for making informed rational decisions.
Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to
explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type
occur throughout all the physical, social and other sciences. One way of looking at statistics stems
from the perception that, ultimately, probability is the only appropriate way to describe and system-
atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference
statements are precisely framed as probability statements on the possible values of the unknown quan-
tities of interest (parameters or future observations) conditional on the observed, available data. The
scientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly
needed and sophisticated models, often hierarchical models, to describe available data are typically
too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics.
In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since
some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should
be appreciated and used by everyone. It is the logic of contemporary society and science. According
to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has
to be done.


    Bayesian methods have matured and improved in several ways during last fifteen years. Actually,
they are increasingly becoming attractive to researchers as well as successful applications of Bayesian


                                                   1
1. Introduction


data analysis have been appeared in many different fields, including Actuarial Science, Biometrics,
Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that
the Bayesian approach produces appropriate answers to many current important problems, but also
there is an evident need for it, given the inapplicability of conventional statistics to many of them.


    Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating
researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the
more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was
restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion
favoured by the development and improvement of different computational methods in this field such
as Markov chain Monte Carlo.


    This methodology has shown to be extremely useful in its application to regression models, which
are widely accepted. Let us remember that the general purpose of regression analysis is to learn more
about the relationship between several independent or predictor variables and a dependent or criterion
variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis,
improving the results since they do not only depend on the sampling data.


    On the other hand, increasingly, datasets are so large that they must be summarized in some fash-
ion so that the resulting summary dataset is of a more manageable size, while still retaining as much
knowledge inherent to the entire dataset as possible. One consequence of this situation is that data
may no longer be formatted as single values such as is the case for classical data, but rather may be
represented by lists, intervals, distributions, and the like. These summarized data are examples of
symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our
mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this
responds to the current need of changing from a Statistics of data in the past century to a Statistics of
knowledge in XXI century.


    Market and demand requirements are increasing continuously throughout the time. This implies
a need of better and more accurate methods to forecast new situations and to control different quanti-
ties with the minimum error in order to supply better products, to obtain higher incomes or scientist
advantages and better results.


    Dealing with this outlook, this project is intended to respond to those requirements providing a

                                                   2
1. Introduction


wide and exhaustive documentation about some of the currently more used and advanced techniques,
including Bayesian data analysis, regression models and symbolic data. Different examples related
to the Continuous Spanish Stock Market have been explained throughout this writing, making clear
the advantages of employing the described methods. Likewise a software tool with a user- friendly
graphical interface has been developed to practice and to check all the acquired knowledge.


    Therefore, this is a project combining the most recent techniques with major future implications
in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological
part dealing with the problem of interconnecting two software programs: one used to show the graph-
ical user interface and the other one employed to make computations.


    Regarding to a more personal motivation, when accepting this project, several factors were taken
into consideration by the author:

    • A great challenge: it is an ambitious project with a high technical complexity related to both its
       theoretical basis and its technological basis. This represents a very good letter of introduction
       in order to be incorporated to the labour world.

    • A good planning time: this project was designed to be finished before June of 2007, which
       means to be able of finishing the career in June and incorporating to labour world in September.

    • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting
       and modelling observations and situations in order to get the best possible results. On the other
       hand, it focuses on the Stock Market, which meets my personal hobbies.

    • A new programming language: the possibility of learning deeply a new and relatively recent
       programming language, such as R, was an extra- motivation factor.

    • The project director: Carlos Mat´ is considered a demanding and very competent director by
                                      e
       the students of the university.

    • An investigation scholarship: The possibility of being in the Industrial Organization department
       of the University learning from people such as the director mentioned above and another very
       recognized professors was a great factor.




                                                   3
1. Introduction


1.2 Objectives

This project pretends to get the following aims.

    • To provide a wide and rigorous documentation about the following issues: Bayesian data anal-
       ysis, regression models and symbolic data. From this point, documentation about Bayesian
       regression will be developed, as well as the software tool designed.

    • To build a software tool in order to fit Bayesian regression models to interval- valued data,
       finding out the most efficient way to design the graphical user interface. This must be as user-
       friendly as possible.

    • To find out the most efficient way to offer that system to future clients from the tests carried out
       with the application.

    • To design a survey to measure the quality of the tool and users’ satisfaction.

    • The possibility to write an article for a scientific journal.


1.3 Methodology

As the title of the project indicates, the last purpose is the development of an application aimed to-
wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge
is required.


    The first stage is the familiarization of the Bayesian data analysis, regression models applied to
Bayesian methodology and symbolic data.


    Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get
the most important elements. A special dedication will be given to posterior simulation and computa-
tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach,
to deep later into the different Bayesian regression models, applying great part of what was explained
in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic
data, paying special attention to interval- valued data.


    The second stage is referred to the development of the software application, employing an incre-
mental methodology for programming and testing iterative prototypes. This methodology has been

                                                    4
1. Introduction


considered the most suitable for this project since it will let us introduce successive models into the
application.


    The following figure shows the structure of the work packages the project is divided into:




                                 Figure 1.1: Project Work Packages




                                                  5
Chapter 2

Bayesian Data Analysis

2.1     What is Bayesian Data Analysis?

Statistics can be defined as the discipline that provides us with a methodology to collect, to organize,
to summarize and to analyze a set of data.


    Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and
confirmatory data analysis. The former is used to represent, describe and analyze a set of data through
simple methods in the first stages of statistical analysis. The latter is applied to make inferences from
data, based on probability models.


    In the same way, confirmatory data analysis is divided into two branches depending on the adopted
approach. The first one, known as frequentist, is used to make the inference of the data resulting from
a sampling through classical methods. The second branch, known as Bayesian, goes further in the
analysis and adds to those data the prior knowledge which the researcher has about the treated prob-
lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision
of different classical methods related to the frequentist approach can be found in [Mont02].




                                         
                                          Exploratory
                                         
                                         
                         Data Analysis                         Frequentist
                                          Confirmatory
                                         
                                                               Bayesian


                                                   6
2. Bayesian Data Analysis


    As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided
into the following three steps:

    • To set up a full probability model, through a joint probability distribution for all observable and
      unobservable quantities in a problem.

    • To condition on observed data, obtaining the posterior distribution.

    • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu-
      tion.

    f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ),
is obtained by means of


                            f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ))                       (2.1)

where y is the set of sampled data. So this distribution is the product of two densities that are referred
to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)).


    The sampling distribution, as its name suggests, is the probability model that the researcher as-
signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here,
an important problem stands up in relation to parametric approach due to the fact that the probability
model that the researcher chooses could not be adequate. The nonparametric approach overcomes
this inconvenient as it will be seen later.


    When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the
likelihood function and obeys the likelihood principle, which states that for a given sample of data,
any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same
inference for θ, (resp. θ).


    The prior distribution does not depend upon the data. Accordingly, it contains the information
and the knowledge that the researcher has about the situation or problem to be solved. When there
is not any previous significant population from which the engineer can take his knowledge, that is,
the researcher has not any prior information about the problem, a non-informative prior distribution
must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that
the prior knowledge will have very little importance in the results. But most non- informative priors


                                                      7
2. Bayesian Data Analysis


are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases
it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an
informative prior distribution but with an insignificant weight (around zero) associated to it.


    Though the prior distribution can take any form, it is common to choose particular classes of
priors that make computation and interpretation easier. These are the conjugate priors. A conjugate
prior distribution is one which, when combined with the likelihood function, gives a distribution that
falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural
conjugate prior has the additional property that it has the same form as the likelihood does. But it is
not always possible to find this kind of distribution and the researcher has to manage a lot of distribu-
tions to be able to give expression to his prior knowledge about the problem. This is another handicap
that the nonparametric approach reduces.


    In relation to the prior, what distribution should be chosen? There are three different points of
view corresponding to different styles of Bayesians:

    • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least
        information possible should be chosen.

    • Modern parametric Bayesians considers that the prior is a useful convenience and priors with
        desirable properties such as conjugacy should be chosen. They remark that given a distribu-
        tional choice, prior hyper-parameters that interject the least information possible should be
        chosen.

    • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a
        summary of old beliefs. So prior distributions which are based on previous knowledge (either
        the results of earlier studies or non-scientific opinion) should be chosen.

    Returning to Bayesian data analysis process, simply conditioning on the observed data y and
applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields:

                            f (θ, y)   f (θ)f (y|θ)                             f (θ, y)   f (θ)f (y|θ)
              f (θ|y) =              =                   (resp. f (θ|y) =                =              )   (2.2)
                             f (y)         f (y)                                 f (y)         f (y)
where

                                  ∞                                         ∞        ∞
                  f (y) =             f (θ)f (y|θ)dθ   (resp. f (y) =                    f (θ)f (y|θ)dθ)    (2.3)
                              0                                         0        0

                                                           8
2. Bayesian Data Analysis


is known as the prior predictive distribution, since it is not conditional upon a previous observation of
the process and is applied to an observable quantity.


    An equivalent form of the posterior distribution displayed above omits the prior predictive distri-
bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ).
So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability
distribution f (θ, y).


    Once the posterior distribution is calculated, some kind of summary measure will be required to
estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior
distribution is a high- dimensional object and its use is not practical for a problem. That measure
which will summarize the posterior distribution can be the posterior mean, mode, median or variance,
apart from others. Its choice will depend on the requirements of the problem. So the posterior dis-
tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ)
and provide him information about it (resp. them) taking into account both his prior knowledge and
the data collected from sampling on that parameter.


    According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non-
                     e
Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the
same as the one resulting from the sampling.


    Once the data y have been observed, a new unknown observable quantity y can be predicted for
                                                                          ˜
the same process through the posterior predictive distribution, namely f (˜|y):
                                                                          y


                   f (˜|y) =
                      y         f (˜, θ|y)dθ =
                                   y              f (˜|θ, y)dθ =
                                                     y               f (˜|θ)f (θ|y)dθ
                                                                        y                          (2.4)

    To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by
observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a
prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said.




                                                   9
2. Bayesian Data Analysis




   Distribution         Expression                          Information Required              Result



   Likelihood               f (y|θ)                           Data Distribution               f (y|θ)


   Prior                     f (θ)         Researcher’s Knowledge Parameter Distribution       f (θ)


   Joint               f (y|θ)f (θ)           Likelihood Distribution Prior Distribution     f (θ, y)


   Posterior           f (θ)f (y|θ)                         Prior Joint Distribution          f (θ|y)


   Predictive         f (˜|θ)f (θ|y)dθ
                         y                  New Data Distribution Posterior Distribution      f (˜|y)
                                                                                                 y


                             Table 2.1: Distributions in Bayesian Data Analysis



2.2 Bayesian Analysis for Normal and other distributions

2.2.1 Univariate Normal distribution

The basic model to be discussed concerns an observable variable , normally distributed with mean µ
and unknown variance σ 2 :


                                            y|µ, σ 2    N (µ, σ 2 )                                (2.5)

    As it can be seen in Appendix A, the likelihood function for a single observation is

                                                                    1
                               f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp −         (y − µ)2                   (2.6)
                                                                   2σ 2
    This means that the likelihood function is proportional to a Normal distribution, omitting those
terms that are constant.


    Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ-
ous section, the parameters to be estimated θ are µ and σ 2 :



                                                       10
2. Bayesian Data Analysis




                                             θ = (θ1 , θ2 ) = (µ, σ 2 )                              (2.7)

    A full probability model must be set up through a joint probability distribution:


                               f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ)              (2.8)

    The likelihood function for a sample of n iid observations in this case is

                                                                                 n
                                                                           1
                      f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp −                       (yi − µ)2     (2.9)
                                                                          2σ 2
                                                                                 i=1

    As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural
conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu-
tion of the form


                                      f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 )                      (2.10)

where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ
given σ 2 is Normal (details about these distributions in Appendix A):



                                         µ|σ 2         N (µ0 , σ 2 V0 )                             (2.11)
                                            σ2         Inv − χ2 (µ0 , s2 )
                                                                       0                            (2.12)

    So the joint prior distribution is:


                 f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 )
                                                                                0            0      (2.13)

    Its four parameters can be identified as the location and scale of µ and the degrees of freedom and
scale of σ 2 , respectively.


    As a natural conjugate prior was employed, the posterior joint distribution will have the same
form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have:



        f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 )
                                                                                 1            1     (2.14)

where it be can shown that

                                                         11
2. Bayesian Data Analysis




                             µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯
                                                          y                                        (2.15)
                                                 −1
                             V1 =     V0−1 + n                                                     (2.16)
                             ν1 = ν0 + n                                                           (2.17)
                                                             V0−1 n
                            ν1 s2 = ν0 s2 + (n − 1)s2 +
                                1       0                                (¯ − µ0 )2
                                                                          y                        (2.18)
                                                            V0−1 + n

    All these formulae evidence that Bayesian inference combines prior and posterior information.


    The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical
mean divided by the sum of their respective weights, where these are represented by V0−1 and the
simple size n.


    The second term represents the importance that posterior mean has and it can be seen as a com-
promise between the sample size and the significance given to the prior mean.


    The third term indicates that the degrees of freedom of posterior variance are the sum of the prior
degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a
fictitious sample size on which the expert’s prior information is based.


    The last term explains the posterior sum of square errors as a combination of prior and empirical
sum of square errors plus a term that measures the conflict between prior and posterior information.


    A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06].


    It is obvious that the marginal posterior distributions are:



                                    µ|σ 2 , y      N (µ1 , σ 2 V0 )                                (2.19)
                                       σ 2 |y      Inv − χ2 (ν1 , s2 )
                                                                   1                               (2.20)

    If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details):


                                          µ|y    tν1 (µ1 , s2 V0 )
                                                            1                                      (2.21)


                                                   12
2. Bayesian Data Analysis


    Let us see an application to the Spanish Stock market. Let us suppose that the monthly close
values associated with Ibex 35 are normally distributed. If we take the values at which the Span-
ish index closed during the first two weeks in January in 2006, it can be shown that the mean was
10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a
Normal distribution with the previous mean and standard deviation. Let us guess that we had asked
any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would
decrease slightly, the mean close value at the end of the month would be around 10870 and, hence,
the standard deviation would be higher, around 100. Then, according to the previous formulas, the
posterior parameters would be



             µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12
             V1 = (100 + 10)−1 = 0.0091
             ν1 = 100 + 10 = 110
                            (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 )
                                                       110
             s1 =                                                                = 95.60
                                                    110

    This means that there is a difference of almost 20 points between the Bayesian estimation and the
non-Bayesian for the mean close value of January. When the month of January would have passed, we
could compare both results and we could note that the Bayesian estimation was closer to the finally
real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how
the blue line representing the Bayesian estimation is closer to the cyan line representing the final real
mean close value than the red line representing the frequentist estimation:


2.2.2 Multivariate Normal distribution

Now, let us consider that we have an observable vector y of d components with the multivariate
Normal distribution:


                                              y   N (µ, Σ)                                       (2.22)

where the first parameter is the mean column vector and the second one is the variance-covariance
matrix.


    Extending what was said above to the multivariate case, we have:



                                                   13
2. Bayesian Data Analysis


           −3
        x 10
    7
                                                                                       Frequentist Approach
                                                                                       Bayesian Approach
    6                                                                                  Real Mean Colse Value in January



    5


    4


    3


    2


    1


    0
   10000        10200     10400      10600      10800      11000     11200     11400        11600        11800       12000




                                     Figure 2.1: Univariate Normal Example




                                                      1
                              f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ)                                            (2.23)
                                                      2
    And for n iid observations:

                                                                      n
                                                    −n/2         1
                   f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ      exp −           (yi − µ) Σ−1 (yi − µ)                      (2.24)
                                                                 2
                                                                     i=1

    A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see
details in Appendix A), so the prior joint distribution is

                                                                                  Λ0
                        f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 ,               , ν0 , Λ0                        (2.25)
                                                                                  k0
due to the fact that


                                                               Σ
                                      µ|Σ          N    µ0 ,                                                          (2.26)
                                                               k0
                                         Σ         Inv − W ishart ν0 , Λ−1
                                                                        0                                             (2.27)




                                                           14
2. Bayesian Data Analysis




                                         Univariate Normal                                  Multivariate Normal



Expression                                 y   N (µ, σ 2 )                                      y   N (µ, Σ)


Parameters to estimate                         µ, σ 2                                               µ, Σ

                                                             2
                                     µ|σ 2     N µ0 , σ0
                                                      k                                      µ|Σ           Σ
                                                                                                    N µ0 , k0


Prior Distributions                 σ2     Inv − χ2 ν0 , σ0
                                                          2                            Σ    Inv − W ishart ν0 , Λ−1
                                                                                                                 0

                                                                  2
                                                                 σ0
                          µ, σ 2    N − Inv − χ2 µ0 ,                       2
                                                                 k0 , ν0 , σ0   µ, Σ   N − Inv − W ishart µ0 , k0 , ν0 , Λ−1
                                                                                                               Σ
                                                                                                                          0


                                                             2
                                     µ|σ 2     N µ1 , σ1
                                                      k                                      µ|Σ           Σ
                                                                                                    N µ1 , k1


Posterior Distributions             σ2     Inv − χ2 ν1 , σ1
                                                          2                            Σ    Inv − W ishart ν1 , Λ−1
                                                                                                                 1

                                                                  2
                                                                 σ1
                          µ, σ 2    N − Inv − χ2 µ1 ,                       2
                                                                 k1 , ν1 , σ1   µ, Σ   N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1
                                                                                                               k
                                                                                                                 1




                          Table 2.2: Comparison between Univariate and Multivariate Normal


             The posterior results are the same that were told for the univariate case but applying these distri-
        butions. For those interested readers, more information in [Gelm04] or [Cong06].


             A summary is shown in Table 2.2 in order to get the most important ideas.


        2.2.3 Other distributions

        As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could
        be done. For instance, the exponential distribution is commonly used in reliability analysis. Because
        of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail
        the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions


                                                                    15
2. Bayesian Data Analysis


for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06].



    Likelihood      Parameter    Conjugate    Prior Hyperparameters      Posterior Hyperparameters



    Bin(y|n, θ)             θ       Beta                 α, β                α + y, β + n − y


      P (y|θ)               θ     Gamma                  α, β                  α + n¯, β + n
                                                                                    y


     Exp (y|θ)              θ     Gamma                  α, β                   α + 1, β + y


     Geo(y|θ)               θ       Beta                 α, β                   α + 1, β + y


                  Table 2.3: Conjugate distributions for other likelihood distributions




2.3 Hierarchical Models

Hierarchical data arise when they are structured or related among them. When this occurs, standard
techniques either assume that these groups belong to entirely different populations or ignore the ag-
gregate information entirely.


    Hierarchical models provide a way of pooling the information for the disparate groups without
assuming that they belong to precisely the same population.


    Suppose we have collected data about some random variable Y from m different populations with
n observations for each population.


    Let yij represent observation j from population i. Now suppose yij       f (θi ), where θi is a vector
of parameters for population i. Furthermore, θi         f (Θ), where Θ may also be a vector. Until this
point, we have only rewritten what it was said previously.




                                                   16
2. Bayesian Data Analysis


    Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution
of the θ’s are themselves random variables and assign a prior distribution to these variables as well:


                                                      Θ     f (ψ)                                            (2.28)

where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and
represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for
these quantities as well, and proceed to another layer of hierarchy.


    According to [Gelm04], the idea of exchangeability will be used to create a joint probability
distribution model for all the parameters θ. A formal definition to explain what exchangeability
consists of is:
    ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is
invariant to permutations in the index 1, 2, . . . , n”.


    This means that if no information other than the data is available to distinguish any of the θi from
any of the others, and no ordering of the parameters can be made, one must assume symmetry among
the parameters in the prior distribution. So we can treat the parameters for each sub-population as
exchangeable units. This can be formulated by:


                                    f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ
                                                                 i=1                                         (2.29)

    The prior joint distribution is now:


                            f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ)                     (2.30)

    And conditioning on the data, it yields:


                  f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ        (2.31)

    Perhaps the most important point in practice is that non-hierarchical models are usually inappro-
priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical
structure and assigning concrete values to the hyperprior parameters.


    This kind of models will be used in Bayesian regression models with autocorrelated errors, as it
will be seen in the following chapters.



                                                          17
2. Bayesian Data Analysis


    For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04]
and [Rossi06].




2.4 Nonparametric Bayesian

To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric
approach which achieves to get through and to reduce the restrictions of the parametric approach.
This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to
express in a simple way the prior distributions or the distribution family of F , where F is the distri-
bution function of the studied variable. This process has a parameter, called α, which is transformed
into a distribution probability.


    According to [Mat´ 06], a Dirichlet Process for F (t) requires to know:
                     e

    • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks
      the prior knowledge which the engineer has and it is denoted by

                                                            α(t)
                                                 F0 (t) =                                          (2.32)
                                                             M
    • A measure of the confidence about the previous proposal, denoted by M , and whose values can
      vary between 0 and ∞, depending on whether there is a total confidence in the data or in the
      previous proposal respectively.



                                                                      ˆ
    It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data,
is given by

                                   ˆ
                                   Fn (t) = pn Fn (t) + (1 − pn )Fn (t)                            (2.33)
                                                                    M
where Fn (t) is the empirical distribution function and pn =       M +n .


    A more detailed information about the nonparametric approach and how Dirichlet processes are
used can be found in [Mull04] or [Gosh03].



                                                   18
2. Bayesian Data Analysis


    With this approach not only the limitation of the parametric approach related to the probability
model of the variable to study is avoided, since no hypothesis is required, but also it allows us to
confer a quantified importance to the prior knowledge which the engineer gives, depending on the
confidence on the certainty about this knowledge.




                                                19
Chapter 3

Posterior Simulation

3.1 Introduction

A practical problem with Bayesian inference is the difficulty of summarizing realistically complex
posterior distributions. In most practical problems, posterior densities will not take the form of any
well-known and understood density, so summary statistics, such as the posterior mean and variance of
parameters of interest, will not be analytically available. It is at this point where the importance of the
Bayesian computation arises and any computational tools are required to gain meaningful inference
from the posterior distribution. Its importance is such that the computing revolution of the last 20
years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or
Health.


    Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo
methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested
in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The
original idea was subsequently generalized by [Hast70], but its true potential was not fully realized
within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte-
grals commonly occurring in the context of Bayesian statistical inference.


    As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from
a specific probability distribution then design a Markov chain whose long-time equilibrium is that
distribution, write a computer program to simulate the Markov chain, run it for a time long enough
to be confident that approximate equilibrium has been attained, then record the state of the Markov


                                                    20
3. Posterior Simulation


chain as an approximate draw from equilibrium.


    The technique has been developed strongly in different fields and with rather different emphases
in the computer science community concerned with the study of random algorithms (where the em-
phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the
spatial statistics community (where one is interested in understanding what kinds of patterns arise
from complex stochastic models), and also in the applied statistics community (where it is applied
largely in Bayesian contexts, enabling researchers to formulate statistical models which would other-
wise be resistant to effective statistical analyses).


    The development of the theoretical work also benefits the development of statistical applications.
The MCMC simulation techniques have been applied to develop practical statistical inferences for
almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im-
age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical
models such as GARCH and stochastic volatility.


    The simplicity of the underlying principle of MCMC is a major reason for its success. However
a substantial complication arises as the underlying target problem becomes more complex; namely,
how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to
[Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries,
but in some cases more samples are needed to assure more accuracy.




3.2 Markov chains

The essential theory required in developing Monte Carlo methods based on Markov chains is pre-
sented here. The most fundamental result is that certain Markov chains converge to a unique invariant
distribution, and can be used to estimate expectations with respect to this distribution. But in order to
reach this conclusion, some concepts need to be defined firstly.


    A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in
which only the value of Xn−1 influences the distribution of Xn . Formally:


              P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 )              (3.1)

                                                        21
3. Posterior Simulation


where the Xn−1 have a common range called the state space of the Markov chain.


    The common language to refer to different situations in which a Markov chain can be found is
the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value
i in the step n. This language confers the chain certain dynamic view, which is corroborated by the
main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the
transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability
of changing of state i to state j.


    Due to the fact that in major interesting applications Markov chains are homogeneous, the transi-
tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a
Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j.


    Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition
matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n .


    On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and
irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho-
mogenous Markov chain.


    Then, vector P is an invariant distribution of the chain Xt if satisfies:

   a) πj ≥ 0 such as       j   πj = 1.

   b) π = πP .

    That is, a stationary distribution over the states of a Markov chain is one that persists forever once
it is reached.


    The concept of ergodic state requires making other definitions clear such as recurrence and aperi-
odicity:

    • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient.
       Moreover, i will be positive recurrent if the expected (average) return time is finite, and null
       recurrent if it is not.


                                                    22
3. Posterior Simulation


    • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is
       aperiodic if di = 1, or periodic if it is greater.



    Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is the
irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all
i, j ∈ C:

    • i and j have the same period.

    • i is transient if and only if j is transient.

    • i is recurrent if and only if j is null recurrent.

    Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu-
tion with next lemma:

Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one
stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have
inputs given by πi = µi −1 , where µi denotes the expected return time of the state i.

    The relation with the long time behaviour is given by this other lemma:

Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then

                                           1
                              [Pn ]ij −→        for all i, j ∈ S       as n   ∞                      (3.2)
                                           µi


3.3 Monte Carlo Integration

Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from
the posterior distribution p(θ|y) and averaging

                                                            n
                                                     1
                                           E[g(θ)] =              g(θt )                             (3.3)
                                                     n
                                                            t=1

    where the function g(θ) represents the function of interest to estimate. Note that if samples
θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain.




                                                      23
3. Posterior Simulation


3.4 Gibbs sampler

In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the
parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the
full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim-
ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model
it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be
p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent
model which will be explained later.


    The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions:

   1. Set a starting value, θ0 = (θ2 , . . . , θp ).
                                   0            0


   2. Take random draws
          1                0            0
       - θ1 from p(θ1 |y, θ2 , . . . , θp )
          1                1            0
       - θ2 from p(θ2 |y, θ1 , . . . , θp )
        .
       -.
        .
          1                1            1
       - θp from p(θp |y, θ1 , . . . , θp−1 )

   3. Repeat step 2 as necessary.

   4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest
                                        0            0

       of draws applying the Monte Carlo integration.



    For instance, in the Normal regression model we would have:

   1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ).
                                   0
                                             2

   2. Take random draws

       - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 )
          1    1                   0
                                             2

       - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β)
          1
                    2
                                           1


   3. Repeat step 2 as necessary.

                       1    1
   4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration.

                                                           24
3. Posterior Simulation


    Those values dropped which are affected by the starting point are called the burn-in. Generally,
any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the
burn-in period is the subject of current research in MCMC methods.


    As the state of each draw depends on the state of the previous one, the sequence is a Markov
chain. More detail information can found in [Chen00], [Mart01] or [Rossi06].




3.5 Metropolis-Hastings sampler and its special cases

3.5.1 Metropolis-Hastings sampler

The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate.
Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where
some of the conditional posterior distributions are easy to sample from and other ones are not. As
the algorithms above explained, this is based on formulating a Markov chain, but using a proposal
distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ .
This proposal is accepted as the next state with probability given by

                                                          p(θ∗ |y)q(θt |θ∗ )
                                α(θt , θ∗ ) = min 1,                                             (3.4)
                                                          p(θt |y)q(θ∗ |θt )
    If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to
[Mart01], the steps to follow are:

   1. Initialize the chain to θ0 and set t=0.

   2. Generate a candidate point θ∗ from q(.|θt ).

   3. Generate U from a uniform (0,1) distribution.

   4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt .

   5. Set t=t+1 and repeat steps 2 trough 5.

   6. Take the average of the draws g(θ1 ), . . . , g(θn )

    Note that it should be, not only recommendable, but also essential that the proposal distribution
q(·|θt ) were easy to sample from.



                                                     25
3. Posterior Simulation


    There are some special cases of this method. The most important are briefly explained below. As
well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of
the Metropolis-Hastings algorithm where the proposal point is always accepted.



3.5.2 Metropolis sampler

This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution
has to be symmetric. That is,


                                          q(θ∗ |θt ) = q(θt |θ∗ )                             (3.5)

    for all θ∗ and θt . Then, the probability of accepting the new point is

                                                           p(θ = θ∗ |y)
                                 α(θt , θ∗ ) = min 1,                                         (3.6)
                                                           p(θ = θt |y)
    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.5.3 Random-walk sampler

This special case refers to a proposal distribution of the form


                                        q(θ∗ |θt) = q(|θt − θ∗ |)                             (3.7)

    And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q.
Then, the probability of accepting the new point is

                                                           p(θ = θ∗ |y)
                                 α(θt , θ∗ ) = min 1,                                         (3.8)
                                                           p(θ = θt |y)
    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.5.4 Independence sampler

The last variation has a proposal distribution such that


                                           q(θ∗ |θt ) = q(θ∗ )                                (3.9)

    So it does not depend on θt . Then, the probability of accepting the new point is



                                                    26
3. Posterior Simulation



                                                        p(θ∗ |y)p(θt )                       w(θ∗ )
                          α(θt , θ∗ ) = min 1,                              = min 1,                                   (3.10)
                                                        p(θt |y)p(θ∗ )                       w(θt )
    where

                                                                p(θ|y)
                                                    w(θ) =                                                             (3.11)
                                                                 q(θ)
    It is important to remark that to make this method works well, the proposal distribution q should
be very similar to the posterior distribution p(θ|y).


    The same procedure seen in the Metropolis-Hastings sampler has to be followed.


3.6 Importance sampling

Importance sampling is a variance reduction technique that can be used in the Monte Carlo method.
The idea behind this method is that certain values of the input random variables in a simulation have
more impact on the parameter being estimated than others. So instead of taking a simple average,
importance sampling takes a weighted average.


    Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ)
is called the importance function, and the importance sampling can be defined:

                            PS       (s) )g(θ (s) )
                             s=1 w(θ                                               p(θ=θ(s) |y)
    The function gs =
                 ˆ            PS          (s) )
                                                    ,   where w(θ(s) ) =            q(θ=θ (s) )
                                                                                                ,   converges to E[g(θ)|y] as
                                s=1 w(θ
S −→ inf.

                                                                   p∗ (θ|y)
    In fact, w(θ(s) ) can be formulated by w(θ(s) ) =              q ∗ (θ|y) ,   where the new densities are proportional
to the old ones.


    For more information and details about Markov chain Monte Carlo methods and their application,
the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05].




                                                              27
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data
Bayesian Regression for Interval Data

Weitere ähnliche Inhalte

Ähnlich wie Bayesian Regression for Interval Data

Doktorsavhandling Rolf Olsson
Doktorsavhandling Rolf OlssonDoktorsavhandling Rolf Olsson
Doktorsavhandling Rolf OlssonRolf Olsson
 
Types of Research Design for Social Sciences
Types of Research Design for Social SciencesTypes of Research Design for Social Sciences
Types of Research Design for Social Scienceskryzedj
 
Introduction to Research methodology: Orientation for Doctoral Program Course...
Introduction to Research methodology: Orientation for Doctoral Program Course...Introduction to Research methodology: Orientation for Doctoral Program Course...
Introduction to Research methodology: Orientation for Doctoral Program Course...niloysarkar
 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesCovance
 
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...Hannes Smárason
 
Finding employment and education discovery framework
Finding employment and education discovery frameworkFinding employment and education discovery framework
Finding employment and education discovery frameworkFutureGov
 
Big Data and Healthcare Big Opportunity or Big Problem Abstract
Big Data and Healthcare Big Opportunity or Big Problem AbstractBig Data and Healthcare Big Opportunity or Big Problem Abstract
Big Data and Healthcare Big Opportunity or Big Problem AbstractJames Selley
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsDhruv Saxena
 
Tech v Trust: scaling simulation for the 21C student
Tech v Trust: scaling simulation for the 21C studentTech v Trust: scaling simulation for the 21C student
Tech v Trust: scaling simulation for the 21C studentdebbieholley1
 
Co-creating Sustainability Strategies for PSS Development
Co-creating Sustainability Strategies for PSS Development  Co-creating Sustainability Strategies for PSS Development
Co-creating Sustainability Strategies for PSS Development Adrià Garcia i Mateu
 
The impact of foresight on innovation
The impact of foresight on innovationThe impact of foresight on innovation
The impact of foresight on innovationatelier t*h
 
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...Life Sciences Network marcus evans
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUNYU Tandon Online
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET Journal
 
Social Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug UsageSocial Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug Usageijtsrd
 
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...Adrienne Gifford
 
A Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-conceptA Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-conceptUN Global Pulse
 

Ähnlich wie Bayesian Regression for Interval Data (20)

Doktorsavhandling Rolf Olsson
Doktorsavhandling Rolf OlssonDoktorsavhandling Rolf Olsson
Doktorsavhandling Rolf Olsson
 
Types of Research Design for Social Sciences
Types of Research Design for Social SciencesTypes of Research Design for Social Sciences
Types of Research Design for Social Sciences
 
Introduction to Research methodology: Orientation for Doctoral Program Course...
Introduction to Research methodology: Orientation for Doctoral Program Course...Introduction to Research methodology: Orientation for Doctoral Program Course...
Introduction to Research methodology: Orientation for Doctoral Program Course...
 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health Technologies
 
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
Maintaining Momentum Post-­ASHG 2014: Maximizing the Value of Large Genomic D...
 
Mapping innovation missions
Mapping innovation missionsMapping innovation missions
Mapping innovation missions
 
Finding employment and education discovery framework
Finding employment and education discovery frameworkFinding employment and education discovery framework
Finding employment and education discovery framework
 
Big Data and Healthcare Big Opportunity or Big Problem Abstract
Big Data and Healthcare Big Opportunity or Big Problem AbstractBig Data and Healthcare Big Opportunity or Big Problem Abstract
Big Data and Healthcare Big Opportunity or Big Problem Abstract
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Tech v Trust: scaling simulation for the 21C student
Tech v Trust: scaling simulation for the 21C studentTech v Trust: scaling simulation for the 21C student
Tech v Trust: scaling simulation for the 21C student
 
Co-creating Sustainability Strategies for PSS Development
Co-creating Sustainability Strategies for PSS Development  Co-creating Sustainability Strategies for PSS Development
Co-creating Sustainability Strategies for PSS Development
 
The impact of foresight on innovation
The impact of foresight on innovationThe impact of foresight on innovation
The impact of foresight on innovation
 
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...
Best Indicators of Clinical Development Success - Tom Macek, Takeda Global Re...
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYU
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
 
All Things Biocuration
All Things BiocurationAll Things Biocuration
All Things Biocuration
 
Social Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug UsageSocial Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug Usage
 
BIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.pptBIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.ppt
 
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...
Pioneering-New-Operating-Models-and-Measurement-Techniques-for-Private-Sector...
 
A Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-conceptA Guide to Data Innovation for Development - From idea to proof-of-concept
A Guide to Data Innovation for Development - From idea to proof-of-concept
 

Bayesian Regression for Interval Data

  • 1. Autorizada la entrega del proyecto del alumno: Rub´ n Salgado Fern´ ndez e a EL DIRECTOR DEL PROYECTO Carlos Mat´ Jim´ nez e e Fdo.: Fecha: 12/06/2007 Vo Bo DEL COORDINADOR DE PROYECTOS Claudia Meseguer Velasco Fdo.: Fecha: 12/06/2007
  • 2. UNIVERSIDAD PONTIFICIA DE COMILLAS ESCUELA TECNICA SUPERIOR DE INGENIER´ (ICAI) ´ IA ´ INGENIERO EN ORGANIZACION INDUSTRIAL PROYECTO FIN DE CARRERA Bayesian Regression System for Interval-Valued Data. Application to the Spanish Continuous Stock Market AUTOR : Salgado Fern´ ndez, Rub´ n a e M ADRID , Junio 2007
  • 3. Acknowlegdements Firstly, I would like to thank my director, Carlos Mat´ Jim´ nez, PhD, for giving me the chance of e e making this project. With him, I have learnt, not only about Statistics and investigation, but also about how to enjoy with them. Special thanks to my parents. Their love and all they have taught me in this life are the things what have made possible being the person I am now. Thanks to my brothers, my sister and the rest of my family for their support and for the stolen time. Thanks to Charo for standing my bad mood in the bad moments, for supporting me and for giving me the inspiration to go ahead. Madrid, June 2007 i
  • 4. Resumen ´ En los ultimos a˜ os los m´ todos Bayesianos se han extendido y se han venido utilizando de forma n e exitosa en muchos y variados campos tales como marketing, medicina, ingenier´a, econometr´a o mer- ı ı cados financieros. La principal caracter´stica que hace destacar al an´ lisis Bayesiano de datos (AN- ı a BAD) frente a otras alternativas es que, no s´ lo tiene en cuenta la informaci´ n objetiva procedente de o o los datos del suceso en estudio, sino tambi´ n el conocimiento anterior al mismo. Los beneficios que e se obtienen de este enfoque son m´ ltiples ya que, cuanto mayor sea el conocimiento de la situaci´ n, u o a ´ con mayor fiabilidad se podr´ n tomar las decisiones y estas ser´ n m´ s acertadas. Pero no siempre todo a a han sido ventajas. El ANBAD, hasta hace unos a˜ os, presentaba una serie de dificultades que limita- n ban el desarrollo del mismo a los investigadores. Si bien la metodolog´a Bayesiana existe como tal ı desde hace bastante tiempo, no se ha empezado emplear de manera generalizada hasta los 90’s. Esta expansi´ n ha sido propiciada en gran parte por el avance en el desarrollo computacional y la mejora y o perfeccionamiento de distintos m´ todos de c´ lculo como los m´ todos de cadenas de Markov-Monte e a e Carlo. ı ´ En especial, esta metodolog´a se ha mostrado extraordinariamente util en la aplicaci´ n a los mod- o elos de regresi´ n, ampliamente adoptados. En m´ ltiples ocasiones en la pr´ ctica, se dan situaciones o u a en las que se requiere analizar la relaci´ n entre dos variables cuantitativas. Los dos objetivos fun- o damentales de este an´ lisis ser´ n, por un lado, determinar si dichas variables est´ n asociadas y en a a a qu´ sentido se da dicha asociaci´ n (es decir, si los valores de una de las variables tienden a aumentar e o -o disminuir- al aumentar los valores de la otra); y por otro, estudiar si los valores de una variable pueden ser utilizados para predecir el valor de la otra. Un modelo de regresi´ n trata de proporcionar o informaci´ n sobre uno o varios sucesos a trav´ s de su relaci´ n con el comportamiento de otros. Con o e o la metodolog´a Bayesiana se permite incorporar el conocimiento del investigador al an´ lisis, haciendo ı a los resultados m´ s precisos, ya que no se a´slan los resultados a los datos de una determinada muestra. a ı ii
  • 5. iii a ´ Por otro lado, se est´ empezando a aceptar que el siglo XXI en el ambito de la estad´stica va a ı ser el siglo de la ”estad´stica del conocimiento” a diferencia del anterior que fue el de la ”estad´stica ı ı de los datos”. El concepto b´ sico para construir dicha estad´stica es el de dato simb´ lico y se han a ı o desarrollado m´ todos estad´sticos para algunos tipos de datos simb´ licos. e ı o En la actualidad, la exigencia del mercado, la demanda y, en general, del mundo crece. Esto implica que cada vez sea mayor el deseo de predecir la ocurrencia de un evento o poder controlar el comportamiento de ciertas cantidades con el menor error posible con el fin de ofrecer mejores pro- ductos, obtener mayores beneficios o adelantos cient´ficos y mejores resultados. ı Sobre esta realidad, este proyecto trata de responder a dichas necesidades proporcionando una amplia documentaci´ n sobre varias de las t´ cnicas m´ s utilizadas y m´ s punteras a d´a de hoy, como o e a a ı son el an´ lisis Bayesiano de datos, los modelos de regresi´ n y los datos simb´ licos, y proponiendo a o o diferentes t´ cnicas de regresi´ n. De igual forma se desarrollar´ una herramienta que permita poner e o a en pr´ ctica todos los conocimientos adquiridos. Dicha aplicaci´ n estar´ dirigida al mercado burs´ til a o a a espa˜ ol y permitir´ al usuario utilizarla de manera sencilla y amigable. En cuanto al desarrollo de esta n a herramienta se emplear´ uno de los lenguajes m´ s novedosos y con m´ s proyecci´ n del momento: R. a a a o Se trata, por tanto, de un proyecto que combina las t´ cnicas m´ s novedosas y con mayor proyecci´ n e a o tanto en materia te´ rica, como es la regresi´ n Bayesiana aplicada a datos de tipo intervalo, como en o o materia pr´ ctica, como es el empleo del lenguaje R. a
  • 6. Abstract In the recent years, Bayesian methods have been spread and successfully used in many and several fields such as Marketing, Medicine, Engineering, Econometrics or Financial Markets. The main char- acteristic that makes Bayesian Data Analysis (BADAN) remarkable compared with other alternatives is that not only does it take into account the objective information coming from the analyzed event, but also the pre-event knowledge. The benefits obtained from this approach are innumerable due to the fact that the more knowledge of the situation one has, the more reliable and accurate decisions could be taken. However, although Bayesian methodology was set long time ago, it has not been applied in a general way until the 90’s because of the computational difficulties. Such expansion has been mainly favoured by the advances in that field and the improvement on different calculus meth- ods, such as Markov-chain Monte Carlo methods. Particularly, this Bayesian methodology has been resulted in an extraordinary useful application for the regression models, which have been adopted by large. There are many times in real life in which it is necessary to analyse the situation between two quantitive variables. The two main objec- tives of this analysis would be, on the one hand, to determine whether such variables are associated and in what sense that association comes about (that is, whether the value of one of the variables tends to rise- or to decrease- when augmented the value of the other); and on the other hand, to study whether the values of one variable can be used to predict the value of the other. A regression model offers information about one or more events through their relationship with the behaviour of the oth- ers. With the Bayesian methodology it is possible to add the researcher’s knowledge to the analysis, making thus the results be more accurate due to the fact that the results are not isolated from the data of one determined sample. On the other hand, in the Statistics field, it has been more and more accepted the fact that the XXI century will be the century of the ”Statistics of knowledge” contrary to the last one, which was the iv
  • 7. v one of the ”Statistics of data”. The most basic concept to constitute such Statistics is the symbolic data; furthermore, there have been developed more statistics methods for some types of symbolic data. Nowadays, the requirements of the market, and the demands of the world in general, are growing up. This implies the continuous increase of the desire for predicting the occurrence of an event or for the ability of controlling the behaviour of certain quantities with the minimum error with the aim of offering better products, obtaining more benefits or scientific improvements and better outcomes. Under this frame, this project tries to responds such needs by offering a large documentation about several of the most applied and leading nowadays techniques, such as Bayesian data analysis, regression models, and symbolic data, and suggesting different regression techniques. Similarly, it has been developed a tool that allow the reader to put all the acquired knowledge into practice. Such application will be aimed to the Spanish Continuous Stock Market and it will let the user apply it eas- ily. As far as the development of this tool is concerned, it has been used one of the more innovative and with more projection languages of the moment: R. So, the project is about a combination of the techniques that are most innovative and with the most projection both in theoretical questions such as Bayesian regression applied to interval- valued data and in practical questions such us the employment of the R language.
  • 8. List of Figures 1.1 Project Work Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Univariate Normal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.1 Interval time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.1 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 73 7.2 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 74 7.3 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . . . . . 75 7.4 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.6 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 77 7.7 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 78 7.8 Bayesian Centre and Radius Method in testing test . . . . . . . . . . . . . . . . . . 80 7.9 Classical Regression with single values in training test . . . . . . . . . . . . . . . . 81 7.10 Classical Regression with single values in testing test . . . . . . . . . . . . . . . . . 81 7.11 Centre Method (2000) in training set . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.12 Centre Method (2000) in testing set . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.13 Centre and Radius Method in training set . . . . . . . . . . . . . . . . . . . . . . . 85 7.14 Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . . . . . . 85 7.15 Bayesian Centre and Radius Method in testing set . . . . . . . . . . . . . . . . . . . 87 9.1 BARESIMDA MDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.1 Interface between BARESIMDA and R . . . . . . . . . . . . . . . . . . . . . . . . 104 10.2 Interface between BARESIMDA and Excel . . . . . . . . . . . . . . . . . . . . . . 105 10.3 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 vi
  • 9. LIST OF FIGURES vii C.1 Load Data Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.2 Select File Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.3 Display Loaded Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 C.4 Define New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.5 Enter New Variable Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.6 Display New Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 C.7 Edit Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.8 Select Variable to Be Editted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.9 Enter New Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.10 Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.11 New Row data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.12 Type Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.13 Look And Feel Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.14 Look And Feel Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.15 New Look And Feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.16 Type Of User Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.17 Select Type Of User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 C.18 Non-Symbolic Classical Regression Menu . . . . . . . . . . . . . . . . . . . . . . . 131 C.19 Select Non-Symbolic Variables in Simple Regression . . . . . . . . . . . . . . . . . 131 C.20 Brief Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 C.21 Analysis Options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 132 C.22 New Prediction in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . 133 C.23 Graphics options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . 133 C.24 Save options in Non-Symbolic Classical Simple Regression . . . . . . . . . . . . . . 134 C.25 Non-Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . 134 C.26 Select Variables in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 134 C.27 Analysis options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . 135 C.28 Graphics options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . 135 C.29 Save options in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . 136 C.30 Intercept in Non-Symbolic Classical Multiple Regression . . . . . . . . . . . . . . . 136 C.31 Non-Symbolic Bayesian Simple Regression Menu . . . . . . . . . . . . . . . . . . . 136 C.32 Select Variables in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . 137 C.33 Analysis Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 137 C.34 Graphics Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . 138
  • 10. LIST OF FIGURES viii C.35 Save Options in Non-Symbolic Bayesian Simple Regression . . . . . . . . . . . . . 138 C.36 Prior Experienced Especification Options in Non-Symbolic Bayesian Simple Regres- sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 C.37 Prior Inexperienced Especification in Non-Symbolic Bayesian Simple Regression . . 139 C.38 Prior Experienced Especification in Non-Symbolic Bayesian Simple Regression . . . 139 C.39 Non-Symbolic Bayesian Multiple Regression menu . . . . . . . . . . . . . . . . . . 139 C.40 Analysis Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.41 Graphics Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . 140 C.42 Save Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . . 140 C.43 Model Options in Non-Symbolic Bayesian Multiple Regression . . . . . . . . . . . 141 C.44 Symbolic Classical Simple Regression Menu . . . . . . . . . . . . . . . . . . . . . 141 C.45 Select Variables in Symbolic Classical Simple Regression . . . . . . . . . . . . . . . 141 C.46 Analysis Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.47 Graphics Options in Symbolic Classical Simple Regression . . . . . . . . . . . . . . 142 C.48 Symbolic Classical Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . . 143 C.49 Select Variables in Symbolic Classical Multiple Regression . . . . . . . . . . . . . . 143 C.50 Analysis Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.51 Graphics Options in Symbolic Classical Multiple Regression . . . . . . . . . . . . . 144 C.52 Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . 145 C.53 Select Variables in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.54 Analysis Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 145 C.55 Graphics Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . 146 C.56 Model Options in Symbolic Bayesian Simple Regression . . . . . . . . . . . . . . . 147 C.57 Symbolic Bayesian Multiple Regression Menu . . . . . . . . . . . . . . . . . . . . 147 C.58 Select Variables in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . . 147 C.59 Graphics Options in Symbolic Bayesian Multiple Regression . . . . . . . . . . . . . 148
  • 11. List of Tables 2.1 Distributions in Bayesian Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Comparison between Univariate and Multivariate Normal . . . . . . . . . . . . . . . 15 2.3 Conjugate distributions for other likelihood distributions . . . . . . . . . . . . . . . 16 4.1 Bayes Factor Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Sensitivity Summary I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sensitivity Summary II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Multiple and Simple Regression Comparison . . . . . . . . . . . . . . . . . . . . . 40 5.2 Sensitivity analysis of parameter β . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Sensitivity analysis of parameter σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Classical and Bayesian regression comparison . . . . . . . . . . . . . . . . . . . . . 48 5.5 Main Prior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6 Main Posterior Distributions Summary . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.7 Prior and Posterior Parameters Summary . . . . . . . . . . . . . . . . . . . . . . . . 59 5.8 Main Posterior Predictive Distributions Summary . . . . . . . . . . . . . . . . . . . 60 6.1 Multivalued Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Modal-multivalued Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.1 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 74 7.2 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 78 7.5 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 80 7.6 Error Measures for Classical Regression with single values . . . . . . . . . . . . . . 82 ix
  • 12. LIST OF TABLES x 7.7 Error Measure for Centre Method (2000) . . . . . . . . . . . . . . . . . . . . . . . . 83 7.8 Error Measure for Centre Method (2002) . . . . . . . . . . . . . . . . . . . . . . . . 84 7.9 Error Measures for Centre and Radius Method . . . . . . . . . . . . . . . . . . . . . 84 7.10 Error Measures in Bayesian Centre and Radius Method . . . . . . . . . . . . . . . . 86 11.1 Estimated material costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2 Amortization Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.3 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
  • 13. Contents Acknowlegdements i Resumen ii Abstract iv List of Figures vi List of Tables x Contents xvi 1 Introduction 1 1.1 Project Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Bayesian Data Analysis 6 2.1 What is Bayesian Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Bayesian Analysis for Normal and other distributions . . . . . . . . . . . . . . . . . 10 2.2.1 Univariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Hierarchical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Posterior Simulation 20 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 xi
  • 14. CONTENTS xii 3.2 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Metropolis-Hastings sampler and its special cases . . . . . . . . . . . . . . . . . . . 25 3.5.1 Metropolis-Hastings sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5.2 Metropolis sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.3 Random-walk sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5.4 Independence sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Sensitivity Analysis 28 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Alternative Stats to Bayes Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Highest Posterior Density Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Model Comparison Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Regression Analysis 35 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Classical Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 The Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Normal Linear Regression Model subject to inequality constraints . . . . . . . . . . 48 5.5 Normal Linear Regression Model with Independent Parameters . . . . . . . . . . . . 49 5.6 Normal Linear Regression Model with Heteroscedasticity and Correlation . . . . . . 51 5.6.1 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.6.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.7 Models Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6 Symbolic Data 61 6.1 What is symbolic data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Interval-valued variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3 Classical regression analysis with Interval-valued data . . . . . . . . . . . . . . . . . 67 6.4 Bayesian regression analysis with Interval-valued data . . . . . . . . . . . . . . . . 70
  • 15. CONTENTS xiii 7 Results 72 7.1 Spanish Continuous Stock Market data sets . . . . . . . . . . . . . . . . . . . . . . 72 7.2 Direct Relation between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.3 Uncorrelated Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8 A Guide to Statistical Software Today 88 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.2 Commercial Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.2.1 The SAS System for Statistical Analysis . . . . . . . . . . . . . . . . . . . . 89 8.2.2 Minitab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.3 BMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.4 SPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.5 S-PLUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 8.2.6 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.3 Public License Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.1 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.3.2 BUGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.4 Analysis Packages with Statistical Libraries . . . . . . . . . . . . . . . . . . . . . . 94 8.4.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.4.2 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.4.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5 Some General Languages with Statistical Libraries . . . . . . . . . . . . . . . . . . 95 8.5.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.5.2 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 8.6 Developed Software Tool: BARESIMDA . . . . . . . . . . . . . . . . . . . . . . . 96 9 Software Requirements Specification 98 9.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9.3.1 Classical Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 99 9.3.2 Classical Regression with interval- valued data . . . . . . . . . . . . . . . . 99 9.3.3 Bayesian Regression with crisp data . . . . . . . . . . . . . . . . . . . . . . 100 9.3.4 Bayesian Regression with interval- valued data . . . . . . . . . . . . . . . . 100
  • 16. CONTENTS xiv 9.3.5 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.6 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.3.7 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 9.4 External Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.1 User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 9.4.2 Software Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 10 Software Architecture Study 103 10.1 Hardware/ Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.2 Logical Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 11 Project Budget 106 11.1 Engineering Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 11.2 Investment and Elements Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.2.1 Summarized Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 12 Conclusions 110 12.1 Bayesian Regression applied to Symbolic Data . . . . . . . . . . . . . . . . . . . . 110 12.2 BARESIMDA Software Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.3 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 A Probability Distributions 113 A.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.1 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.1.2 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.1.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.1 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.2 Univariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 A.2.3 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.4 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.2.5 Inverse- Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.6 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2.7 Inverse- Chi-square and Inverse-Scaled Chi-Square . . . . . . . . . . . . . . 118
  • 17. CONTENTS xv A.2.8 Univariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.9 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.10 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 A.2.11 Multivariate Student- t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.12 Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.2.13 Inverse- Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 B Installation Guide 122 B.1 From source folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 B.2 From installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 C User’s Guide 123 C.1 Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.1 Loading an excel file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.2 Defining a new variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 C.1.3 Editing an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 C.1.4 Deleting an existing variable . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.5 Typing in a new data row . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 C.1.6 Deleting an existing data row . . . . . . . . . . . . . . . . . . . . . . . . . . 128 C.1.7 Modifying an existing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.1 Setting the look& feel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.2.2 Selecting the type of user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C.3 Non Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 131 C.3.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 133 C.3.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 136 C.3.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 139 C.4 Symbolic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.1 Simple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 140 C.4.2 Multiple Classical Regression . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.4.3 Simple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 144 C.4.4 Multiple Bayesian Regression . . . . . . . . . . . . . . . . . . . . . . . . . 146
  • 18. CONTENTS xvi D Obtaining and Installing R 149 D.1 Binary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D.2 Installation from source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 D.3 Package installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 E Obtaining and installing Java Runtime Environment 152 E.1 Microsoft Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 E.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 E.2.1 Installation of Self-Extracting Binary . . . . . . . . . . . . . . . . . . . . . 153 E.2.2 Installation of RPM File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 E.3 UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Bibliography 157
  • 19. Chapter 1 Introduction 1.1 Project Motivation Statistics is primarily concerned with the analysis of data, either to assist in arriving at an improved understanding of some underlying mechanism, or as a means for making informed rational decisions. Both these aspects generally involve some degree of uncertainty. The statistician’s task is then to explain such uncertainty, and to reduce it to the extent in which this is possible. Problems of this type occur throughout all the physical, social and other sciences. One way of looking at statistics stems from the perception that, ultimately, probability is the only appropriate way to describe and system- atically deal with uncertainty, as if it were the language for the logic of uncertainty. Thus, inference statements are precisely framed as probability statements on the possible values of the unknown quan- tities of interest (parameters or future observations) conditional on the observed, available data. The scientific discipline based on this understanding is called Bayesian Statistics. Moreover, increasingly needed and sophisticated models, often hierarchical models, to describe available data are typically too much complex for conventional statistics to handle, but can be tackled within Bayesian Statistics. In principle, Bayesian Statistics is designed to handle all situations where uncertainty is found. Since some uncertainty is present in most aspects of life, it may be argued that Bayesian Statistics should be appreciated and used by everyone. It is the logic of contemporary society and science. According to [Rupp04], applying Bayesian methodology is no more discussed, but the question is when this has to be done. Bayesian methods have matured and improved in several ways during last fifteen years. Actually, they are increasingly becoming attractive to researchers as well as successful applications of Bayesian 1
  • 20. 1. Introduction data analysis have been appeared in many different fields, including Actuarial Science, Biometrics, Finance, Market Research, Marketing, Medicine, Engineering or Social Science. It is not only that the Bayesian approach produces appropriate answers to many current important problems, but also there is an evident need for it, given the inapplicability of conventional statistics to many of them. Thus, the main characteristic offered by Bayesian data analysis is the possibility of incorporating researcher’s knowledge about the problem to be handled. This supposes obtaining the better and the more reliable results as far as prior knowledge is more and more precise. But Bayesian Statistics was restrained until mid 90’s by its computational complexity. Since then, it has had a great expansion favoured by the development and improvement of different computational methods in this field such as Markov chain Monte Carlo. This methodology has shown to be extremely useful in its application to regression models, which are widely accepted. Let us remember that the general purpose of regression analysis is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. Bayesian methodology let the researcher incorporate her or his knowledge to the analysis, improving the results since they do not only depend on the sampling data. On the other hand, increasingly, datasets are so large that they must be summarized in some fash- ion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This kind of data also lets us represent better the knowledge and beliefs having in our mind and that it is limited and hardly to take out with classical Statistics. According to [Bill02], this responds to the current need of changing from a Statistics of data in the past century to a Statistics of knowledge in XXI century. Market and demand requirements are increasing continuously throughout the time. This implies a need of better and more accurate methods to forecast new situations and to control different quanti- ties with the minimum error in order to supply better products, to obtain higher incomes or scientist advantages and better results. Dealing with this outlook, this project is intended to respond to those requirements providing a 2
  • 21. 1. Introduction wide and exhaustive documentation about some of the currently more used and advanced techniques, including Bayesian data analysis, regression models and symbolic data. Different examples related to the Continuous Spanish Stock Market have been explained throughout this writing, making clear the advantages of employing the described methods. Likewise a software tool with a user- friendly graphical interface has been developed to practice and to check all the acquired knowledge. Therefore, this is a project combining the most recent techniques with major future implications in theoretical issues, as Bayesian regression applied to interval- valued data is, with a technological part dealing with the problem of interconnecting two software programs: one used to show the graph- ical user interface and the other one employed to make computations. Regarding to a more personal motivation, when accepting this project, several factors were taken into consideration by the author: • A great challenge: it is an ambitious project with a high technical complexity related to both its theoretical basis and its technological basis. This represents a very good letter of introduction in order to be incorporated to the labour world. • A good planning time: this project was designed to be finished before June of 2007, which means to be able of finishing the career in June and incorporating to labour world in September. • Some very interesting issues: on one hand, it deals with the always needed issue of forecasting and modelling observations and situations in order to get the best possible results. On the other hand, it focuses on the Stock Market, which meets my personal hobbies. • A new programming language: the possibility of learning deeply a new and relatively recent programming language, such as R, was an extra- motivation factor. • The project director: Carlos Mat´ is considered a demanding and very competent director by e the students of the university. • An investigation scholarship: The possibility of being in the Industrial Organization department of the University learning from people such as the director mentioned above and another very recognized professors was a great factor. 3
  • 22. 1. Introduction 1.2 Objectives This project pretends to get the following aims. • To provide a wide and rigorous documentation about the following issues: Bayesian data anal- ysis, regression models and symbolic data. From this point, documentation about Bayesian regression will be developed, as well as the software tool designed. • To build a software tool in order to fit Bayesian regression models to interval- valued data, finding out the most efficient way to design the graphical user interface. This must be as user- friendly as possible. • To find out the most efficient way to offer that system to future clients from the tests carried out with the application. • To design a survey to measure the quality of the tool and users’ satisfaction. • The possibility to write an article for a scientific journal. 1.3 Methodology As the title of the project indicates, the last purpose is the development of an application aimed to- wards stock markets based on a Bayesian regression system and, therefore, some previous knowledge is required. The first stage is the familiarization of the Bayesian data analysis, regression models applied to Bayesian methodology and symbolic data. Within this phase, Bayesian data analysis will be firstly studied, trying to synthesize and to get the most important elements. A special dedication will be given to posterior simulation and computa- tional algorithms. Then, regression models will be treated, reviewing quickly the classical approach, to deep later into the different Bayesian regression models, applying great part of what was explained in Bayesian methodology. Finally, this first stage will be completed with the application to symbolic data, paying special attention to interval- valued data. The second stage is referred to the development of the software application, employing an incre- mental methodology for programming and testing iterative prototypes. This methodology has been 4
  • 23. 1. Introduction considered the most suitable for this project since it will let us introduce successive models into the application. The following figure shows the structure of the work packages the project is divided into: Figure 1.1: Project Work Packages 5
  • 24. Chapter 2 Bayesian Data Analysis 2.1 What is Bayesian Data Analysis? Statistics can be defined as the discipline that provides us with a methodology to collect, to organize, to summarize and to analyze a set of data. Regarding data analysis, it can be divided into two ways of analysis: exploratory data analysis and confirmatory data analysis. The former is used to represent, describe and analyze a set of data through simple methods in the first stages of statistical analysis. The latter is applied to make inferences from data, based on probability models. In the same way, confirmatory data analysis is divided into two branches depending on the adopted approach. The first one, known as frequentist, is used to make the inference of the data resulting from a sampling through classical methods. The second branch, known as Bayesian, goes further in the analysis and adds to those data the prior knowledge which the researcher has about the treated prob- lem. Since the frequentist approach is not worthy to explain everything here, a more extended revision of different classical methods related to the frequentist approach can be found in [Mont02].   Exploratory   Data Analysis Frequentist  Confirmatory   Bayesian 6
  • 25. 2. Bayesian Data Analysis As far as Bayesian analysis is concerned and according to [Gelm04], the process can be divided into the following three steps: • To set up a full probability model, through a joint probability distribution for all observable and unobservable quantities in a problem. • To condition on observed data, obtaining the posterior distribution. • Finally, to evaluate the fit of the model and the implications of the resulting posterior distribu- tion. f (θ, y), known as the joint probability distribution (or f (y|θ), if there are several parameters θ), is obtained by means of f (θ, y) = f (y|θ)f (θ) (resp. f (θ, y) = f (y|θ)f (θ)) (2.1) where y is the set of sampled data. So this distribution is the product of two densities that are referred to as the sampling distribution f (y|θ) (resp. f (y|θ)) and the prior distribution f (θ) (resp. f (θ)). The sampling distribution, as its name suggests, is the probability model that the researcher as- signs to the statistics (resp. set of statistics) to be studied after the data have been observed. Here, an important problem stands up in relation to parametric approach due to the fact that the probability model that the researcher chooses could not be adequate. The nonparametric approach overcomes this inconvenient as it will be seen later. When y is considered fixed, so it is function of θ (resp. θ), the sampling distribution is called the likelihood function and obeys the likelihood principle, which states that for a given sample of data, any two probability models f (y|θ) (resp. f (y|θ)) with the same likelihood function yield the same inference for θ, (resp. θ). The prior distribution does not depend upon the data. Accordingly, it contains the information and the knowledge that the researcher has about the situation or problem to be solved. When there is not any previous significant population from which the engineer can take his knowledge, that is, the researcher has not any prior information about the problem, a non-informative prior distribution must be used in the analysis in order to let the data speak for themselves. Hence, it is assumed that the prior knowledge will have very little importance in the results. But most non- informative priors 7
  • 26. 2. Bayesian Data Analysis are ”improper” in that they do not integrate to 1, and this fact can cause problems. In these cases it is necessary to be sure that the posterior distribution is proper. Another possibility is to use an informative prior distribution but with an insignificant weight (around zero) associated to it. Though the prior distribution can take any form, it is common to choose particular classes of priors that make computation and interpretation easier. These are the conjugate priors. A conjugate prior distribution is one which, when combined with the likelihood function, gives a distribution that falls in the same class of distributions as the prior. Furthermore, and according to [Koop03], a natural conjugate prior has the additional property that it has the same form as the likelihood does. But it is not always possible to find this kind of distribution and the researcher has to manage a lot of distribu- tions to be able to give expression to his prior knowledge about the problem. This is another handicap that the nonparametric approach reduces. In relation to the prior, what distribution should be chosen? There are three different points of view corresponding to different styles of Bayesians: • Classical Bayesians consider that the prior is a necessary evil and priors that interject the least information possible should be chosen. • Modern parametric Bayesians considers that the prior is a useful convenience and priors with desirable properties such as conjugacy should be chosen. They remark that given a distribu- tional choice, prior hyper-parameters that interject the least information possible should be chosen. • Subjective Bayesians give essential importance to the prior, in the sense they consider it as a summary of old beliefs. So prior distributions which are based on previous knowledge (either the results of earlier studies or non-scientific opinion) should be chosen. Returning to Bayesian data analysis process, simply conditioning on the observed data y and applying the Bayes’ Theorem, the posterior distribution, namely f (θ|y) (resp. f (θ|y)), yields: f (θ, y) f (θ)f (y|θ) f (θ, y) f (θ)f (y|θ) f (θ|y) = = (resp. f (θ|y) = = ) (2.2) f (y) f (y) f (y) f (y) where ∞ ∞ ∞ f (y) = f (θ)f (y|θ)dθ (resp. f (y) = f (θ)f (y|θ)dθ) (2.3) 0 0 0 8
  • 27. 2. Bayesian Data Analysis is known as the prior predictive distribution, since it is not conditional upon a previous observation of the process and is applied to an observable quantity. An equivalent form of the posterior distribution displayed above omits the prior predictive distri- bution, since it does not involve θ (resp. θ) and the interest is based on learning about θ (resp. θ). So, with fixed y, it can be said that the posterior distribution is proportional to the joint probability distribution f (θ, y). Once the posterior distribution is calculated, some kind of summary measure will be required to estimate the uncertainty about the parameter θ (resp. θ). This is due to the fact that the posterior distribution is a high- dimensional object and its use is not practical for a problem. That measure which will summarize the posterior distribution can be the posterior mean, mode, median or variance, apart from others. Its choice will depend on the requirements of the problem. So the posterior dis- tribution has a great importance since it lets the researcher manage the uncertainty about θ (resp. θ) and provide him information about it (resp. them) taking into account both his prior knowledge and the data collected from sampling on that parameter. According to [Mat´ 06], it is not difficult to deduce that posterior inference will fit in the non- e Bayesian one as long as the estimation which the researcher gives to the parameter θ (resp. θ) is the same as the one resulting from the sampling. Once the data y have been observed, a new unknown observable quantity y can be predicted for ˜ the same process through the posterior predictive distribution, namely f (˜|y): y f (˜|y) = y f (˜, θ|y)dθ = y f (˜|θ, y)dθ = y f (˜|θ)f (θ|y)dθ y (2.4) To sum up, the basic idea is to update the prior distribution f (θ) through Bayes’ theorem by observing the data y in order to get a posterior distribution f (θ|y). Then a summary measure or a prediction for new data can be obtained from f (θ|y). Table 2.1 reflects what has been said. 9
  • 28. 2. Bayesian Data Analysis Distribution Expression Information Required Result Likelihood f (y|θ) Data Distribution f (y|θ) Prior f (θ) Researcher’s Knowledge Parameter Distribution f (θ) Joint f (y|θ)f (θ) Likelihood Distribution Prior Distribution f (θ, y) Posterior f (θ)f (y|θ) Prior Joint Distribution f (θ|y) Predictive f (˜|θ)f (θ|y)dθ y New Data Distribution Posterior Distribution f (˜|y) y Table 2.1: Distributions in Bayesian Data Analysis 2.2 Bayesian Analysis for Normal and other distributions 2.2.1 Univariate Normal distribution The basic model to be discussed concerns an observable variable , normally distributed with mean µ and unknown variance σ 2 : y|µ, σ 2 N (µ, σ 2 ) (2.5) As it can be seen in Appendix A, the likelihood function for a single observation is 1 f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (y − µ)2 (2.6) 2σ 2 This means that the likelihood function is proportional to a Normal distribution, omitting those terms that are constant. Now let us consider we have n independent observations y1 , y2 , . . . , yn . According to the previ- ous section, the parameters to be estimated θ are µ and σ 2 : 10
  • 29. 2. Bayesian Data Analysis θ = (θ1 , θ2 ) = (µ, σ 2 ) (2.7) A full probability model must be set up through a joint probability distribution: f (θ, (y1 , y2 , . . . , yn )) = f (θ, y) = f (y|θ)f (θ) (2.8) The likelihood function for a sample of n iid observations in this case is n 1 f (y|θ) = f (y|µ, σ 2 ) ∝ (σ 2 )−1/2 exp − (yi − µ)2 (2.9) 2σ 2 i=1 As it was recommended previously, a conjugate prior will be chosen; in fact, it will be a natural conjugate prior. According to [Gelm04], this likelihood function suggests a conjugate prior distribu- tion of the form f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) (2.10) where the marginal distribution of σ 2 is the Scaled Inverse-χ2 and the conditional distribution of µ given σ 2 is Normal (details about these distributions in Appendix A): µ|σ 2 N (µ0 , σ 2 V0 ) (2.11) σ2 Inv − χ2 (µ0 , s2 ) 0 (2.12) So the joint prior distribution is: f (θ) = f (µ, σ 2 ) = f (µ|σ 2 )f (σ 2 ) ∝ N − Inv − χ2 (µ0 , s2 V0 , ν0 , s2 ) 0 0 (2.13) Its four parameters can be identified as the location and scale of µ and the degrees of freedom and scale of σ 2 , respectively. As a natural conjugate prior was employed, the posterior joint distribution will have the same form that the prior has. So, conditioning on the data, and according to Bayes’ Theorem, we have: f (θ|y) = f (µ, σ 2 |y) = f (y|µ, σ 2 )f (µ, σ 2 ) ∝ N − Inv − χ2 (µ1 , s2 V1 , ν1 , s2 ) 1 1 (2.14) where it be can shown that 11
  • 30. 2. Bayesian Data Analysis µ1 = (V0−1 + n)−1 V0−1 µ0 + n¯ y (2.15) −1 V1 = V0−1 + n (2.16) ν1 = ν0 + n (2.17) V0−1 n ν1 s2 = ν0 s2 + (n − 1)s2 + 1 0 (¯ − µ0 )2 y (2.18) V0−1 + n All these formulae evidence that Bayesian inference combines prior and posterior information. The first term means that posterior mean µ1 is a weighted mean of prior mean µ0 and empirical mean divided by the sum of their respective weights, where these are represented by V0−1 and the simple size n. The second term represents the importance that posterior mean has and it can be seen as a com- promise between the sample size and the significance given to the prior mean. The third term indicates that the degrees of freedom of posterior variance are the sum of the prior degrees of freedom and the sample size. That is, the prior degrees of freedom can be understood as a fictitious sample size on which the expert’s prior information is based. The last term explains the posterior sum of square errors as a combination of prior and empirical sum of square errors plus a term that measures the conflict between prior and posterior information. A more detailed explanation of this last step can be found in [Gelm04], [Koop03] or [Cong06]. It is obvious that the marginal posterior distributions are: µ|σ 2 , y N (µ1 , σ 2 V0 ) (2.19) σ 2 |y Inv − χ2 (ν1 , s2 ) 1 (2.20) If we integrate out σ 2 , the marginal for µ will be a t-distribution (see Appendix A for details): µ|y tν1 (µ1 , s2 V0 ) 1 (2.21) 12
  • 31. 2. Bayesian Data Analysis Let us see an application to the Spanish Stock market. Let us suppose that the monthly close values associated with Ibex 35 are normally distributed. If we take the values at which the Span- ish index closed during the first two weeks in January in 2006, it can be shown that the mean was 10893.29 and the standard deviation was 61.66. So the non- Bayesian approach would inference a Normal distribution with the previous mean and standard deviation. Let us guess that we had asked any analyst about the Ibex 35 evolution in January, he would have affirmed strongly that it would decrease slightly, the mean close value at the end of the month would be around 10870 and, hence, the standard deviation would be higher, around 100. Then, according to the previous formulas, the posterior parameters would be µ1 = (100 + 10)−1 (100 × 10870 + 10 × 10893.29) = 10872.12 V1 = (100 + 10)−1 = 0.0091 ν1 = 100 + 10 = 110 (100 × 1002 + 9 × 61.66 + 1000 (10893.29 − 10870)2 ) 110 s1 = = 95.60 110 This means that there is a difference of almost 20 points between the Bayesian estimation and the non-Bayesian for the mean close value of January. When the month of January would have passed, we could compare both results and we could note that the Bayesian estimation was closer to the finally real mean close value and standard deviation: 10871.2 and 112.44. In figure 2.1, it can be seen how the blue line representing the Bayesian estimation is closer to the cyan line representing the final real mean close value than the red line representing the frequentist estimation: 2.2.2 Multivariate Normal distribution Now, let us consider that we have an observable vector y of d components with the multivariate Normal distribution: y N (µ, Σ) (2.22) where the first parameter is the mean column vector and the second one is the variance-covariance matrix. Extending what was said above to the multivariate case, we have: 13
  • 32. 2. Bayesian Data Analysis −3 x 10 7 Frequentist Approach Bayesian Approach 6 Real Mean Colse Value in January 5 4 3 2 1 0 10000 10200 10400 10600 10800 11000 11200 11400 11600 11800 12000 Figure 2.1: Univariate Normal Example 1 f (y|µ, Σ) ∝ Σ−1/2 exp − (y − µ) Σ−1 (y − µ) (2.23) 2 And for n iid observations: n −n/2 1 f (y1 , y2 , . . . , yn |µ, Σ) ∝ Σ exp − (yi − µ) Σ−1 (yi − µ) (2.24) 2 i=1 A multivariate generalization of the Scaled-Inverse χ2 is the Inverse Wishart distribution (see details in Appendix A), so the prior joint distribution is Λ0 f (θ|y) = f (µ, Σ|y) ∝ N − Inv − W ishart µ0 , , ν0 , Λ0 (2.25) k0 due to the fact that Σ µ|Σ N µ0 , (2.26) k0 Σ Inv − W ishart ν0 , Λ−1 0 (2.27) 14
  • 33. 2. Bayesian Data Analysis Univariate Normal Multivariate Normal Expression y N (µ, σ 2 ) y N (µ, Σ) Parameters to estimate µ, σ 2 µ, Σ 2 µ|σ 2 N µ0 , σ0 k µ|Σ Σ N µ0 , k0 Prior Distributions σ2 Inv − χ2 ν0 , σ0 2 Σ Inv − W ishart ν0 , Λ−1 0 2 σ0 µ, σ 2 N − Inv − χ2 µ0 , 2 k0 , ν0 , σ0 µ, Σ N − Inv − W ishart µ0 , k0 , ν0 , Λ−1 Σ 0 2 µ|σ 2 N µ1 , σ1 k µ|Σ Σ N µ1 , k1 Posterior Distributions σ2 Inv − χ2 ν1 , σ1 2 Σ Inv − W ishart ν1 , Λ−1 1 2 σ1 µ, σ 2 N − Inv − χ2 µ1 , 2 k1 , ν1 , σ1 µ, Σ N − Inv − W ishart µ1 , Λ1 , ν1 , Λ1 k 1 Table 2.2: Comparison between Univariate and Multivariate Normal The posterior results are the same that were told for the univariate case but applying these distri- butions. For those interested readers, more information in [Gelm04] or [Cong06]. A summary is shown in Table 2.2 in order to get the most important ideas. 2.2.3 Other distributions As it has just been made with the Normal distribution, a Bayesian analysis for other distributions could be done. For instance, the exponential distribution is commonly used in reliability analysis. Because of this project will deal with the Normal distribution for the likelihood, it will not be explained in detail the analysis with other distributions. Table 2.3 shows the conjugate prior and posterior distributions 15
  • 34. 2. Bayesian Data Analysis for other likelihood distributions. More details can be found in [Cong06], [Gelm04], or [Rossi06]. Likelihood Parameter Conjugate Prior Hyperparameters Posterior Hyperparameters Bin(y|n, θ) θ Beta α, β α + y, β + n − y P (y|θ) θ Gamma α, β α + n¯, β + n y Exp (y|θ) θ Gamma α, β α + 1, β + y Geo(y|θ) θ Beta α, β α + 1, β + y Table 2.3: Conjugate distributions for other likelihood distributions 2.3 Hierarchical Models Hierarchical data arise when they are structured or related among them. When this occurs, standard techniques either assume that these groups belong to entirely different populations or ignore the ag- gregate information entirely. Hierarchical models provide a way of pooling the information for the disparate groups without assuming that they belong to precisely the same population. Suppose we have collected data about some random variable Y from m different populations with n observations for each population. Let yij represent observation j from population i. Now suppose yij f (θi ), where θi is a vector of parameters for population i. Furthermore, θi f (Θ), where Θ may also be a vector. Until this point, we have only rewritten what it was said previously. 16
  • 35. 2. Bayesian Data Analysis Now let us extend the model, and assume that the parameters Θ11 , Θ12 that govern the distribution of the θ’s are themselves random variables and assign a prior distribution to these variables as well: Θ f (ψ) (2.28) where Θ is called the hyperprior. The vector parameter ψ for the hyperprior may be ”known” and represents our prior beliefs about Θ or, in theory; we can also assign a probability distribution for these quantities as well, and proceed to another layer of hierarchy. According to [Gelm04], the idea of exchangeability will be used to create a joint probability distribution model for all the parameters θ. A formal definition to explain what exchangeability consists of is: ”The parameters θ1 , θ2 , . . . , θn are exchangeable in their joint distribution if f (θ1 , θ2 , . . . , θn ) is invariant to permutations in the index 1, 2, . . . , n”. This means that if no information other than the data is available to distinguish any of the θi from any of the others, and no ordering of the parameters can be made, one must assume symmetry among the parameters in the prior distribution. So we can treat the parameters for each sub-population as exchangeable units. This can be formulated by: f θ1 , θ2 , . . . , θn |Θ = Πl f θi |Θ i=1 (2.29) The prior joint distribution is now: f θ1 , θ2 , . . . , θn , Θ = f θ1 , θ2 , . . . , θn |Θ f (Θ) (2.30) And conditioning on the data, it yields: f θ1 , θ2 , . . . , θn |y = f θ1 , θ2 , . . . , θn , Θ f y|θ1 , θ2 , . . . , θn , Θ (2.31) Perhaps the most important point in practice is that non-hierarchical models are usually inappro- priate for hierarchical data, while non-hierarchical data can be modelled following the hierarchical structure and assigning concrete values to the hyperprior parameters. This kind of models will be used in Bayesian regression models with autocorrelated errors, as it will be seen in the following chapters. 17
  • 36. 2. Bayesian Data Analysis For more details about Bayesian hierarchical models, the reader is referenced to [Cong06], [Gelm04] and [Rossi06]. 2.4 Nonparametric Bayesian To overcome the limitations that have been mentioned throughout this chapter, it is the nonparametric approach which achieves to get through and to reduce the restrictions of the parametric approach. This kind of analysis can be performed through the so-called Dirichlet Process, which allows us to express in a simple way the prior distributions or the distribution family of F , where F is the distri- bution function of the studied variable. This process has a parameter, called α, which is transformed into a distribution probability. According to [Mat´ 06], a Dirichlet Process for F (t) requires to know: e • A previous proposal for F (t), F0 (t), that corresponds to the distribution function that remarks the prior knowledge which the engineer has and it is denoted by α(t) F0 (t) = (2.32) M • A measure of the confidence about the previous proposal, denoted by M , and whose values can vary between 0 and ∞, depending on whether there is a total confidence in the data or in the previous proposal respectively. ˆ It can be demonstrated that the posterior distribution for F (t), Fn (t), with a sampling over n data, is given by ˆ Fn (t) = pn Fn (t) + (1 − pn )Fn (t) (2.33) M where Fn (t) is the empirical distribution function and pn = M +n . A more detailed information about the nonparametric approach and how Dirichlet processes are used can be found in [Mull04] or [Gosh03]. 18
  • 37. 2. Bayesian Data Analysis With this approach not only the limitation of the parametric approach related to the probability model of the variable to study is avoided, since no hypothesis is required, but also it allows us to confer a quantified importance to the prior knowledge which the engineer gives, depending on the confidence on the certainty about this knowledge. 19
  • 38. Chapter 3 Posterior Simulation 3.1 Introduction A practical problem with Bayesian inference is the difficulty of summarizing realistically complex posterior distributions. In most practical problems, posterior densities will not take the form of any well-known and understood density, so summary statistics, such as the posterior mean and variance of parameters of interest, will not be analytically available. It is at this point where the importance of the Bayesian computation arises and any computational tools are required to gain meaningful inference from the posterior distribution. Its importance is such that the computing revolution of the last 20 years has led to a blossoming of Bayesian methods in many fields such Econometrics, Ecology or Health. Regarding to this, the most transcendent simulation methods are the Markov chain Monte Carlo methods (MCMC). MCMC methods date from the original work of [Metr53], who were interested in methods for the efficient simulation of the energy levels of atoms in a crystalline structure. The original idea was subsequently generalized by [Hast70], but its true potential was not fully realized within the statistical literature until [Gelf90] demonstrated its application to the estimation of inte- grals commonly occurring in the context of Bayesian statistical inference. As [Berg05] points up, the underlying principle is simple: if one wishes to sample randomly from a specific probability distribution then design a Markov chain whose long-time equilibrium is that distribution, write a computer program to simulate the Markov chain, run it for a time long enough to be confident that approximate equilibrium has been attained, then record the state of the Markov 20
  • 39. 3. Posterior Simulation chain as an approximate draw from equilibrium. The technique has been developed strongly in different fields and with rather different emphases in the computer science community concerned with the study of random algorithms (where the em- phasis is on whether the resulting algorithm scales well with increasing size of the problem), in the spatial statistics community (where one is interested in understanding what kinds of patterns arise from complex stochastic models), and also in the applied statistics community (where it is applied largely in Bayesian contexts, enabling researchers to formulate statistical models which would other- wise be resistant to effective statistical analyses). The development of the theoretical work also benefits the development of statistical applications. The MCMC simulation techniques have been applied to develop practical statistical inferences for almost all problems in (bio) statistics, for example, the problems in longitudinal data analysis, im- age analysis, genetics, contagious disease epidemics, random spatial pattern, and financial statistical models such as GARCH and stochastic volatility. The simplicity of the underlying principle of MCMC is a major reason for its success. However a substantial complication arises as the underlying target problem becomes more complex; namely, how long should one run the Markov chain so as to ensure that it is close to equilibrium? According to [Gelm04], with n = 100 independent samples should be enough for reasonable posterior summaries, but in some cases more samples are needed to assure more accuracy. 3.2 Markov chains The essential theory required in developing Monte Carlo methods based on Markov chains is pre- sented here. The most fundamental result is that certain Markov chains converge to a unique invariant distribution, and can be used to estimate expectations with respect to this distribution. But in order to reach this conclusion, some concepts need to be defined firstly. A Markov chain is a series of random variables, X0 , . . . , Xn , also called a statistic process, in which only the value of Xn−1 influences the distribution of Xn . Formally: P (Xn = xn |X0 = x0 , . . . , Xn−1 = xn−1 ) = P (Xn = xn |Xn−1 = xn−1 ) (3.1) 21
  • 40. 3. Posterior Simulation where the Xn−1 have a common range called the state space of the Markov chain. The common language to refer to different situations in which a Markov chain can be found is the following. If Xn = i, it is said that the chain is in the state i in the step n or that it has the value i in the step n. This language confers the chain certain dynamic view, which is corroborated by the main tool to study it: the transition probabilities P (Xn+1 = j|Xn = i), which are represented by the transition matrix P = (Pij ) with Pij = P (Xn+1 = j|Xn = i) . This is used to show the probability of changing of state i to state j. Due to the fact that in major interesting applications Markov chains are homogeneous, the transi- tion matrix can be defined from the initial probability, P0 = P (X1 = j|X0 = i). Regarding to this, a Markov chain Xt is homogeneous if P (Xn+1 = j|Xn = i) = P (X1 = j|X0 = i) for all n, i, j. Furthermore, using Chapman- Kolmogorov equation, it can be shown that, given the transition matrixes P and, for step n, Pn of a homogenous Markov chain, then Pn = P n . On the other hand we will see the concepts of invariant or stationary distribution, ergodicity and irreducibility, which are indispensable to reach the main result. It will be assumed that Xt is a ho- mogenous Markov chain. Then, vector P is an invariant distribution of the chain Xt if satisfies: a) πj ≥ 0 such as j πj = 1. b) π = πP . That is, a stationary distribution over the states of a Markov chain is one that persists forever once it is reached. The concept of ergodic state requires making other definitions clear such as recurrence and aperi- odicity: • The state i is recurrent if P (Xn = i|X0 = i) = 1 for any n ≥ 1. Otherwise, it is transient. Moreover, i will be positive recurrent if the expected (average) return time is finite, and null recurrent if it is not. 22
  • 41. 3. Posterior Simulation • The period of a state i, denoted by d, is defined as di = mcd(n : [Pn ]ii > 0). The state i is aperiodic if di = 1, or periodic if it is greater. Then a state is ergodic if it is positive recurrent and aperiodic. The last concept to define is the irreducibility. A set of states C ∈ S, where S is the set of all possible states, is irreducible if for all i, j ∈ C: • i and j have the same period. • i is transient if and only if j is transient. • i is recurrent if and only if j is null recurrent. Now, having all these concepts in mind, we can know if a Markov chain has a stationary distribu- tion with next lemma: Lema 3.2.1. Let Xt be a homogenous and irreducible Markov chain. The chain will have only one stationary distribution if, and only if, all the states are positive recurrent. In that case, it will have inputs given by πi = µi −1 , where µi denotes the expected return time of the state i. The relation with the long time behaviour is given by this other lemma: Lema 3.2.2. Let Xt be a homogenous, irreducible and aperiodic Markov chain. Then 1 [Pn ]ij −→ for all i, j ∈ S as n ∞ (3.2) µi 3.3 Monte Carlo Integration Monte Carlo integration estimates the integral E[g(θ)] by obtaining samples θt , t = 1, . . . , n from the posterior distribution p(θ|y) and averaging n 1 E[g(θ)] = g(θt ) (3.3) n t=1 where the function g(θ) represents the function of interest to estimate. Note that if samples θt , t = 1, . . . , n has p(θ|y) as its stationary distribution, the θt form a Markov chain. 23
  • 42. 3. Posterior Simulation 3.4 Gibbs sampler In many models, it is not easy to draw directly from the posterior distribution p(θ|y). However, if the parameter θ is partitioned into several blocks as θ = (θ1 , . . . , θp ) where θj for j = 1, . . . , p, then the full conditional posterior distributions, p(θ1 |y, θ2 , . . . , θp ), . . . , p(θp |y, θ1 , . . . , θp−1 ) , could be sim- ple to draw from to obtain a sequence θ1 , . . . , θp . For instance, in the Normal linear regression model it is convenient to set j=2, with θ1 = β and θ2 = σ 2 , and the full conditional distributions would be p(θ1 = β|y, θ2 = σ 2 ) and p(θ2 = σ 2 |y, θ1 = β), which are very useful in the Normal independent model which will be explained later. The Gibbs sampler is defined by iterative sampling from each of those p conditional distributions: 1. Set a starting value, θ0 = (θ2 , . . . , θp ). 0 0 2. Take random draws 1 0 0 - θ1 from p(θ1 |y, θ2 , . . . , θp ) 1 1 0 - θ2 from p(θ2 |y, θ1 , . . . , θp ) . -. . 1 1 1 - θp from p(θp |y, θ1 , . . . , θp−1 ) 3. Repeat step 2 as necessary. 4. Reject those θ affected by θ0 = (θ2 , . . . , θp ), that is the first p − 1 draws, and average the rest 0 0 of draws applying the Monte Carlo integration. For instance, in the Normal regression model we would have: 1. Set a starting value, θ0 = (θ2 = (σ 2 )0 ). 0 2 2. Take random draws - θ1 = β1 from p(θ1 = β|y, θ2 = (σ 2 )0 ) 1 1 0 2 - θ2 = (σ 2 )1 from p(θ2 = σ 2 |y, θ1 = β) 1 2 1 3. Repeat step 2 as necessary. 1 1 4. Eliminate those θ1 = β1 and average the rest of draws applying the Monte Carlo integration. 24
  • 43. 3. Posterior Simulation Those values dropped which are affected by the starting point are called the burn-in. Generally, any set of values which are discarded in a MCMC simulation is called the burn-in. The size of the burn-in period is the subject of current research in MCMC methods. As the state of each draw depends on the state of the previous one, the sequence is a Markov chain. More detail information can found in [Chen00], [Mart01] or [Rossi06]. 3.5 Metropolis-Hastings sampler and its special cases 3.5.1 Metropolis-Hastings sampler The Metropolis-Hastings method is adequate to simulate models that are not conditionally conjugate. Furthermore, it can be combined with the Gibbs sampler to simulate posterior distributions where some of the conditional posterior distributions are easy to sample from and other ones are not. As the algorithms above explained, this is based on formulating a Markov chain, but using a proposal distribution, q(.|θt ), which depends on the current state θt , to generate a new proposed sample θ∗ . This proposal is accepted as the next state with probability given by p(θ∗ |y)q(θt |θ∗ ) α(θt , θ∗ ) = min 1, (3.4) p(θt |y)q(θ∗ |θt ) If the point θ∗ is not accepted, then the chain does not move and θt+1 = θt . According to [Mart01], the steps to follow are: 1. Initialize the chain to θ0 and set t=0. 2. Generate a candidate point θ∗ from q(.|θt ). 3. Generate U from a uniform (0,1) distribution. 4. If U ≤ α(θt , θ∗ ) then set θt+1 = θ∗ , else set θt+1 = θt . 5. Set t=t+1 and repeat steps 2 trough 5. 6. Take the average of the draws g(θ1 ), . . . , g(θn ) Note that it should be, not only recommendable, but also essential that the proposal distribution q(·|θt ) were easy to sample from. 25
  • 44. 3. Posterior Simulation There are some special cases of this method. The most important are briefly explained below. As well as those, it can shown according to [Gelm04] that the Gibbs sampler is another special case of the Metropolis-Hastings algorithm where the proposal point is always accepted. 3.5.2 Metropolis sampler This method is a particular case of the Metropolis-Hastings sampler where the proposal distribution has to be symmetric. That is, q(θ∗ |θt ) = q(θt |θ∗ ) (3.5) for all θ∗ and θt . Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.6) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.5.3 Random-walk sampler This special case refers to a proposal distribution of the form q(θ∗ |θt) = q(|θt − θ∗ |) (3.7) And the candidate point is θ∗ = θt + z, where z is called the increment random variable from q. Then, the probability of accepting the new point is p(θ = θ∗ |y) α(θt , θ∗ ) = min 1, (3.8) p(θ = θt |y) The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.5.4 Independence sampler The last variation has a proposal distribution such that q(θ∗ |θt ) = q(θ∗ ) (3.9) So it does not depend on θt . Then, the probability of accepting the new point is 26
  • 45. 3. Posterior Simulation p(θ∗ |y)p(θt ) w(θ∗ ) α(θt , θ∗ ) = min 1, = min 1, (3.10) p(θt |y)p(θ∗ ) w(θt ) where p(θ|y) w(θ) = (3.11) q(θ) It is important to remark that to make this method works well, the proposal distribution q should be very similar to the posterior distribution p(θ|y). The same procedure seen in the Metropolis-Hastings sampler has to be followed. 3.6 Importance sampling Importance sampling is a variance reduction technique that can be used in the Monte Carlo method. The idea behind this method is that certain values of the input random variables in a simulation have more impact on the parameter being estimated than others. So instead of taking a simple average, importance sampling takes a weighted average. Let q(θ) be a density from which is easy to obtain random draws θ(s) for s = 1, . . . , S. Then q(θ) is called the importance function, and the importance sampling can be defined: PS (s) )g(θ (s) ) s=1 w(θ p(θ=θ(s) |y) The function gs = ˆ PS (s) ) , where w(θ(s) ) = q(θ=θ (s) ) , converges to E[g(θ)|y] as s=1 w(θ S −→ inf. p∗ (θ|y) In fact, w(θ(s) ) can be formulated by w(θ(s) ) = q ∗ (θ|y) , where the new densities are proportional to the old ones. For more information and details about Markov chain Monte Carlo methods and their application, the reader is referred to [Chen00], [Gilk95], [Berg05] and [Kend05]. 27