Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
Wird geladen in …5
×

# A Bayesian Search for the Needle in the Haystack

1.231 Aufrufe

Veröffentlicht am

Master project by Timothée Stumpf-Fétizon. Barcelona GSE Master's Degree in Data Science

Veröffentlicht in: Wirtschaft & Finanzen
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Als Erste(r) kommentieren

• Gehören Sie zu den Ersten, denen das gefällt!

### A Bayesian Search for the Needle in the Haystack

1. 1. A Bayesian Search for the Needle in the Haystack Timoth´ee Stumpf F´etizon [timstf@gmail.com] July 23, 2015 Barcelona GSE
2. 2. Introduction - Abstract Bayesian Model Averaging is a technique that systematically searches a model space (e.g. linear regression models) for promising models. It estimates the coeﬃcients as weighted averages of all models, depending on how likely they are given the data. Estimates will be close to the ones you would obtain from ﬁtting the ”true” nested model, and no knowledge of that model is required. Implementing the technique in high dimensions is computationally challenging, and I propose an improvement to the state-of-the-art algorithm. 2
3. 3. Motivation
4. 4. Motivation - Monte Carlo Methods • Many inference problems in statistics are intractable analytically (integration, massive discrete sample spaces, ...). • We can approximate many quantities by sampling from the distribution and computing sample statistics. This is the Monte Carlo approach. • The problem may be as fundamental as computing an expected value: x f (x) d x Even if the integral does not have an analytical solution, we can draw from f (x) and compute its sample mean! The more we draw, the better, so we need eﬃcient algorithms. 4
5. 5. Motivation - Approximating a Distribution • Consider the problem of choosing a hypothesis Hk from space H. • The Bayesian solution is, as always, seductively simple (but only on the face of it). Compute the posterior given the data X! π(Hk |X) = π(X|Hk ) π(Hk ) π(X) • H may be too large to compute all probabilities. But if most of the probabilities are very close to zero, we do not need to! Instead, we draw from π(H|X) and compute empirical frequencies. 5
6. 6. Motivation - Selecting Models • Speciﬁc setting: selecting a linear regression model. As always, y = Xβ + • Given a design matrix X with d predictors, there are 2d possible models. Don’t even try to look at all of them if d > 20. • But if d is large, we deﬁnitely need an algorithmic model selection method, like the one above. This is a problem we want to solve! • Monte Carlo is the way to go here. 6
7. 7. Approach
8. 8. Approach - Markov Chain Sampling H(0) Q(H|H(0) ) −−−−−−→ H(1) Q(H|H(1) ) −−−−−−→ · · · • Markov Chain Monte Carlo is an umbrella term for algorithms that draw dependent samples from any distribution. • The samples are drawn from a Markov process - the current sample H(k) only depends on the last sample through the distribution Q H|H(k−1) . • Less dependence is better because it slows down the convergence of sample statistics. We can reduce dependence by setting Q(·) appropriately. 8
9. 9. Approach - Drawing Models • In our speciﬁc setting, we draw linear regression models from the posterior π(H|X, y). A model H ∈ {0, 1} d is deﬁned by the subset of variables in X it includes, so Hi is 1 if the i-th variable is included and 0 otherwise. • The standard version of Q(·) ﬂips a random element of H. Hence, a variable was previously included, it will be excluded, and vice versa. This rule is symmetric: a move from H(k) to H(k+1) is as likely as a move in the opposite direction. • It also privileges intermediate model sizes. If more variables are excluded than included, ﬂipping a random bit is more likely to include a new variable. • This is inappropriate if the model size should be small or large according to our prior expectation of the model size. 9
10. 10. Approach - Model Size Priors Figure 1: This is the binomial prior on the model size. It peaks around its expected value, which is given by the prior probability p that any single variable is included times the total number of variables d. In this plot, d is 20 and p is 1/4 (green), 1/2 (red) and 3/4 (blue). 10
11. 11. Approach - Setting Q(·) • We should set Q(·) such that it is consistent with that (or any) prior. • One way of doing that is to set asymmetric probabilities of growing and shrinking the model. If the prior is given by π(dH ), where dH is the model size, I propose to increase the model size with probability Pr growth|d (k) H = π d (k) H + 1 π d (k) H + 1 + π d (k) H − 1 • This discourages the proposal of models that are much larger than the prior expectation. 11
12. 12. Approach - Prior-consistent Q(·) Figure 2: The curves represent the probability of growing the model for diﬀerent choices of Q(·). The gray line corresponds to the standard symmetric rule, the colored lines to my asymmetric rule. In this plot, d is 20 and p is 1/4 (green), 1/2 (red) and 3/4 (blue). 12
13. 13. Approach - Autocorrelation • Sample dependence is the performance criterion for an MCMC algorithm. Less dependence means we’re getting more out of every draw. • We measure dependence by way of an inclusion indicator’s autocorrelation function (ACF). • I test the rule with a 20-variable simulation where a fourth of the variables is relevant. Thus, I set p = 1/4, which corresponds to the green prior. 13
14. 14. Approach - Simulations Figure 3: The curves represent ACFs for the 20 variable inclusion indicators for the standard (blue) and the modiﬁed rule (red). Most curves decay much more quickly when using the modiﬁed rule, which translates to more eﬃcient sampling. 14
15. 15. Final Remarks • In the case above, my eﬀective sample size increased up to 4 times the previous amount. • This is work in progress and there is no telling whether the rule works better in all situations! • If you’re interested in using BMA in practice, you can fork the software on my github (working knowledge of Python required!): https://github.com/timsf/bma 15