Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# A non-Gaussian model for causal discovery in the presence of hidden common causes

1.629 Aufrufe

Veröffentlicht am

Talk slides at 2016 Munich Workshop on
Causal Inference and Information Theory

Veröffentlicht in: Wissenschaft
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

### A non-Gaussian model for causal discovery in the presence of hidden common causes

1. 1. Shohei Shimizu Shiga University / Osaka University Japan 1 A non-Gaussian model for causal discovery in the presence of hidden common causes 2016 Munich Workshop on Causal Inference and Information Theory
2. 2. Abstract • Managing hidden common causes is essential in causal discovery • Non-causally-related observed variables can be correlated due to hidden common causes • Propose a linear non-Gaussian model for estimating causal direction in cases with hidden common causes 2
3. 3. Motivation Illustrative example
4. 4. Strong correlation btw chocolate consumption and number of Nobel laureates (Messerli12NEJM) 4 2002-2011Chocolate consumption (kg/yr/capita) Num.Nobellaureatesper10millionpop. Corr. 0.791 P-value < 0.001
5. 5. Eating more chocolate increases num. Nobel laureates? • Interpretational drift (Maurage+13, J. Nutrition) 5 Chclt Nobel ?Chclt Nobel or GDP GDP Chclt Nobel or GDP Corr. 0.791 P-value < 0.001 Nobel Chocolate Hidden Common cause Manage this gap! Hidden Common cause Hidden Common cause
6. 6. Formulating the problem
7. 7. Structural causal models (Pearl, 2000,2009; cf. Bollen, 1989) • A framework for describing causal relations • Generally speaking, if the value of 𝑥1 has been changed and then that of 𝑥2 changes, then 𝑥1 causes 𝑥2 7    2122 111 ,, , efxgx efgx   x1 x2 f e1 e2 GDP NobelChclt
8. 8. Challenge in causal discovery 8 Hidden common cause    2122 111 ,, , efxgx efgx   Data matrix x1 x2  21... ,~ xxpdii obs.1 Assume that either of the three generated the data Estimate which of the three models generated the data obs.nobs.2 … x1 x2 f x1 x2 f x1 x2 f e1 e2 e1 e2 e1 e2      fpepep ,, 21 Hidden common cause Hidden common cause    222 1211 , ,, efgx efxgx      222 111 , , efgx efgx        fpepep ,, 21      fpepep ,, 21
9. 9. Under what conditions can we manage the gap? • We have shown that it is possible under the three assumptions: i) linearity; ii) Acyclicty; iii) non-Gaussianity (Hoyer+08IJAR; Shimizu+14JMLR): • Classical Bayesian network approach incapable 9 x1 x2 ? x1 x2 or f1 f1 x1 x2 f1 or 21211212 11121 efxbx efx     21212 11122121 efx efxbx     22212 11121 efx efx    
10. 10. Basic non-Gaussian model (No hidden common cause) S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen Journal of Machine Learning Research 2006
11. 11. Linear Non-Gaussian Acyclic Model (LiNGAM) (Shimizu et al., 2006) • Identifiable: causal directions and coefficients • Various extensions including nonlinear (Hoyer+08NIPS, Zhang+09UAI) and cyclic (Lacerda+08UAI) models 11 i ij jiji exbx   x1 x2 x3 21b 23b13b 2e 3e 1e Linearity Acyclicity Non-Gaussian errors ei Independence of errors ei (no hidden common causes)
12. 12. 1212 Different directions give different data distributions Gaussian Non-Gaussian (ex. uniform) Model 1: Model 2: x1 x2 x1 x2 e1 e2 x1 x2 e1 e2 x1 x2 x1 x2 x1 x2 212 11 8.0 exx ex   22 121 8.0 ex exx       1varvar 21  xx     ,021  eEeE
13. 13. 13 Independent Component Analysis (ICA) (Jutten & Herault, 1991; Comon, 1994; Hyvarinen et al., 2001) • Observed variables are modeled by where – Hidden variables are non-Gaussian and independent • Then, mixing matrix A is identifiable up to permutation and scaling of the columns Asx   pjsj ,,1    p j jiji sax 1 or ix
14. 14. Sketch of the identifiability proof • Different directions give different zero/non- zero patterns of the mixing matrices – No zeros on the diagonal in the causal model – No permutation indeterminacy 14                     2 1 212 1 1 01 e e bx x  21212 11 exbx ex   A sx                     2 112 2 1 10 1 e eb x x  A sx22 12121 ex exbx   x1 x2 e1 e2 x1 x2 e1 e2 0 0 Model 1: Model 2:
15. 15. LiNGAM with hidden common causes P. O. Hoyer, S. Shimizu, A. Kerminen, and M. Palviainen Int. J. Approximate Reasoning 2008
16. 16. qf 2121 1 22 1 1 11 exbfx efx Q q qq Q q qq         i ij jij Q q qiqi exbfx   1  • Extension to incorporate non-Gaussian hidden common causes LiNGAM with hidden common causes (Hoyer+08IJAR) 16 where are independent:),,1( Qqfq  x1 x2 2e1e 1f 2f
17. 17. i ij jij Q q qiqi exbfx   1  2 :2 f ef1 :1 f ef qfWLG, hidden common causes are assumed to be independent Independent hidden common causes 17 x1 x2 2e1e 1f e 2f e x1 x2 2e1e 1f 2f Dependent hidden common causes                               2 1 2221 11 2221 11 2 1 00 2 1 f f aa a e e aa a f f f f
18. 18. Non-Gaussian x2 x1 Gaussian e1,e2, f1 x2 • Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given Different directions give different zero/non-zero patterns (Hoyer+08IJAR) 18 x1 x2 f1 x1 x2 f1 x1 x2 f1 Models 1. 2. 3.       **0 *0*       *** *0*       **0 *** A A
19. 19. Previous estimation methods (Hoyer+08IJAR; Henao+11JMLR) • Explicitly model hidden common causes • Do model comparison based on maximum likelihood principle or Bayesian approach • Need to specify their number and distributions, which is difficult in general 19 x1 x2 f1 x1 x2 orfQ f1 fQ … … 2e1e2e1e
20. 20. Our proposal: A Bayesian LiNGAM approach S. Shimizu and K. Bollen. Journal of Machine Learning Research, 2014 and something extra
21. 21. Key idea (1/2) • Transform the model to a model with no hidden common causes 21 )1( 1x )1( 2x )( 2 m x )1( 1x x1 x2 f1 fQ… 2e1e )1( 2e)1( 1e )( 2 m e)( 1 m e …… 21b 21b 21b )( 2 m  )1( 2 LiNGAM with no hidden common causes but with possibly different intercepts over obs. LiNGAM with hidden common causes )1( 1 )( 1 m 
22. 22. Key idea (2/2) • Include the sums of hidden common causes as the model parameters, i.e., observation-specific intercepts: • Not explicitly model hidden common causes – Neither necessary to specify the number of hidden common causes Q nor estimate the coefficients 22 )( 2 m  )( 2 )( 121 1 )( 2 )( 2 mm Q q m qq m exbfx   m-th obs.: q2 Obs.-specific intercept
23. 23. • Compare the marginal likelihoods wth data stndrdzd • Once a direction has been estimated, compute the posterior of the connection strength b21 or b12 • Many obs.-specific intercepts – Similar to mixed models and multi-level models – Informative prior )()( 121 )( 2 )( 2 )( 1 )( 1 )( 1 m i mmm mmm exbx ex     Bayesian model selection 23 ),,1;2,1()( nmim i  Model 3 (x1  x2) )( 2 )( 2 )( 2 )( 1 )( 212 )( 1 )( 1 mmm mmmm ex exbx     Model 4 (x1  x2)
24. 24. Prior for the observation-specific intercepts • Motivation: Central limit theorem – Sums of independent variables tend to be more Gaussian • Approximate the density by a bell-shaped curve dist. – Dependent due to hidden common causes • Select the hyper-parameter values that maximize the marginal likelihood 24    Q q m qq m Q q m qq m ff 1 )( 2 )( 2 1 )( 1 )( 1 ,  ~)( 2 )( 1       m m   t-distribution with sd , correlation , and DOF12 21, v }8.0,.6.0,4.0{, 21  )(m qf (here, 8)
25. 25. Error distributions and other priors used in the experiment • Error distributions – Fixed to be the Laplace distribution – Possible to be estimated assuming a family of generalized Gaussian distributions, for example • Priors for the other parameters 25 )75.0,0(~ )75.0,0(~ )1,1(~ 2 21 2 12 12 Nb Nb U  )1,0(~)( )1,0(~)( 2 1 Uestd Uestd )(),( 21 epep
26. 26. Experiment on sociology data
27. 27. Sociology data • Source: General Social Survey (n=1380) – Non-farm background, ages 35-44, white, male, in the labor force, no missing data for any of the covariates, 1972-2006 • 15 pairs with known temporal directions (Duncan+1972) 27 Status attainment model (Duncan et al., 1972) x2: Son’s Income
28. 28. Numbers of successes (n=1380) 28 FE ✔ ✔ Cf. LiNGAM-GU-UK (Chen+13NECO) 0.20; PNL(Zhang+09UAI): 0.60 Known (temporal) orderings of 15 pairs Son’s Education Father’s Education Son’s Income Son’s Occupation … f1 f1
29. 29. Conclusion
30. 30. Conclusion • Estimation of causal direction in the presence of hidden common causes is a major challenge in causal discovery • Proposed a linear non-Gaussian SEM approach – Not necessary to model individual hidden common causes • Future directions – Cyclic cases: Using some prior for forcing the identifiability condition of Lacerda+08UAI? – Non-stationarity: Combining with Kun’s method (Huang+15IJACI)? 30