SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Forest Learning from Data
Joe Suzuki
July 17, 2017
Road Map
PART-II: July 24, 2017 (based on PART-I)
1. Estimating Mutual Information (15 mins)
2. Learning Forests from Data (25 mins)
3. Learning Bayesian Networks from Data (5 mins)
4. Exercise (45 mins)
PART-I: July 17, 2017
A Bayesian Approach to Data Compression
Entropy
Mutual Information (MI)
Correlation may not detect independence!
ML Estimator of MI
Bayesian Testing of Independence
Bayesian Estimation of MI
From Stirling’s formula
For large n
Experiments 500 trials for binary seq. of length n=200
BNSL: a CRAN package (J. Suzuki and J. Kawahara, 2017)
Bayesian Network Learning Structure
https://cran.r-project.org/web/packages/BNSL/index.html
collects research results by Joe Suzuki.
install(“BNSL”)
library(BNSL)
n=200; p=0.5; x=rbinom(n,1,p); y=rbinom(n,1,p) # seqs are generated
mi(x,y, proc=9) # I_n
mi(x,y) # J_n
Tree Approximation
Factorization w.r.t. A Tree
Find E s.t. D(P||P’) is minimized
Kruskal’s Algorithm
Chow-Liu Algorithm
Experiments using Asia data set
• library(BNSL)
• mm=mi_matrix(asia, proc=9) # I_n is used
• edge.list=kruskal(mm)
• g=graph_from_edgelist(edge.list, directed=FALSE)
• plot(g)
• mm=mi_matrix(asia) # J_n is used
• edge.list=kruskal(mm)
• g=graph_from_edgelist(edge.list, directed=FALSE)
• plot(g)
Asia (8 variables)
S. Lauritzen, D. Spiegelhalter. Local
Computation with Probabilities on
Graphical Structures and their
Application to Expert Systems (with
discussion). Journal of the Royal
Statistical Society: Series B
(Statistical Methodology), 50(2):157-
224, 1988
Asia Data Set
I. A. Beinlich, H. J. Suermondt, R. M.
Chavez, and G. F. Cooper. The ALARM
Monitoring System: A Case Study
with Two Probabilistic Inference
Techniques for Belief Networks. In
Proceedings of the 2nd European
Conference on Artificial Intelligence
in Medicine, pages 247-256.
Springer-Verlag, 1989.
Alarm (37 varibles)
Alarm Data Set
Learning Bayesian Networks from Data
The # of candidate structures with p nodes is more than exponential with p
25 DAGs exist for p=3 but only 11 BNs are considered
7 local scores and 11 global scores
• Estimating Mutual Information
• Learning Forests from Data
• Learning Bayesian Networks from Data
Summary
Problem Set #2

Weitere ähnliche Inhalte

Mehr von Joe Suzuki

RとPythonを比較する
RとPythonを比較するRとPythonを比較する
RとPythonを比較するJoe Suzuki
 
R集会@統数研
R集会@統数研R集会@統数研
R集会@統数研Joe Suzuki
 
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...Joe Suzuki
 
分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減するJoe Suzuki
 
連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定Joe Suzuki
 
E-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka UniversityE-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka UniversityJoe Suzuki
 
AMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップAMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップJoe Suzuki
 
CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要Joe Suzuki
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningJoe Suzuki
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...Joe Suzuki
 
研究紹介(学生向け)
研究紹介(学生向け)研究紹介(学生向け)
研究紹介(学生向け)Joe Suzuki
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresJoe Suzuki
 
MDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/MeasureMDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/MeasureJoe Suzuki
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...Joe Suzuki
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousJoe Suzuki
 
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...Joe Suzuki
 
The Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu AlgorithmThe Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu AlgorithmJoe Suzuki
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence TestJoe Suzuki
 

Mehr von Joe Suzuki (20)

RとPythonを比較する
RとPythonを比較するRとPythonを比較する
RとPythonを比較する
 
R集会@統数研
R集会@統数研R集会@統数研
R集会@統数研
 
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...E-learning Development of Statistics and in Duex: Practical Approaches and Th...
E-learning Development of Statistics and in Duex: Practical Approaches and Th...
 
分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する分枝限定法でモデル選択の計算量を低減する
分枝限定法でモデル選択の計算量を低減する
 
連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定連続変量を含む条件付相互情報量の推定
連続変量を含む条件付相互情報量の推定
 
E-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka UniversityE-learning Design and Development for Data Science in Osaka University
E-learning Design and Development for Data Science in Osaka University
 
UAI 2017
UAI 2017UAI 2017
UAI 2017
 
AMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップAMBN2017 サテライトワークショップ
AMBN2017 サテライトワークショップ
 
CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要CRAN Rパッケージ BNSLの概要
CRAN Rパッケージ BNSLの概要
 
A Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent LearningA Conjecture on Strongly Consistent Learning
A Conjecture on Strongly Consistent Learning
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
A Generalization of Nonparametric Estimation and On-Line Prediction for Stati...
 
研究紹介(学生向け)
研究紹介(学生向け)研究紹介(学生向け)
研究紹介(学生向け)
 
Bayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal MeasuresBayesian Criteria based on Universal Measures
Bayesian Criteria based on Universal Measures
 
MDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/MeasureMDL/Bayesian Criteria based on Universal Coding/Measure
MDL/Bayesian Criteria based on Universal Coding/Measure
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
 
Universal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or ContinuousUniversal Prediction without assuming either Discrete or Continuous
Universal Prediction without assuming either Discrete or Continuous
 
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...Bayesian network structure estimation based on the Bayesian/MDL criteria when...
Bayesian network structure estimation based on the Bayesian/MDL criteria when...
 
The Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu AlgorithmThe Universal Bayesian Chow-Liu Algorithm
The Universal Bayesian Chow-Liu Algorithm
 
Bayes Independence Test
Bayes Independence TestBayes Independence Test
Bayes Independence Test
 

Kürzlich hochgeladen

Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Kürzlich hochgeladen (20)

Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Forest Learning from Data

  • 1. Forest Learning from Data Joe Suzuki July 17, 2017
  • 2. Road Map PART-II: July 24, 2017 (based on PART-I) 1. Estimating Mutual Information (15 mins) 2. Learning Forests from Data (25 mins) 3. Learning Bayesian Networks from Data (5 mins) 4. Exercise (45 mins) PART-I: July 17, 2017 A Bayesian Approach to Data Compression
  • 5. Correlation may not detect independence!
  • 7.
  • 8. Bayesian Testing of Independence
  • 9.
  • 10. Bayesian Estimation of MI From Stirling’s formula For large n
  • 11. Experiments 500 trials for binary seq. of length n=200
  • 12. BNSL: a CRAN package (J. Suzuki and J. Kawahara, 2017) Bayesian Network Learning Structure https://cran.r-project.org/web/packages/BNSL/index.html collects research results by Joe Suzuki. install(“BNSL”) library(BNSL) n=200; p=0.5; x=rbinom(n,1,p); y=rbinom(n,1,p) # seqs are generated mi(x,y, proc=9) # I_n mi(x,y) # J_n
  • 15. Find E s.t. D(P||P’) is minimized
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Experiments using Asia data set • library(BNSL) • mm=mi_matrix(asia, proc=9) # I_n is used • edge.list=kruskal(mm) • g=graph_from_edgelist(edge.list, directed=FALSE) • plot(g) • mm=mi_matrix(asia) # J_n is used • edge.list=kruskal(mm) • g=graph_from_edgelist(edge.list, directed=FALSE) • plot(g)
  • 23. Asia (8 variables) S. Lauritzen, D. Spiegelhalter. Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion). Journal of the Royal Statistical Society: Series B (Statistical Methodology), 50(2):157- 224, 1988
  • 25. I. A. Beinlich, H. J. Suermondt, R. M. Chavez, and G. F. Cooper. The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, pages 247-256. Springer-Verlag, 1989. Alarm (37 varibles)
  • 27. Learning Bayesian Networks from Data The # of candidate structures with p nodes is more than exponential with p
  • 28. 25 DAGs exist for p=3 but only 11 BNs are considered
  • 29.
  • 30. 7 local scores and 11 global scores
  • 31. • Estimating Mutual Information • Learning Forests from Data • Learning Bayesian Networks from Data Summary