SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Summary:
Probability-based Learning
(from book: Machine learning for predictve data analytics)
Duyen Do
1
NỘI DUNG
Bayes’ Theorem
Fundamentals
Bayes Prediction
Standard Approach: The Naïve Bayes Model
Conditional Independence and Factorization
Smoothing
Extensions and Variations
Continuous Features
Bayesian Network
Summary
Q&A
Probability basic
2
 Probability function: P()
Probability mass function (categorical features) and Probability density function (continuous features)
0 ≤ P (f = level) ≤ 1
𝑙 ∈ 𝑙𝑒𝑣𝑒𝑙𝑠(𝑓)
𝑃 𝑓 = 𝑙 = 1.0
 Prior probability or Unconditional probability
P(X)
 Posterior probability or Conditional probability
P(X|Y)
 Joint probability
P(X, Y)
 Joint probability distribution
Example
Table 6.1
Fundamentals3
 Conditional probability in terms of joint probability:
𝑃(𝑋|𝑌) =
𝑃(𝑋, 𝑌)
𝑃(𝑌)
 Product rule:
P(X,Y) = P(X|Y) P(Y) = P(Y|X) P(X)
0 ≤ P(X|Y) ≤ 1
𝑖 𝑃 𝑋𝑖 𝑌 𝑃(𝑌) = 1.0
 Chain rule:
P(A,B,C,…,Z) = P(Z) P(Y|Z) P(X|Y,Z)…P(A|B,…,X,Y,Z)
 Theorem of Total Probability:
𝑃(𝑋) =
𝑖
𝑃 𝑋 𝑌𝑖 𝑃(𝑌𝑖)
Fundamentals4
Fundamentals: Bayes’s Theorem
P(X|Y) =
𝑃(𝑌|𝑋)𝑃(𝑋)
𝑃(𝑌)
A doctor inform a patient both bad news and good news:
- Bad news: 99% has a serious desease
- Good news: the desease is rarely and only 1 in 10,000 people
What is actually probability that patient has the desease?
Using Bayes’Theorem:
𝑃 𝑑 𝑡 =
𝑃 𝑡 𝑑 𝑃 𝑑
𝑃 𝑡
P(t) = P(t|d)P(d) + P(t|-d)P(-d) = (0.99 * 0.0001) + (0.01 * 0.9999) = 0.0101
P(d|t) =
0.99 ∗0.0001
0.0101
= 0.0098
5
With P(t=l): the prior probability of target feature t taking the level l
P(q[1],…,q[m]): joint probability of descriptive feature
P(q[1],…,q[m] | t=l): the conditional probability
Example 1:
Table 6.1
What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=false,
VOMITING=true?
=> P(m | h, -f, v)
Bayesian prediction
P(t=l | q[1],…,q[m]) =
𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙)
𝑃(𝑞 1 ,…,𝑞 𝑚 )
6
P(m | h, -f, v) =
𝑃 ℎ,−𝑓,𝑣 𝑚)𝑃(𝑚)
𝑃(ℎ,−ℎ,𝑣)
P(m) =
|{𝐼𝐷5,𝐼𝐷8,𝐼𝐷10}|
|{𝐼𝐷1,𝐼𝐷2,𝐼𝐷3,𝐼𝐷4,𝐼𝐷5,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷9,𝐼𝐷10}|
= 0.3
P(h,-f,v) =
|{𝐼𝐷3,𝐼𝐷4,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷10}|
|{𝐼𝐷1,𝐼𝐷2,𝐼𝐷3,𝐼𝐷4,𝐼𝐷5,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷9,𝐼𝐷10}|
= 0.6
P(h, -f, v | m) = P(h|m) P(-f | h,m) P(v | -f, h, m)
=
|{𝐼𝐷8,𝐼𝐷10}|
|{𝐼𝐷5,𝐼𝐷8,𝐼𝐷10}|
x
|{𝐼𝐷8,𝐼𝐷10}|
|{𝐼𝐷8,𝐼𝐷10}|
x
|{𝐼𝐷8,𝐼𝐷10}|
|{𝐼𝐷8,𝐼𝐷10}|
=
2
3
x
2
2
x
2
2
= 0.66
P(m | h, -f, v) = (0.66*0.3) / 0.6 = 0.33
P(-m | h, -f, v) =
𝑃 ℎ,−𝑓,𝑣 −𝑚)𝑃(−𝑚)
𝑃(ℎ,−ℎ,𝑣)
=
𝑃 ℎ −𝑚)𝑃(−𝑓 ℎ,−𝑚 𝑃 𝑣 −𝑓,ℎ,−𝑚)𝑃(−𝑚)
𝑃(ℎ,−ℎ,𝑣)
= 0.667
Bayesian prediction
P(t=l | q[1],…,q[m]) =
𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙)
𝑃(𝑞 1 ,…,𝑞 𝑚 )
7
Bayesian prediction
Maximum a posterior predction:
M (q) = arg max P(t=l | q[1],…,q[m])
= arg max l ∈ 𝑙𝑒𝑣𝑒𝑙𝑠( 𝑡)
𝑃 𝑞 1 , … , 𝑞 𝑚 𝑡 = 𝑙)𝑃(𝑡 = 𝑙)
𝑃(𝑞 1 , … , 𝑞 𝑚 )
Bayesian MAP prediction model:
M (q) = arg max P(t=l | q[1],…,q[m])
= arg max l ∈ 𝑙𝑒𝑣𝑒𝑙𝑠( 𝑡)
𝑃 𝑞 1 , … , 𝑞 𝑚 𝑡 = 𝑙)𝑃(𝑡 = 𝑙)
8
Example 2:
Table 6.1
What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=true,
VOMITING=false?
 P(m | h, f, -v) =
𝑃 ℎ 𝑚 𝑃 𝑓 ℎ, 𝑚 𝑃 −𝑣 𝑓, ℎ, 𝑚 𝑃(𝑚)
𝑃(ℎ,𝑓,−𝑣)
=
0.66 ∗ 0 ∗0 ∗0.3
0.1
= 0.0
P(-m| h, f, -v) = 1.0 – 0.0 = 1.0
 Overfit data
Bayesian prediction
P(t=l | q[1],…,q[m]) =
𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙)
𝑃(𝑞 1 ,…,𝑞 𝑚 )
9
Conditional independence
when X and Y (independence) share the same cause Z.
 P(X | Y, Z) = P(X | Z)
P(X, Y | Z) = P(X | Z)*P(Y | Z)
 Chain rule:
P(q[1],…,q[m] | t=l) = P(q[1] | t=l)*P(q[2] | t=l)*…*P(q[m] | t=l)
= Π P(q[i] | t=l)
P(t=l | q[1],…,q[m]) =
Π P(q[i] | t=l) ∗ P(t=l)
𝑃(𝑞 1 ,…,𝑞 𝑚 )
= Joint probability:
P(H,F,V,M) = P(M) * P(H|M) * P(F|M) * P(V|M)
Conditional independence and Factorization10
If X and Y are independent, then:
P(X|Y) = P(X)
P(X, Y) = P(X) P(Y)
Example 2:
Table 6.1
What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=true,
VOMITING=false?
P(m | h, f, -v) =
P(h |m) ∗ P(f | m) ∗ P(−v | m) ∗ P(m)
𝑃(ℎ,𝑓,−𝑣)
=
P(h |m) ∗ P(f | m) ∗ P(−v | m) ∗ P(m)
Σ𝑖
𝑃 ℎ 𝑀𝑖 ∗𝑃 𝑓 𝑀𝑖 ∗𝑃 −𝑣 𝑀𝑖 ∗𝑃(𝑀𝑖)
= 0.1948
P(-m | h, f, -v) = 0.8052
Conditional independence and Factorization11
If X and Y are independent, then:
P(X|Y) = P(X)
P(X, Y) = P(X) P(Y)
Example
Table 6.2
Naïve Bayes Model12
MAP (Maximum a posterior):
M(q) = arg maxl ∈ levels(t) ((Πi P(q[i] | t=l) * P(t=l))
Example
Table 6.2
CREDIT HISTORY = paid, GUANRANTOR/COAPPLICANT = guarantor, ACCOMMODATION = free?
Extensions and Variations: Smoothing13
Laplace smoothing:
𝑃 𝑓 = 𝑙 𝑡) =
𝑐𝑜𝑢𝑛𝑡 𝑓 = 𝑙 𝑡) + 𝑘
𝑐𝑜𝑢𝑛𝑡 𝑓 𝑡) + (𝑘 ∗ 𝐷𝑜𝑚𝑎𝑖𝑛 𝑓 )
Transform continuous feature to categorical feature with:
 Equal-width binning
 Equal-frequency binning
Example
Table 6.11
Extensions and Variations:
Continuous feature - Binning14
A Bayesian network, Bayes network, Bayes(ian) model or probabilistic directed acyclic graphical
model is a probabilistic graphical model that represents a structural relationship - a set of random variables
and their conditional dependencies - via a directed acyclic graph (DAG)
P(A,B) = P(B|A) * P(A)
Ex1:
P(a, -b) = P(-b|a) * P(a)
= 0.7 * 0.4 = 0.28
The probability of an event x1,…,xn
P(x1,…,xn) = ∏ P(xi | Parents(xi))
Bayesian Network15
A
B
P(A=T) P(A=F)
0.4 0.6
A P(B=T | A) P(B=F | A)
T 0.3 0.7
F 0.4 0.6
Ex2:
P (a, -b, -c, d)
= P(-b|a,-c) * P(-c|d) * P(a) * P(d)
= 0.5 * 0.8 * 0.4 * 0.4
= 0.064
Bayesian Network16
A
B
C
D
P(D=T)
0.4
P(A=T)
0.4 D P(C=T|D)
T 0.2
F 0.5
A C P(B=T|A,C)
T T 0.2
T F 0.5
F T 0.4
F F 0.3
The conditional probability of node xi with n nodes
P(xi | x1,…, xi-1, xi+1,…,xn)
= P(xi | Parents(xi) ∏ P(xj | Parents(xj))
with j ∈ Children(xi)
Ex2:
P(c | -a, b, d) = P(c | d) * P(b | -a, c)
= 0.2 * 0.4 = 0.08
Bayesian Network17
A
B
C
D
P(D=T)
0.4
P(A=T)
0.4 D P(C=T|D)
T 0.2
F 0.5
A C P(B=T|A,C)
T T 0.2
T F 0.5
F T 0.4
F F 0.3
P(t |d[1], …, d[n]) = P(t) * ∏ P(d[j] | t)
Naïve Bayes Classifier18
Target
Descriptive
feature 1
Descriptive
feature 2
Descriptive
feature n
P(A, B, C) = P(C | A, B) * P(B |A) * P(A) P(A, B, C) = P(A| C, B) * P(B |C) * P(C)
Building Bayesian Network19
B
C
A
A
C
B
P(A=T)
0.6
P(C=T)
0.2
A P(B=T|A)
T 0.333
F 0.5
C P(B=T|C)
T 0.5
F 0.375
A B P(C|A,B)
T T 0.25
T F 0.125
F T 0.25
F F 0.25
C B P(A|B,C)
T T 0.5
T F 0.5
F T 0.5
F F 0.7
Building Bayesian Network20
Hybrid approach:
 1. Given the topology of the network
 2. Induce the CPT
What is the best topology structure to give the algorithm as input?
 Causal graphs
Example
Table 6.18
Building Bayesian Network21
A potential causal theory:
The more equal in a society, the higher the investment that society will make in health and
education, and this in turn result in a lower of corruption
SY LE P(CPI|SY, LE)
L L 1.0
L H 0
H L 1.0
H H 1.0
CPI
GC
SC
LE
GC P(SY=L|GC)
L 0.2
H 0.8 GC P(LE=L|GC)
L 0.2
H 0.8
B P(GC=L)
T 0.5
Using Bayesian Network make predictions22
M (q) = arg max l ∈ levels(t) BayesianNetwork(t=l,q)
Making prediction with missing descriptive feature values:
GC = high, SC = high =>?
Example
Table 6.18

Weitere ähnliche Inhalte

Was ist angesagt?

Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
ESCOM
 

Was ist angesagt? (20)

Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Gaussian Process Regression
Gaussian Process Regression  Gaussian Process Regression
Gaussian Process Regression
 
The n Queen Problem
The n Queen ProblemThe n Queen Problem
The n Queen Problem
 
Hebbian Learning
Hebbian LearningHebbian Learning
Hebbian Learning
 
Regression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machineRegression, Bayesian Learning and Support vector machine
Regression, Bayesian Learning and Support vector machine
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Modeling with Recurrence Relations
Modeling with Recurrence RelationsModeling with Recurrence Relations
Modeling with Recurrence Relations
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Minimum spanning tree (mst)
Minimum spanning tree (mst)Minimum spanning tree (mst)
Minimum spanning tree (mst)
 
2.2.ppt.SC
2.2.ppt.SC2.2.ppt.SC
2.2.ppt.SC
 
Predicate calculus
Predicate calculusPredicate calculus
Predicate calculus
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Fuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoningFuzzy rules and fuzzy reasoning
Fuzzy rules and fuzzy reasoning
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
Shading
ShadingShading
Shading
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Fuzzy relations
Fuzzy relationsFuzzy relations
Fuzzy relations
 
Rough sets and fuzzy rough sets in Decision Making
Rough sets and  fuzzy rough sets in Decision MakingRough sets and  fuzzy rough sets in Decision Making
Rough sets and fuzzy rough sets in Decision Making
 

Ähnlich wie Probability based learning (in book: Machine learning for predictve data analytics)

Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Advanced-Concepts-Team
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
Pablo Ginestet
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
Frank Nielsen
 

Ähnlich wie Probability based learning (in book: Machine learning for predictve data analytics) (20)

Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECUnit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC
 
LectureNotes for Bayesian methods in Recommendation system
LectureNotes for Bayesian methods in Recommendation systemLectureNotes for Bayesian methods in Recommendation system
LectureNotes for Bayesian methods in Recommendation system
 
Madrid easy
Madrid easyMadrid easy
Madrid easy
 
Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...Equational axioms for probability calculus and modelling of Likelihood ratio ...
Equational axioms for probability calculus and modelling of Likelihood ratio ...
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
 
sada_pres
sada_pressada_pres
sada_pres
 
Low Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse Problems
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
AIML unit-2(1).ppt
AIML unit-2(1).pptAIML unit-2(1).ppt
AIML unit-2(1).ppt
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Slides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometrySlides: Hypothesis testing, information divergence and computational geometry
Slides: Hypothesis testing, information divergence and computational geometry
 
Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...Unique fixed point theorems for generalized weakly contractive condition in o...
Unique fixed point theorems for generalized weakly contractive condition in o...
 
11.[29 35]a unique common fixed point theorem under psi varphi contractive co...
11.[29 35]a unique common fixed point theorem under psi varphi contractive co...11.[29 35]a unique common fixed point theorem under psi varphi contractive co...
11.[29 35]a unique common fixed point theorem under psi varphi contractive co...
 
A/B Testing for Game Design
A/B Testing for Game DesignA/B Testing for Game Design
A/B Testing for Game Design
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Model Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular GaugesModel Selection with Piecewise Regular Gauges
Model Selection with Piecewise Regular Gauges
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 

Probability based learning (in book: Machine learning for predictve data analytics)

  • 1. Summary: Probability-based Learning (from book: Machine learning for predictve data analytics) Duyen Do 1
  • 2. NỘI DUNG Bayes’ Theorem Fundamentals Bayes Prediction Standard Approach: The Naïve Bayes Model Conditional Independence and Factorization Smoothing Extensions and Variations Continuous Features Bayesian Network Summary Q&A Probability basic 2
  • 3.  Probability function: P() Probability mass function (categorical features) and Probability density function (continuous features) 0 ≤ P (f = level) ≤ 1 𝑙 ∈ 𝑙𝑒𝑣𝑒𝑙𝑠(𝑓) 𝑃 𝑓 = 𝑙 = 1.0  Prior probability or Unconditional probability P(X)  Posterior probability or Conditional probability P(X|Y)  Joint probability P(X, Y)  Joint probability distribution Example Table 6.1 Fundamentals3
  • 4.  Conditional probability in terms of joint probability: 𝑃(𝑋|𝑌) = 𝑃(𝑋, 𝑌) 𝑃(𝑌)  Product rule: P(X,Y) = P(X|Y) P(Y) = P(Y|X) P(X) 0 ≤ P(X|Y) ≤ 1 𝑖 𝑃 𝑋𝑖 𝑌 𝑃(𝑌) = 1.0  Chain rule: P(A,B,C,…,Z) = P(Z) P(Y|Z) P(X|Y,Z)…P(A|B,…,X,Y,Z)  Theorem of Total Probability: 𝑃(𝑋) = 𝑖 𝑃 𝑋 𝑌𝑖 𝑃(𝑌𝑖) Fundamentals4
  • 5. Fundamentals: Bayes’s Theorem P(X|Y) = 𝑃(𝑌|𝑋)𝑃(𝑋) 𝑃(𝑌) A doctor inform a patient both bad news and good news: - Bad news: 99% has a serious desease - Good news: the desease is rarely and only 1 in 10,000 people What is actually probability that patient has the desease? Using Bayes’Theorem: 𝑃 𝑑 𝑡 = 𝑃 𝑡 𝑑 𝑃 𝑑 𝑃 𝑡 P(t) = P(t|d)P(d) + P(t|-d)P(-d) = (0.99 * 0.0001) + (0.01 * 0.9999) = 0.0101 P(d|t) = 0.99 ∗0.0001 0.0101 = 0.0098 5
  • 6. With P(t=l): the prior probability of target feature t taking the level l P(q[1],…,q[m]): joint probability of descriptive feature P(q[1],…,q[m] | t=l): the conditional probability Example 1: Table 6.1 What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=false, VOMITING=true? => P(m | h, -f, v) Bayesian prediction P(t=l | q[1],…,q[m]) = 𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙) 𝑃(𝑞 1 ,…,𝑞 𝑚 ) 6
  • 7. P(m | h, -f, v) = 𝑃 ℎ,−𝑓,𝑣 𝑚)𝑃(𝑚) 𝑃(ℎ,−ℎ,𝑣) P(m) = |{𝐼𝐷5,𝐼𝐷8,𝐼𝐷10}| |{𝐼𝐷1,𝐼𝐷2,𝐼𝐷3,𝐼𝐷4,𝐼𝐷5,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷9,𝐼𝐷10}| = 0.3 P(h,-f,v) = |{𝐼𝐷3,𝐼𝐷4,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷10}| |{𝐼𝐷1,𝐼𝐷2,𝐼𝐷3,𝐼𝐷4,𝐼𝐷5,𝐼𝐷6,𝐼𝐷7,𝐼𝐷8,𝐼𝐷9,𝐼𝐷10}| = 0.6 P(h, -f, v | m) = P(h|m) P(-f | h,m) P(v | -f, h, m) = |{𝐼𝐷8,𝐼𝐷10}| |{𝐼𝐷5,𝐼𝐷8,𝐼𝐷10}| x |{𝐼𝐷8,𝐼𝐷10}| |{𝐼𝐷8,𝐼𝐷10}| x |{𝐼𝐷8,𝐼𝐷10}| |{𝐼𝐷8,𝐼𝐷10}| = 2 3 x 2 2 x 2 2 = 0.66 P(m | h, -f, v) = (0.66*0.3) / 0.6 = 0.33 P(-m | h, -f, v) = 𝑃 ℎ,−𝑓,𝑣 −𝑚)𝑃(−𝑚) 𝑃(ℎ,−ℎ,𝑣) = 𝑃 ℎ −𝑚)𝑃(−𝑓 ℎ,−𝑚 𝑃 𝑣 −𝑓,ℎ,−𝑚)𝑃(−𝑚) 𝑃(ℎ,−ℎ,𝑣) = 0.667 Bayesian prediction P(t=l | q[1],…,q[m]) = 𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙) 𝑃(𝑞 1 ,…,𝑞 𝑚 ) 7
  • 8. Bayesian prediction Maximum a posterior predction: M (q) = arg max P(t=l | q[1],…,q[m]) = arg max l ∈ 𝑙𝑒𝑣𝑒𝑙𝑠( 𝑡) 𝑃 𝑞 1 , … , 𝑞 𝑚 𝑡 = 𝑙)𝑃(𝑡 = 𝑙) 𝑃(𝑞 1 , … , 𝑞 𝑚 ) Bayesian MAP prediction model: M (q) = arg max P(t=l | q[1],…,q[m]) = arg max l ∈ 𝑙𝑒𝑣𝑒𝑙𝑠( 𝑡) 𝑃 𝑞 1 , … , 𝑞 𝑚 𝑡 = 𝑙)𝑃(𝑡 = 𝑙) 8
  • 9. Example 2: Table 6.1 What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=true, VOMITING=false?  P(m | h, f, -v) = 𝑃 ℎ 𝑚 𝑃 𝑓 ℎ, 𝑚 𝑃 −𝑣 𝑓, ℎ, 𝑚 𝑃(𝑚) 𝑃(ℎ,𝑓,−𝑣) = 0.66 ∗ 0 ∗0 ∗0.3 0.1 = 0.0 P(-m| h, f, -v) = 1.0 – 0.0 = 1.0  Overfit data Bayesian prediction P(t=l | q[1],…,q[m]) = 𝑃 𝑞 1 ,…,𝑞 𝑚 𝑡=𝑙)𝑃(𝑡=𝑙) 𝑃(𝑞 1 ,…,𝑞 𝑚 ) 9
  • 10. Conditional independence when X and Y (independence) share the same cause Z.  P(X | Y, Z) = P(X | Z) P(X, Y | Z) = P(X | Z)*P(Y | Z)  Chain rule: P(q[1],…,q[m] | t=l) = P(q[1] | t=l)*P(q[2] | t=l)*…*P(q[m] | t=l) = Π P(q[i] | t=l) P(t=l | q[1],…,q[m]) = Π P(q[i] | t=l) ∗ P(t=l) 𝑃(𝑞 1 ,…,𝑞 𝑚 ) = Joint probability: P(H,F,V,M) = P(M) * P(H|M) * P(F|M) * P(V|M) Conditional independence and Factorization10 If X and Y are independent, then: P(X|Y) = P(X) P(X, Y) = P(X) P(Y)
  • 11. Example 2: Table 6.1 What is probability a person has MENINGTIS when he/she has HEADACHE=true, FEVER=true, VOMITING=false? P(m | h, f, -v) = P(h |m) ∗ P(f | m) ∗ P(−v | m) ∗ P(m) 𝑃(ℎ,𝑓,−𝑣) = P(h |m) ∗ P(f | m) ∗ P(−v | m) ∗ P(m) Σ𝑖 𝑃 ℎ 𝑀𝑖 ∗𝑃 𝑓 𝑀𝑖 ∗𝑃 −𝑣 𝑀𝑖 ∗𝑃(𝑀𝑖) = 0.1948 P(-m | h, f, -v) = 0.8052 Conditional independence and Factorization11 If X and Y are independent, then: P(X|Y) = P(X) P(X, Y) = P(X) P(Y)
  • 12. Example Table 6.2 Naïve Bayes Model12 MAP (Maximum a posterior): M(q) = arg maxl ∈ levels(t) ((Πi P(q[i] | t=l) * P(t=l))
  • 13. Example Table 6.2 CREDIT HISTORY = paid, GUANRANTOR/COAPPLICANT = guarantor, ACCOMMODATION = free? Extensions and Variations: Smoothing13 Laplace smoothing: 𝑃 𝑓 = 𝑙 𝑡) = 𝑐𝑜𝑢𝑛𝑡 𝑓 = 𝑙 𝑡) + 𝑘 𝑐𝑜𝑢𝑛𝑡 𝑓 𝑡) + (𝑘 ∗ 𝐷𝑜𝑚𝑎𝑖𝑛 𝑓 )
  • 14. Transform continuous feature to categorical feature with:  Equal-width binning  Equal-frequency binning Example Table 6.11 Extensions and Variations: Continuous feature - Binning14
  • 15. A Bayesian network, Bayes network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model that represents a structural relationship - a set of random variables and their conditional dependencies - via a directed acyclic graph (DAG) P(A,B) = P(B|A) * P(A) Ex1: P(a, -b) = P(-b|a) * P(a) = 0.7 * 0.4 = 0.28 The probability of an event x1,…,xn P(x1,…,xn) = ∏ P(xi | Parents(xi)) Bayesian Network15 A B P(A=T) P(A=F) 0.4 0.6 A P(B=T | A) P(B=F | A) T 0.3 0.7 F 0.4 0.6
  • 16. Ex2: P (a, -b, -c, d) = P(-b|a,-c) * P(-c|d) * P(a) * P(d) = 0.5 * 0.8 * 0.4 * 0.4 = 0.064 Bayesian Network16 A B C D P(D=T) 0.4 P(A=T) 0.4 D P(C=T|D) T 0.2 F 0.5 A C P(B=T|A,C) T T 0.2 T F 0.5 F T 0.4 F F 0.3
  • 17. The conditional probability of node xi with n nodes P(xi | x1,…, xi-1, xi+1,…,xn) = P(xi | Parents(xi) ∏ P(xj | Parents(xj)) with j ∈ Children(xi) Ex2: P(c | -a, b, d) = P(c | d) * P(b | -a, c) = 0.2 * 0.4 = 0.08 Bayesian Network17 A B C D P(D=T) 0.4 P(A=T) 0.4 D P(C=T|D) T 0.2 F 0.5 A C P(B=T|A,C) T T 0.2 T F 0.5 F T 0.4 F F 0.3
  • 18. P(t |d[1], …, d[n]) = P(t) * ∏ P(d[j] | t) Naïve Bayes Classifier18 Target Descriptive feature 1 Descriptive feature 2 Descriptive feature n
  • 19. P(A, B, C) = P(C | A, B) * P(B |A) * P(A) P(A, B, C) = P(A| C, B) * P(B |C) * P(C) Building Bayesian Network19 B C A A C B P(A=T) 0.6 P(C=T) 0.2 A P(B=T|A) T 0.333 F 0.5 C P(B=T|C) T 0.5 F 0.375 A B P(C|A,B) T T 0.25 T F 0.125 F T 0.25 F F 0.25 C B P(A|B,C) T T 0.5 T F 0.5 F T 0.5 F F 0.7
  • 20. Building Bayesian Network20 Hybrid approach:  1. Given the topology of the network  2. Induce the CPT What is the best topology structure to give the algorithm as input?  Causal graphs Example Table 6.18
  • 21. Building Bayesian Network21 A potential causal theory: The more equal in a society, the higher the investment that society will make in health and education, and this in turn result in a lower of corruption SY LE P(CPI|SY, LE) L L 1.0 L H 0 H L 1.0 H H 1.0 CPI GC SC LE GC P(SY=L|GC) L 0.2 H 0.8 GC P(LE=L|GC) L 0.2 H 0.8 B P(GC=L) T 0.5
  • 22. Using Bayesian Network make predictions22 M (q) = arg max l ∈ levels(t) BayesianNetwork(t=l,q) Making prediction with missing descriptive feature values: GC = high, SC = high =>? Example Table 6.18