SlideShare ist ein Scribd-Unternehmen logo
1 von 114
Probably, Definitely, 
Maybe 
James McGivern
Does intelligent life 
exist in our galaxy?
N = R* · fp · ne · fL · fi · fc · L 
N = the number of civilisa!ons in our galaxy with which radio-communica!on might be 
possible 
R* = the average rate of star forma!on in our galaxy 
fp = the frac!on of those stars that have planets 
ne = the average number of planets that can poten!ally support life per star that has 
planets 
fl = the frac!on of planets that could support life that actually develop life at some point 
fi = the frac!on of planets with life that actually go on to develop intelligent life 
(civilisa!ons) 
fc = the frac!on of civilisa!ons that develop a technology that releases detectable signs of 
their existence into space 
L = the length of !me for which such civilisa!ons release detectable signals into space
MAths 
Warning
- Chapter I - 
Bayesian Probability 
& 
Bayesian Statistics
The Mean 
Given a set of variables 
we define the mean (average)
Example: Amazon 
‣ Users can rate a product by vo!ng 1-5 stars 
‣ product ra!ng is the mean of the user votes 
Q: how can we rank products with different number of votes?
Simple “Bayesian Ranking” 
‣ C - the mean vote across all items 
‣ v - number of votes for a given item 
‣ R - the mean ra!ng of the item 
‣ m - number of votes required to be in top n percen!le
Book Number of votes (v) Average Ra!ng (R) Bayesian Rank 
A 100 5 4.333333... 
B 70 5 4.17 
C 50 4 3.5 
D 30 4 3.375 
E 20 3.5 3.14 
F 30 3 3 
G 5 2 2.91 
C = 3 m = 50
A Detour In To 
Probability Basics
Events 
‣ Consider an experiment whose set of all possible outcomes Ω, called the 
sample space, is {x1,... ,xn} 
‣ We define an event E as a subset of Ω and say that E occurs iff the 
experiment outcomes equal E 
‣ Union 
‣ Intersec!on
Probability Axioms: 1 
‣ We denote the probability of an event A by P(A) 
‣ For any event A, 0 ≤ P(A) ≤ 1 
‣ The certain event, Ω, always occurs and P(Ω)=1 
‣ The impossible event Ø never occurs and P(Ø)=0
Probability Axioms: 2 
‣ We say that events A and B are disjoint if A⋂B = Ø 
‣ if A and B are disjoint then P(A⋃B) = P(A) + P(B) 
‣ for a set of disjoint events, the addi!on law gives us:
Probability Lemmas 
‣ For any event E, P(Ec) = P(¬E) = 1 - P(E) 
‣ P(A−B) = P(A) − P(A∩B) 
‣ If A⊂B then P(A) ≤ P(B) 
‣ P(A∪B) = P(A) + P(B) − P(A∩B)
Random Variables 
‣ Consider a random variable X, then {X≤x} is the event that X has a value less 
than or equal to the real number x. Hence the probability that this event 
occurs is P(X≤x) 
‣ If we allow x to vary we can define the distribu!on func!on 
F(x) = P(X≤x) -∞ < x < ∞ 
‣ Note that: 
‣ P(X>x) = 1 - F(x) 
‣ P(a<X<b) = F(b) - F(a)
Probability Mass Function 
The probability mass func!on (PMF) of X 
is a probability measure of the possible values for the 
random variable. Of course 
PMF for a fair 
6 sided dice
Example: PMF of 2 Fair Die 
1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 
1,6 
1,5 2,5 2,6 
1,4 2,4 3,4 3,5 3,6 
1,3 2,3 3,3 4,3 4,4 4,5 4,6 
1,2 2,2 3,2 4,2 5,2 5,3 5,4 5,5 5,6 
1,1 2,1 3,1 4,1 5,1 6,1 6,2 6,3 6,4 6,5 6,6 
2 3 4 5 6 7 8 9 10 11 12
Uniform Distribution 
0.5 
0.375 
0.25 
0.125 
0 
0 1 2 3 4 5 6 7 8 9
Bernoulli Distribution 
0.7 
0.525 
0.35 
0.175 
0 
0 1
Binomial Distribution
Probability Axioms: 3 
‣ Given two events A and B we define the condi!onal probability P(A | B) by: 
P(A⋂B) = P(A | B) P(B) 
‣ Given two events A and B we say that they are independent iff: 
P(A⋂B) = P(A) P(B)
Prior & Posterior Distributions 
‣ P(θ) the prior probability distribu!on of θ 
‣ P(θ|X) is the posterior probability of θ given X 
‣ The posterior probability can be written in the memorable form as: 
posterior probability * likelihood * prior probability 
‣ If the posterior distribu!ons P(θ|X) are in the same family as the prior 
probability distribu!on P(θ), the prior and posterior are then called conjugate 
distribu!ons, and the prior is called a conjugate prior
Bayes’ Theorem 
From the defini!on of condi!onal probability we know: 
Hence 
Or if B1,...,Bn form a par!!on of the sample space
Example: Pregnancy Tests 
‣ Pregnancy tests detect the presence of hCG, or human chorionic gonadotropin, in 
the blood or urine 
‣ A “false posi!ve” is when the test incorrectly returns a posi!ve result, and “false 
nega!ve” when it incorrectly returns a false one. 
‣ False posi!ves in the hcg test include: 
‣ non-pregnant produc!on of the hCG molecule 
‣ use of drugs containing the hCG molecule 
‣ Some medica!ons cause a posi!ve reac!on in the tests 
‣ The actual probability of being pregnant depends on many messy biological factors
Pregnant? 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Pregnant? 
P(B) 
P(¬B) 
P(B⋂A) 
P(B⋂¬A) 
P(¬B⋂A) 
P(¬B⋂¬A) 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Pregnant? 
P(B) 
P(¬B) 
P(A|B) 
P(¬A|B) 
P(A|¬B) 
P(¬A|¬B) 
P(B) 
P(¬B) 
P(B⋂A) 
P(B⋂¬A) 
P(¬B⋂A) 
P(¬B⋂¬A) 
Pregnant 
Not Pregnant 
Test Posi!ve 
Test Nega!ve 
Test Posi!ve 
Test Nega!ve
Q:Given the test is posi!ve what is the probability 
that the subject is pregnant?
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant 
P(A) 
P(¬A) 
P(A⋂B) 
P(¬A⋂B) 
P(A⋂¬B) 
P(¬A⋂¬B)
Test Result 
Posi!ve 
Nega!ve 
Pregnant 
False Posi!ve 
False Nega!ve 
Not Pregnant 
P(A) 
P(¬A) 
P(B|A) 
P(¬B|A) 
P(¬B|A) 
P(¬B|¬A) 
P(A) 
P(¬A) 
P(A⋂B) 
P(¬A⋂B) 
P(A⋂¬B) 
P(¬A⋂¬B)
Example: Disease Diagnosis 
Q: Consider the set S={si : i = 0...N} of all disease symptoms, and Ds+={si : si in S} are 
the diagnos!cally inclusive symptoms of Ebola, and Ds-={si : si in S} the 
exclusionary symptoms. Given a pa!ent has some combina!on of symptoms 
Ps={si : si in S}, what is the probability they have Ebola? 
‣ The presence or absence of some symptoms can completely rule out the 
diagnosis 
‣ By upda!ng the model based on real outcomes it is possible to provide more 
and more accurate predic!ons
Expected Values & Moments 
‣ Suppose random variable X can take value x1 with probability p1, value x2 with 
probability p2, and so on. Then the expecta!on of this random variable X is 
defined as: 
E[X] = p1x1 + p2x2 + ... + pkxk 
‣ The variance of a random variable X is its second central moment, the 
expected value of the squared devia!on from the mean μ = E[X]: 
Var(X) = E[(X-μ)2]
Variance & Covariance 
‣ Variance is a measure of how far a set of numbers differs from the mean of 
those numbers. The square root of the variance is the standard devia!on σ 
‣ CERN uses the 5-sigma rule to rule out sta!s!cal anomalies in sensor 
readings, i.e. is the value NOT the expected value of noise 
‣ The covariance between two jointly distributed random variables X and Y 
with finite second moments is defined as: 
σ(X,Y) = E[(X - E[X]) • (Y - E[Y])]
Gaussian Distribution 
P(x) x μ σ 2σ 3σ 4σ 5σ
Beta Distribution
End of Detour
Example: Amazon (Revisited) 
‣ Users can rate a product by vo!ng 1-5 stars 
‣ product ra!ng is the mean of the user votes 
Q: how can we rank products with different number of votes?
Simple “Bayesian Ranking” 
Assume the vote posterior distribu!on is a Normal, then the prior is also a Normal*, 
with mean 
prior mean 
prior precision 
(*) http://en.wikipedia.org/wiki/Conjugate_prior 
precision of vote 
distribu!on
Example: YouTube 
‣ Users can rate a clip by vo!ng +/-1 
‣ clip ra!ng is the mean of the user votes 
‣ clips also record the number of views 
Q: how can we compare clip ra!ngs with different number of votes and views? 
Q: how can we make results more relevant? e.g. take in to account how old 
ra!ngs are, author provenance, cost of incorrect ranking promo!on
“Bayesian Rank” 
‣ Since this is a +/- Bernoulli Trial we can model the prior belief distribu!on by a 
beta func!on* 
‣ Let = upvote bias + number of up votes 
‣ Let = downvote bias + number of down votes + 1 
‣ Every !me we receive a new vote we just recalculate the distribu!on
‣ To map between a belief and a sor!ng criterion we make a decision using a 
loss func!on L 
‣ Since the value of L depends on the value of a random variable we use 
instead the expected value of L 
‣ Consider a mul!linear loss func!on: 
since we want to minimise the loss we have:
Extending The Loss Function 
‣ Suppose in addi!on to the vote counts we also record the !mestamp of the 
votes 
‣ Items becomes less relevant in the rank the longer it is since the last vote 
following a pattern of exponen!al decay 
‣ Hence the current up or down vote count is now determined by 
‣ Thus we derive an updated rank func!on from
The House THat Skynet Built 
‣ SkyNet SmartHome™ is a system designed to manage the state of various 
household resources, e.g. hea!ng, ligh!ng, media-centre, etc 
‣ it communicates with users smart phones to iden!fy their loca!on 
‣ its aims to are to maximise the comfort (e.g. room temperature, hot water) of 
the users and minimise on waste (e.g. power consump!on) 
‣ users can provide feedback to correct inappropriate behaviour, e.g turning 
the hea!ng up too high
Predicting Behaviour 
Q: Given the user is leaving the office what ac!ons should Skynet smarthome 
take? 
‣ the variables could include: 
‣ day of the week 
‣ work day or holiday 
‣ weather 
‣ season 
‣ calendar events
Bayesian Networks 
‣ A BN is a probabilis!c directed acyclic graphical model that represents a set of 
random variables and their condi!onal dependencies 
‣ Ver!ces may be observable quan!!es, latent variables, unknown parameters 
or hypotheses 
‣ Ver!ces that are not connected represent variables that are condi!onally 
independent of each other
Example: Cancer 
Pollu!on Smoker 
Cancer 
X-Ray Dyspnoea
Example: Cancer 
S P(S) 
TRUE 0.3 
FALSE 0.7 
C P(D+|C) 
T 0.65 
F 0.3 
P P(P) 
Low 0.9 
High 0.1 
C P(XRay+|C) 
T 0.9 
F 0.2 
P S P(C|PmS) 
Low T 0.03 
Low F 0.001 
High T 0.05 
High F 0.02
Predictive Reasoning 
Pollu!on Smoker 
Cancer 
X-Ray Dyspnoea 
Direc!on of Reasoning 
EVIDENCE 
QUERY 
QUERY QUERY
Diagnostic Reasoning 
Pollu!on Smoker 
QUERY QUERY 
Cancer 
X-Ray Dyspnoea 
Direc!on of Reasoning 
EVIDENCE 
QUERY
Intercausal Reasoning 
Pollu!on Smoker 
QUERY EVIDENCE 
Cancer 
EVIDENCE 
X-Ray Dyspnoea
Combined Reasoning 
Pollu!on Smoker 
Cancer 
EVIDENCE 
QUERY 
X-Ray Dyspnoea 
EVIDENCE
- Chapter II - 
Markov Models
Stochastic Process 
‣ A stochas!c process, or random process, is a collec!on of random variables 
represen!ng the evolu!on of some system over !me 
‣ Examples: stock market value and exchange rate fluctua!ons, audio and video 
signals, EKG & EEG readings 
‣ They can be classified as: 
‣ Discrete !me & discrete space 
‣ Discrete !me & con!nuous space 
‣ Con!nuous !me & discrete space 
‣ Con!nuous !me & con!nuous space
A Detour In To Matrix 
Algebra
Vectors & Matrices 
1x3 matrix 
or vector 3x2 matrix
Matrix Addition 
If A and B are two m by n matrices then addi!on is defined by:
Matrix Multiplication 
If A is n*m matrix and B is an m*p matrix then mul!plica!on is defined by: 
where
The Identity Matrix 
If A is n*m matrix and then the iden!ty matrix I is an m*n matrix such that A I=A. 
I is defined by: 
So the 3*3 iden!ty matrix is:
Inverse Matrix 
If A is n*n matrix and then the inverse of A , A-1, is an n*n matrix such that A 
A-1=I. Non-square matrices do not have inverses
Matrix Transposition 
If A is n*m matrix and then the transpose of A , AT, is an m*n matrix defined by: 
For example, consider a 2*2 matrix:
End of Detour
‣ A Markov chain is a directed graph whose ver!ces represent states and the 
edges the probability of transi!on between the two states 
‣ Frequently we use an adjacency matrix representa!on of the graph T called 
the transi!on matrix 
‣ Hence for a chain with N ver!ces the transi!on matrix is NxN 
‣ The ini!al state of the system, S0, is also an NxN matrix 
‣ The state evolves according to Sn+1 = SnT = STn 
‣ The value of Sn+1(i,j) is the probability of being at that state in step n+1
Example: British Weather 
‣ England is the land of rain. We never have two nice days in a row. In fact if a 
nice day is always followed by either rain or snow. If there is a change from 
rain or snow only half the !me is this a change to sunny weather 
Rain Nice Snow 
Rain 
Nice 
Snow
Snow 
1/4 
1/4 
1/4 
1/2 
1/4 
Rain Nice 
1/2 
1/2 
1/2 
0 
R N S 
R 
N 
S
Absorbing Markov Chains 
‣ A state si of a Markov chain is called absorbing if it is impossible to leave it (i.e., 
Ti,i = 1) 
‣ A Markov chain is absorbing if: 
‣ it has at least one absorbing state 
‣ every state is connected to at least one absorbing state 
‣ In an absorbing Markov chain, the probability that the process will be absorbed 
is 1 (i.e., Qn → 0 as n → ∞)
Example: The Wandering Drunk 
‣ Consider a city divided up in some some grid, e.g. square blocks. 
‣ The drunk can move 1 block per turn, each direc!on has equal probability 
‣ If the drunk reaches Home or the Bar they will stay there 
‣ Ques!ons we can answer: 
‣ What is the expect !me un!l an absorbing state is reached? 
‣ How many !mes does the drunk visit each intersec!on?
Ergodic Markov Chains 
‣ Markov chain is called a regular chain if some power of the transi!on matrix has 
only posi!ve elements 
‣ A Markov chain is called an ergodic chain if it is possible to go from every state 
to every state 
‣ Therefore every regular chain is ergodic but not every ergodic chain is 
regular! 
‣ It can be shown that Tn converges to a constant matrix with all posi!ve 
entries as n→∞
Example: Queuing Theory 
‣ Queuing theory is, naturally, the mathema!cal formula!on of queues, aimed at 
making predic!ons of queue length, wai!ng !mes and throughput 
‣ Single queueing nodes are usually described using Kendall's nota!on in the 
form A/S/C 
‣ First rigourous model was M/D/1, followed by M/D/k 
‣ M stands for Markov and means arrivals occur according to a Poisson process 
‣ D stands for determinis!c and means jobs arriving at the queue require a 
fixed amount of service 
‣ k describes the number of servers at the queueing node (k = 1, 2,...).
Hidden Markov models 
‣ The name is misleading! Nothing is unknown... 
‣ In a basic Markov model the states of the system are visible. E.g. we can see if 
it is snowing 
‣ In a hidden Markov Model the (en!re) state is not directly visible but some 
outputs dependent on the state are observable 
‣ However we s!ll need to know all the transi!on probability values!
Markov decision processes 
‣ Markov decision processes are an extension of Markov chains; the difference is 
the addi!on of ac!ons (allowing choice) and rewards (giving mo!va!on) 
‣ A Markov decision process is a 5-tuple (S, A, PA, RA, L) 
‣ The core problem is to choose an ac!on π that will maximise some cumula!ve 
func!on, e.g. 
‣ MDPs can be easily solved by linear (e.g. Simplex method) or dynamic programming 
(e.g. Map-Reduce)
- Chapter III - 
Kalman Filters
A Linear Dynamic System 
‣ Con!nuous !me defini!on: 
‣ Discrete !me defini!on: 
‣ The systems are called linear since given any two solu!ons x(t) & y(t) then 
any linear combina!on of these solu!ons is also a solu!on
Example: Climate Control 
‣ SkyNet SmartHome™ is able to monitor the temperature of rooms in the house 
and effect hea!ng/AC to regulate the temperature 
‣ The temperature sensors contain noise 
‣ hea!ng and AC are either ON or OFF 
‣ similar systems exist for humidity
Consider...
Kalman Filters 
‣ Developed ~1960 by Rudolf E. Kálmán, Peter Swerling, and Richard S. Bucy. 
‣ First implemented in NASA as part of the Apollo naviga!on computer 
‣ S!ll used in many aeronau!c and military applica!ons, e.g. submarines, cruise 
missiles, NASA Space Shuttle, ISS 
‣ There are generalisa!on of the basic Kalman filters for con!nuous !me 
systems as well as non-linear systems
‣ Kalman Filters work by making a predic!on of the future, get!ng a 
measurement from reality, comparing the two, modera!ng this difference, and 
adjus!ng its es!mate with this moderated value. 
‣ Kalman filters are: 
‣ discrete 
‣ recursive 
‣ Extremely accurate if you have a good model
Overview 
current 
es!mate measured 
value 
previous 
es!mate 
Kalman gain
Basic Kalman Filters 
‣ Modelled on a Markov chain built on linear operators perturbed by errors that 
may include Gaussian noise 
‣ The state of the system is represented by a vector of real numbers 
‣ The filter is recursive, i.e. only the es!mated state from the previous !me step 
and the current measurement are needed to compute the es!mate for the 
current state 
‣ typically we describe the algorithm in two phases: predict & update
1: Predict: State Estimation 
Predicted state 
Previous es!mate of 
state 
Control vector 
State transi!on 
matrix 
Control matrix
2: Predict: Error Estimation 
State transi!on 
matrix 
Es!mated process 
error covariance 
Covariance predic!on 
Previous 
covariance 
es!mate
3: Update: Innovation Covariance 
Observa!on Matrix 
Measurement Vector 
Covariance Innova!on 
Predicted state (step 1)
4: Update: Innovation Covariance 
Observa!on Matrix 
Covariance predic!on 
Covariance (step 2) 
Innova!on 
Es!mated measurement error 
covariance
5: Update: Kalman Gain 
Observa!on Matrix 
Covariance predic!on 
(step 2) 
Kalman Gain 
Covariance Innova!on 
(step 3)
6: Update: State 
Predicted state (step 1) Covariance Innova!on 
Kalman Gain (step 5) 
New state es!mate 
(step 3)
7: Update: Covariance 
New es!mate of 
error 
Kalman Gain (step 5) 
Observa!on Matrix Covariance 
predic!on (step 2)
Example: Voltmeter 
‣ Consider a voltmeter measuring a constant DC voltage via a sensor with noise. 
‣ The system can be described by: 
Vn = Vn-1 + wn 
‣ Since the voltage is constant using a Kalman filter allows us to filter out the 
noise wn 
‣ Also since this is a single state example all matrices are of size 1*1
‣ A: State transi!on - since the previous state should equal the current state 
A=1 
‣ H: Observa!on transform - since we’re taking direct measurements from the 
sensor H=1 
‣ B: Control matrix - we have no controls so B=0 
‣ Q: Process covariance - since we know the model very accurately Q=0.00001 
‣ R: Measurement covariance - we don’t trust the sensor too much so R=0.1 
‣ X: Ini!al state es!mate = any number 
‣ P: Inital covariance es!mate = 1 (because)
As a programmer your 
challenge is to find the right 
filter model and determine 
the values of the matrices
Example: Robo-copter 
XCell Tempest 
Helicopter 
Freezin 
Eskimo
RL:Helicopter 
‣ http://library.rl-community.org/wiki/Helicopter_(Java) 
‣ Sensors to determine: 
‣ bearing 
‣ accelera!on (velocity) 
‣ posi!on (GPS) 
‣ rota!onal rates 
‣ iner!al measurement unit 
‣ and more...
Summary
Events & Random Variables
Events & Random Variables 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Networks 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Markov Networks 
Chains 
Conditional 
Probability
Events & Random Variables 
Bayesian 
Markov Networks 
Chains Random 
Conditional 
Probability 
Walks
Events & Random Variables 
Bayesian 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Events & Random Variables 
Bayesian 
Kalman 
Filters 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Events & Random Variables 
Bayesian 
Kalman 
Filters 
Markov Networks 
Chains Random 
Markov DecisionW alks 
Conditional 
Probability 
Processes
Books 
‣ Introduc!on to Probability - Grindstead & Snell 
http://www.dartmouth.edu/~chance/teaching_aids/books_ar!cles/probability_book/book.html 
‣ Bayesian Ar!ficial Intelligence - Kevin B. Korb & Ann E. Nicholson 
‣ An Introduc!on to Stochas!c Modelling - Mark A Pinsky & Samuel Karlin 
‣ Stochas!c Processes and Filtering Theory - Andrew H. Jazwinski 
‣ Ar!ficial Intelligence: A Modern Approach - Stuart Russell and Peter Norvig
Java Libraries 
‣ Apache Commons Math: http://commons.apache.org/proper/commons-math/ 
‣ Colt - high performance data structures and algorithms: http://dst.lbl.gov/ 
ACSSoftware/colt/ 
‣ Parallel Colt: https://sites.google.com/site/piotrwendykier/software/parallelcolt 
‣ JBlas - high performance Java API for na!ve libraries LAPACK, BLAS, & ATLAS: 
http://mikiobraun.github.io/jblas/ 
‣ The rest... http://code.google.com/p/java-matrix-benchmark/
Other Resources 
‣ http://www.probabilitycourse.com 
‣ http://masanjin.net/blog/bayesian-average - detailed deriva!on of bayesian 
averaging via normal distribu!ons 
‣ http://fulmicoton.com/posts/bayesian_ra!ng/ - an alterna!ve deriva!on of 
bayesian “averaging” 
‣ http://www.!na-vision.net/docs/memos/1996-002.pdf - a beau!fully simple 
deriva!on of Kalman filters 
‣ http://www.intechopen.com/books/kalman-filter - ar!cles on applica!ons of 
Kalman filters
Thank You
Probably, Definitely, 
Maybe 
James McGivern

Weitere ähnliche Inhalte

Ähnlich wie Probably, Definitely, Maybe

Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
Vibha Nayak
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
pavantech57
 
Statistik Chapter 5
Statistik Chapter 5Statistik Chapter 5
Statistik Chapter 5
WanBK Leo
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
PedhaBabu
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
AliceRivera13
 
Probability Fundas
Probability FundasProbability Fundas
Probability Fundas
pirama2000
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)
WanBK Leo
 

Ähnlich wie Probably, Definitely, Maybe (20)

2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
 
Ppt unit-05-mbf103
Ppt unit-05-mbf103Ppt unit-05-mbf103
Ppt unit-05-mbf103
 
Machine learning session2
Machine learning   session2Machine learning   session2
Machine learning session2
 
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptxMATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
 
Binomial and Poission Probablity distribution
Binomial and Poission Probablity distributionBinomial and Poission Probablity distribution
Binomial and Poission Probablity distribution
 
Sqqs1013 ch5-a122
Sqqs1013 ch5-a122Sqqs1013 ch5-a122
Sqqs1013 ch5-a122
 
Day 3.pptx
Day 3.pptxDay 3.pptx
Day 3.pptx
 
Addition rule and multiplication rule
Addition rule and multiplication rule  Addition rule and multiplication rule
Addition rule and multiplication rule
 
4 1 probability and discrete probability distributions
4 1 probability and discrete    probability distributions4 1 probability and discrete    probability distributions
4 1 probability and discrete probability distributions
 
jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj
 
Statistik Chapter 5
Statistik Chapter 5Statistik Chapter 5
Statistik Chapter 5
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
 
S244 10 Probability.ppt
S244 10 Probability.pptS244 10 Probability.ppt
S244 10 Probability.ppt
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Basic concepts of probability
Basic concepts of probabilityBasic concepts of probability
Basic concepts of probability
 
Probability Fundas
Probability FundasProbability Fundas
Probability Fundas
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Probably, Definitely, Maybe

  • 2. Does intelligent life exist in our galaxy?
  • 3. N = R* · fp · ne · fL · fi · fc · L N = the number of civilisa!ons in our galaxy with which radio-communica!on might be possible R* = the average rate of star forma!on in our galaxy fp = the frac!on of those stars that have planets ne = the average number of planets that can poten!ally support life per star that has planets fl = the frac!on of planets that could support life that actually develop life at some point fi = the frac!on of planets with life that actually go on to develop intelligent life (civilisa!ons) fc = the frac!on of civilisa!ons that develop a technology that releases detectable signs of their existence into space L = the length of !me for which such civilisa!ons release detectable signals into space
  • 5. - Chapter I - Bayesian Probability & Bayesian Statistics
  • 6. The Mean Given a set of variables we define the mean (average)
  • 7. Example: Amazon ‣ Users can rate a product by vo!ng 1-5 stars ‣ product ra!ng is the mean of the user votes Q: how can we rank products with different number of votes?
  • 8. Simple “Bayesian Ranking” ‣ C - the mean vote across all items ‣ v - number of votes for a given item ‣ R - the mean ra!ng of the item ‣ m - number of votes required to be in top n percen!le
  • 9. Book Number of votes (v) Average Ra!ng (R) Bayesian Rank A 100 5 4.333333... B 70 5 4.17 C 50 4 3.5 D 30 4 3.375 E 20 3.5 3.14 F 30 3 3 G 5 2 2.91 C = 3 m = 50
  • 10. A Detour In To Probability Basics
  • 11. Events ‣ Consider an experiment whose set of all possible outcomes Ω, called the sample space, is {x1,... ,xn} ‣ We define an event E as a subset of Ω and say that E occurs iff the experiment outcomes equal E ‣ Union ‣ Intersec!on
  • 12. Probability Axioms: 1 ‣ We denote the probability of an event A by P(A) ‣ For any event A, 0 ≤ P(A) ≤ 1 ‣ The certain event, Ω, always occurs and P(Ω)=1 ‣ The impossible event Ø never occurs and P(Ø)=0
  • 13. Probability Axioms: 2 ‣ We say that events A and B are disjoint if A⋂B = Ø ‣ if A and B are disjoint then P(A⋃B) = P(A) + P(B) ‣ for a set of disjoint events, the addi!on law gives us:
  • 14. Probability Lemmas ‣ For any event E, P(Ec) = P(¬E) = 1 - P(E) ‣ P(A−B) = P(A) − P(A∩B) ‣ If A⊂B then P(A) ≤ P(B) ‣ P(A∪B) = P(A) + P(B) − P(A∩B)
  • 15. Random Variables ‣ Consider a random variable X, then {X≤x} is the event that X has a value less than or equal to the real number x. Hence the probability that this event occurs is P(X≤x) ‣ If we allow x to vary we can define the distribu!on func!on F(x) = P(X≤x) -∞ < x < ∞ ‣ Note that: ‣ P(X>x) = 1 - F(x) ‣ P(a<X<b) = F(b) - F(a)
  • 16. Probability Mass Function The probability mass func!on (PMF) of X is a probability measure of the possible values for the random variable. Of course PMF for a fair 6 sided dice
  • 17. Example: PMF of 2 Fair Die 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1,6 1,5 2,5 2,6 1,4 2,4 3,4 3,5 3,6 1,3 2,3 3,3 4,3 4,4 4,5 4,6 1,2 2,2 3,2 4,2 5,2 5,3 5,4 5,5 5,6 1,1 2,1 3,1 4,1 5,1 6,1 6,2 6,3 6,4 6,5 6,6 2 3 4 5 6 7 8 9 10 11 12
  • 18. Uniform Distribution 0.5 0.375 0.25 0.125 0 0 1 2 3 4 5 6 7 8 9
  • 19. Bernoulli Distribution 0.7 0.525 0.35 0.175 0 0 1
  • 21. Probability Axioms: 3 ‣ Given two events A and B we define the condi!onal probability P(A | B) by: P(A⋂B) = P(A | B) P(B) ‣ Given two events A and B we say that they are independent iff: P(A⋂B) = P(A) P(B)
  • 22. Prior & Posterior Distributions ‣ P(θ) the prior probability distribu!on of θ ‣ P(θ|X) is the posterior probability of θ given X ‣ The posterior probability can be written in the memorable form as: posterior probability * likelihood * prior probability ‣ If the posterior distribu!ons P(θ|X) are in the same family as the prior probability distribu!on P(θ), the prior and posterior are then called conjugate distribu!ons, and the prior is called a conjugate prior
  • 23. Bayes’ Theorem From the defini!on of condi!onal probability we know: Hence Or if B1,...,Bn form a par!!on of the sample space
  • 24. Example: Pregnancy Tests ‣ Pregnancy tests detect the presence of hCG, or human chorionic gonadotropin, in the blood or urine ‣ A “false posi!ve” is when the test incorrectly returns a posi!ve result, and “false nega!ve” when it incorrectly returns a false one. ‣ False posi!ves in the hcg test include: ‣ non-pregnant produc!on of the hCG molecule ‣ use of drugs containing the hCG molecule ‣ Some medica!ons cause a posi!ve reac!on in the tests ‣ The actual probability of being pregnant depends on many messy biological factors
  • 25. Pregnant? Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 26. Pregnant? P(B) P(¬B) P(B⋂A) P(B⋂¬A) P(¬B⋂A) P(¬B⋂¬A) Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 27. Pregnant? P(B) P(¬B) P(A|B) P(¬A|B) P(A|¬B) P(¬A|¬B) P(B) P(¬B) P(B⋂A) P(B⋂¬A) P(¬B⋂A) P(¬B⋂¬A) Pregnant Not Pregnant Test Posi!ve Test Nega!ve Test Posi!ve Test Nega!ve
  • 28. Q:Given the test is posi!ve what is the probability that the subject is pregnant?
  • 29. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant
  • 30. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant P(A) P(¬A) P(A⋂B) P(¬A⋂B) P(A⋂¬B) P(¬A⋂¬B)
  • 31. Test Result Posi!ve Nega!ve Pregnant False Posi!ve False Nega!ve Not Pregnant P(A) P(¬A) P(B|A) P(¬B|A) P(¬B|A) P(¬B|¬A) P(A) P(¬A) P(A⋂B) P(¬A⋂B) P(A⋂¬B) P(¬A⋂¬B)
  • 32. Example: Disease Diagnosis Q: Consider the set S={si : i = 0...N} of all disease symptoms, and Ds+={si : si in S} are the diagnos!cally inclusive symptoms of Ebola, and Ds-={si : si in S} the exclusionary symptoms. Given a pa!ent has some combina!on of symptoms Ps={si : si in S}, what is the probability they have Ebola? ‣ The presence or absence of some symptoms can completely rule out the diagnosis ‣ By upda!ng the model based on real outcomes it is possible to provide more and more accurate predic!ons
  • 33. Expected Values & Moments ‣ Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, and so on. Then the expecta!on of this random variable X is defined as: E[X] = p1x1 + p2x2 + ... + pkxk ‣ The variance of a random variable X is its second central moment, the expected value of the squared devia!on from the mean μ = E[X]: Var(X) = E[(X-μ)2]
  • 34. Variance & Covariance ‣ Variance is a measure of how far a set of numbers differs from the mean of those numbers. The square root of the variance is the standard devia!on σ ‣ CERN uses the 5-sigma rule to rule out sta!s!cal anomalies in sensor readings, i.e. is the value NOT the expected value of noise ‣ The covariance between two jointly distributed random variables X and Y with finite second moments is defined as: σ(X,Y) = E[(X - E[X]) • (Y - E[Y])]
  • 35. Gaussian Distribution P(x) x μ σ 2σ 3σ 4σ 5σ
  • 38. Example: Amazon (Revisited) ‣ Users can rate a product by vo!ng 1-5 stars ‣ product ra!ng is the mean of the user votes Q: how can we rank products with different number of votes?
  • 39. Simple “Bayesian Ranking” Assume the vote posterior distribu!on is a Normal, then the prior is also a Normal*, with mean prior mean prior precision (*) http://en.wikipedia.org/wiki/Conjugate_prior precision of vote distribu!on
  • 40. Example: YouTube ‣ Users can rate a clip by vo!ng +/-1 ‣ clip ra!ng is the mean of the user votes ‣ clips also record the number of views Q: how can we compare clip ra!ngs with different number of votes and views? Q: how can we make results more relevant? e.g. take in to account how old ra!ngs are, author provenance, cost of incorrect ranking promo!on
  • 41. “Bayesian Rank” ‣ Since this is a +/- Bernoulli Trial we can model the prior belief distribu!on by a beta func!on* ‣ Let = upvote bias + number of up votes ‣ Let = downvote bias + number of down votes + 1 ‣ Every !me we receive a new vote we just recalculate the distribu!on
  • 42. ‣ To map between a belief and a sor!ng criterion we make a decision using a loss func!on L ‣ Since the value of L depends on the value of a random variable we use instead the expected value of L ‣ Consider a mul!linear loss func!on: since we want to minimise the loss we have:
  • 43. Extending The Loss Function ‣ Suppose in addi!on to the vote counts we also record the !mestamp of the votes ‣ Items becomes less relevant in the rank the longer it is since the last vote following a pattern of exponen!al decay ‣ Hence the current up or down vote count is now determined by ‣ Thus we derive an updated rank func!on from
  • 44. The House THat Skynet Built ‣ SkyNet SmartHome™ is a system designed to manage the state of various household resources, e.g. hea!ng, ligh!ng, media-centre, etc ‣ it communicates with users smart phones to iden!fy their loca!on ‣ its aims to are to maximise the comfort (e.g. room temperature, hot water) of the users and minimise on waste (e.g. power consump!on) ‣ users can provide feedback to correct inappropriate behaviour, e.g turning the hea!ng up too high
  • 45. Predicting Behaviour Q: Given the user is leaving the office what ac!ons should Skynet smarthome take? ‣ the variables could include: ‣ day of the week ‣ work day or holiday ‣ weather ‣ season ‣ calendar events
  • 46. Bayesian Networks ‣ A BN is a probabilis!c directed acyclic graphical model that represents a set of random variables and their condi!onal dependencies ‣ Ver!ces may be observable quan!!es, latent variables, unknown parameters or hypotheses ‣ Ver!ces that are not connected represent variables that are condi!onally independent of each other
  • 47. Example: Cancer Pollu!on Smoker Cancer X-Ray Dyspnoea
  • 48. Example: Cancer S P(S) TRUE 0.3 FALSE 0.7 C P(D+|C) T 0.65 F 0.3 P P(P) Low 0.9 High 0.1 C P(XRay+|C) T 0.9 F 0.2 P S P(C|PmS) Low T 0.03 Low F 0.001 High T 0.05 High F 0.02
  • 49. Predictive Reasoning Pollu!on Smoker Cancer X-Ray Dyspnoea Direc!on of Reasoning EVIDENCE QUERY QUERY QUERY
  • 50. Diagnostic Reasoning Pollu!on Smoker QUERY QUERY Cancer X-Ray Dyspnoea Direc!on of Reasoning EVIDENCE QUERY
  • 51. Intercausal Reasoning Pollu!on Smoker QUERY EVIDENCE Cancer EVIDENCE X-Ray Dyspnoea
  • 52. Combined Reasoning Pollu!on Smoker Cancer EVIDENCE QUERY X-Ray Dyspnoea EVIDENCE
  • 53. - Chapter II - Markov Models
  • 54. Stochastic Process ‣ A stochas!c process, or random process, is a collec!on of random variables represen!ng the evolu!on of some system over !me ‣ Examples: stock market value and exchange rate fluctua!ons, audio and video signals, EKG & EEG readings ‣ They can be classified as: ‣ Discrete !me & discrete space ‣ Discrete !me & con!nuous space ‣ Con!nuous !me & discrete space ‣ Con!nuous !me & con!nuous space
  • 55. A Detour In To Matrix Algebra
  • 56. Vectors & Matrices 1x3 matrix or vector 3x2 matrix
  • 57. Matrix Addition If A and B are two m by n matrices then addi!on is defined by:
  • 58. Matrix Multiplication If A is n*m matrix and B is an m*p matrix then mul!plica!on is defined by: where
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. The Identity Matrix If A is n*m matrix and then the iden!ty matrix I is an m*n matrix such that A I=A. I is defined by: So the 3*3 iden!ty matrix is:
  • 65. Inverse Matrix If A is n*n matrix and then the inverse of A , A-1, is an n*n matrix such that A A-1=I. Non-square matrices do not have inverses
  • 66. Matrix Transposition If A is n*m matrix and then the transpose of A , AT, is an m*n matrix defined by: For example, consider a 2*2 matrix:
  • 68. ‣ A Markov chain is a directed graph whose ver!ces represent states and the edges the probability of transi!on between the two states ‣ Frequently we use an adjacency matrix representa!on of the graph T called the transi!on matrix ‣ Hence for a chain with N ver!ces the transi!on matrix is NxN ‣ The ini!al state of the system, S0, is also an NxN matrix ‣ The state evolves according to Sn+1 = SnT = STn ‣ The value of Sn+1(i,j) is the probability of being at that state in step n+1
  • 69. Example: British Weather ‣ England is the land of rain. We never have two nice days in a row. In fact if a nice day is always followed by either rain or snow. If there is a change from rain or snow only half the !me is this a change to sunny weather Rain Nice Snow Rain Nice Snow
  • 70. Snow 1/4 1/4 1/4 1/2 1/4 Rain Nice 1/2 1/2 1/2 0 R N S R N S
  • 71. Absorbing Markov Chains ‣ A state si of a Markov chain is called absorbing if it is impossible to leave it (i.e., Ti,i = 1) ‣ A Markov chain is absorbing if: ‣ it has at least one absorbing state ‣ every state is connected to at least one absorbing state ‣ In an absorbing Markov chain, the probability that the process will be absorbed is 1 (i.e., Qn → 0 as n → ∞)
  • 72. Example: The Wandering Drunk ‣ Consider a city divided up in some some grid, e.g. square blocks. ‣ The drunk can move 1 block per turn, each direc!on has equal probability ‣ If the drunk reaches Home or the Bar they will stay there ‣ Ques!ons we can answer: ‣ What is the expect !me un!l an absorbing state is reached? ‣ How many !mes does the drunk visit each intersec!on?
  • 73. Ergodic Markov Chains ‣ Markov chain is called a regular chain if some power of the transi!on matrix has only posi!ve elements ‣ A Markov chain is called an ergodic chain if it is possible to go from every state to every state ‣ Therefore every regular chain is ergodic but not every ergodic chain is regular! ‣ It can be shown that Tn converges to a constant matrix with all posi!ve entries as n→∞
  • 74. Example: Queuing Theory ‣ Queuing theory is, naturally, the mathema!cal formula!on of queues, aimed at making predic!ons of queue length, wai!ng !mes and throughput ‣ Single queueing nodes are usually described using Kendall's nota!on in the form A/S/C ‣ First rigourous model was M/D/1, followed by M/D/k ‣ M stands for Markov and means arrivals occur according to a Poisson process ‣ D stands for determinis!c and means jobs arriving at the queue require a fixed amount of service ‣ k describes the number of servers at the queueing node (k = 1, 2,...).
  • 75. Hidden Markov models ‣ The name is misleading! Nothing is unknown... ‣ In a basic Markov model the states of the system are visible. E.g. we can see if it is snowing ‣ In a hidden Markov Model the (en!re) state is not directly visible but some outputs dependent on the state are observable ‣ However we s!ll need to know all the transi!on probability values!
  • 76. Markov decision processes ‣ Markov decision processes are an extension of Markov chains; the difference is the addi!on of ac!ons (allowing choice) and rewards (giving mo!va!on) ‣ A Markov decision process is a 5-tuple (S, A, PA, RA, L) ‣ The core problem is to choose an ac!on π that will maximise some cumula!ve func!on, e.g. ‣ MDPs can be easily solved by linear (e.g. Simplex method) or dynamic programming (e.g. Map-Reduce)
  • 77. - Chapter III - Kalman Filters
  • 78. A Linear Dynamic System ‣ Con!nuous !me defini!on: ‣ Discrete !me defini!on: ‣ The systems are called linear since given any two solu!ons x(t) & y(t) then any linear combina!on of these solu!ons is also a solu!on
  • 79. Example: Climate Control ‣ SkyNet SmartHome™ is able to monitor the temperature of rooms in the house and effect hea!ng/AC to regulate the temperature ‣ The temperature sensors contain noise ‣ hea!ng and AC are either ON or OFF ‣ similar systems exist for humidity
  • 81. Kalman Filters ‣ Developed ~1960 by Rudolf E. Kálmán, Peter Swerling, and Richard S. Bucy. ‣ First implemented in NASA as part of the Apollo naviga!on computer ‣ S!ll used in many aeronau!c and military applica!ons, e.g. submarines, cruise missiles, NASA Space Shuttle, ISS ‣ There are generalisa!on of the basic Kalman filters for con!nuous !me systems as well as non-linear systems
  • 82. ‣ Kalman Filters work by making a predic!on of the future, get!ng a measurement from reality, comparing the two, modera!ng this difference, and adjus!ng its es!mate with this moderated value. ‣ Kalman filters are: ‣ discrete ‣ recursive ‣ Extremely accurate if you have a good model
  • 83. Overview current es!mate measured value previous es!mate Kalman gain
  • 84. Basic Kalman Filters ‣ Modelled on a Markov chain built on linear operators perturbed by errors that may include Gaussian noise ‣ The state of the system is represented by a vector of real numbers ‣ The filter is recursive, i.e. only the es!mated state from the previous !me step and the current measurement are needed to compute the es!mate for the current state ‣ typically we describe the algorithm in two phases: predict & update
  • 85. 1: Predict: State Estimation Predicted state Previous es!mate of state Control vector State transi!on matrix Control matrix
  • 86. 2: Predict: Error Estimation State transi!on matrix Es!mated process error covariance Covariance predic!on Previous covariance es!mate
  • 87. 3: Update: Innovation Covariance Observa!on Matrix Measurement Vector Covariance Innova!on Predicted state (step 1)
  • 88. 4: Update: Innovation Covariance Observa!on Matrix Covariance predic!on Covariance (step 2) Innova!on Es!mated measurement error covariance
  • 89. 5: Update: Kalman Gain Observa!on Matrix Covariance predic!on (step 2) Kalman Gain Covariance Innova!on (step 3)
  • 90. 6: Update: State Predicted state (step 1) Covariance Innova!on Kalman Gain (step 5) New state es!mate (step 3)
  • 91. 7: Update: Covariance New es!mate of error Kalman Gain (step 5) Observa!on Matrix Covariance predic!on (step 2)
  • 92. Example: Voltmeter ‣ Consider a voltmeter measuring a constant DC voltage via a sensor with noise. ‣ The system can be described by: Vn = Vn-1 + wn ‣ Since the voltage is constant using a Kalman filter allows us to filter out the noise wn ‣ Also since this is a single state example all matrices are of size 1*1
  • 93. ‣ A: State transi!on - since the previous state should equal the current state A=1 ‣ H: Observa!on transform - since we’re taking direct measurements from the sensor H=1 ‣ B: Control matrix - we have no controls so B=0 ‣ Q: Process covariance - since we know the model very accurately Q=0.00001 ‣ R: Measurement covariance - we don’t trust the sensor too much so R=0.1 ‣ X: Ini!al state es!mate = any number ‣ P: Inital covariance es!mate = 1 (because)
  • 94.
  • 95.
  • 96.
  • 97. As a programmer your challenge is to find the right filter model and determine the values of the matrices
  • 98. Example: Robo-copter XCell Tempest Helicopter Freezin Eskimo
  • 99. RL:Helicopter ‣ http://library.rl-community.org/wiki/Helicopter_(Java) ‣ Sensors to determine: ‣ bearing ‣ accelera!on (velocity) ‣ posi!on (GPS) ‣ rota!onal rates ‣ iner!al measurement unit ‣ and more...
  • 101.
  • 102. Events & Random Variables
  • 103. Events & Random Variables Conditional Probability
  • 104. Events & Random Variables Bayesian Networks Conditional Probability
  • 105. Events & Random Variables Bayesian Markov Networks Chains Conditional Probability
  • 106. Events & Random Variables Bayesian Markov Networks Chains Random Conditional Probability Walks
  • 107. Events & Random Variables Bayesian Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 108. Events & Random Variables Bayesian Kalman Filters Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 109. Events & Random Variables Bayesian Kalman Filters Markov Networks Chains Random Markov DecisionW alks Conditional Probability Processes
  • 110. Books ‣ Introduc!on to Probability - Grindstead & Snell http://www.dartmouth.edu/~chance/teaching_aids/books_ar!cles/probability_book/book.html ‣ Bayesian Ar!ficial Intelligence - Kevin B. Korb & Ann E. Nicholson ‣ An Introduc!on to Stochas!c Modelling - Mark A Pinsky & Samuel Karlin ‣ Stochas!c Processes and Filtering Theory - Andrew H. Jazwinski ‣ Ar!ficial Intelligence: A Modern Approach - Stuart Russell and Peter Norvig
  • 111. Java Libraries ‣ Apache Commons Math: http://commons.apache.org/proper/commons-math/ ‣ Colt - high performance data structures and algorithms: http://dst.lbl.gov/ ACSSoftware/colt/ ‣ Parallel Colt: https://sites.google.com/site/piotrwendykier/software/parallelcolt ‣ JBlas - high performance Java API for na!ve libraries LAPACK, BLAS, & ATLAS: http://mikiobraun.github.io/jblas/ ‣ The rest... http://code.google.com/p/java-matrix-benchmark/
  • 112. Other Resources ‣ http://www.probabilitycourse.com ‣ http://masanjin.net/blog/bayesian-average - detailed deriva!on of bayesian averaging via normal distribu!ons ‣ http://fulmicoton.com/posts/bayesian_ra!ng/ - an alterna!ve deriva!on of bayesian “averaging” ‣ http://www.!na-vision.net/docs/memos/1996-002.pdf - a beau!fully simple deriva!on of Kalman filters ‣ http://www.intechopen.com/books/kalman-filter - ar!cles on applica!ons of Kalman filters
  • 114. Probably, Definitely, Maybe James McGivern