Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Hidden Markov Models
1. Markov chain and Hidden
Markov Models
Nasir and Rajab
Dr. Pan
Spring 2014
2. MARKOV CHAINS:
One assumption that the stochastic process lead to Markov chain, which has the following key property:
A stochastic process {Xt} is said to have the Markovian property if The state of the system at time t+1
depends only on the state of the system at time t
]xX|xP[X
]xX,xX,...,xX,xX|xP[X
tt11t
00111-t1-ttt11t
t
t
4. _ Irreducible Markov chain :
A Markov Chain is irreducible if the corresponding graph is strongly connected.
- Recurrent and transient states
A and B are transient states, C and D are recurrent states.
Once the process moves from B to D, it will never come back.
E
5. The period of a state
A Markov Chain is periodic if all the states in it have a period k >1.
It is aperiodic otherwise.
Ergodic
A Markov chain is ergodic if :
1.the corresponding graph is strongly connected.
2.It is not peridoic
E
6. Markov Chain Example
• Based on the weather today what will it be tomorrow?
• Assuming only four possible weather states
Sunny
Cloudy
Rainy
Snowing
7. Markov Chain Structure
• Each state is an observable event
• At each time interval the state changes to another or same state (qt {S1, S2, S3, S4})
State S1 State S2
State S3 State S4
(Sunny)
(Snowing)(Rainy)
(Cloudy)
13. The Probability of a Sequence for a Given
Markov Chain Model
end
A
TC
G
begin
Pr(cggt) Pr(c |begin)Pr(g |c)Pr(g | g)Pr(t | g)Pr(end | t)
14. Markov Chain Notation
The transition parameters can be denoted by where
• Similarly we can denote the probability of a sequence x as
where represents the transition from the begin state
• This gives a probability distribution over sequences of
length M
)|Pr()Pr( 1
2
1
2
11 i
M
i
i
M
i
xxx xxxaa iiB
axi 1xi
Pr(Xi xi | Xi 1 xi 1)
ii xxa 1
1xaB
15. Estimating the Model Parameters
Given some data (e.g. a set of sequences from CpG islands), how can we
determine the probability parameters of our model?
* One approach maximum likelihood estimation
* A Bayesian Approach
The "p" in CpG indicates that the C and the G are next to each other in
sequence, regardless of being single- or double- stranded. In a CpG site, both C
and G are found on the same strand of DNA or RNA and are connected by a
phosphodiester bond. This is a covalent bond between atoms.
16. Maximum Likelihood Estimation
• Let’s use a very simple sequence model
Every position is independent of the others
Eevery position generated from the same multinomial distribution
We want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t)
and we’re given the sequences
accgcgctta
gcttagtgac
tagccgttac
then the maximum likelihood estimates are the observed
frequencies of the bases
267.0
30
8
)Pr(
233.0
30
7
)Pr(
t
g
3.0
30
9
)Pr(
2.0
30
6
)Pr(
c
a
Pr(a)
na
ni
i
17. Maximum Likelihood Estimation
• Suppose instead we saw the following sequences
gccgcgcttg
gcttggtggc
tggccgttgc
• Then the maximum likelihood estimates are
267.0
30
8
)Pr(
433.0
30
13
)Pr(
t
g
3.0
30
9
)Pr(
0
30
0
)Pr(
c
a
18. A Bayesian Approach
• A more general form: m-estimates
mn
mpn
a
i
i
aa
)Pr(
• with m=8 and uniform priors
gccgcgcttg
gcttggtggc
tggccgttgc
number of “virtual” instances
prior probability of a
38
11
830
825.09
)Pr(c
19. Estimation for 1st Order Probabilities
To estimate a 1st order parameter, such as Pr(c|g), we count
the number of times that c follows the history g in our given
sequences
using Laplace estimates with the sequences
gccgcgcttg
gcttggtggc
tggccgttgc
412
12
)|Pr(
412
13
)|Pr(
412
17
)|Pr(
412
10
)|Pr(
gt
gg
gc
ga
47
10
)|Pr( ca
20. Example Application
Markov Chains for Discrimination
• Suppose we want to distinguish CpG islands from other sequence
regions
• given sequences from CpG islands, and sequences from other
regions, we can construct
• A model to represent CpG islands.
• A null model to represent the other regions.
21. Markov Chains for Discrimination
+ a c g t
a .18 .27 .43 .12
c .17 .37 .27 .19
g .16 .34 .38 .12
t .08 .36 .38 .18
- a c g t
a .30 .21 .28 .21
c .32 .30 .08 .30
g .25 .24 .30 .21
t .18 .24 .29 .29
• Parameters estimated for CpG and null models
human sequences containing 48 CpG islands 60,000
nucleotides
Pr( | )c a
CpG null
23. Hidden Markov Models (HMM)
“Doubly stochastic process with an underlying process that is not
observable (It’s Hidden) but can only be observed through another
set of stochastic process that produce the sequence observed
symbol”
Rabiner& Juang 86’
24. • In Markov chain, the state is directly visible to the observer, and
therefore the state transition probabilities are the only parameters. In a
hidden Markov model, the state is not directly visible, but
output, dependent on the state, is visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence of
tokens generated by an HMM gives some information about the
sequence of states.
Difference between Markov chains and HMM
25. HMM Example
• Suppose we want to determine the average annual temperature at a
particular location on earth over a series of years.
• we consider two annual temperatures, (hot" and cold“).
State H C
H 0.7 0.3
C 0.4 0.6
Time t +1
Time t
The state is the average annual temperature.
The transition from one state to the next is a Markov process
26. Since we can't observe the state (temperature) in the past, we
can observe the size of tree rings.
27. Now suppose that current research indicates a correlation between the
size of tree growth rings and temperature. we only consider three
different tree ring sizes, small, medium and large, respectively.
S M L
H 0.1 0.4 0.5
C 0.7 0.2 0.1
Observation
State
The probabilistic relationship between annual temperature and tree ring sizes is
28.
29.
30. State probability Normalized
HHHH 0.000412 0.042787
HHHC 0.000035 0.003635
HHCH 0.000706 0.073320
HHCC 0.000212 0.022017
HCHH 0.000050 0.005193
HCHC 0.000004 0.000415
HCCH 0.000302 0.031364
HCCC 0.000091 0.009451
CHHH 0.001098 0.114031
CHHC 0.000094 0.009762
CHCH 0.001882 0.195451
CHCC 0.000564 0.058573
CCHH 0.000470 0.048811
CCHC 0.000040 0.004154
CCCH 0.002822 0.293073
CCCC 0.000847 0.087963
Table 1: State sequence probabilities
To find the optimal state sequence in the dynamic
programming (DP), we simply choose the
sequence with the highest
probability, namely, CCCH.
31. 0 1 2 3
P(H) 0.188182 0.519576 0.228788 0.804029
P(C) 0.811818 0.480424 0.771212 0.195971
Table 2: HMM probabilities
From Table 2 we found that the optimal sequence in the HMM sense is CHCH.
and, the optimal DP sequence differs from the optimal HMM sequence and all state
transitions are valid.
Note that: The DP solution and the HMM solution are not necessarily the same. For
example, the DP solution must have valid state transitions, while this is not
necessarily the case for the HMMs.