SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
FromLogisticRegression
toLinear-ChainCRF
Yow-Bang (Darren) Wang
12/20/2012
● Introduction
● Logistic Regression
● Log-Linear Model
● Linear-Chain CRF
○ Example: Part of Speech (POS) Tagging
● CRF Training and Testing
○ Example: Part of Speech (POS) Tagging
● Example: Speech Disfluency Detection
Outline
Introduction
Introduction
We can approach the theory of CRF from
1. Maximum Entropy
2. Probabilistic Graphical Model
3. Logistic Regression <– today's talk
LinearRegression
● Input x: real-valued features (RV)
● Output y: Gaussian distribution (RV)
● Model parameter
● ML (conditional likelihood) estimation of Ө:
, where {X, Y} are the training data.
LinearRegression
● Input x: real-valued features (RV)
● Output y: Gaussian distribution (RV)
● Represented with a graphical model:
1
x1
xN
y
a0
a1
aN
…...
LogisticRegression
LogisticRegression
● Input x: real-valued features (RV)
● Output y: Bernoulli distribution (RV)
● Model parameter
Q:Whythisform?
A:Bothsideshaverangeofvalue
{-∞,∞}
NoanalyticalsolutionforML
→gradientdescent
LogisticRegression
● Input x: real-valued features (RV)
● Output y: Bernoulli distribution (RV)
● Represented with a graphical model:
1
x1
xN
a0
a1
aN
…...
pSigmoid
LogisticRegression
Advantages of Logistic Regression:
1. Correlated features x don't lead to problems (contrast to
Naive Bayes)
2. Well-calibrated probability (contrast to SVM)
3. Not sensitive to unbalanced training data
numberof”Y=1"
MultinomialLogisticRegression
● Input x: real-valued features (RV), N-dimension
● Output y: Bernoulli distribution (RV), M-class
● Represented with a graphical model:
1
x1
xN
…
p1
pM
…
Softmax
Neuralnetwork
with2layers!!!
pm
:Probabilityof
m-thclass
Log-LinearModel
Log-LinearModel
An interpretation: Log-Linear Model is a Structured Logistic
Regression
● Structured: allow non-numerical input and output by
defining proper feature function
● Special case: Logistic regression
General form:
● Fj
(x,y): j-th feature function
Log-LinearModel
Note:
1. “Feature” vs. “Feature function”
○ Feature: only correspond to input
○ Feature function: correspond to both input and output
2. Must sum over all possible label y' for denominator
-> normalization into [0, 1].
General form:
● Fj
(x,y): j-th feature function
Linear-ChainCRF
hidden
observed
From probabilistic graphical model perspective:
● CRF is a Markov Random Field with some disjoint RVs
observed and some hidden.
x
z
y
q
r
p
ConditionalRandomField(CRF)
From probabilistic graphical model perspective:
● Linear-Chain CRF: a specific structure of CRF
Linear-ChainCRF
hidden
observed
Weoftenreferto"linear-chainCRF"
assimply"CRF"
Linear-ChainCRF
From Log-Linear Model point of view: Linear-Chain CRF is a
Log-Linear Model, of which
1. The length L of output y can be varying.
2. The form of feature function is the sum of ”low-level
feature functions”:
hidden
observed
y:
x:
……
Linear-ChainCRF
From Log-Linear Model point of view: Linear-Chain CRF is a
Log-Linear Model, of which
1. The length L of output y can be varying.
2. The form of feature function is the sum of ”low-level
feature functions”:
“We can have a fixed set of feature-functions Fj
for log-
linear training, even though the training examples are not
fixed-length.” [1]
Input (observed) x: word sequence
Output (hidden) y: POS tag sequence
● For example:
x = "He sat on the mat."
y = "pronoun verb preposition article noun"
pron. v.
He sat on the mat.
prep. art. n.
Example:PartofSpeech(POS)Tagging
Example:PartofSpeech(POS)Tagging
Input (observed) x: word sequence
Output (hidden) y: POS tag sequence
● With CRF we hope
CRF:
, where
Example:PartofSpeech(POS)Tagging
An example of low-level feature function fj
(x,yi
,yi-1
,i):
● "The i-th word in x is capitalized, and POS tag yi
=
proper noun." [TRUE(1) or FALSE(0)]
If wj
positively large: given x and other condition fixed, y
is more probable if fj
(x,yi
,yi-1
,i) is activated.
CRF:
, where
Noteafeaturefunctionmaynotuse
allthegiveninformation
CRFTrainingand
Testing
Training
Stochastic Gradient Ascent
● Partial derivative of conditional log-likelihood:
● Update weight by
Training
Note: if j-th feature function is not activated by this
training example
→ we don't need to update it!
→ usually only a few weights need to be updated in each
iteration
Testing
For 1-best derivation:
N V Adj ...
N
V
Adj
...
For 1-best derivation:
1. Pre-compute g(yi-1
,yi
) as a table for each i
2. Perform dynamic programming to find the best sequence y:
Example:PartofSpeech(POS)Tagging
●
●
……
……
…
●
●
…
For 1-best derivation:
1. Pre-compute g(yi-1
,yi
) as a table for each i
2. Perform dynamic programming to find the best sequence y:
● Complexity: O(M2
LD)
Example:PartofSpeech(POS)Tagging
Buildatable
Foreachelement
insequence
#offeaturefuNctions
Testing
For probability estimation:
● must also compute all possible y (e.g. all possible POS
sequences) for denominator......
Canbecalculatedbymatrix
multiplication!!!
Example:Speech
Disfluency
Detection
Example:SpeechDisfluencyDetection
One of the application of CRF in speech recognition:
Boundary/Disfluency Detection [5]
● Repetition : “It is is Tuesday.”
● Hesitation : “It is uh… Tuesday.”
● Correction: “It is Monday, I mean, Tuesday.”
● etc.
Possible clues: prosody
● Pitch
● Duration
● Energy
● Pause
● etc.
“Itisuh…Tuesday.”
● Pitchreset?
● Longduration?
● Lowenergy?
● Pauseexistence?
One of the application of CRF in speech recognition:
Boundary/Disfluency Detection [5]
● CRF Input x: prosodic features
● CRF Output y:
Speech
Recognition
Rescoring
Example:SpeechDisfluencyDetection
Reference
[1] Charles Elkan, “Log-linear Models and Conditional Random
Fields”
○ Tutorial at CIKM08 (ACM International Conference on Information and
Knowledge Management)
○ Video: http://videolectures.net/cikm08_elkan_llmacrf/
○ Lecture notes: http://cseweb.ucsd.edu/~elkan/250B/cikmtutorial.pdf
[2] Hanna M. Wallach, “Conditional Random Fields: An
Introduction”
[3] Jeremy Morris, “Conditional Random Fields: An Overview”
○ Presented at OSU Clippers 2008, January 11, 2008
Reference
[4] C. Sutton, K. Rohanimanesh, A. McCallum, “Conditional
random fields: Probabilistic models for segmenting and
labeling sequence data”, 2001.
[5] Liu, Y. and Shriberg, E. and Stolcke, A. and Hillard, D.
and Ostendorf, M. and Harper, M., “Enriching speech
recognition with automatic detection of sentence boundaries
and disfluencies”, in IEEE Transactions on Audio, Speech,
and Language Processing, 2006.

Weitere ähnliche Inhalte

Was ist angesagt?

[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 

Was ist angesagt? (20)

Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
 
Computer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC AlgorithmComputer Vision: Feature matching with RANSAC Algorithm
Computer Vision: Feature matching with RANSAC Algorithm
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock Prediction
 
Cnn
CnnCnn
Cnn
 
Attention is all you need
Attention is all you needAttention is all you need
Attention is all you need
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Cryptocurrency Price Predictions
Cryptocurrency Price PredictionsCryptocurrency Price Predictions
Cryptocurrency Price Predictions
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
NLP CHEAT SHEET.pdf
NLP CHEAT SHEET.pdfNLP CHEAT SHEET.pdf
NLP CHEAT SHEET.pdf
 
A Simple Explanation of XLNet
A Simple Explanation of XLNetA Simple Explanation of XLNet
A Simple Explanation of XLNet
 

Ähnlich wie From logistic regression to linear chain CRF

Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Spark Summit
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
butest
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Thomas Lockney
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 

Ähnlich wie From logistic regression to linear chain CRF (20)

Scala qq
Scala qqScala qq
Scala qq
 
Ridge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power AnalysisRidge-based Profiled Differential Power Analysis
Ridge-based Profiled Differential Power Analysis
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Scheme 核心概念(一)
Scheme 核心概念(一)Scheme 核心概念(一)
Scheme 核心概念(一)
 
Slides
SlidesSlides
Slides
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
ReactiveX
ReactiveXReactiveX
ReactiveX
 
20 mins of Liblinear
20 mins of Liblinear20 mins of Liblinear
20 mins of Liblinear
 
Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis
 
Halide - 2
Halide - 2 Halide - 2
Halide - 2
 
DEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliksDEF CON 23 - Atlas - fun with symboliks
DEF CON 23 - Atlas - fun with symboliks
 
Reactive cocoa 101
Reactive cocoa 101Reactive cocoa 101
Reactive cocoa 101
 
COMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.pptCOMPILER_DESIGN_CLASS 2.ppt
COMPILER_DESIGN_CLASS 2.ppt
 
COMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptxCOMPILER_DESIGN_CLASS 1.pptx
COMPILER_DESIGN_CLASS 1.pptx
 
Reactive programming using rx java & akka actors - pdx-scala - june 2014
Reactive programming   using rx java & akka actors - pdx-scala - june 2014Reactive programming   using rx java & akka actors - pdx-scala - june 2014
Reactive programming using rx java & akka actors - pdx-scala - june 2014
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

From logistic regression to linear chain CRF

  • 2. ● Introduction ● Logistic Regression ● Log-Linear Model ● Linear-Chain CRF ○ Example: Part of Speech (POS) Tagging ● CRF Training and Testing ○ Example: Part of Speech (POS) Tagging ● Example: Speech Disfluency Detection Outline
  • 4. Introduction We can approach the theory of CRF from 1. Maximum Entropy 2. Probabilistic Graphical Model 3. Logistic Regression <– today's talk
  • 5. LinearRegression ● Input x: real-valued features (RV) ● Output y: Gaussian distribution (RV) ● Model parameter ● ML (conditional likelihood) estimation of Ө: , where {X, Y} are the training data.
  • 6. LinearRegression ● Input x: real-valued features (RV) ● Output y: Gaussian distribution (RV) ● Represented with a graphical model: 1 x1 xN y a0 a1 aN …...
  • 8. LogisticRegression ● Input x: real-valued features (RV) ● Output y: Bernoulli distribution (RV) ● Model parameter Q:Whythisform? A:Bothsideshaverangeofvalue {-∞,∞} NoanalyticalsolutionforML →gradientdescent
  • 9. LogisticRegression ● Input x: real-valued features (RV) ● Output y: Bernoulli distribution (RV) ● Represented with a graphical model: 1 x1 xN a0 a1 aN …... pSigmoid
  • 10. LogisticRegression Advantages of Logistic Regression: 1. Correlated features x don't lead to problems (contrast to Naive Bayes) 2. Well-calibrated probability (contrast to SVM) 3. Not sensitive to unbalanced training data numberof”Y=1"
  • 11. MultinomialLogisticRegression ● Input x: real-valued features (RV), N-dimension ● Output y: Bernoulli distribution (RV), M-class ● Represented with a graphical model: 1 x1 xN … p1 pM … Softmax Neuralnetwork with2layers!!! pm :Probabilityof m-thclass
  • 13. Log-LinearModel An interpretation: Log-Linear Model is a Structured Logistic Regression ● Structured: allow non-numerical input and output by defining proper feature function ● Special case: Logistic regression General form: ● Fj (x,y): j-th feature function
  • 14. Log-LinearModel Note: 1. “Feature” vs. “Feature function” ○ Feature: only correspond to input ○ Feature function: correspond to both input and output 2. Must sum over all possible label y' for denominator -> normalization into [0, 1]. General form: ● Fj (x,y): j-th feature function
  • 16. hidden observed From probabilistic graphical model perspective: ● CRF is a Markov Random Field with some disjoint RVs observed and some hidden. x z y q r p ConditionalRandomField(CRF)
  • 17. From probabilistic graphical model perspective: ● Linear-Chain CRF: a specific structure of CRF Linear-ChainCRF hidden observed Weoftenreferto"linear-chainCRF" assimply"CRF"
  • 18. Linear-ChainCRF From Log-Linear Model point of view: Linear-Chain CRF is a Log-Linear Model, of which 1. The length L of output y can be varying. 2. The form of feature function is the sum of ”low-level feature functions”: hidden observed y: x: ……
  • 19. Linear-ChainCRF From Log-Linear Model point of view: Linear-Chain CRF is a Log-Linear Model, of which 1. The length L of output y can be varying. 2. The form of feature function is the sum of ”low-level feature functions”: “We can have a fixed set of feature-functions Fj for log- linear training, even though the training examples are not fixed-length.” [1]
  • 20. Input (observed) x: word sequence Output (hidden) y: POS tag sequence ● For example: x = "He sat on the mat." y = "pronoun verb preposition article noun" pron. v. He sat on the mat. prep. art. n. Example:PartofSpeech(POS)Tagging
  • 21. Example:PartofSpeech(POS)Tagging Input (observed) x: word sequence Output (hidden) y: POS tag sequence ● With CRF we hope CRF: , where
  • 22. Example:PartofSpeech(POS)Tagging An example of low-level feature function fj (x,yi ,yi-1 ,i): ● "The i-th word in x is capitalized, and POS tag yi = proper noun." [TRUE(1) or FALSE(0)] If wj positively large: given x and other condition fixed, y is more probable if fj (x,yi ,yi-1 ,i) is activated. CRF: , where Noteafeaturefunctionmaynotuse allthegiveninformation
  • 24. Training Stochastic Gradient Ascent ● Partial derivative of conditional log-likelihood: ● Update weight by
  • 25. Training Note: if j-th feature function is not activated by this training example → we don't need to update it! → usually only a few weights need to be updated in each iteration
  • 27. N V Adj ... N V Adj ... For 1-best derivation: 1. Pre-compute g(yi-1 ,yi ) as a table for each i 2. Perform dynamic programming to find the best sequence y: Example:PartofSpeech(POS)Tagging ● ● …… …… … ● ● …
  • 28. For 1-best derivation: 1. Pre-compute g(yi-1 ,yi ) as a table for each i 2. Perform dynamic programming to find the best sequence y: ● Complexity: O(M2 LD) Example:PartofSpeech(POS)Tagging Buildatable Foreachelement insequence #offeaturefuNctions
  • 29. Testing For probability estimation: ● must also compute all possible y (e.g. all possible POS sequences) for denominator...... Canbecalculatedbymatrix multiplication!!!
  • 31. Example:SpeechDisfluencyDetection One of the application of CRF in speech recognition: Boundary/Disfluency Detection [5] ● Repetition : “It is is Tuesday.” ● Hesitation : “It is uh… Tuesday.” ● Correction: “It is Monday, I mean, Tuesday.” ● etc. Possible clues: prosody ● Pitch ● Duration ● Energy ● Pause ● etc. “Itisuh…Tuesday.” ● Pitchreset? ● Longduration? ● Lowenergy? ● Pauseexistence?
  • 32. One of the application of CRF in speech recognition: Boundary/Disfluency Detection [5] ● CRF Input x: prosodic features ● CRF Output y: Speech Recognition Rescoring Example:SpeechDisfluencyDetection
  • 33. Reference [1] Charles Elkan, “Log-linear Models and Conditional Random Fields” ○ Tutorial at CIKM08 (ACM International Conference on Information and Knowledge Management) ○ Video: http://videolectures.net/cikm08_elkan_llmacrf/ ○ Lecture notes: http://cseweb.ucsd.edu/~elkan/250B/cikmtutorial.pdf [2] Hanna M. Wallach, “Conditional Random Fields: An Introduction” [3] Jeremy Morris, “Conditional Random Fields: An Overview” ○ Presented at OSU Clippers 2008, January 11, 2008
  • 34. Reference [4] C. Sutton, K. Rohanimanesh, A. McCallum, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”, 2001. [5] Liu, Y. and Shriberg, E. and Stolcke, A. and Hillard, D. and Ostendorf, M. and Harper, M., “Enriching speech recognition with automatic detection of sentence boundaries and disfluencies”, in IEEE Transactions on Audio, Speech, and Language Processing, 2006.