SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Speaker: Yun-Nung Chen 陳縕儂
Advisor: Prof. Lin-Shan Lee 李琳山
National Taiwan University
Automatic Key Term Extraction and Summarization
from Spoken Course Lectures
課程錄音之自動關鍵用語擷取及摘要
Introduction
2
Master Defense, National Taiwan University
Target: extract key terms and summaries from course lectures
Key Term Summary
O Indexing and retrieval
O The relations between key
terms and segments of
documents
3
Introduction
Master Defense, National Taiwan University
O Efficiently understand the
document
Related to document understanding and semantics from the document
Both are “Information Extraction”
4
Master Defense, National Taiwan University
Definition
O Key Term
O Higher term frequency
O Core content
O Two types
O Keyword
O Ex. “語音”
O Key phrase
O Ex. “語言 模型”
5
Master Defense, National Taiwan University
Automatic Key Term Extraction
6
▼ Original spoken documents
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
Automatic Key Term Extraction
7
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
Automatic Key Term Extraction
8
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans
Master Defense, National Taiwan University
Phrase
Identification
Automatic Key Term Extraction
9
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
First using branching entropy to identify phrases
ASR trans
Master Defense, National Taiwan University
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
10
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Then using learning methods to extract key terms by some features
ASR trans
Master Defense, National Taiwan University
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
11
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
ASR trans
Master Defense, National Taiwan University
Branching Entropy
12
O Inside the phrase
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
13
O Inside the phrase
O Inside the phrase
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
14
hidden Markov model
boundary
Define branching entropy to decide possible boundary
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
O Inside the phrase
O Inside the phrase
O Boundary of the phrase
Master Defense, National Taiwan University
Branching Entropy
15
hidden Markov model
O Definition of Right Branching Entropy
O Probability of xi given X
O Right branching entropy for X
X xi
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
16
hidden Markov model
O Decision of Right Boundary
O Find the right boundary located between X and xi where
X
boundary
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
17
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
18
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
Branching Entropy
19
hidden Markov model
How to decide the boundary of a phrase?
represent
is
can
:
:
is
of
in
:
:
Master Defense, National Taiwan University
boundary
Using PAT tree to implement
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
20
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
Key terms
entropy
acoustic
model
:
Extract prosodic, lexical, and semantic features for each candidate term
ASR trans
Master Defense, National Taiwan University
Feature Extraction
21
OProsodic features
O For each candidate term appearing at the first time
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Speaker tends to use longer duration to emphasize key terms
using 4 values for
duration of the term
duration of phone “a” normalized by
avg duration of phone “a”
Master Defense, National Taiwan University
Feature Extraction
22
OProsodic features
O For each candidate term appearing at the first time
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Master Defense, National Taiwan University
Feature Extraction
23
OProsodic features
O For each candidate term appearing at the first time
Higher pitch may represent significant information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Master Defense, National Taiwan University
Feature Extraction
24
OProsodic features
O For each candidate term appearing at the first time
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Master Defense, National Taiwan University
Feature Extraction
25
OProsodic features
O For each candidate term appearing at the first time
Higher energy emphasizes important information
Feature
Name
Feature Description
Duration
(I – IV)
normalized duration
(max, min, mean, range)
Pitch
(I - IV)
F0
(max, min, mean, range)
Energy
(I - IV)
energy
(max, min, mean, range)
Master Defense, National Taiwan University
Feature Extraction
26
OLexical features
Feature Name Feature Description
TF term frequency
IDF inverse document frequency
TFIDF tf * idf
PoS the PoS tag
Using some well-known lexical features for each candidate term
Master Defense, National Taiwan University
Feature Extraction
27
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Probability
Key terms tend to focus on limited topics
t1
t2
tj
tn
D1
D2
Di
DN
TK
Tk
T2
T1
P(T |D )k i
P(t |T )j k
Di: documents Tk: latent topics tj: terms
Master Defense, National Taiwan University
Feature Extraction
28
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Probability
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
describe a probability distribution
Master Defense, National Taiwan University
Feature Extraction
29
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Master Defense, National Taiwan University
Feature Extraction
30
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Significance
Within-topic to out-of-topic ratio
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
within-topic freq.
out-of-topic freq.
Master Defense, National Taiwan University
Feature Extraction
31
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
non-key term
key term
Key terms tend to focus on limited topics
Master Defense, National Taiwan University
Feature Extraction
32
OSemantic features
O Probabilistic Latent Semantic Analysis (PLSA)
O Latent Topic Entropy
Feature Name Feature Description
LTP (I - III) Latent Topic Probability (mean, variance, standard deviation)
LTS (I - III) Latent Topic Significance (mean, variance, standard deviation)
LTE term entropy for latent topic
non-key term
key term
Key terms tend to focus on limited topics
Higher LTE
Lower LTE
Master Defense, National Taiwan University
Phrase
Identification
Key Term Extraction
Automatic Key Term Extraction
33
Archive of
spoken
documents
Branching
Entropy
Feature
Extraction
Learning Methods
1) AdaBoost
2) Neural Network
ASR
speech signal
ASR trans
Key terms
entropy
acoustic
model
:
Using supervised approaches to extract key terms
Master Defense, National Taiwan University
Learning Methods
34
OAdaptive Boosting (AdaBoost)
ONeural Network
Automatically adjust the weights of features to train a classifier
Master Defense, National Taiwan University
Automatic Key Term Extraction
35
Master Defense, National Taiwan University
Experiments
36
OCorpus
O NTU lecture corpus
O Mandarin Chinese embedded by English words
O Single speaker
O 45.2 hours
OASR System
O Bilingual AM with model adaptation [1]
O LM with adaptation using random forests [2]
Master Defense, National Taiwan University
Language Mandarin English Overall
Char Acc (%) 78.15 53.44 76.26
[1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011.
[2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and
Random Forests,” Master Thesis, 2011.
Experiments
37
OReference Key Terms
O Annotations from 61 students who have taken the course
O If the an annotator labeled 150 key terms, he gave each
of them a score of 1/150 , but 0 to others
O Rank the terms by the sum of all scores given by all
annotators for each term
O Choose the top N terms form the list
O N is average number of key terms
O N = 154 key terms
O 59 key phrases and 95 keywords
OEvaluation
O 3-fold cross validation
Master Defense, National Taiwan University
0
10
20
30
40
50
60
Pr Lx Sm Pr+Lx Pr+Lx+Sm
Experiments
38
OFeature Effectiveness
O Neural network for keywords from ASR transcriptions
Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful
20.78
42.86
35.63
48.15
56.55
Pr: Prosodic
Lx: Lexical
Sm: Semantic
F-measure
Master Defense, National Taiwan University
0
10
20
30
40
50
60
70
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural
Network
ASR
Manual
Experiments
39
OOverall Performance (Keywords & Key Phrases)
Baseline
Branching entropy performs well
F-measure
Master Defense, National Taiwan University
N-Gram
TFIDF
Branching Entropy
TFIDF
Branching Entropy
AdaBoost
Branching Entropy
Neural Network
key phrase
keyword
23.44
52.60
57.68
62.70
The performance of manual is slightly better than ASR
0
10
20
30
40
50
60
70
N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural
Network
ASR
Manual
Experiments
40
OOverall Performance (Keywords & Key Phrases)
55.84
Baseline
62.39
67.31
32.19
Supervised learning using neural network gives the best results
F-measure
Master Defense, National Taiwan University
N-Gram
TFIDF
Branching Entropy
TFIDF
Branching Entropy
AdaBoost
Branching Entropy
Neural Network
key phrase
keyword
23.44
52.60
57.68
62.70
41
Master Defense, National Taiwan University
Introduction
42
OExtractive Summary
O Important sentences in the document
OComputing Importance of Sentences
O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram
Score, Grammatical Structure Score
ORanking Sentences by Importance and Deciding
Ratio of Summary
Master Defense, National Taiwan University
Proposed a better statistical measure of a term
Statistical Measure of a Term
43
OLTE-Based Statistical Measure (Baseline)
OKey-Term-Based Statistical Measure
O Considering only key terms
O Weighted by LTS of the term
Master Defense, National Taiwan University
Tk-1 Tk Tk+1… …
Key terms can represent core content of the document
Latent topic probability can be estimated more accurately
ti ϵ key
ti ϵ key
Importance of the Sentence
44
OOriginal Importance
O LTE-based statistical measure
O Key-term-based statistical measure
ONew Importance
O Considering original importance and similarity of other
sentences
Master Defense, National Taiwan University
Sentences similar to more sentences should get higher importance
Random Walk on a Graph
45
O Idea
O Sentences similar to more important
sentences should be more important
O Graph Construction
O Node: sentence in the document
O Edge: weighted by similarity between nodes
O Node Score
O Interpolating two scores
O Normalized original score of sentence Si
O Scores propagated from neighbors according to edge weight p(j, i)
Master Defense, National Taiwan University
Nodes connecting to more nodes with higher scores should get higher scores
score of Si in k-th iter.
Random Walk on a Graph
46
OTopical Similarity between Sentences
O Edge weight sim(Si, Sj): (sentence i  sentence j)
O Latent topic probability of the sentence
O Using Latent Topic Significance
Master Defense, National Taiwan University
Sj t
LTS
Si
… …Tk Tk+1
tj
Tk-1
ti tk
Random Walk on a Graph
47
OScores of Sentences
O Converged equation
O Matrix form
O Solution
dominate eigen vector of P’
OIntegrated with Original Importance
Master Defense, National Taiwan University
Automatic Summarization
48
Master Defense, National Taiwan University
Experiments
49
OSame Corpus and ASR System
O NTU lecture corpus
OReference Summaries
O Two human produced reference summaries for each document
O Ranking sentences from “the most important” to “of average importance”
OEvaluation Metric
O ROUGE-1, ROUGE-2, ROUGE-3
O ROUGE-L: Longest Common Subsequence (LCS)
Master Defense, National Taiwan University
Evaluation
50
Master Defense, National Taiwan University
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ASR
ROUGE-1
LTE Key
ROUGE-2 ROUGE-3 ROUGE-L
Key-term-based statistical measure is helpful
Evaluation
51
Master Defense, National Taiwan University
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Random walk can help the LTE-based statistical measure
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
Random walk can also help the key-term-based statistical measure
LTE LTE + RW
Key Key + RW
ASR
ASR
Topical similarity can compensate recognition errors
Evaluation
52
Master Defense, National Taiwan University
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L
ASR
LTE LTE + RW Key Key + RW
41
46
51
56
10% 20% 30%
18
23
28
10% 20% 30%
9
14
19
10% 20% 30%
40
45
50
10% 20% 30%
Manual
Key-term-based statistical measure and random walk using topical similarity
are useful for summarization
53
Master Defense, National Taiwan University
Automatic Key Term Extraction Automatic Summarization
• The performance can be improved
by
▫ Key-term-based statistical
measure
▫ Random walk with topical
similarity
 Compensating recognition errors
 Giving higher scores to sentences
topically similar to more
important sentences
 Considering all sentences in the
document
54
• The performance can be improved
by
▫ Identifying phrases by branching
entropy
▫ Prosodic, lexical, and semantic
features together
Conclusions
Master Defense, National Taiwan University
Published Papers:
[1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from
Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of
SLT, 2010.
[2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by
Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of
InterSpeech, 2011.
55
Master Defense, National Taiwan University

Weitere ähnliche Inhalte

Was ist angesagt?

Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...inscit2006
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 
Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages Robert Munro
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Facultad de Informática UCM
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
download
downloaddownload
downloadbutest
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...Editor IJARCET
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics OntologyHammad Afzal
 

Was ist angesagt? (20)

Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
Packing and Unpacking the Bag of Words: Introducing a Toolkit for Inductive A...
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...Taking into account communities of practice’s specific vocabularies in inform...
Taking into account communities of practice’s specific vocabularies in inform...
 
co:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahnco:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahn
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Asr
AsrAsr
Asr
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 
Processing short-message communications in low-resource languages
Processing short-message communications in low-resource languages�Processing short-message communications in low-resource languages�
Processing short-message communications in low-resource languages
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
download
downloaddownload
download
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2Requirements Engineering: focus on Natural Language Processing, Lecture 2
Requirements Engineering: focus on Natural Language Processing, Lecture 2
 
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
 
ppt
pptppt
ppt
 
Generating Lexical Information for Terminology in a Bioinformatics Ontology
Generating Lexical Information for Terminologyin a Bioinformatics OntologyGenerating Lexical Information for Terminologyin a Bioinformatics Ontology
Generating Lexical Information for Terminology in a Bioinformatics Ontology
 

Andere mochten auch

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...Yun-Nung (Vivian) Chen
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Yun-Nung (Vivian) Chen
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Yun-Nung (Vivian) Chen
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Yun-Nung (Vivian) Chen
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsYun-Nung (Vivian) Chen
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerYun-Nung (Vivian) Chen
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Yun-Nung (Vivian) Chen
 
An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingYun-Nung (Vivian) Chen
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUYun-Nung (Vivian) Chen
 
significance of theory for nursing as a profession
significance of theory for nursing as a professionsignificance of theory for nursing as a profession
significance of theory for nursing as a professionJohn Christian Villanueva
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...Yun-Nung (Vivian) Chen
 
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...Muhamad Hesham
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoBig Data Spain
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Intelligent Agent Perception
Intelligent Agent PerceptionIntelligent Agent Perception
Intelligent Agent PerceptionMolly Maymar
 
Los MOOC ventajas y desventajas
Los MOOC ventajas y desventajasLos MOOC ventajas y desventajas
Los MOOC ventajas y desventajasluz deluna
 

Andere mochten auch (20)

End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
Detecting Actionable Items in Meetings by Convolutional Deep Structured Seman...
 
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken...
 
Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)Language Empowering Intelligent Assistants (CHT)
Language Empowering Intelligent Assistants (CHT)
 
Statistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent AssistantsStatistical Learning from Dialogues for Intelligent Assistants
Statistical Learning from Dialogues for Intelligent Assistants
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken...
 
An Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task UnderstandingAn Intelligent Assistant for High-Level Task Understanding
An Intelligent Assistant for High-Level Task Understanding
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
 
significance of theory for nursing as a profession
significance of theory for nursing as a professionsignificance of theory for nursing as a profession
significance of theory for nursing as a profession
 
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli..."Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
"Sorry, I didn't get that!" - Statistical Learning from Dialogues for Intelli...
 
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro BarberoFrom data to numbers to knowledge: semantic embeddings By Alvaro Barbero
From data to numbers to knowledge: semantic embeddings By Alvaro Barbero
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Intelligent Agent Perception
Intelligent Agent PerceptionIntelligent Agent Perception
Intelligent Agent Perception
 
Los MOOC ventajas y desventajas
Los MOOC ventajas y desventajasLos MOOC ventajas y desventajas
Los MOOC ventajas y desventajas
 

Ähnlich wie Automatic Key Term Extraction and Summarization from Spoken Course Lectures

Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
李宏毅/當語音處理遇上深度學習
李宏毅/當語音處理遇上深度學習李宏毅/當語音處理遇上深度學習
李宏毅/當語音處理遇上深度學習台灣資料科學年會
 
Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-feiTianlu Wang
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyondlinshanleearchive
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOLgreenwop
 
SPARQL and Linked Data
SPARQL and Linked DataSPARQL and Linked Data
SPARQL and Linked DataFulvio Corno
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informaticsbutest
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 

Ähnlich wie Automatic Key Term Extraction and Summarization from Spoken Course Lectures (20)

Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
download
downloaddownload
download
 
download
downloaddownload
download
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
李宏毅/當語音處理遇上深度學習
李宏毅/當語音處理遇上深度學習李宏毅/當語音處理遇上深度學習
李宏毅/當語音處理遇上深度學習
 
Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-fei
 
Lecture20 xing
Lecture20 xingLecture20 xing
Lecture20 xing
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
All good things
All good thingsAll good things
All good things
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyond
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOL
 
SPARQL and Linked Data
SPARQL and Linked DataSPARQL and Linked Data
SPARQL and Linked Data
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 

Mehr von Yun-Nung (Vivian) Chen

End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsEnd-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsYun-Nung (Vivian) Chen
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesYun-Nung (Vivian) Chen
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Yun-Nung (Vivian) Chen
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人Yun-Nung (Vivian) Chen
 

Mehr von Yun-Nung (Vivian) Chen (6)

End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsEnd-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue Systems
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in Dialogues
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
 
Chatbot的智慧與靈魂
Chatbot的智慧與靈魂Chatbot的智慧與靈魂
Chatbot的智慧與靈魂
 
Deep Learning for Dialogue Systems
Deep Learning for Dialogue SystemsDeep Learning for Dialogue Systems
Deep Learning for Dialogue Systems
 
One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人One Day for Bot 一天搞懂聊天機器人
One Day for Bot 一天搞懂聊天機器人
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Automatic Key Term Extraction and Summarization from Spoken Course Lectures

  • 1. Speaker: Yun-Nung Chen 陳縕儂 Advisor: Prof. Lin-Shan Lee 李琳山 National Taiwan University Automatic Key Term Extraction and Summarization from Spoken Course Lectures 課程錄音之自動關鍵用語擷取及摘要
  • 2. Introduction 2 Master Defense, National Taiwan University Target: extract key terms and summaries from course lectures
  • 3. Key Term Summary O Indexing and retrieval O The relations between key terms and segments of documents 3 Introduction Master Defense, National Taiwan University O Efficiently understand the document Related to document understanding and semantics from the document Both are “Information Extraction”
  • 4. 4 Master Defense, National Taiwan University
  • 5. Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Ex. “語音” O Key phrase O Ex. “語言 模型” 5 Master Defense, National Taiwan University
  • 6. Automatic Key Term Extraction 6 ▼ Original spoken documents Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  • 7. Automatic Key Term Extraction 7 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  • 8. Automatic Key Term Extraction 8 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Master Defense, National Taiwan University
  • 9. Phrase Identification Automatic Key Term Extraction 9 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal First using branching entropy to identify phrases ASR trans Master Defense, National Taiwan University
  • 10. Phrase Identification Key Term Extraction Automatic Key Term Extraction 10 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : Then using learning methods to extract key terms by some features ASR trans Master Defense, National Taiwan University
  • 11. Phrase Identification Key Term Extraction Automatic Key Term Extraction 11 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : ASR trans Master Defense, National Taiwan University
  • 12. Branching Entropy 12 O Inside the phrase hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 13. Branching Entropy 13 O Inside the phrase O Inside the phrase hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 14. Branching Entropy 14 hidden Markov model boundary Define branching entropy to decide possible boundary How to decide the boundary of a phrase? represent is can : : is of in : : O Inside the phrase O Inside the phrase O Boundary of the phrase Master Defense, National Taiwan University
  • 15. Branching Entropy 15 hidden Markov model O Definition of Right Branching Entropy O Probability of xi given X O Right branching entropy for X X xi How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 16. Branching Entropy 16 hidden Markov model O Decision of Right Boundary O Find the right boundary located between X and xi where X boundary How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 17. Branching Entropy 17 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 18. Branching Entropy 18 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University
  • 19. Branching Entropy 19 hidden Markov model How to decide the boundary of a phrase? represent is can : : is of in : : Master Defense, National Taiwan University boundary Using PAT tree to implement
  • 20. Phrase Identification Key Term Extraction Automatic Key Term Extraction 20 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal Key terms entropy acoustic model : Extract prosodic, lexical, and semantic features for each candidate term ASR trans Master Defense, National Taiwan University
  • 21. Feature Extraction 21 OProsodic features O For each candidate term appearing at the first time Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Speaker tends to use longer duration to emphasize key terms using 4 values for duration of the term duration of phone “a” normalized by avg duration of phone “a” Master Defense, National Taiwan University
  • 22. Feature Extraction 22 OProsodic features O For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Master Defense, National Taiwan University
  • 23. Feature Extraction 23 OProsodic features O For each candidate term appearing at the first time Higher pitch may represent significant information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Master Defense, National Taiwan University
  • 24. Feature Extraction 24 OProsodic features O For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Master Defense, National Taiwan University
  • 25. Feature Extraction 25 OProsodic features O For each candidate term appearing at the first time Higher energy emphasizes important information Feature Name Feature Description Duration (I – IV) normalized duration (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range) Energy (I - IV) energy (max, min, mean, range) Master Defense, National Taiwan University
  • 26. Feature Extraction 26 OLexical features Feature Name Feature Description TF term frequency IDF inverse document frequency TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term Master Defense, National Taiwan University
  • 27. Feature Extraction 27 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability Key terms tend to focus on limited topics t1 t2 tj tn D1 D2 Di DN TK Tk T2 T1 P(T |D )k i P(t |T )j k Di: documents Tk: latent topics tj: terms Master Defense, National Taiwan University
  • 28. Feature Extraction 28 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Probability Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics describe a probability distribution Master Defense, National Taiwan University
  • 29. Feature Extraction 29 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq. Master Defense, National Taiwan University
  • 30. Feature Extraction 30 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Significance Within-topic to out-of-topic ratio Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics within-topic freq. out-of-topic freq. Master Defense, National Taiwan University
  • 31. Feature Extraction 31 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) non-key term key term Key terms tend to focus on limited topics Master Defense, National Taiwan University
  • 32. Feature Extraction 32 OSemantic features O Probabilistic Latent Semantic Analysis (PLSA) O Latent Topic Entropy Feature Name Feature Description LTP (I - III) Latent Topic Probability (mean, variance, standard deviation) LTS (I - III) Latent Topic Significance (mean, variance, standard deviation) LTE term entropy for latent topic non-key term key term Key terms tend to focus on limited topics Higher LTE Lower LTE Master Defense, National Taiwan University
  • 33. Phrase Identification Key Term Extraction Automatic Key Term Extraction 33 Archive of spoken documents Branching Entropy Feature Extraction Learning Methods 1) AdaBoost 2) Neural Network ASR speech signal ASR trans Key terms entropy acoustic model : Using supervised approaches to extract key terms Master Defense, National Taiwan University
  • 34. Learning Methods 34 OAdaptive Boosting (AdaBoost) ONeural Network Automatically adjust the weights of features to train a classifier Master Defense, National Taiwan University
  • 35. Automatic Key Term Extraction 35 Master Defense, National Taiwan University
  • 36. Experiments 36 OCorpus O NTU lecture corpus O Mandarin Chinese embedded by English words O Single speaker O 45.2 hours OASR System O Bilingual AM with model adaptation [1] O LM with adaptation using random forests [2] Master Defense, National Taiwan University Language Mandarin English Overall Char Acc (%) 78.15 53.44 76.26 [1] Ching-Feng Yeh, “Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery,” Master Thesis, 2011. [2] Chao-Yu Huang, “Language Model Adaptation for Mandarin-English Code-Mixed Lectures Using Word Classes and Random Forests,” Master Thesis, 2011.
  • 37. Experiments 37 OReference Key Terms O Annotations from 61 students who have taken the course O If the an annotator labeled 150 key terms, he gave each of them a score of 1/150 , but 0 to others O Rank the terms by the sum of all scores given by all annotators for each term O Choose the top N terms form the list O N is average number of key terms O N = 154 key terms O 59 key phrases and 95 keywords OEvaluation O 3-fold cross validation Master Defense, National Taiwan University
  • 38. 0 10 20 30 40 50 60 Pr Lx Sm Pr+Lx Pr+Lx+Sm Experiments 38 OFeature Effectiveness O Neural network for keywords from ASR transcriptions Each set of these features alone gives F1 from 20% to 42%Prosodic features and lexical features are additiveThree sets of features are all useful 20.78 42.86 35.63 48.15 56.55 Pr: Prosodic Lx: Lexical Sm: Semantic F-measure Master Defense, National Taiwan University
  • 39. 0 10 20 30 40 50 60 70 N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network ASR Manual Experiments 39 OOverall Performance (Keywords & Key Phrases) Baseline Branching entropy performs well F-measure Master Defense, National Taiwan University N-Gram TFIDF Branching Entropy TFIDF Branching Entropy AdaBoost Branching Entropy Neural Network key phrase keyword 23.44 52.60 57.68 62.70
  • 40. The performance of manual is slightly better than ASR 0 10 20 30 40 50 60 70 N-Gram+TFIDF BE+TFIDF BE+Adaboost BE+Neural Network ASR Manual Experiments 40 OOverall Performance (Keywords & Key Phrases) 55.84 Baseline 62.39 67.31 32.19 Supervised learning using neural network gives the best results F-measure Master Defense, National Taiwan University N-Gram TFIDF Branching Entropy TFIDF Branching Entropy AdaBoost Branching Entropy Neural Network key phrase keyword 23.44 52.60 57.68 62.70
  • 41. 41 Master Defense, National Taiwan University
  • 42. Introduction 42 OExtractive Summary O Important sentences in the document OComputing Importance of Sentences O Statistical Measure, Linguistic Measure, Confidence Score, N-Gram Score, Grammatical Structure Score ORanking Sentences by Importance and Deciding Ratio of Summary Master Defense, National Taiwan University Proposed a better statistical measure of a term
  • 43. Statistical Measure of a Term 43 OLTE-Based Statistical Measure (Baseline) OKey-Term-Based Statistical Measure O Considering only key terms O Weighted by LTS of the term Master Defense, National Taiwan University Tk-1 Tk Tk+1… … Key terms can represent core content of the document Latent topic probability can be estimated more accurately ti ϵ key ti ϵ key
  • 44. Importance of the Sentence 44 OOriginal Importance O LTE-based statistical measure O Key-term-based statistical measure ONew Importance O Considering original importance and similarity of other sentences Master Defense, National Taiwan University Sentences similar to more sentences should get higher importance
  • 45. Random Walk on a Graph 45 O Idea O Sentences similar to more important sentences should be more important O Graph Construction O Node: sentence in the document O Edge: weighted by similarity between nodes O Node Score O Interpolating two scores O Normalized original score of sentence Si O Scores propagated from neighbors according to edge weight p(j, i) Master Defense, National Taiwan University Nodes connecting to more nodes with higher scores should get higher scores score of Si in k-th iter.
  • 46. Random Walk on a Graph 46 OTopical Similarity between Sentences O Edge weight sim(Si, Sj): (sentence i  sentence j) O Latent topic probability of the sentence O Using Latent Topic Significance Master Defense, National Taiwan University Sj t LTS Si … …Tk Tk+1 tj Tk-1 ti tk
  • 47. Random Walk on a Graph 47 OScores of Sentences O Converged equation O Matrix form O Solution dominate eigen vector of P’ OIntegrated with Original Importance Master Defense, National Taiwan University
  • 48. Automatic Summarization 48 Master Defense, National Taiwan University
  • 49. Experiments 49 OSame Corpus and ASR System O NTU lecture corpus OReference Summaries O Two human produced reference summaries for each document O Ranking sentences from “the most important” to “of average importance” OEvaluation Metric O ROUGE-1, ROUGE-2, ROUGE-3 O ROUGE-L: Longest Common Subsequence (LCS) Master Defense, National Taiwan University
  • 50. Evaluation 50 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ASR ROUGE-1 LTE Key ROUGE-2 ROUGE-3 ROUGE-L Key-term-based statistical measure is helpful
  • 51. Evaluation 51 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L Random walk can help the LTE-based statistical measure 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L Random walk can also help the key-term-based statistical measure LTE LTE + RW Key Key + RW ASR ASR Topical similarity can compensate recognition errors
  • 52. Evaluation 52 Master Defense, National Taiwan University 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-L ASR LTE LTE + RW Key Key + RW 41 46 51 56 10% 20% 30% 18 23 28 10% 20% 30% 9 14 19 10% 20% 30% 40 45 50 10% 20% 30% Manual Key-term-based statistical measure and random walk using topical similarity are useful for summarization
  • 53. 53 Master Defense, National Taiwan University
  • 54. Automatic Key Term Extraction Automatic Summarization • The performance can be improved by ▫ Key-term-based statistical measure ▫ Random walk with topical similarity  Compensating recognition errors  Giving higher scores to sentences topically similar to more important sentences  Considering all sentences in the document 54 • The performance can be improved by ▫ Identifying phrases by branching entropy ▫ Prosodic, lexical, and semantic features together Conclusions Master Defense, National Taiwan University
  • 55. Published Papers: [1] Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, and Lin-Shan Lee, “Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features,” in Proceedings of SLT, 2010. [2] Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, and Lin-Shan Lee, “Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms,” in Proceedings of InterSpeech, 2011. 55 Master Defense, National Taiwan University

Hinweis der Redaktion

  1. Hello, everybody. I am Vivian Chen, coming from National Taiwan University. Today I’m going to present my work about automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features.
  2. We can show some key terms related to acoustic model. If the key term and acoustic model co-occurs in the same document, they are relevant, so that we can show them for users.
  3. First I will define what is key term. A key term is a term that has higher term frequency and includes core content. There are two types of key terms. One of them is phrase, we call it key phrase. For example, “language model” is a key phrase. Another type is single word, we call it keyword, like “entropy”. Then there are two advantages about key term extraction. They can help us index and retrieve. We also can construct the relationships between key terms and segment documents. Here’s an example.
  4. First I will define what is key term. A key term is a term that has higher term frequency and includes core content. There are two types of key terms. One of them is phrase, we call it key phrase. For example, “language model” is a key phrase. Another type is single word, we call it keyword, like “entropy”. Then there are two advantages about key term extraction. They can help us index and retrieve. We also can construct the relationships between key terms and segment documents. Here’s an example.
  5. Here’s the flow chart. Now there are a lot of spoken documents.
  6. First, with speech recognition system, we can get ASR transcriptions.
  7. The input of our work includes transcriptions and speech signals.
  8. The first part is to identify phrases. We propose branching entropy to finish this part.
  9. Then we can enter key term extraction part. We first extract some features to represent a candidate term, and use machine learning methods to decide the key terms.
  10. First we explain how to do phrase identification.
  11. The target of this work is to decide the boundary of a phrase, but where’s the boundary? We can observe some characteristics first. Hidden is almost always followed by Markov.
  12. Similarly, hidden Markov is almost always followed by model,
  13. But at this position, we find that hidden Markov model is followed by some different words. The position is more likely to be boundary. Then we define branching entropy by the concept.
  14. First, I will define what is right branching entropy. We assume that X is hidden and x_i is the children of X, representing hidden Markov. We define p of x_i to be the frequency of x_i over the frequency of X, which is probability of children x_i for X. Then we define right branching entropy as this equation, H_r of X, which represents the entropy of X’s chidren.
  15. The idea is that the branching entropy at the boundary is higher. So we can find the right boundary where branching entropy is higher like hidden Markov model.
  16. Similarly, left branching entropy has the same attribute.
  17. The approach called branching entropy is to find the boundaries of key phrases, but where’s the boundary. The idea is that the word entropy at the boundary is higher. There are four parts in this approach. We can present each part in detail.
  18. The approach called branching entropy is to find the boundaries of key phrases, but where’s the boundary. The idea is that the word entropy at the boundary is higher. There are four parts in this approach. We can present each part in detail.
  19. With all candidate terms, including words and phrases, we can extract prosodic, lexical, semantic features for each term.
  20. For each word, we also compute some prosodic features. First, we believe that lecturer would use longer duration to emphasize key terms. For the candidate term first appearing, we compute the duration for each phone in each candidate term. Then we normalize the duration of specific phone by the average duration of this phone. In this feature, we only use four values to represent this term. We can compute maximum, minimum, mean and range over all phones in a single term to be the features.
  21. We believe that higher pitch may represent important information. So we can extract the pitch contour like this.
  22. The method is like duration, but the segment unit is changed to single frame. We also use these four values to represent the features.
  23. Similarly, we think that higher energy may represent important information. We also can extract energy for each frame in a candidate term.
  24. The features are like pitch. The first set of features is shown in this.
  25. The second set of features is lexical features. These features are well-known, which may indicate the importance of term. We just use these features to represent each term.
  26. Third set of features is semantic features. The assumption is that key terms tend to focus on limited topics. We use PLSA to compute some semantic features. We can get the probability of each topic given a candidate term.
  27. Here are distributions of two terms. Notice that the first term has similar probability for most topics, and it may be a function word, so that this candidate term may be non-key term. In other one, the term focuses on less topics, so it is more likely to be key term. In order to describe the distribution, for the distribution, we compute mean, variance and standard deviation to be the features of this term.
  28. Similarly, we can compute latent topic significance, which is within-topic to out-of-topic ratio like this equation. This is the significance of topic T_k in term t_i. This one is within-topic frequency, where n of t_i and d_j is the number of t_i in document d_j, and this one is out-of-topic frequency.
  29. Because the significance of key terms would focus on limited topics, we also compute this three values to represent a distribution.
  30. Then, we also can compute latent topic entropy. Because this feature also can describe the difference between these two distributions. Non-key term may have higher LTE, and key term would have lower one.
  31. Finally, we use some machine learning methods to extract key terms.
  32. We also use two supervised learning, adaptive boosting and neural network to automatically adjust the weights of features to produce a classifier.
  33. Then we do some experiments to evaluate our approach. The corpus is NTU lecture, which includes Mandarin Chinese and some English words, like this example. This sentence is ~~, which means our solution is viterbi algorithm, The lecture is from a single speaker, and total corpus is about 45 hours.
  34. To evaluate our result, we need to generate reference key term list. The reference key terms are from students’ annotations, and these students have taken the course. Then we sort all terms and decide top N to be key terms. N is the average numbers of key terms extracted by students, and N is 154. The reference key term list includes 59 key phrases and 95 keywords.
  35. At this experiment, we are going to see feature effectiveness. The results only include keywords, and it is from neural network using ASR transcriptions. Row a, b, c perform F1 measure from 20% to 42%. Then row d shows that prosodic and lexical features are additive. Row e proves that adding semantic features can further improve the performance so that three sets of features are all useful.
  36. Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62.
  37. Finally, we show the overall performance for manual and ASR transcriptions. These are baseline, conventional TFIDF without extracting phrases using branching entropy. From the better performance, we can see that branching entropy is very useful. This proves our assumption that the term with higher branching entropy is more likely to be key term. And the best results are from supervised learning using neural network, achieving F1 measure of 67 and 62.
  38. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  39. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  40. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  41. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  42. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  43. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.
  44. From the above experiments, the conclusion is that we proposed new approach to extract key terms efficiently. The performance can be improved by two ideas, using branching entropy to extract key phrases and using three sets of features.