08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Towards Machine Comprehension of Spoken Content
1. TAIPEI | SEP. 21-22, 2016
李宏毅 Hung-yi Lee
TOWARDS MACHINE COMPREHENSION
OF SPOKEN CONTENT
2. 2
MULTIMEDIA INTERNET CONTENT
300 hrs multimedia is
uploaded per minute.
(2015.01)
1874 courses on coursera
(2016.04)
Ø We need machine to listen to the audio data,
understand it, and extract useful information for humans.
Ø In these multimedia, the spoken part carries
very important information about the content.
Ø Nobody is able to go through the data.
Ø Overview the technology developed at NTU Speech Lab
7. 7
SPEECH SUMMARIZATION
Retrieved
Audio File
Summary
Select the most informative
segments to form a compact version
1 hour
long
10 minutes
Extractive Summaries
Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/
MLDS_2015/Structured%20Lecture/Summariz
ation%20Hidden_2.ecm.mp4/index.html
12. Spoken Content Retrieval
l Transcribe spoken content into text by speech recognition
Speech
Recognition Models
Text
Retrieval
Result
Text
Retrieval
Query learner
l Use text retrieval approach to search the transcriptions
Spoken
Content
Black Box
16. Challenges
• Given the information entered by the users, which
action should be taken?
“Give me an example.”“Is it relevant to XXX?”
“More precisely, please.”
“Show the results.”
The retrieval system learns to take the most effective
actions from historical interaction experiences.
22. More is less …...
• Given all the related lectures from different courses
Which lecture should I
go first?
Learning Map
Ø Nodes: lectures in the
same topics
Ø Edges: suggested learning
order
learner
[Shen & Lee, Interspeech 15]
26. Spoken Question Answering
• TOEFL Listening Comprehension Test by Machine
Question: “ What is a possible origin of Venus’ clouds? ”
Audio Story:
Choices:
(A) gases released as a result of volcanic activity
(B) chemical reactions caused by high surface temperatures
(C) bursts of radio energy from the plane's surface
(D) strong winds that blow dust into the atmosphere
(The original story is 5 min long.)
[Tseng & Lee, Interspeech 16]
29. Model Architecture
“what is a possible
origin of Venus‘ clouds?"
Question:
Question
Semantics
…… It be quite possible that this be due to
volcanic eruption because volcanic eruption
often emit gas. If that be the case volcanism
could very well be the root cause of Venus 's
thick cloud cover. And also we have observe
burst of radio energy from the planet 's
surface. These burst be similar to what we
see when volcano ……
Audio Story:
Speech
Recognition
Semantic
Analysis
Semantic
Analysis
Attention
Answer
Select the choice most
similar to the answer
Attention
The model is learned
end-to-end.
32. 32
CHALLENGES IN SPEECH RECOGNITION?
Lots of audio files in different languages on the Internet
Most languages have little annotated data for training
speech recognition systems.
Some audio files are produced in several different of
languages
Some languages even do not have written form
Out-of-vocabulary (OOV) problem
45. 45
如果你想 “深度學習深度學習”
My Course: Machine learning and having it deep and structured
http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.html
6 hour version: http://www.slideshare.net/tw_dsconf/ss-62245351
“Neural Networks and Deep Learning”
written by Michael Nielsen
http://neuralnetworksanddeeplearning.com/
“Deep Learning”
Written by Yoshua Bengio, Ian J. Goodfellow and Aaron Courville
http://www.deeplearningbook.org