Asr

A utomatic S peech R ecognition ,[object Object],[object Object],[object Object]

OUTLINE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Multilayer Structure of speech production: ,[object Object],[I] [would] [like] [to] [book] [a] [flight] [from] [Rome] [to] [London][tomorrow][morning] [book]  [b/uh/k] Pragmatic Layer Semantic Layer Syntactic Layer Prosodic/Phonetic Layer Acoustic Layer

What is S peech R ecognition ? ,[object Object],[object Object],[object Object]

Capabilities of ASR including: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Uses and Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A Timeline & History of Voice Recognition Software Dragon released discrete word dictation-level speech recognition software. It was the first time dictation speech & voice recognition technology was available to consumers . 1995 SpeechWorks, the leading provider of over-the-telephone automated speech recognition (ASR) solutions, was founded. 1984 Dragon Systems was founded. 1982 DARPA established the Speech Understanding Research (SUR) program. A $3 million per year of government funds for 5 years. It was the largest speech recognition project ever. 1971 HMM approach to speech & voice recognition was invented by Lenny Baum of Princeton University Early 1970's AT&T's Bell Labs produced the first electronic speech synthesizer called the Voder. 1936

… timeline…continue Scansoft, Inc. is presently the world leader in the technology of Speech Recognition in the commercial market. ScanSoft Ships Dragon NaturallySpeaking 7 Medical, Lowers Healthcare Costs through Highly Accurate Speech Recognition. 2003 Lernout & Hauspie acquired Dragon Systems for approximately $460 million. 2000 Microsoft invested $45 million to allow Microsoft to use speech & voice recognition technology in their systems. 1998 Dragon introduced "Naturally Speaking", the first "continuous speech" dictation software available 1997

The Structure of ASR System: Functional Scheme of an ASR System Speech samples X Y S W * Database Signal Interface Feature Extraction Recognition Databases Training HMM

Speech Database: ,[object Object],[object Object],[object Object]

Transcription of speech: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Segmentation and labeling example

Many databases are distributed by the Linguistic Data Consortium www.ldc.upenn.edu

Speech Signal Analysis Feature Extraction for ASR: - The aim is to extract the voice features to distinguish different phonemes of a language.

MFCC extraction: ,[object Object],[object Object],Pre-emphasis DFT Mel filter banks Log(|| 2 ) IDFT Speech signal x(n) WINDOW x ’ (n) x t (n) X t (k) Y t (m) MFCC y t (m) (k)

Spectral Analysis: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Speech waveform of a phoneme “e” ,[object Object],After pre-emphasis and Hamming windowing Power spectrum MFCC

Training and Recognition : ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Deterministic vs. Stochastic framework: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Implementing HMM to speech Modeling Training and Recognition ,[object Object],[object Object],[object Object],Training HMM Feature Extraction Recognition W * Y Y S Speech Samples 

Implementation of HMM: ,[object Object],[object Object],P(w t =yes t-1 =il)=0.2 P(w t =il|w t-1 =yes)=1 P(w t =il|w t-1 =no)=1 P(w t =no t-1 =il)=0.2 P(s t t-1 ) s (0) Silence Start S (1) S (2) S (3) S (4) S (5) S (6) S (7) S (8) S (9) S (10) S (11) S (12) Phoneme ‘ YE ’ Phoneme ‘ S ’ w= YES w= NO Phoneme ‘ N ’ Phoneme ‘ O ’ P(Y t =s (9) ) Y 0.6

The search Algorithm: ,[object Object],s (0) s (7) s (0) s (1) s (8) s (7) s (0) s (1) s (2) Time=1 Time=2 Time=3 0.1 0.4 0.1 0.025 0.021 0.051 0.041 0.045 0.036 0.032

Conclusions: ,[object Object],[object Object],[object Object],[object Object],[object Object]

Asr

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Asr

Ähnlich wie Asr (20)

Mehr von kkkseld

Mehr von kkkseld (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Asr