SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
INSTANT SPEECH TRANSLATION
By SATHIYASEELAN M
10BM60080
I Year M.B.A
VGSOM, IIT Kharagpur
Index
1. Abstract.................................................................................................3
2. Instant Speech Translation – Eliminating Language Barriers ...........3
3. System Requirements ..........................................................................3
3.1. Speech Recognition ...............................................................................4
3.2. Language Parsing ..................................................................................5
3.3. Translation .............................................................................................5
4. Applications and their Business Potential..........................................6
4.1. Mobile Applications and Services ...........................................................6
4.2. Voice Interface Devices with Local Language support ............................8
4.3. Data Entry Applications – in Multiple Languages ....................................9
4.4. e-Learning .............................................................................................9
4.5. Business Applications ..........................................................................10
5. Key Players .........................................................................................11
6. Challenges Ahead...............................................................................11
7. Conclusion..........................................................................................12
8. References ..........................................................................................13
1. Abstract
With the current pace of globalization, any Industry needs to look beyond Geographical
borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also
invest a lot on foreign language training programs. An Application that provides instant
translation will not only cut down these costs but will also help gathering requirements more
precisely and in a short span of time. Instant speech translation [IST] finds wide applications in
other industries as well. Say in a country like India where numerous vernacular languages are
in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST
applications in mobile phones. All major players such as Google, Microsoft, and IBM have
already come up with some sort of prototype for these kind of applications. Google Translator is
one such primitive example. A lot many such applications will be in our gadgets soon. This
Paper elaborates on few such applications and their business potential.
2. Instant Speech Translation – Eliminating Language Barriers
Internet and mobile services has reached even remote villages. Now rural markets are
considered significant in countries like China and India. Breaking Language barriers will further
open up these markets for international business. Knowledge anywhere in any form should be
used for the growth of the humanity. We should create opportunities for those who want to
learn and share knowledge using their own native languages. Instant Speech translation will
create a platform for them. This could unravel many things that are not known to the world.
In “The Hitchhiker’s Guide to the Galaxy” Babel fish, a fictitious animal performs instant
translations when kept in the ear. If such an application is there on the mobile, Say I call a
person in Japan, I speak to him in English which would be translated to Japanese by the
application and then transmitted through a telecom service provider. This will eliminate
language boundaries and create a truly connected world.
3. System Requirements
“We think speech-to-speech translation should be possible and work reasonably well in a few years’
time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation
and high-accuracy voice recognition, and that’s what we’re working on .If you look at the progress in
machine translation and corresponding advances in voice recognition, there has been huge progress
recently.”
- Franz Och, Google’s head of translation services
To develop an Instant speech Translation application, we need a robust speech recognition
and Machine translation system. Following figure depicts the basic blocks of an instant speech
translation system.
Fig. Basic Functional Blocks of Instant Speech Translation
3.1. Speech Recognition
Advances in speech-recognition and dictation technology have made stunning leaps
forward in recent years although it isn't perfect. Word Error Rate (WER) has drastically come
down in the recent past.
Fig. Word Error Rate of Speech Recognition Systems over Years
Source - http://cacm.acm.org/ Communications of the ACM
Speech recognition has achieved good usability and there is a sudden surge in the
speech controlled devices. Even Microsoft Vista had speech recognition capabilities which
turned out to be a failure. But we had witnessed basic commands working in it. Just a listening
and guessing system is not going to this forward.
Robust speech recognition technology is an crucial part of Instant speech translation.
Main problem systems face is in understanding the nuance of user’s enunciation and voice
patterns. When used over a period of time it could reduce the speech recognition error rate.
Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only
one user mostly and even users can’t avoid mobile phone usage. Mobiles can also soon
recognise user’s natural free-style speech. Speech recognition systems can be customized to a
particular user by having a predefined set of commands or words to be uttered by the user.
This could help the system recognize its master’s voice patterns. This could be done with the
help of a professional in early stages of development for this sort of customization.
3.2. Language Parsing
Human sentences can’t be easily parsed by programs as they parse mathematical
expressions. There is substantial ambiguity associated with the structure of human language.
Some sort of linguistic analysis needs to be done to fetch the relevant information. Language
parser splits the raw text into understandable word units and selects the correct form and class
for each word that can have more than one interpretation and identifies the head words of a
sentence. The information that is analysed by the language parser is passed to the machine
translation engine for further tasks.
There should be some set of protocols defined for communication between different
languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but
in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide
parsed language stream that can be easily interpreted by translators.
3.3. Translation
Machine Translator translates a parsed input language stream to a well defined output
language stream. Translation done by Machine translator will abide by the set of protocols
defined for communication between a set of languages.
Fig. Machine Translation
4. Applications and their Business Potential
IST applications have great business potential. Various players are almost set ready to
roll out these services in various types of gadgets.
4.1. Mobile Applications and Services
IST as a service:
Instant Speech translation will have a lot many applications on mobile. It is highly
impossible for an IST service provider to cover all languages and various colloquial forms in
them. Hence the service provider can expose certain Application Programming Interfaces
(APIs) so that interested third parties can develop and sell them back to the IST service
provider. This will become a viable business model once regional language enthusiasts start
involving in this. IST service provider can bill the users based on usage. This sort of services
can be launched in collaboration with the telecomm service provider.
Fig. A Model of IST Services on mobile
IST as a product:
Even these services can be packaged into a product. But this will be a heavy
application to support an almost perfect translation. So in the initial stages user preferred
language packs can be packed into a product and sold to the user.
Fig. Users interacting through an IST application on mobile
Service model will suit Indian languages and Product model will suit for international
languages like Japanese. Service model will facilitate wide spread of these applications and it
will also bring in various players into it.
Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few
basic stuffs are already available in App store for e.g. Jibbigo Voice Translation
Fig. Screenshot of Jibbigo Application on iPod
IST Development Standards
To facilitate easy development and learning some set of standards need to be
established similar to HTML in web design. As XML and JSON for machine readable data
sharing, VOXML (Voice XML) can be used for these types of applications.
4.2. Voice Interface Devices with Local Language support
Voice interface devices that support Local languages will soon be on use. Say a
localities interacting with a railway information kiosk with their local language through speech.
Instant speech translation will play a vital role in these types of interfaces. IST Applications can
be at the front end of such devices. This will also consume lesser query resolving time as
compared to traditional key entry enquiry system. As most of the voice driven applications
currently support English. Even same is the case with Windows 7 Operating System. IST
Application when used at the front end can translate local language speech input to English
which can be further processed by Speech recognition systems supported by various Operating
Systems.
Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support
through speech
4.3. Data Entry Applications – in Multiple Languages
IST Applications can help in Data entry applications in multiple languages. This could
assist in translating legal documents to various languages. We have witnessed many court
proceedings getting delayed due to lack of documents in regional languages. Our Governm ent
also invests a lot in translating various documents to regional languages. In the years to come
Microsoft word will have options to view translated versions while typing. This could cut down
costs and time involved in such activities.
4.4. e-Learning
Advancement in computing and bandwidth has brought the benefits of traditional classroom
education into a distance learning environment. IST will take this a step forward by removing
language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema
of an e-classroom that uses IST.
Local
Language
Speech input
IST
Applications
Command / Query
Generator
Normal Processing
done in a Railway
Information Kiosk
English
Fig. IST Applications supporting Distance Learning in Various Languages
Even IST applications could be used in webcasting in a similar way.
4.5. Business Applications
IST Applications could also assist Business enterprises to interact with customers located
across different geographies. IST will help in understanding customer requirements in short
span of time.
Users’ contribution to IST applications is very crucial. They can provide suggestions to
improvise the translation provided by the application. Some credits can be given to regular
users who provide valuable suggestions. This will encourage local participation, which would
ultimately help in improving the quality of service provided by IST applications.
Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such
applications in future when IST applications are usable in real time. Then IST applications
could be expanded to lot many sensitive areas like Health care, defence etc.
5. Key Players
Google was the first company to announce that it was working on speech-to-speech
translation for mobile phones. The Latest Apps from Google Android that supports translation is
Babylon that will give dictionary results in 75 different languages as well as full text translations
in over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for
iPhones. IBM and Apple are already working closely on a few applications that will run on
iPhone and iPad.
IBM has been working on translation software and machine translation for years. In fact,
they created MASTOR and the SMT (Statistical Machine Translation) technology that many
other Translating Applications are using.
Microsoft has inbuilt speech recognition support in its Operating systems. It has
recently demonstrated German-English translation of a conversation between two Microsoft
employees. It has made no official announcements on projects pertaining to Instant Speech
Translation.
Videos of Instant Speech Translation applications by other major players like AT&T,
NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known
speech translation systems developed by these players in this field. Extensive Research
Projects are going on to improve the usability of Speech translation systems. PDA
manufacturers could work in collaboration with these Application developers to accelerate
these projects, which would also help them in gaining an upper hand over their competitors.
6. Challenges Ahead
System that works well in real time environment will only be successful in the long run.
Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech
Recognition with high accuracy. It is heavily dependent upon the quality of the input speech.
Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy.
In a real time user is not going to use IST applications in a noise free environment. Hence IST
application should be intelligent enough to separate out the user’s voice form the noise in the
environment.
IST applications are also expected to be intelligent enough to capture the user’s mood
in the future. Monotonous voice from an IST application will soon make the user bored with
these applications. Even a customisable voice from the IST application will make them more
expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human
voice.
Industry should work in collaboration with research communities in resolving these
hurdles and achieve a human like performance.
7. Conclusion
Speech/Text Translation Applications are being used in variety of forms in number of
devices. To attain humanlike performance, we must continue to invest in research. Along with
speech, other sensory user inputs can also be integrated with IST applications to attain
humanlike performance. Once that is achieved Instant speech translation will soon spread to
devices like T.V. It wouldn’t be a surprise if text in the web now gets replaced by audio and
video in the future “glocal” world.
8. References
1. “Enhancing Global and Synchronous Distance Learning and Teaching by Using Instant
Transcript and Translation” By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo
Yana Hosei. University Research Institute, California.
2. http://mashable.com/2010/02/08/speech-to-speech/
3. http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.html
4. http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article701783
1.ece
5. http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-
translator/
6. http://www.jibbigo.com/website/index.php
7. http://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognition

Weitere ähnliche Inhalte

Was ist angesagt?

Programming language (JGMNHS)
Programming language (JGMNHS)Programming language (JGMNHS)
Programming language (JGMNHS)
Katherine Gamboa
 
Cmp104 lec 6 computer lang
Cmp104 lec 6 computer langCmp104 lec 6 computer lang
Cmp104 lec 6 computer lang
kapil078
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browser
Gowsalyasri
 

Was ist angesagt? (18)

Introduction to computer science
Introduction to computer scienceIntroduction to computer science
Introduction to computer science
 
Chat bots and AI
Chat bots and AIChat bots and AI
Chat bots and AI
 
Programming language (JGMNHS)
Programming language (JGMNHS)Programming language (JGMNHS)
Programming language (JGMNHS)
 
Cmp104 lec 6 computer lang
Cmp104 lec 6 computer langCmp104 lec 6 computer lang
Cmp104 lec 6 computer lang
 
Final presentation on chatbot
Final presentation on chatbotFinal presentation on chatbot
Final presentation on chatbot
 
Presentation on computer language
Presentation on computer languagePresentation on computer language
Presentation on computer language
 
Artificially Intelligent chatbot Implementation
Artificially Intelligent chatbot ImplementationArtificially Intelligent chatbot Implementation
Artificially Intelligent chatbot Implementation
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
Voice browser
Voice browserVoice browser
Voice browser
 
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browser
 
Voice based email for blinds
Voice based email for blindsVoice based email for blinds
Voice based email for blinds
 
The Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AIThe Ultimate Guide to Implementing Conversational AI
The Ultimate Guide to Implementing Conversational AI
 
Conversational AI - 2020
Conversational AI - 2020Conversational AI - 2020
Conversational AI - 2020
 
Voice Browser
Voice BrowserVoice Browser
Voice Browser
 
Conversational ai, conversational ui
Conversational ai, conversational uiConversational ai, conversational ui
Conversational ai, conversational ui
 
voice browser
voice browservoice browser
voice browser
 

Andere mochten auch

Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
behzad66
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
Madhuri Yellapu
 

Andere mochten auch (9)

Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
The translator (session 3)
The translator (session 3)The translator (session 3)
The translator (session 3)
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Wearable Technology Design
Wearable Technology DesignWearable Technology Design
Wearable Technology Design
 
fire detection and alarm system
fire detection and alarm systemfire detection and alarm system
fire detection and alarm system
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
 

Ähnlich wie Instant speech translation 10BM60080 - VGSOM

An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
Association of Scientists, Developers and Faculties
 
Voice Command Mobile Phone Dialer
Voice Command Mobile Phone DialerVoice Command Mobile Phone Dialer
Voice Command Mobile Phone Dialer
ijtsrd
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
Vinay Jaisriram
 

Ähnlich wie Instant speech translation 10BM60080 - VGSOM (20)

D1803041822
D1803041822D1803041822
D1803041822
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
Voice Command Mobile Phone Dialer
Voice Command Mobile Phone DialerVoice Command Mobile Phone Dialer
Voice Command Mobile Phone Dialer
 
VOCAL- Voice Command Application using Artificial Intelligence
VOCAL- Voice Command Application using Artificial IntelligenceVOCAL- Voice Command Application using Artificial Intelligence
VOCAL- Voice Command Application using Artificial Intelligence
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)
 
VOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PIVOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PI
 
Voice Command System Using Raspberry PI
Voice Command System Using Raspberry PIVoice Command System Using Raspberry PI
Voice Command System Using Raspberry PI
 
Seminar
SeminarSeminar
Seminar
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech RecognitionIRJET- Voice to Code Editor using Speech Recognition
IRJET- Voice to Code Editor using Speech Recognition
 
Wake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phoneWake-up-word speech recognition using GPS on smart phone
Wake-up-word speech recognition using GPS on smart phone
 
voice browser
voice browservoice browser
voice browser
 
Desktop assistant
Desktop assistant Desktop assistant
Desktop assistant
 
Voice Assistant Using Python and AI
Voice Assistant Using Python and AIVoice Assistant Using Python and AI
Voice Assistant Using Python and AI
 
IRJET - Sign Language Recognition System
IRJET -  	  Sign Language Recognition SystemIRJET -  	  Sign Language Recognition System
IRJET - Sign Language Recognition System
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
IRJET- Communication System for Blind, Deaf and Dumb People using Internet of...
 
30
3030
30
 
Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software Top 10 Best Speech Recognition Software
Top 10 Best Speech Recognition Software
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Instant speech translation 10BM60080 - VGSOM

  • 1. INSTANT SPEECH TRANSLATION By SATHIYASEELAN M 10BM60080 I Year M.B.A VGSOM, IIT Kharagpur
  • 2. Index 1. Abstract.................................................................................................3 2. Instant Speech Translation – Eliminating Language Barriers ...........3 3. System Requirements ..........................................................................3 3.1. Speech Recognition ...............................................................................4 3.2. Language Parsing ..................................................................................5 3.3. Translation .............................................................................................5 4. Applications and their Business Potential..........................................6 4.1. Mobile Applications and Services ...........................................................6 4.2. Voice Interface Devices with Local Language support ............................8 4.3. Data Entry Applications – in Multiple Languages ....................................9 4.4. e-Learning .............................................................................................9 4.5. Business Applications ..........................................................................10 5. Key Players .........................................................................................11 6. Challenges Ahead...............................................................................11 7. Conclusion..........................................................................................12 8. References ..........................................................................................13
  • 3. 1. Abstract With the current pace of globalization, any Industry needs to look beyond Geographical borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also invest a lot on foreign language training programs. An Application that provides instant translation will not only cut down these costs but will also help gathering requirements more precisely and in a short span of time. Instant speech translation [IST] finds wide applications in other industries as well. Say in a country like India where numerous vernacular languages are in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST applications in mobile phones. All major players such as Google, Microsoft, and IBM have already come up with some sort of prototype for these kind of applications. Google Translator is one such primitive example. A lot many such applications will be in our gadgets soon. This Paper elaborates on few such applications and their business potential. 2. Instant Speech Translation – Eliminating Language Barriers Internet and mobile services has reached even remote villages. Now rural markets are considered significant in countries like China and India. Breaking Language barriers will further open up these markets for international business. Knowledge anywhere in any form should be used for the growth of the humanity. We should create opportunities for those who want to learn and share knowledge using their own native languages. Instant Speech translation will create a platform for them. This could unravel many things that are not known to the world. In “The Hitchhiker’s Guide to the Galaxy” Babel fish, a fictitious animal performs instant translations when kept in the ear. If such an application is there on the mobile, Say I call a person in Japan, I speak to him in English which would be translated to Japanese by the application and then transmitted through a telecom service provider. This will eliminate language boundaries and create a truly connected world. 3. System Requirements “We think speech-to-speech translation should be possible and work reasonably well in a few years’ time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation and high-accuracy voice recognition, and that’s what we’re working on .If you look at the progress in machine translation and corresponding advances in voice recognition, there has been huge progress recently.” - Franz Och, Google’s head of translation services To develop an Instant speech Translation application, we need a robust speech recognition and Machine translation system. Following figure depicts the basic blocks of an instant speech translation system.
  • 4. Fig. Basic Functional Blocks of Instant Speech Translation 3.1. Speech Recognition Advances in speech-recognition and dictation technology have made stunning leaps forward in recent years although it isn't perfect. Word Error Rate (WER) has drastically come down in the recent past. Fig. Word Error Rate of Speech Recognition Systems over Years Source - http://cacm.acm.org/ Communications of the ACM
  • 5. Speech recognition has achieved good usability and there is a sudden surge in the speech controlled devices. Even Microsoft Vista had speech recognition capabilities which turned out to be a failure. But we had witnessed basic commands working in it. Just a listening and guessing system is not going to this forward. Robust speech recognition technology is an crucial part of Instant speech translation. Main problem systems face is in understanding the nuance of user’s enunciation and voice patterns. When used over a period of time it could reduce the speech recognition error rate. Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only one user mostly and even users can’t avoid mobile phone usage. Mobiles can also soon recognise user’s natural free-style speech. Speech recognition systems can be customized to a particular user by having a predefined set of commands or words to be uttered by the user. This could help the system recognize its master’s voice patterns. This could be done with the help of a professional in early stages of development for this sort of customization. 3.2. Language Parsing Human sentences can’t be easily parsed by programs as they parse mathematical expressions. There is substantial ambiguity associated with the structure of human language. Some sort of linguistic analysis needs to be done to fetch the relevant information. Language parser splits the raw text into understandable word units and selects the correct form and class for each word that can have more than one interpretation and identifies the head words of a sentence. The information that is analysed by the language parser is passed to the machine translation engine for further tasks. There should be some set of protocols defined for communication between different languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide parsed language stream that can be easily interpreted by translators. 3.3. Translation Machine Translator translates a parsed input language stream to a well defined output language stream. Translation done by Machine translator will abide by the set of protocols defined for communication between a set of languages.
  • 6. Fig. Machine Translation 4. Applications and their Business Potential IST applications have great business potential. Various players are almost set ready to roll out these services in various types of gadgets. 4.1. Mobile Applications and Services IST as a service: Instant Speech translation will have a lot many applications on mobile. It is highly impossible for an IST service provider to cover all languages and various colloquial forms in them. Hence the service provider can expose certain Application Programming Interfaces (APIs) so that interested third parties can develop and sell them back to the IST service provider. This will become a viable business model once regional language enthusiasts start involving in this. IST service provider can bill the users based on usage. This sort of services can be launched in collaboration with the telecomm service provider.
  • 7. Fig. A Model of IST Services on mobile IST as a product: Even these services can be packaged into a product. But this will be a heavy application to support an almost perfect translation. So in the initial stages user preferred language packs can be packed into a product and sold to the user. Fig. Users interacting through an IST application on mobile Service model will suit Indian languages and Product model will suit for international languages like Japanese. Service model will facilitate wide spread of these applications and it will also bring in various players into it.
  • 8. Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few basic stuffs are already available in App store for e.g. Jibbigo Voice Translation Fig. Screenshot of Jibbigo Application on iPod IST Development Standards To facilitate easy development and learning some set of standards need to be established similar to HTML in web design. As XML and JSON for machine readable data sharing, VOXML (Voice XML) can be used for these types of applications. 4.2. Voice Interface Devices with Local Language support Voice interface devices that support Local languages will soon be on use. Say a localities interacting with a railway information kiosk with their local language through speech. Instant speech translation will play a vital role in these types of interfaces. IST Applications can be at the front end of such devices. This will also consume lesser query resolving time as compared to traditional key entry enquiry system. As most of the voice driven applications currently support English. Even same is the case with Windows 7 Operating System. IST Application when used at the front end can translate local language speech input to English which can be further processed by Speech recognition systems supported by various Operating Systems.
  • 9. Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support through speech 4.3. Data Entry Applications – in Multiple Languages IST Applications can help in Data entry applications in multiple languages. This could assist in translating legal documents to various languages. We have witnessed many court proceedings getting delayed due to lack of documents in regional languages. Our Governm ent also invests a lot in translating various documents to regional languages. In the years to come Microsoft word will have options to view translated versions while typing. This could cut down costs and time involved in such activities. 4.4. e-Learning Advancement in computing and bandwidth has brought the benefits of traditional classroom education into a distance learning environment. IST will take this a step forward by removing language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema of an e-classroom that uses IST. Local Language Speech input IST Applications Command / Query Generator Normal Processing done in a Railway Information Kiosk English
  • 10. Fig. IST Applications supporting Distance Learning in Various Languages Even IST applications could be used in webcasting in a similar way. 4.5. Business Applications IST Applications could also assist Business enterprises to interact with customers located across different geographies. IST will help in understanding customer requirements in short span of time. Users’ contribution to IST applications is very crucial. They can provide suggestions to improvise the translation provided by the application. Some credits can be given to regular users who provide valuable suggestions. This will encourage local participation, which would ultimately help in improving the quality of service provided by IST applications. Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such applications in future when IST applications are usable in real time. Then IST applications could be expanded to lot many sensitive areas like Health care, defence etc.
  • 11. 5. Key Players Google was the first company to announce that it was working on speech-to-speech translation for mobile phones. The Latest Apps from Google Android that supports translation is Babylon that will give dictionary results in 75 different languages as well as full text translations in over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for iPhones. IBM and Apple are already working closely on a few applications that will run on iPhone and iPad. IBM has been working on translation software and machine translation for years. In fact, they created MASTOR and the SMT (Statistical Machine Translation) technology that many other Translating Applications are using. Microsoft has inbuilt speech recognition support in its Operating systems. It has recently demonstrated German-English translation of a conversation between two Microsoft employees. It has made no official announcements on projects pertaining to Instant Speech Translation. Videos of Instant Speech Translation applications by other major players like AT&T, NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known speech translation systems developed by these players in this field. Extensive Research Projects are going on to improve the usability of Speech translation systems. PDA manufacturers could work in collaboration with these Application developers to accelerate these projects, which would also help them in gaining an upper hand over their competitors. 6. Challenges Ahead System that works well in real time environment will only be successful in the long run. Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech Recognition with high accuracy. It is heavily dependent upon the quality of the input speech. Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy. In a real time user is not going to use IST applications in a noise free environment. Hence IST application should be intelligent enough to separate out the user’s voice form the noise in the environment. IST applications are also expected to be intelligent enough to capture the user’s mood in the future. Monotonous voice from an IST application will soon make the user bored with these applications. Even a customisable voice from the IST application will make them more expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human voice.
  • 12. Industry should work in collaboration with research communities in resolving these hurdles and achieve a human like performance. 7. Conclusion Speech/Text Translation Applications are being used in variety of forms in number of devices. To attain humanlike performance, we must continue to invest in research. Along with speech, other sensory user inputs can also be integrated with IST applications to attain humanlike performance. Once that is achieved Instant speech translation will soon spread to devices like T.V. It wouldn’t be a surprise if text in the web now gets replaced by audio and video in the future “glocal” world.
  • 13. 8. References 1. “Enhancing Global and Synchronous Distance Learning and Teaching by Using Instant Transcript and Translation” By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo Yana Hosei. University Research Institute, California. 2. http://mashable.com/2010/02/08/speech-to-speech/ 3. http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.html 4. http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article701783 1.ece 5. http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech- translator/ 6. http://www.jibbigo.com/website/index.php 7. http://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognition