SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Assamese- ENGLISH Statistical Machine Translation
Using Moses
PRESENTED BY
KALYANEE KANCHAN BARUAH
AND
PRANJAL DAS
CONTENTS
• INTRODUCTION
• LITERATURE REVIEW
• IMPLEMENTATION
• TRANSLITERATION IN TRANSLATION
• EVALUATION
• CONCLUSION AND FURURE WORK
• REFERENCES
INTRODUCTION
What is Natural Language Processing ?
• Natural Language Processing (NLP) is the ability
of a computer program to understand human
speech as it is spoken.
• NLP automates the translation between
computers and humans.
WHAT IS MACHINE
TRANSLATION
• Machine translation (MT) is automated
translation. It is the process by which computer
software is used to translate a text from one
natural language (such as Assamese) to
another (such as English).
WHAT IS MACHINE
TRANSLATION
• The ideal aim of machine translation systems
is to produce the best possible translation
without human assistance. Basically every
machine translation system requires programs
for translation and automated dictionaries
and grammars to support translation.
ADVANTAGES OF MACHINE
TRANSLATION
• Quick Translation
• Low price
• Confidentiality
• Online translation and translation of web page
content
• Overcomes technological barriers
PROBLEMS IN MACHINE
TRANSLATION
• Translation is not straightforward
• Word order
• Word sense
• Idioms
TYPES OF MACHINE
TRANSLATION
• BILINGUAL
– MT systems that produce translations between any
two particular languages.
• MULTILINGUAL
– MT systems that produce translations for any
given pair of languages.
– They are preferred to bi-directional and bi-lingual
as they have ability to translate from any given
language to any other given language and vice
versa
SOME EXISTING MT SYSTEMS
• Google Translate
• Systran
• Bing Translator
• Bable Fish
• Apertium
SOME MAJOR MT PROECTS IN
INDIA
• Anglabharat (and Anubharati)
• Anusaaraka
• MaTra
• UCSG-based English-Kannada MT
• Tamil-Hindi Anusaaraka and English-Tamil
MT
• Anuvadak English-Hindi software
• Sampark
MACHINE TRANSLATION
APPROACHES
STATISTICAL MACHINE
TRANSLATION
• Enables us to automatically build machine
translation systems using statistical models
trained by text data.
• Every sentence in a language has a possible
translation in another language.
STATISTICAL MACHINE
TRANSLATION
LANGUAGE MODEL
• Gives the probability of a sentence
• Uses n-gram model
• IRSTLM is used to develop the Language Model
The probability of sentence P (S), is broken down as
the probability of individual words P(w).
P(s) = P(w1, w2, w3,....., wn)
=P(w1) P(w2|w1) P(w3,|w1w2 ) P(w4|w1w2w3)...P(wn|w1w2...wn-1)
LANGUAGE MODEL
Suppose for a large amount of corpus we have the following bigram
probabilities
.001Eat British.03Eat today
.007Eat dessert.04Eat Indian
.01Eat tomorrow.04Eat a
.02Eat Mexican.04Eat at
.02Eat Chinese.05Eat dinner
.02Eat in.06Eat lunch
.03Eat breakfast.06Eat some
.03Eat Thai.16Eat on
LANGUAGE MODEL
.01British lunch.05Want a
.01British cuisine.65Want to
.15British restaurant.04I have
.60British food.08I don’t
.02To be.29I would
.09To spend.32I want
.14To have.02<start> I’m
.26To eat.04<start> Tell
.01Want Thai.06<start> I’d
.04Want some.25<start> I
LANGUAGE MODEL
Then, the probability of a sentence “I want to eat British food” is
P(I want to eat British food)
= P(I|<start>) P(want | I) P(to | want) P(eat | to) P(British | eat) P(food |
British)
= .25*.32*.65*.26*.001*.60 = .000080
TRANSLATION MODEL
• Computes the probability of source sentence ‘S’, for a
given target sentence ‘T’ i.e. P(S|T).
• May be done word based or phrase based.
• Output of TM is fed into the Moses decoder.
• Giza++ along with mkcls is used to develop Translation
Model.
TRANSLATION MODEL
Example :
জয়পুৰ ৰাজস্থানৰ এখন বিখযাত চহৰ
Jaipur is a famous city of Rajasthan
DECODER
• Maximizes the probability of the translated text
• Search for sentence T is performed that maximizes
P (S|T) i.e.
Pr (S, T) = argmax P (S|T) P (T)
DECODING
ALGORITHM
TRANSLATION
MODEL
LANGUAGE
MODEL
ARCHITECTURE OF OUR SMT
HOW OUR SMT WORKS
IMPLEMENTATION
 Install all packages in Moses
• Install Giza++
• Install IRSTLM
Training
Tuning
Generate output (decoding)
TRAINING THE MOSES
DECODER
Prepare data
Run Giza++
Get lexical translation table
Build lexicalized reordering
model
Create configuration fileBuild generation models.
Align words
Extract phrases
PREPARING THE DATA
 Tokenising - inserting spaces between words and
punctuation.
 Truecasing - setting the case of the first word in each
sentence.
 Cleaning - removing empty lines, redundant spaces,
and lines that are too short or too long.
EXAMPLE PARALLEL DATA
ass-eng1.as ass-eng1.en
বিকাণেবি ভূ বিয়া আিু বিঠাই হৈণে
বিকাণেিত বকবিি পিা অবত উত্তি
সািগ্ৰীসিূৈি বভতিত বকেুিাি।
The famous Bikaneri Bhujias and sweets
are some of the best items to purchase in
Bikaner.
ভািতিৰ্ষি গ ালপীয়া চৈি িাণি খ্যাত
িয়পুি, িািস্থাি িািযি িািধািী।
Jaipur, popularly known as the Pink City,
is the capital of Rajasthan state, India.
অম্বি গপণলচটণটা হৈণে গিা ল আিু বৈন্দু
স্থাপতয বিদ্যাি আদ্ৰ্ষ উদ্াৈিে।
The Amber Palace is a classic example of
Mughal and Hindu architecture.
কিক িৃন্দািি হৈণে িয়পুিি এখ্ি িিবিয়
িিণভাি স্থাি।
Kanak Vrindavan is a popular picnic spot
in Jaipur.
িয়পুি িািষলি িূবত্তষ, িীলা কলৈ আিু
িািস্থািী গিাতাি িাণিও বিখ্যাত।
Jaipur is also famous for marble statues,
blue pottery and the Rajasthani shoes
SAMPLE OUTPUTS
Input Assamese Sentence Output English sentence
জয়পুৰ ৰাজস্থানৰ এখন বিখযাত চহৰ । Jaipur is a famous city of Rajasthan .
তাজমহল আগ্ৰাত অৱবস্থত । the Taj Mahal , is located in the heart of the
Agra .
জামা মছবজদ শ্বাহজাহানন বনমমান কবৰবছল। Jama Masjid built by Shahjahan .
অন্ধ্ৰপ্ৰনদশ ভাৰতৰ এখন অনযতম ৰাজযৰ বভতৰত এক। Andhra Pradesh is one of the state of one of
India .
গুৱাহাটী অসমৰ ৰাজধানী। Guwahati is connected by the capital of the
State .
আগ্ৰা এখন বিখযাত চহৰ Agra is the one of the famous city
বদল্লী ভাৰতৰ ৰাজধানী। Delhi is the capital of India .
PROBLEMS WITH PROPER
NOUNS
Input Assamese Sentence Output English sentence
কানাদা এখন বিশাল দদশ । কানাদা is a vast country .
মুলতান চহৰখন ৰাজস্থানৰ পৰা ৯৯৯ বক.বম. দুৰত্বত অৱবস্থত। মুলতান from the city is located at a distance of
৯৯৯ of Rajasthan .
পানাবজ দ াৱাৰ ৰাজধানী । the capital of Goa , পানাবজ|
TRANSLITERATION IN
TRANSLATION
 Transliteration
– Transcription from one alphabet to another
 Some proper nouns which are not in our corpus
are not translated.
 For example: Translating “কানাদা এখন বিশাল দদশ”
gives
“কানাদা is a vast country.”
 Because ‘কানাদা’ is not in our corpus.
TRANSLITERATION IN
TRANSLATION
 Store each Assamese alphabet and their English transliteration in a
perl script
For example: ক -> k
খ্ -> kh
-> g
 Used this perl script and run with moses using the following
command
echo ‘কানাদা এখন বিশাল দদশ’ | ~/mymoses/bin/moses –f ~/work/mert-
work/moses.ini | ./transliterate.pl
 Output : kanada is a vast country .
IMPLEMENTING
TRANSLITERATION
INPUT ASSAMESE
SENTENCE
OUTPUT BEFORE
TRANSLITERATION
OUTPUT AFTER
TRANSLITERATION
কানাদা এখন বিশাল দদশ কানাদা is a vast country . kanada is a vast country .
মুলতান চহৰখন ৰাজস্থানৰ পৰা ৯৯৯
বক.বম. দুৰত্বত অৱবস্থত।
মুলতান from the city is located
at a distance of ৯৯৯ of
Rajasthan .
multan from the city is
located at a distance of 999
of Rajasthan .
পানাবজ দ াৱাৰ ৰাজধানী । the capital of Goa , পানাবজ| the capital of Goa , panaji .
EVALUATION OF BLEU SCORE
Source/Target Bleu Score 1/2/3/4-gram
precision
Assamese – English 7.02 30.5/8.5/4.1/2.3
CONCLUSION AND FUTURE
WORK
• The SMT is a part of corpus based MT system which
requires parallel corpus before undertaking translation.
• A parallel corpus of about 2500 Assamese and English
sentences was used to train the system.
• The SMT system developed accepts Assamese sentences
as input and generates corresponding translation in
Assamese.
• The results shows that significant improvements can be
made by increasing the amount of parallel corpus.
CONCLUSION AND FUTURE
WORK
• In the future, we will try to include the Transliteration in
our system.
• We will try to increase the volume of our corpus, such
that we get a much better translation system.
• We will also try to implement the translation process
without using the Moses toolkit
REFERENCES
• “Machine Translation”, [Online]. Available:
http://en.wikipedia.org/wiki/Machine_translation
• “Statistical Machine Translation” , [Online]. Available:
http://en.wikipedia.org/wiki/Statistical_machine_translation
• “Problems in Machine Translation system”, [Online]. Available:
http://languagedirect.org/machine-translation/
• “Machine Translation”, [Online]. Available:
http://faculty.ksu.edu.sa/homiedan/Publications/Machine%20Translation.pdf
• D. D. Rao, “Machine Translation A Gentle Introduction”, RESONANCE, July 1998.
• S.K. Dwivedi and P. P. Sukadeve, “Machine Translation System Indian Perspectives”,
Proceeding of Journal of Computer Science Vol. 6 No. 10. pp 1082-1087, May 2010.
REFERENCES
• P. F. Brown, S. De. Pietra, V. D. Pietra and R. Mercer, “The mathematics of statistical
machine translation: parameter estimation”. “Journal Computational Linguistics”, vol.
10, no.2, June 1993
• “ Natural Language Processing” , [Online]. Available:
http://www.techopedia.com/definition/653/natural-language-processing-nlp
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 

Was ist angesagt? (18)

A Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration SystemA Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration System
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
 
Pbsmt presenation waleed_oransa_29_april2010
Pbsmt presenation waleed_oransa_29_april2010Pbsmt presenation waleed_oransa_29_april2010
Pbsmt presenation waleed_oransa_29_april2010
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
part of speech tagger for ARABIC TEXT
part of speech tagger for ARABIC TEXTpart of speech tagger for ARABIC TEXT
part of speech tagger for ARABIC TEXT
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
 
Tamil Morphological Analysis
Tamil Morphological AnalysisTamil Morphological Analysis
Tamil Morphological Analysis
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Introduction To Translation Technologies
Introduction To Translation TechnologiesIntroduction To Translation Technologies
Introduction To Translation Technologies
 

Andere mochten auch

Localization and globalization in c#
Localization and globalization in c#Localization and globalization in c#
Localization and globalization in c#
PaYal Umraliya
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
vini89
 

Andere mochten auch (20)

Machine Tanslation
Machine TanslationMachine Tanslation
Machine Tanslation
 
Escaping style and script data
Escaping style and script dataEscaping style and script data
Escaping style and script data
 
Sec16.3: Reordering Integration
Sec16.3: Reordering IntegrationSec16.3: Reordering Integration
Sec16.3: Reordering Integration
 
Designing e-Learning Content for Localization
Designing e-Learning Content for LocalizationDesigning e-Learning Content for Localization
Designing e-Learning Content for Localization
 
7. ebmt based on st sm
7. ebmt based on st sm7. ebmt based on st sm
7. ebmt based on st sm
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationSummary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine Translation
 
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
Towards OpenLogos Hybrid Machine Translation - Anabela BarreiroTowards OpenLogos Hybrid Machine Translation - Anabela Barreiro
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translation
 
Data Localization and Translation
Data Localization and TranslationData Localization and Translation
Data Localization and Translation
 
Going Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly ContentGoing Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly Content
 
Translation & Localization
Translation & LocalizationTranslation & Localization
Translation & Localization
 
Statistical machine translation in a few slides
Statistical machine translation in a few slidesStatistical machine translation in a few slides
Statistical machine translation in a few slides
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
 
WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013
 
TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013
 
Localization and globalization in c#
Localization and globalization in c#Localization and globalization in c#
Localization and globalization in c#
 
Localization framework
Localization frameworkLocalization framework
Localization framework
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 

Ähnlich wie Assamese to English Statistical Machine Translation

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
LinkedIn
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
LinkedIn
 

Ähnlich wie Assamese to English Statistical Machine Translation (20)

Text Representations for Deep learning
Text Representations for Deep learningText Representations for Deep learning
Text Representations for Deep learning
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Add more Speech API to your bot
Add more Speech API to your botAdd more Speech API to your bot
Add more Speech API to your bot
 
An Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation SystemAn Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation System
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
 
How to Translate from English to Khmer using Moses
How to Translate from English to Khmer using MosesHow to Translate from English to Khmer using Moses
How to Translate from English to Khmer using Moses
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
Intern presentation
Intern presentationIntern presentation
Intern presentation
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur RahmanProgress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
 
“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...“Neural Machine Translation for low resource languages: Use case anglais - wo...
“Neural Machine Translation for low resource languages: Use case anglais - wo...
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
Mtvectorspace 161101214722
Mtvectorspace 161101214722Mtvectorspace 161101214722
Mtvectorspace 161101214722
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Language translation system p
Language translation system pLanguage translation system p
Language translation system p
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

Assamese to English Statistical Machine Translation

  • 1. Assamese- ENGLISH Statistical Machine Translation Using Moses PRESENTED BY KALYANEE KANCHAN BARUAH AND PRANJAL DAS
  • 2. CONTENTS • INTRODUCTION • LITERATURE REVIEW • IMPLEMENTATION • TRANSLITERATION IN TRANSLATION • EVALUATION • CONCLUSION AND FURURE WORK • REFERENCES
  • 3. INTRODUCTION What is Natural Language Processing ? • Natural Language Processing (NLP) is the ability of a computer program to understand human speech as it is spoken. • NLP automates the translation between computers and humans.
  • 4. WHAT IS MACHINE TRANSLATION • Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as Assamese) to another (such as English).
  • 5. WHAT IS MACHINE TRANSLATION • The ideal aim of machine translation systems is to produce the best possible translation without human assistance. Basically every machine translation system requires programs for translation and automated dictionaries and grammars to support translation.
  • 6. ADVANTAGES OF MACHINE TRANSLATION • Quick Translation • Low price • Confidentiality • Online translation and translation of web page content • Overcomes technological barriers
  • 7. PROBLEMS IN MACHINE TRANSLATION • Translation is not straightforward • Word order • Word sense • Idioms
  • 8. TYPES OF MACHINE TRANSLATION • BILINGUAL – MT systems that produce translations between any two particular languages. • MULTILINGUAL – MT systems that produce translations for any given pair of languages. – They are preferred to bi-directional and bi-lingual as they have ability to translate from any given language to any other given language and vice versa
  • 9. SOME EXISTING MT SYSTEMS • Google Translate • Systran • Bing Translator • Bable Fish • Apertium
  • 10. SOME MAJOR MT PROECTS IN INDIA • Anglabharat (and Anubharati) • Anusaaraka • MaTra • UCSG-based English-Kannada MT • Tamil-Hindi Anusaaraka and English-Tamil MT • Anuvadak English-Hindi software • Sampark
  • 12. STATISTICAL MACHINE TRANSLATION • Enables us to automatically build machine translation systems using statistical models trained by text data. • Every sentence in a language has a possible translation in another language.
  • 14. LANGUAGE MODEL • Gives the probability of a sentence • Uses n-gram model • IRSTLM is used to develop the Language Model The probability of sentence P (S), is broken down as the probability of individual words P(w). P(s) = P(w1, w2, w3,....., wn) =P(w1) P(w2|w1) P(w3,|w1w2 ) P(w4|w1w2w3)...P(wn|w1w2...wn-1)
  • 15. LANGUAGE MODEL Suppose for a large amount of corpus we have the following bigram probabilities .001Eat British.03Eat today .007Eat dessert.04Eat Indian .01Eat tomorrow.04Eat a .02Eat Mexican.04Eat at .02Eat Chinese.05Eat dinner .02Eat in.06Eat lunch .03Eat breakfast.06Eat some .03Eat Thai.16Eat on
  • 16. LANGUAGE MODEL .01British lunch.05Want a .01British cuisine.65Want to .15British restaurant.04I have .60British food.08I don’t .02To be.29I would .09To spend.32I want .14To have.02<start> I’m .26To eat.04<start> Tell .01Want Thai.06<start> I’d .04Want some.25<start> I
  • 17. LANGUAGE MODEL Then, the probability of a sentence “I want to eat British food” is P(I want to eat British food) = P(I|<start>) P(want | I) P(to | want) P(eat | to) P(British | eat) P(food | British) = .25*.32*.65*.26*.001*.60 = .000080
  • 18. TRANSLATION MODEL • Computes the probability of source sentence ‘S’, for a given target sentence ‘T’ i.e. P(S|T). • May be done word based or phrase based. • Output of TM is fed into the Moses decoder. • Giza++ along with mkcls is used to develop Translation Model.
  • 19. TRANSLATION MODEL Example : জয়পুৰ ৰাজস্থানৰ এখন বিখযাত চহৰ Jaipur is a famous city of Rajasthan
  • 20. DECODER • Maximizes the probability of the translated text • Search for sentence T is performed that maximizes P (S|T) i.e. Pr (S, T) = argmax P (S|T) P (T) DECODING ALGORITHM TRANSLATION MODEL LANGUAGE MODEL
  • 22. HOW OUR SMT WORKS
  • 23. IMPLEMENTATION  Install all packages in Moses • Install Giza++ • Install IRSTLM Training Tuning Generate output (decoding)
  • 24. TRAINING THE MOSES DECODER Prepare data Run Giza++ Get lexical translation table Build lexicalized reordering model Create configuration fileBuild generation models. Align words Extract phrases
  • 25. PREPARING THE DATA  Tokenising - inserting spaces between words and punctuation.  Truecasing - setting the case of the first word in each sentence.  Cleaning - removing empty lines, redundant spaces, and lines that are too short or too long.
  • 26. EXAMPLE PARALLEL DATA ass-eng1.as ass-eng1.en বিকাণেবি ভূ বিয়া আিু বিঠাই হৈণে বিকাণেিত বকবিি পিা অবত উত্তি সািগ্ৰীসিূৈি বভতিত বকেুিাি। The famous Bikaneri Bhujias and sweets are some of the best items to purchase in Bikaner. ভািতিৰ্ষি গ ালপীয়া চৈি িাণি খ্যাত িয়পুি, িািস্থাি িািযি িািধািী। Jaipur, popularly known as the Pink City, is the capital of Rajasthan state, India. অম্বি গপণলচটণটা হৈণে গিা ল আিু বৈন্দু স্থাপতয বিদ্যাি আদ্ৰ্ষ উদ্াৈিে। The Amber Palace is a classic example of Mughal and Hindu architecture. কিক িৃন্দািি হৈণে িয়পুিি এখ্ি িিবিয় িিণভাি স্থাি। Kanak Vrindavan is a popular picnic spot in Jaipur. িয়পুি িািষলি িূবত্তষ, িীলা কলৈ আিু িািস্থািী গিাতাি িাণিও বিখ্যাত। Jaipur is also famous for marble statues, blue pottery and the Rajasthani shoes
  • 27. SAMPLE OUTPUTS Input Assamese Sentence Output English sentence জয়পুৰ ৰাজস্থানৰ এখন বিখযাত চহৰ । Jaipur is a famous city of Rajasthan . তাজমহল আগ্ৰাত অৱবস্থত । the Taj Mahal , is located in the heart of the Agra . জামা মছবজদ শ্বাহজাহানন বনমমান কবৰবছল। Jama Masjid built by Shahjahan . অন্ধ্ৰপ্ৰনদশ ভাৰতৰ এখন অনযতম ৰাজযৰ বভতৰত এক। Andhra Pradesh is one of the state of one of India . গুৱাহাটী অসমৰ ৰাজধানী। Guwahati is connected by the capital of the State . আগ্ৰা এখন বিখযাত চহৰ Agra is the one of the famous city বদল্লী ভাৰতৰ ৰাজধানী। Delhi is the capital of India .
  • 28. PROBLEMS WITH PROPER NOUNS Input Assamese Sentence Output English sentence কানাদা এখন বিশাল দদশ । কানাদা is a vast country . মুলতান চহৰখন ৰাজস্থানৰ পৰা ৯৯৯ বক.বম. দুৰত্বত অৱবস্থত। মুলতান from the city is located at a distance of ৯৯৯ of Rajasthan . পানাবজ দ াৱাৰ ৰাজধানী । the capital of Goa , পানাবজ|
  • 29. TRANSLITERATION IN TRANSLATION  Transliteration – Transcription from one alphabet to another  Some proper nouns which are not in our corpus are not translated.  For example: Translating “কানাদা এখন বিশাল দদশ” gives “কানাদা is a vast country.”  Because ‘কানাদা’ is not in our corpus.
  • 30. TRANSLITERATION IN TRANSLATION  Store each Assamese alphabet and their English transliteration in a perl script For example: ক -> k খ্ -> kh -> g  Used this perl script and run with moses using the following command echo ‘কানাদা এখন বিশাল দদশ’ | ~/mymoses/bin/moses –f ~/work/mert- work/moses.ini | ./transliterate.pl  Output : kanada is a vast country .
  • 31. IMPLEMENTING TRANSLITERATION INPUT ASSAMESE SENTENCE OUTPUT BEFORE TRANSLITERATION OUTPUT AFTER TRANSLITERATION কানাদা এখন বিশাল দদশ কানাদা is a vast country . kanada is a vast country . মুলতান চহৰখন ৰাজস্থানৰ পৰা ৯৯৯ বক.বম. দুৰত্বত অৱবস্থত। মুলতান from the city is located at a distance of ৯৯৯ of Rajasthan . multan from the city is located at a distance of 999 of Rajasthan . পানাবজ দ াৱাৰ ৰাজধানী । the capital of Goa , পানাবজ| the capital of Goa , panaji .
  • 32. EVALUATION OF BLEU SCORE Source/Target Bleu Score 1/2/3/4-gram precision Assamese – English 7.02 30.5/8.5/4.1/2.3
  • 33. CONCLUSION AND FUTURE WORK • The SMT is a part of corpus based MT system which requires parallel corpus before undertaking translation. • A parallel corpus of about 2500 Assamese and English sentences was used to train the system. • The SMT system developed accepts Assamese sentences as input and generates corresponding translation in Assamese. • The results shows that significant improvements can be made by increasing the amount of parallel corpus.
  • 34. CONCLUSION AND FUTURE WORK • In the future, we will try to include the Transliteration in our system. • We will try to increase the volume of our corpus, such that we get a much better translation system. • We will also try to implement the translation process without using the Moses toolkit
  • 35. REFERENCES • “Machine Translation”, [Online]. Available: http://en.wikipedia.org/wiki/Machine_translation • “Statistical Machine Translation” , [Online]. Available: http://en.wikipedia.org/wiki/Statistical_machine_translation • “Problems in Machine Translation system”, [Online]. Available: http://languagedirect.org/machine-translation/ • “Machine Translation”, [Online]. Available: http://faculty.ksu.edu.sa/homiedan/Publications/Machine%20Translation.pdf • D. D. Rao, “Machine Translation A Gentle Introduction”, RESONANCE, July 1998. • S.K. Dwivedi and P. P. Sukadeve, “Machine Translation System Indian Perspectives”, Proceeding of Journal of Computer Science Vol. 6 No. 10. pp 1082-1087, May 2010.
  • 36. REFERENCES • P. F. Brown, S. De. Pietra, V. D. Pietra and R. Mercer, “The mathematics of statistical machine translation: parameter estimation”. “Journal Computational Linguistics”, vol. 10, no.2, June 1993 • “ Natural Language Processing” , [Online]. Available: http://www.techopedia.com/definition/653/natural-language-processing-nlp