2. Main Agenda
• Introduction to SMT.
• Tools.
• Popular Machine Translation Systems.
• Machine Translation Projects in India.
• Machine Translation Tools and Punjabi
Language.
• Conclusion and future work.
• References.
3. Introduction
• Part of Corpus based Machine Translation.
• System consists of 3 components:
– Language Model (LM).
– Translation Model (TM).
– Decoder.
5. Language Model (LM)
• Gives probability of single word given all
words of the sentence.
• N-gram model.
• P(s)=P(w1,w2,w3,……….,wn)
=P(w1)P(w2/w1)P(w3/w1.w2)P(w4/w1w2w3)
……..
P(wn/w1w2w3w……wn-1).
6. Translation Model (TM)
• Computes conditional probability P (T|S).
• Break the process into smaller units (words,
phrases..)
• Here T:Target Language, S:Source language.
• For Example, (aUH baag wYWch s/UN gaYI|
she slept in garden).
7. Decoder
• Search for a sentence T is performed that
maximizes P(S|T) i.e.
– Pr (S, T) = argmax P(T) P (S|T).
• Start with null hypothesis, i.e. sequence starts
with sequence of sentences.
8. Main Agenda
• Introduction to SMT.
• Tools for SMT.
• Popular Machine Translation Systems.
• Machine Translation Projects in India.
• Machine Translation Tools and Punjabi
Language.
• Conclusion and future work.
• References.
10. LM Tools
• CMU Statistical Language Modeling (SLM)
Toolkit.
– Set of unix software tools.
– Written by Roni Rosenfeld.
• SRILM
– Developed by SRI Speech Technology and research
laboratory.
– Applying Language Models.
14. TM Tools
• GIZA++
– Implements different models like HMM.
– Performs word alignment.
• MGIZA++
– Multi-threaded word alignment
– Memory optimization.
15. This is the t3 final:-
First column: ids of source words
Second column:ids of target words.
Third column: Probability of alignment words.
16. Decoder Tools
• Moses
– Automatic training of translation models for any
language pair.
– Works with SRILM and GIZA++.
• ISI Rewriter Decoder
– Performs searching in development of SMT.
– Works with CMU-Statistical Language Modeling
toolkit and GIZA++.
18. Main Agenda
• Introduction to SMT.
• Tools.
• Popular Machine Translation Systems.
• Machine Translation Projects in India.
• Machine Translation Tools and Punjabi
Language.
• Conclusion and future work.
• References.
19. Machine Translation Project in
India
• Anglabharat and Anubharati
• Anusaaraka
• MaTra
• Mantra
• UCSG-based English-Kannada MT
• UNL based MT between English, Hindi and
Marathi
• Tamil-Hindi Anusaarka and English-Tamil MT
• English-Hindi SMT.
20. Machine Translation Tools and
Punjabi Language
• Punjabi University.
– On-line Hindi-Punjabi & Punjabi-Hindi
Machine Translation.
• Thapar University.
– Punjabi language server which includes
Punjabi-UNL Encoverter and UNL-Punjabi
Encoverter.
21. Conclusion and Future Work
•There are applications supporting regional language translation.
•Future research directions in tree-tostring alignment template,clause based
restructuring.
•Combination of various MT techniques leading to efficient translation.
22. References
[01]. Adam Lopez, “Statistical Machine Translation”, ACM Computing Surveys, Vol. 40, No. 3, Article 8, Aug 2008.
[02]. Durgesh Rao; ―Machine Translation in India: A Brief Survey.
[03]. Franz Josef Och., ―GIZA++: Training of statistical translation models available at:‖ http://fjoch.com/GIZA++.html
accessed on 26/03/2010.
[04]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.
[05]. Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010.
[06] Gurpreet Singh Lehal, ―A Survey of the State of the Art in Punjabi Language Processing , Language in India, oct‖
2009.
[07] Hindi to Punjabi Translation system available at http://h2p.learnpunjabi.org accessed on 03/04/2010
[08] ISI ReWrite Decoder User's Manual, Version 0.2, available at
http://www.isi.edu/~germann/software/ReWrite-Decoder/isi-decoder-manual.html accessed on 12/03.2010
[09] Jamie G. Carbonell, Teruko Mitamurs, Eric H. Nyberg, ―The KANT Perspective: A Critique of Pur Transfer (and Pure
Interlingua, Pure Statistic,….)
[10] Jayprasad J Hegde, Ananthakrishnan R, Kavitha M, Chandra Shekhar, Ritesh Shah, Sawani Bade, Sasikumar M,
―MaTra: A Practical Approach to Fully- Automatic Indicative English-Hindi Machine Translation.
[11] Jean Senellart, Péter Dienes, Tamás Váradi, ―New Generation Systran Translation System, MT Summit VIII, Sept
2001.
23. References(Cont.)
[12] On line Translation System available at:
www.translate.google.com accessed on 03/04/2010.
[13] Online manual of CMU Statistical Language Modeling Toolkit
available at:
http://mi.eng.cam.ac.uk/~prc14/toolkit_documentation.html
accessed on 15/03/2010.
[14] P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer ―The
mathematics of statistical machine translation: parameter
estimation. Computational Linguistics, 19(2), 263-311. (1993).
[15] Parteek Bhatia, Sandeep Singh, ―Punjabi Deconverter
Architecture , National Seminar on Creation of Lexical Resources‖
for Indian Language Computing and Processing, CDAC Mumbai,
March 26-28, 2007