2. Whatis the machine translation??? Machine translation is the study of designingsystemsthat translate from one humanlanguage in to another. Machine translation system essentiallytakes a text in one language (called the source language), and translate itintoanotherlanguage(calledtargetlanguage). The source and targetlanguage are naturallanguagessuch as english and hindi. 2
3. Contd…….. This is the hard problem, sinceprocessingnaturallanguagerequiresworkatseverallevles, and complexities and ambiguitiesariesateach of thoselevles. Hence an MT system canbesaid to bedoingnaturallanguageprocessing(NLP).In fact,most machine translation application requiressomedegree of naturallanguageunderstanding to do the translation. 3
4. History of Machine Translation Machine translation as a discipline dates back to the earlynineteen-fifties. The complexity of the problemwasoriginallyunderestimated, and someearlysuccessfuldemonstrations of experimental system lead to unrealisticexpectionswhichwere hard to fulfil. In the early eighties, the JapaneseFifthGenerationComputing Project revivedinterest in thiswork. The currentapproach to MT is more pragmatic and realistic. 4
5. Contd…. It isnowwidelyacceptedthatfullyautomatic, general-purpose , highquality machine translation is a verydifficultproblem, but veryuseful and pratical system canneverthelessbedeveloped by realxing one or more of thesecriteria,andseveralusefulsystems have been built by doingso,and are in use today. Suchsystems are beingused to translate public announcements,weather bulletins, technical documents, and web pages. 5
6. Contd.. Some machine translation services are starting to becomeavailable on the world wide web. For example,the web page of the Google searchenginealsoprovides a translation service thatcan translate simple sentences among a handful of languages. 6
7. Translation telephonetechnology(speech to speech translation) The ‘Janus’ projectat the Interactive System lab, Carnegie Mellon University, Is working on set of translation project. You dial yourcolleague in tokyo. You do not speakJapanese, and hedoes not speakenglish.Soyouneed system suchthatyouspeakinto the phone in english, whichautomaticallygets translate intojapanese for him, he replies in japanese, and youhearit in english. 7
8. Research MT System Example:thejanustranlsating Phone project This prototype system allowstwousers to communicate in a givendomain via a videoconferencingconnection. Each party sees the other conversant, hearshis/herorginalvoicesees/hears translation of whathe/shesays as subtitles, caption and synthetic speech. The situation iscooperative, That isbothuserswant to understandeachother and collaborate via the system to achieveunderstanding. 8
9. Contd…. After the record buttonisactivated, the station acceptsspoken input and produces a paraphrase of the input sentence first. Once the user has verifiedthat the system properlyunderstood the intendedmeaning, he/sheactivate the sendbutton to send a translation of thisintendedmeaning to the otherside in the desiredlanguage. Various interactive correction mechanismsfacilitate quick recovery, should possible processingerros and miscommunication have altered the intendedmeaning. 9
10. Machine Translation & Artificial Intelligence MT is an important sub-discipline of the widerfield of Artificial Intelligence(AI). AI(amongotherthings)deals withgetting machine to exhibit intelligent behaviour. As wemightimagine,both AI and MT are interesting and challengingfields. 10
11. Component of MT Wecandivide the machine translation taskintothree main phases:- The system has to first analyse the source language input to createsomeinternalrepresentetion. It thentypicallymanipulatesthisinternalrepresentationtotransferit to a formsuitable for a targetlanguage. Finally,itgenerates the output in the targetlanguage. 11
12. Analysis Transfer Generation Source Language Target Language Intermediate Representation based on source language Intermediate Representation based on target language 12
13. Contd… A typical MT system contains components for analysis ,transfer and generation as shown in diagram. These components incorporate a lot of knowledge about words(Lexical Knowledge), and about the language (LinguisticKnowledge). Suchknowledgeisstored in one or more lexicons ,and possiblyother sources of linguisticknowledge ,such as grammar. 13
14. Contd… The user interface isinvariably a crucial part of most MT system. The interface allows user to verify,disambiguate and if necessary correct the output of the system. Anothercommonfeature of NLP workis use of large ‘corpora’. A corpus is a large collection of textwhichisused for acquiring the required lexical and linguisticknowledge. 14
15. Contd… Somesystemsprefer to split the lexiconinto a source lexicon, a targetlexicon,and a transferlexiconthatmapsbetween the two. An MT lexicontypicallyneeds to bemuch more formal,precise and elaboratethan a typicalhumandictionary,sinceitismeant for mechanicalprocessing,and not for reading by humans. The lexiconplays a central role in modern MT system. 15
16. Lexicon The lexiconis an important component of MT system. A lexiconcontains all the relevant information about words and phrases thatisrequired for the variouslevels of analysis and generation. A typicallexicon entry for a wordwouldcontain the following information about the word:the part of speech,information about the equivalentword in the targetlanguage. 16
17. Approaches to MT Based on how closely the internalrepresentationdepends on the source and targetlanguages,approaches to MT canbedividedintothree major classes- Direct. Transfer-based. Inter-lingual. 17
18. A direct MT system tries to directlymap the source language to the targetlanguage , and isthereforehighlydependent on both the source and targetlanguages. A transfer-basedapproach first converts the source languageinto an internalrepresentation (IRs)whichisdependent on the source but not the targetlanguage.The system thentransformIRsinto a formIRtwhichisindependent of the source language and dependsonly on the targetlanguage and finallygenerates the targetlanguage output fromIRt. 18
19. … The Inter-lingualapproachconverts the input into a single internalrepresentation(IR) thatisindependent of both source and targetlanguages,andthenconvertsfromthisinto the output. 19
20. Levels of Natural LanguageProcessing Dealingwithnaturallanguagetypicallyrequiresprocessingatvariouslevels.Inincreasingorder of difficulty,they are:- The Lexical Level(or the Word Level) The SyntacticLevel(or the Sentence Level) The SemanticLevel(or the MeaningLevel) The Discourse and PragmaticLevel(or the Conversation ContextLevel). 20
21. The Lexical Level This level deals withlookingat the input string of characters and seperatingthemintotokens,whichmaybewords,space or punctuation. This levelalso deal with issues likehyphenatedwords,andmisspeltwords. It is the lexical levelwhich tells us that the input ‘’hejoined the parti’’consist of four words of which the last is incorrect. This levelissometimescalled ‘tokenisation’or ‘lexical analysis’. 21
22. The SyntacticLevel This level deals withidentifying the structure of a sentence,andverifyingwhether a sentence isgrammatically correct. This leveltypicallyconsist of a ‘parser’ which looks at the grammar of the language,and the input sentence,and tries to form a ‘parseTree’. If itcanform a parsetree ,the sentence issyntactically correct and the parsetreegives us the structure and the function of various components. 22
23. For ex., a typical English sentence wouldconsist of a subject and predicate.Thesubjectisnormally a noun phrase and the predicateis a verbphrase,andso on. The syntacticlevel tells us the sentence ‘’He the party joined’’ is (syntactically) incorrect, eventhougheachword in itis (lexically) correct. 23
24. The SemanticLevel This level deals with the meaning of the input and its components. It is the semanticlevelwhich tells us that the sentence ‘’He ate the Party’’ issemanticallyincorrect,thoughitislexically and syntacticallywellformed. In general, semanticanalysisinvolvesknowledge about the world,orat least the relevant aspect of world. 24
25. The Conversation ContextLevel This level deals with the information carriedacross multiple sentences, and with information thatis not explicit in the input, but isimplicit in the socio-cultural context of the input pessage or conversation. For ex., the expectedanswer to the question ‘’Do you know what the time is?’’issomethinglike ‘’4p.m.’’ , and not just ‘’Yes’’though the latter islexically,syntactically and symanticallyaccurate. 25
26. Issues in Machine Translation Machine Translation(and Natural LanguageProcessing) is a difficultproblem. There are two mains reasons, which are related to it. The first reasonisthatnaturallanguageishighlyambiguous.Theambiguityoccurat all levels-lexical,syntactic,semantic and pragmatic.Agivenword or sentence can have more than one meaning.Forex,theword ‘’party’’ couldmean a polyticalparty,or a social event,anddeciding the suitable one in perticular case is crucial to getting right analysis and therefore right translation 26
27. The second reasonisthatwhenhuman use naturallanguage , they use an enormousamount of commonsense, and knowledge about the world, whichhelps to resolve the ambiguity. For ex., in ‘’He went to the bank,butitwasclosed for lunch’’,wecaninferthat ‘bank’ refers to a financial institution, and not a river bank,becausewe know fromourknowledge of the world thatonly the former type of bankcanbeclosed for lunch. 27
28. The StatisticalApproach (Warren Weaver,1949) Theyconsideronly the translation of indivisual sentences. Usually, there are many acceptable translation of a perticular sentence the choiceamongthembeinglargely a matter of taste. Theytake the viewthatevery sentence in one languageis a possible translation of any sentence in the other. 28
29. Theyassign to every pair of sentences (S,T) a probability P(S/T) ie. Probabilitythat a translator willproduce T in the targetlanguagewhenpresentedwith S in the source language. Given a sentence T in the targetlanguage,theytry to seek the sentences S fromwhich the translator produces T. The chance of errorisminimized by choosingthat sentence S thatismost probable given T. Thus,theywish to choose S so as to maximize P(S/T). 29
30. UsingBayse’ theorm P(S/T) = P(S).P(T/S) / P(T) The denominator on the right of thisequationdoes not depend on S, and soitsuffices to choose the S thatmaximizes the product P(S)P(T/S) . where, P(S) is the language model probability of S , and P(T/S) is the translation probability of T given S. 30
31. Conclusion Twophenomena have given a new impetus to machine translation work-the globalisation of the world economy, and the explosion of the internet and World Wide Web. Boththesedevelopmentsmeanthatthereis a need for making an immense collection of naturallanguage documents available to multilingual global audience, and translation tools and system can go a long way in meeting thatneed. 31
32. The global translation marketisestimated to beat least 12 billion dollars. System thatautomatically translates Kalidasa and Shakespeare maystillbe a distant dream, but system that translate stock marketreport,weather bulletins and technicalmeasures are a reality today, and will continue to play an increasingly important role in the society of the next millenium. 32