SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Anabela Barreiro
barreiro_anabela@hotmail.com
FLUP & CLUP-Linguateca
New York University
New Tools and Resources to Support
Machine Translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Outline
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation vs Machine Translation
An objective and purpose distinction must be established
between human translation and machine translation!
•They use different methods
•They apply to different types of texts
•They serve different purposes
•They face different barriers
•They are NOT in competition!
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation
Professional translation requires:
•a profound knowledge of the source language and native
proficiency of the target language
•above-average writing skills
•an insightful knowledge of the social-cultural aspects of the
source and target languages
•knowledge of the grammar of the two languages, their
writing conventions, and the situational and cultural context
•In the case of scientific and technical translation, subject
matter knowledge is required, including terminologies of the
field or knowledge domain.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation
Theory of translation has been dealing with controversial
issues:
•problems related to privileging meaning over form
•visibility or invisibility of the translator
•being faithful to the author or trying to make the text
accessible to the reader (and which kind of reader)
•giving value to the source language culture (foreignise) or
making the text suitable for the target language culture
(domesticate)
•Allowing languages/cultures with more impact to
predominate over languages/cultures with less impact, or being
creative, etc.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation
The most relevant aspect in translation is to define the
purpose of each translation, which is related to the
characteristics of each text.
… And to define paraphrasing capabilities.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation: Types of Texts
A certain subjectivity and distance from the source
language text is allowed in translation of literary text for the
sake of maintaining the artistic and aesthetic aspects of the
target language text [Hermans, 1985] [Landers, 2001].
Literary translation may be considered an ART [Leighton,
1990] [Weaver, 2002], where the translator has more freedom
of expression.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Human Translation: Types of Texts
Technical, commercial, and legal translators, like the
authors of the original texts, are more restrained in their use of
language, and they need to be precise and convey the exact
meaning of the original text.
Technical texts are not meant to be beautiful but rather
to be informative, instructive and explanatory. Their main
function is to be clear, so the easier they are to read, the better
they are understood.
Technical translation may be regarded as a CRAFT
[Newmark, 1988] [Biguenet & Schulte, 1989] for which both
technical and linguistic competence is essential, but creativity
and vagueness prohibited.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation
With more translation being performed by machines,
new challenges are imposed on the field, theoretical traditions
shaken and the need to rethink the status of translation
becomes more evident. Of all automated applications, machine
translation compels us to reconsider the nature of translation.
ART and CRAFT are NOT appropriate concepts for
machine translation, because it has necessarily to rely on
linguistics and computer science.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation
1- Automated translation of text or speech from one natural
language into another
2- An important tool that assists human translators
3- It has become available to the general public in the last few
years due to:
• sophisticated computers
• continuous development of computer software capabilities
• internet boom
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation (cont.)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation Bottlenecks
1.Complexity of language
2.Ambiguity of language
3.Wordiness (related to text quality)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation: Limitations
• The task of delivering high-quality machine translation of certain
types of texts and complex linguistic phenomena is difficult
• It is difficult to grasp humour, sarcasm, and other human feelings
expressed in/by means of sophisticated linguistic expression
• Difficulties in handling extra-sentential and extra-textual and
extra-linguistic information (problems of culture or context),
because knowledge of the world cannot be assumed
• Difficult to deal with anaphora resolution
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation Linguistic Challenges
1.Homography
2.Cross-language phenomena (lexical divergences and idioms
and cross-language syntactic transformations, such as
passives)
3.Identification of named entities
4.Capacity to deal with long sentences and wordiness
5.Unusual alterations to the order of words in the target
language
6.Enhanced dictionaries and grammars to recognize and
translate multiword expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation Linguistic Challenges: Examples
• Handling of ellipsis
advanced ambiguity problems – related to anaphora
O JoĂŁo visitou muitos paĂ­ses do mundo. A Maria nĂŁo visitou nenhum.
=> João has visited many countries in the world. Maria hasn’t visited any.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation Linguistic Challenges: Examples
• Common-noun nuance resolution / homography
(1) ele nĂŁo quis tomar partido de ninguĂŠm
(2) ele ĂŠ um bom partido
(3) ele tirou partido da situação
(4) ele pertence a esse partido (polĂ­tico)
(5) o copo estĂĄ partido
(6) jĂĄ esteve em melhor partido
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Machine Translation Linguistic Challenges: Examples
Translation Engine Translation Results
FreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still this
week, defined advances or not for a candidacy to the RTLRS.
WorldLingo advances despite he is to make an effort in the direction to still take a decision this week,
defining if he advances or he does not stop a candidacy to the RTLRS.
Translation Engine Translation Results
Google Eu nĂŁo posso fazer a uma decisĂŁo sobre qualquer coisa estes dias.
Amikai que eu nĂŁo posso fazer para uma decisĂŁo sobre qualquer coisa estes dias.
FreeTranslation Eu nĂŁo posso tomar uma decisĂŁo sobre algo estes dias.
Babelfish Eu nĂŁo posso fazer a uma decisĂŁo sobre qualquer coisa estes dias.
WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias.
E-Translation Server NĂŁo posso tomar uma decisĂŁo sobre qualquer coisa estes dias.
I can't make a decision about anything these days. [Compara]
Francisco Vieira adianta ainda que estå a fazer um esforço no sentido de
tomar uma decisão ainda esta semana, definindo se avança ou não para
uma candidatura Ă  RTLRS. [CdP]
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Multiword Expressions: Support Verb Constructions
Support verb construction = predicate noun construction
is a multiword expression containing a verb with weak semantic value
and a noun which is the predicate of the sentence.
Predicate nouns can be:
morphologically related to a verb
fazer uma apresentação de = apresentar
pay a visit to = to visit
autonomous
fazer um mestrado - *mestrar
have fun - *to fun
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Main Objectives
1.Build a body of lexical, syntactic and semantic knowledge
around support verb constructions
2.Apply this linguistic knowledge to paraphrasing
3.Improve machine translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Outcome: Resources
Port4NooJ
•an open source, ontology driven Portuguese linguistic
system, which integrates a bilingual extension for
Portuguese-English machine translation
DicTUM
•Dicionário de Termos e Unidades Multipalavra
•a Dictionary of Multiword Expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Outcome: Tools
ReWriter
•a monolingual paraphraser to pre-edit texts, using
paraphrasing capabilities
•Portuguese version ReEscreve
ParaMT
•a bilingual/multilingual paraphraser to be integrated in
machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Resources
Port4NooJ - Publicly available at:
http://www.nooj4nlp.net
http://www.linguateca.pt/Repositorio/Port4Nooj/
Based on:
•NooJ linguistic environment (http://www.nooj4nlp.net/)
•OpenLogos English-Portuguese dictionary (http://logos-
os.dfki.de/)
OpenLogos is an open-source derivative of the Logos Machine Translation System
Data Used
•COMPARA (http://www.linguateca.pt/COMPARA)
•METRA (http://www.linguateca.pt/metra)
•Other corpora
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIV
doença maníaco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorder
doença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolardisorder
asma,N+FLX=CASA+AB+state+PULM+EN=asthma
AmesterdĂŁo,N+PL+city+EN=Amsterdam
Estados Unidos da AmĂŠrica,N+PL+coun+EN=United States of America
África,N+PL+cont+EN=Africa
Extremo Oriente,N+PL+othprop+EN=Far East
Mediterrâneo,N+FLX=ANO+PL+water+EN=Mediterranean
Alpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine Alps
ONU,N+AN+org+EN=UN
Syntactic-
Semantic
Attributes
English
Transfer
Inflectional
Paradigm
Part of
Speech
Lemma
mesa,N+FLX=CASA+CO+surf+EN=table
cair,V+FLX=ATRAIR+INMO+IntoType+EN=fall
holandês,A+FLX=INGLÊS+AN+lang+EN=Dutch
actualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadays
alguĂŠm,PRO+IMPERS+INDEF+EN=somebody
porque,RELINT+why+EN=why
e,CONJ+JOIN+EN=and
durante,PREP+TEMP+EN=during
cada,DET+IMPERS+INDEF+SG+EN=each
terceiro+NUM+ord+EN=one third
Port4NooJ Dictionaries
a curto prazo,ADV+TEMP+EN=in the short run
a favor de,PREP+CAUS+EN=in favor of
cada um,PRO+INDEF+SG+EN=each one
de quem,INT+ThatType+EN=whose
quem quer que seja,REL+WhateverType+EN=whoever
alĂŠm disso,CONJ+COOR+EN=besides
um quarto,NUM+frac+EN=one fourth
adro da igreja,N+FLX=MENINO+PL+encl+EN=churchyard
cabo de vassoura,N+FLX=MENINO+COtool+EN=broomstick
bebida alcoĂłlica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMB
bebida alcoĂłlica,N+FLX=CASA+MA+liqu+EN=booze+slang
cor de laranja,A+NAV+Apred+EN=orange
sul-americano,A+FLX=ALTO+AN+des+EN=South American
a curto prazo,ADV+LocTime+TEMP+EN=in the short run
fora de serviço,ADV+STAT+phr+EN=out of order
hĂĄ muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time ago
isto ĂŠ,CONJ+COOR+EN=i.e.
jĂĄ nĂŁo,CONJ+COOR+EN=no longer
mesmo assim,CONJ+SUB+EN=even so
juntamente com,PREP+ASSOC+EN=along with
Ă  direita de,PREP+Loc+AT+EN=at the right of
em conformidade com,PREP+ALOG+EN=in congruence with
General dictionary
sample representing all
PoS, variable and
invariable forms Sample of the
dictionary of Terms
and
Multiword Expressions
DicTUMSample of invariable
compounds in the
general dictionary
Sample of the
dictionary of
Biomedical Terms
Sample of the
dictionary of
Proper Names
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Port4NooJ Dictionaries
Sample of terms
classified as Information
+ Instructional/legal
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Syntactic-Semantic Ontology
 
 
    Representation abstract language
    Hierarchical taxonomy (sets, supersets and (sometimes) subsets)
    Based on Logos SAL ontology
    Integrated in the dictionary
    It represents both meaning (semantics), and structure (syntax)
    Over 1,000 categories
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Syntactic-Semantic Ontology
 
 
Noun Supersets
concrete
mass
animate
place
information
abstract
process (intr)
process (tr)
measure
time
aspective
Sets and Subsets of the CONCRETE Noun Superset
Click on CONCRETE Superset, sets and subsets for explanations
functionals
receptacles
bearing surfaces
links/bridges
thresholds, focal
points, barriers
conduits
fasteners
devices, tools
cloth thing
structural elements
concretizations of
verbals
concretizations of
mass nouns
undifferentiated
functionals
product/brand
names
* * *
agentives
software
vehicles
meters
machines/systems
communication agents
concrete chemical
agents
undifferentiated
agentives
* * *
natural things
minute flora
plants
trees
trees/wood
miscellaneous natural
things
* * *
other concrete sets*
impulses/lights
blemishes/marks
edibles (non-mass)
edibles/color
classifiers
amorphous
atomistic
undifferentiated
concrete things
* * *
*With one exception, these
sets have no subsets
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Syntactic-Semantic Ontology
Category Mnemonic Examples in English Examples in Portuguese
agentives CO+undagt See subsets See subsets
software CO+soft routine rotina, ficheiro
concrete chemical agents CO+chem catalyst, warhead ĂĄcido sulfĂşrico
machines/systems CO+mach battery, camera mĂĄquina fotogrĂĄfica
vehicles CO+vehic truck, ship automĂłvel
meters CO+meter clock, gauge manĂłmetro
communication agents CO+comm radio, radar rĂĄdio
functionals CO+undfunc trinket, ornament ornamento
devices/tools CO+tool pliers alicate
fasteners CO+fast nail, tendon prego
bearing surfaces CO+surf table, shelf mesa
receptacles CO+recp bottle, barrel garrafa
conduits CO+cond chute, artery artĂŠria
thresholds/focal points/barriers CO+barr wall, door porta
links/bridges CO+link circuit, nerve circuito
cloth things CO+cloth shirt, blanket camisola
structural elements CO+struc spar, bone osso
concretizations of verbals CO+verb threading
concretizations of mass nouns CO+mass acid lining
product/brand names CO+brand Windows NT Windows NT
natural things CO+nat See subsets See subsets
minute flora CO+flora algae, spore alga
plants CO+plant rose, weed erva
trees CO+tree apple, willow macieira
trees/wood CO+trwd oak, maple carvalho
misc. natural things CO+mnat pebble, iceberg iceberg
edibles (non-mass) CO+ednm pork chop costoleta
edibles/color CO+edcol orange, cherry laranja
impulses/lights Col+ight lamp, beam lâmpada
blemishes/marks CO+blem scratch, freckle sarda
classifiers CO+class element elemento
amorphous CO+amor breeze, tide brisa
atomistic CO+atom electron, atom ĂĄtomo
undifferentiated CO+obj trifle, curio
 
 
Categories of
CONCRETE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ME - MEASURE Noun Sets and Subsets
Sets and Subsets
Mnemonics (=
SynSem)
Examples
abstract concepts measured by unit ME+abs humidity, length
discrete measurable concepts ME+dis sum, increment
units of measure ME+unit See subsets
units of weight ME+unit+wt ounce, pound
units of velocity ME+unit+vel mph, megahertz
units of volume measure ME+unit+vol gallon, liter
units of temperature ME+unit+temp degrees celsius
units of energy/force ME+unit+ener watt, horsepower
measurement systems ME+unit+sys fahrenheit, kelvin
units of duration ME+unit+dur hour, minute, year
specialized units of measure ME+unit+spec oersted, ohm, phon
units of money/value ME+unit+value dollar, euro, forint
units of linear/area measure ME+unit+lin inch, yard, mile
general undifferentiated measure ME+undif degree, gross, share
Syntactic-Semantic Ontology
 
 
Categories of
MEASURE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
 
 
Inflectional and Derivational Description
Noun Inflectional Paradigm
Adjective Inflectional
Paradigm
Pronoun Inflectional Paradigm
Verb Inflectional Paradigm
Adverb Inflectional Paradigm Determiner Inflectional Paradigm
Interrogative Pronoun Inflectional
Paradigm Nominalization Derivational
Paradigm
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Paraphrasing and Translation Grammars
Translation and 
bilingual paraphrasing 
of simple sentences
Graph to translate simple 
sentences
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Verb entries:
• Identification of derivational paradigms for nominalizations
(annotation NDRV) and predicate adjectives (annotation ADRV)
• Link to the derived noun’s support verbs and to the adjective’s
copula verbs (annotation VSUP and annotation VCOP)
adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃO
azedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTO
Explicit Marking of Derivation and Support Verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Adjective entries:
• Identification of derivational paradigms for adverbializations
(annotation AVDRV)
literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTE
Autonomous predicate nouns:
• Identification of autonomous predicate nouns (annotation
Npred)
• Identification of a semantically related verb
curso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=um
Explicit Marking of Derivation and Semantic Verb Association
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: a Monolingual Standalone Paraphraser
Recognition and monolingual paraphrasing
of support verb constructions
(support verb construction / morphologically related lexical verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Examples
Recognition and paraphrasing of elementary 
support verb constructions
co-occurring with predicate nouns 
of the biomedical field
(support verb construction / lexical verb or 
stylistic variant / non-elementary support verb 
construction)
Elementary SVC > Lexical Verb
Elementary SVC > non-elementary SVC
realizar/efectuar
Elementary SVC > sujeitar-se a
submeter-se a
ONLY if the SUBJECT is a patient
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Extensibility
1.Applications to General Language
2.Applications to Technical Language
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Extensibility - Examples
[Paraphrasing adverbials]
à volta da órbita ≡ periorbital (popular versus technical)
around the orbit of the eye periorbital≡
[Paraphrasing relative clauses - into adjectival past
participles]
N0 que têm sido escritos N0 que foram descritos N0≡ ≡
escritos
N0 that have been written N0 that were described≡ ≡
N0 written
 
[Paraphrasing if clauses]
se for necessário se necessário≡
if it is necessary if necessary≡Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Extensibility - Examples
[Paraphrasing coordinated noun phrases - conjoining
or disjoining]
recursos linguísticos para o ensino e para a investigação
ĹŚ ?linguistic resources for teaching and for research
≡ recursos linguísticos para o ensino e a investigação
ĹŚ linguistic resources for teaching and research
[Paraphrasing subjunctive clauses - into infinitives]
pedimos o favor que confirme a sua participação
ĹŚ *we ask the favor that you confirm your attendance
≡ pedimos o favor de confirmar a sua participação
ĹŚ *we ask the favor of confirming your attendance
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Extensibility - Examples
[Paraphrasing marked-up constructions]
se a necessidade do utilizador ĂŠ criar um texto em linguagem controlada
ĹŚ ?if the end-user need is to create controlled language text
≡ se o utilizador necessita de criar um texto em linguagem controlada
ĹŚ if the end-user needs to create controlled language text
[Paraphrasing of vague and undefined or null subject sentences]
(whenever the real subject/actor is known)
[-] houve um grito na rua [N-PRON]/≡ alguém gritou na rua
Ŧ there was shouting in the street [N-PRON]/≡ someone shouted in the
street
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ReWriter: Extensibility - Examples
[Paraphrasing passives - whenever suitable]
Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveu
esse livro em 2008
That book was written by Saramago in 2008 Saramago wrote that≡
book in 2008
Florida foi atingida por um tornado ≡ Um tornado atingiu a Florida
Florida was hit by a tornado A tornado hit Florida≡
O carro foi roubado ≡ Alguém roubou o carro
The car was stolen ≡ Someone stole the car
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
ParaMT: a Bilingual/Multilingual Paraphraser for MT
Recognition and bilingual paraphrasing of support verb constructions 
(Portuguese support verb construction / corresponding English verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Preliminary Quantitative Results
 
 
SVC Recognition
Precision
SVC Recognition
Recall
SVC Paraphrasing
Precision
Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%
Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%
Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%
Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%
Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%
Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%
Evaluation of recognition and paraphrasing 
of support verb constructions
500 sentences
100 for each elementary support verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Conclusions
Linguistic knowledge applied to a machine
translation system improves its output quality.
Effective results from linguistically based research
on paraphrases can save substantial effort and
resources employed by machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Thank you for your attention!
Acknowledgements
This work was partly supported by grant SFRH/BD/14076/2003
from Fundação para a Ciência e a Tecnologia, co-financed by
POSI and partly by Fundação para a Computação Científica
Nacional.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008

Weitere ähnliche Inhalte

Was ist angesagt?

So you want to be a programmer
So you want to be a programmerSo you want to be a programmer
So you want to be a programmerBusayo Oyebisi
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionKepa J. Rodriguez
 
Translation overview presentation
Translation overview presentationTranslation overview presentation
Translation overview presentationBrian LeVene
 
Industrial Translation
Industrial TranslationIndustrial Translation
Industrial Translationmshikano
 
Pro Translating Presentation
Pro Translating PresentationPro Translating Presentation
Pro Translating Presentationmcdelavega
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
 
Unit Plan
Unit PlanUnit Plan
Unit PlanprojectRAE
 

Was ist angesagt? (14)

Lesson 41
Lesson 41Lesson 41
Lesson 41
 
So you want to be a programmer
So you want to be a programmerSo you want to be a programmer
So you want to be a programmer
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora Resolution
 
Translation overview presentation
Translation overview presentationTranslation overview presentation
Translation overview presentation
 
Industrial Translation
Industrial TranslationIndustrial Translation
Industrial Translation
 
Pro Translating Presentation
Pro Translating PresentationPro Translating Presentation
Pro Translating Presentation
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languages
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
Project paper
Project paperProject paper
Project paper
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
 
Unit Plan
Unit PlanUnit Plan
Unit Plan
 
SLAFINALpdf
SLAFINALpdfSLAFINALpdf
SLAFINALpdf
 

Ähnlich wie New Tools and Resources to Support Machine Translation

Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)Olga Melnikova
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...Digital Classicist Seminar Berlin
 
Spotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationSpotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationUlatus
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
 
Translation Resources
Translation ResourcesTranslation Resources
Translation ResourcesRolando Tellez
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Translation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorTranslation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorIRJET Journal
 
Updated translator Konatsu Yokokawa
Updated translator Konatsu YokokawaUpdated translator Konatsu Yokokawa
Updated translator Konatsu YokokawaKonatsu Yokokawa
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...TERMCAT
 
AAT Translation Assessment Process
AAT Translation Assessment ProcessAAT Translation Assessment Process
AAT Translation Assessment ProcessAAT Taiwan
 
Proposal presentation.pptx
Proposal presentation.pptxProposal presentation.pptx
Proposal presentation.pptxNhlakanipho Majola
 
TRANSLATION TECHNIQUES.ppt
TRANSLATION TECHNIQUES.pptTRANSLATION TECHNIQUES.ppt
TRANSLATION TECHNIQUES.pptGailan1
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
The Communicative Event
The Communicative EventThe Communicative Event
The Communicative EventCharles Rei
 

Ähnlich wie New Tools and Resources to Support Machine Translation (20)

First Stages and challenges of LibreOffice Translation in Hausa Language
First Stages and challenges  of LibreOffice Translation  in Hausa LanguageFirst Stages and challenges  of LibreOffice Translation  in Hausa Language
First Stages and challenges of LibreOffice Translation in Hausa Language
 
Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)
 
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
[DCSB] Gregory Crane, Stella Dee, Maryam Foradi, Monica Lent, Maria Moritz (U...
 
Spotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationSpotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human Translation
 
IRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET - Analysis on Code-Mixed Data for Movie Reviews
IRJET - Analysis on Code-Mixed Data for Movie Reviews
 
Translation Resources
Translation ResourcesTranslation Resources
Translation Resources
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Translation Ally: Document and Audio Translator
Translation Ally: Document and Audio TranslatorTranslation Ally: Document and Audio Translator
Translation Ally: Document and Audio Translator
 
Updated translator Konatsu Yokokawa
Updated translator Konatsu YokokawaUpdated translator Konatsu Yokokawa
Updated translator Konatsu Yokokawa
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Translation & Localization of E-learning Courses How to Get Started
Translation & Localization of E-learning Courses How to Get StartedTranslation & Localization of E-learning Courses How to Get Started
Translation & Localization of E-learning Courses How to Get Started
 
NPL.pptx
NPL.pptxNPL.pptx
NPL.pptx
 
It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...
 
AAT Translation Assessment Process
AAT Translation Assessment ProcessAAT Translation Assessment Process
AAT Translation Assessment Process
 
Proposal presentation.pptx
Proposal presentation.pptxProposal presentation.pptx
Proposal presentation.pptx
 
TRANSLATION TECHNIQUES.ppt
TRANSLATION TECHNIQUES.pptTRANSLATION TECHNIQUES.ppt
TRANSLATION TECHNIQUES.ppt
 
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter ToolboxNCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
NCIHC WEBINAR: Translation as a Tool in the Interpreter Toolbox
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
"Just Read This to The Patient" An Introduction to Teaching Sight Translation...
"Just Read This to The Patient" An Introduction to Teaching Sight Translation..."Just Read This to The Patient" An Introduction to Teaching Sight Translation...
"Just Read This to The Patient" An Introduction to Teaching Sight Translation...
 
The Communicative Event
The Communicative EventThe Communicative Event
The Communicative Event
 

Mehr von INESC-ID (Spoken Language Systems Laboratory - L2F)

Mehr von INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
Anålise comparativa das ediçþes portuguesa e brasileira de Os livros que dev...
Anålise comparativa das ediçþes portuguesa e brasileira de  Os livros que dev...Anålise comparativa das ediçþes portuguesa e brasileira de  Os livros que dev...
Anålise comparativa das ediçþes portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 

KĂźrzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

KĂźrzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

New Tools and Resources to Support Machine Translation

  • 1. Anabela Barreiro barreiro_anabela@hotmail.com FLUP & CLUP-Linguateca New York University New Tools and Resources to Support Machine Translation Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 2. Outline Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 3. Human Translation vs Machine Translation An objective and purpose distinction must be established between human translation and machine translation! •They use different methods •They apply to different types of texts •They serve different purposes •They face different barriers •They are NOT in competition! Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 4. Human Translation Professional translation requires: •a profound knowledge of the source language and native proficiency of the target language •above-average writing skills •an insightful knowledge of the social-cultural aspects of the source and target languages •knowledge of the grammar of the two languages, their writing conventions, and the situational and cultural context •In the case of scientific and technical translation, subject matter knowledge is required, including terminologies of the field or knowledge domain. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 5. Human Translation Theory of translation has been dealing with controversial issues: •problems related to privileging meaning over form •visibility or invisibility of the translator •being faithful to the author or trying to make the text accessible to the reader (and which kind of reader) •giving value to the source language culture (foreignise) or making the text suitable for the target language culture (domesticate) •Allowing languages/cultures with more impact to predominate over languages/cultures with less impact, or being creative, etc. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 6. Human Translation The most relevant aspect in translation is to define the purpose of each translation, which is related to the characteristics of each text. … And to define paraphrasing capabilities. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 7. Human Translation: Types of Texts A certain subjectivity and distance from the source language text is allowed in translation of literary text for the sake of maintaining the artistic and aesthetic aspects of the target language text [Hermans, 1985] [Landers, 2001]. Literary translation may be considered an ART [Leighton, 1990] [Weaver, 2002], where the translator has more freedom of expression. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 8. Human Translation: Types of Texts Technical, commercial, and legal translators, like the authors of the original texts, are more restrained in their use of language, and they need to be precise and convey the exact meaning of the original text. Technical texts are not meant to be beautiful but rather to be informative, instructive and explanatory. Their main function is to be clear, so the easier they are to read, the better they are understood. Technical translation may be regarded as a CRAFT [Newmark, 1988] [Biguenet & Schulte, 1989] for which both technical and linguistic competence is essential, but creativity and vagueness prohibited. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 9. Machine Translation With more translation being performed by machines, new challenges are imposed on the field, theoretical traditions shaken and the need to rethink the status of translation becomes more evident. Of all automated applications, machine translation compels us to reconsider the nature of translation. ART and CRAFT are NOT appropriate concepts for machine translation, because it has necessarily to rely on linguistics and computer science. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 10. Machine Translation 1- Automated translation of text or speech from one natural language into another 2- An important tool that assists human translators 3- It has become available to the general public in the last few years due to: • sophisticated computers • continuous development of computer software capabilities • internet boom Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 11. Machine Translation (cont.) Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 12. Machine Translation Bottlenecks 1.Complexity of language 2.Ambiguity of language 3.Wordiness (related to text quality) Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 13. Machine Translation: Limitations • The task of delivering high-quality machine translation of certain types of texts and complex linguistic phenomena is difficult • It is difficult to grasp humour, sarcasm, and other human feelings expressed in/by means of sophisticated linguistic expression • Difficulties in handling extra-sentential and extra-textual and extra-linguistic information (problems of culture or context), because knowledge of the world cannot be assumed • Difficult to deal with anaphora resolution Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 14. Machine Translation Linguistic Challenges 1.Homography 2.Cross-language phenomena (lexical divergences and idioms and cross-language syntactic transformations, such as passives) 3.Identification of named entities 4.Capacity to deal with long sentences and wordiness 5.Unusual alterations to the order of words in the target language 6.Enhanced dictionaries and grammars to recognize and translate multiword expressions Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 15. Machine Translation Linguistic Challenges: Examples • Handling of ellipsis advanced ambiguity problems – related to anaphora O JoĂŁo visitou muitos paĂ­ses do mundo. A Maria nĂŁo visitou nenhum. => JoĂŁo has visited many countries in the world. Maria hasn’t visited any. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 16. Machine Translation Linguistic Challenges: Examples • Common-noun nuance resolution / homography (1) ele nĂŁo quis tomar partido de ninguĂŠm (2) ele ĂŠ um bom partido (3) ele tirou partido da situação (4) ele pertence a esse partido (polĂ­tico) (5) o copo estĂĄ partido (6) jĂĄ esteve em melhor partido Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 17. Machine Translation Linguistic Challenges: Examples Translation Engine Translation Results FreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still this week, defined advances or not for a candidacy to the RTLRS. WorldLingo advances despite he is to make an effort in the direction to still take a decision this week, defining if he advances or he does not stop a candidacy to the RTLRS. Translation Engine Translation Results Google Eu nĂŁo posso fazer a uma decisĂŁo sobre qualquer coisa estes dias. Amikai que eu nĂŁo posso fazer para uma decisĂŁo sobre qualquer coisa estes dias. FreeTranslation Eu nĂŁo posso tomar uma decisĂŁo sobre algo estes dias. Babelfish Eu nĂŁo posso fazer a uma decisĂŁo sobre qualquer coisa estes dias. WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias. E-Translation Server NĂŁo posso tomar uma decisĂŁo sobre qualquer coisa estes dias. I can't make a decision about anything these days. [Compara] Francisco Vieira adianta ainda que estĂĄ a fazer um esforço no sentido de tomar uma decisĂŁo ainda esta semana, definindo se avança ou nĂŁo para uma candidatura Ă  RTLRS. [CdP] Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 18. Multiword Expressions: Support Verb Constructions Support verb construction = predicate noun construction is a multiword expression containing a verb with weak semantic value and a noun which is the predicate of the sentence. Predicate nouns can be: morphologically related to a verb fazer uma apresentação de = apresentar pay a visit to = to visit autonomous fazer um mestrado - *mestrar have fun - *to fun Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 19. Main Objectives 1.Build a body of lexical, syntactic and semantic knowledge around support verb constructions 2.Apply this linguistic knowledge to paraphrasing 3.Improve machine translation Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 20. Outcome: Resources Port4NooJ •an open source, ontology driven Portuguese linguistic system, which integrates a bilingual extension for Portuguese-English machine translation DicTUM •DicionĂĄrio de Termos e Unidades Multipalavra •a Dictionary of Multiword Expressions Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 21. Outcome: Tools ReWriter •a monolingual paraphraser to pre-edit texts, using paraphrasing capabilities •Portuguese version ReEscreve ParaMT •a bilingual/multilingual paraphraser to be integrated in machine translation systems Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 22. Resources Port4NooJ - Publicly available at: http://www.nooj4nlp.net http://www.linguateca.pt/Repositorio/Port4Nooj/ Based on: •NooJ linguistic environment (http://www.nooj4nlp.net/) •OpenLogos English-Portuguese dictionary (http://logos- os.dfki.de/) OpenLogos is an open-source derivative of the Logos Machine Translation System Data Used •COMPARA (http://www.linguateca.pt/COMPARA) •METRA (http://www.linguateca.pt/metra) •Other corpora Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 23. HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIV doença manĂ­aco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorder doença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolardisorder asma,N+FLX=CASA+AB+state+PULM+EN=asthma AmesterdĂŁo,N+PL+city+EN=Amsterdam Estados Unidos da AmĂŠrica,N+PL+coun+EN=United States of America África,N+PL+cont+EN=Africa Extremo Oriente,N+PL+othprop+EN=Far East Mediterrâneo,N+FLX=ANO+PL+water+EN=Mediterranean Alpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine Alps ONU,N+AN+org+EN=UN Syntactic- Semantic Attributes English Transfer Inflectional Paradigm Part of Speech Lemma mesa,N+FLX=CASA+CO+surf+EN=table cair,V+FLX=ATRAIR+INMO+IntoType+EN=fall holandĂŞs,A+FLX=INGLÊS+AN+lang+EN=Dutch actualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadays alguĂŠm,PRO+IMPERS+INDEF+EN=somebody porque,RELINT+why+EN=why e,CONJ+JOIN+EN=and durante,PREP+TEMP+EN=during cada,DET+IMPERS+INDEF+SG+EN=each terceiro+NUM+ord+EN=one third Port4NooJ Dictionaries a curto prazo,ADV+TEMP+EN=in the short run a favor de,PREP+CAUS+EN=in favor of cada um,PRO+INDEF+SG+EN=each one de quem,INT+ThatType+EN=whose quem quer que seja,REL+WhateverType+EN=whoever alĂŠm disso,CONJ+COOR+EN=besides um quarto,NUM+frac+EN=one fourth adro da igreja,N+FLX=MENINO+PL+encl+EN=churchyard cabo de vassoura,N+FLX=MENINO+COtool+EN=broomstick bebida alcoĂłlica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMB bebida alcoĂłlica,N+FLX=CASA+MA+liqu+EN=booze+slang cor de laranja,A+NAV+Apred+EN=orange sul-americano,A+FLX=ALTO+AN+des+EN=South American a curto prazo,ADV+LocTime+TEMP+EN=in the short run fora de serviço,ADV+STAT+phr+EN=out of order hĂĄ muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time ago isto ĂŠ,CONJ+COOR+EN=i.e. jĂĄ nĂŁo,CONJ+COOR+EN=no longer mesmo assim,CONJ+SUB+EN=even so juntamente com,PREP+ASSOC+EN=along with Ă  direita de,PREP+Loc+AT+EN=at the right of em conformidade com,PREP+ALOG+EN=in congruence with General dictionary sample representing all PoS, variable and invariable forms Sample of the dictionary of Terms and Multiword Expressions DicTUMSample of invariable compounds in the general dictionary Sample of the dictionary of Biomedical Terms Sample of the dictionary of Proper Names Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 24. Port4NooJ Dictionaries Sample of terms classified as Information + Instructional/legal Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 26. Syntactic-Semantic Ontology     Noun Supersets concrete mass animate place information abstract process (intr) process (tr) measure time aspective Sets and Subsets of the CONCRETE Noun Superset Click on CONCRETE Superset, sets and subsets for explanations functionals receptacles bearing surfaces links/bridges thresholds, focal points, barriers conduits fasteners devices, tools cloth thing structural elements concretizations of verbals concretizations of mass nouns undifferentiated functionals product/brand names * * * agentives software vehicles meters machines/systems communication agents concrete chemical agents undifferentiated agentives * * * natural things minute flora plants trees trees/wood miscellaneous natural things * * * other concrete sets* impulses/lights blemishes/marks edibles (non-mass) edibles/color classifiers amorphous atomistic undifferentiated concrete things * * * *With one exception, these sets have no subsets Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 27. Syntactic-Semantic Ontology Category Mnemonic Examples in English Examples in Portuguese agentives CO+undagt See subsets See subsets software CO+soft routine rotina, ficheiro concrete chemical agents CO+chem catalyst, warhead ĂĄcido sulfĂşrico machines/systems CO+mach battery, camera mĂĄquina fotogrĂĄfica vehicles CO+vehic truck, ship automĂłvel meters CO+meter clock, gauge manĂłmetro communication agents CO+comm radio, radar rĂĄdio functionals CO+undfunc trinket, ornament ornamento devices/tools CO+tool pliers alicate fasteners CO+fast nail, tendon prego bearing surfaces CO+surf table, shelf mesa receptacles CO+recp bottle, barrel garrafa conduits CO+cond chute, artery artĂŠria thresholds/focal points/barriers CO+barr wall, door porta links/bridges CO+link circuit, nerve circuito cloth things CO+cloth shirt, blanket camisola structural elements CO+struc spar, bone osso concretizations of verbals CO+verb threading concretizations of mass nouns CO+mass acid lining product/brand names CO+brand Windows NT Windows NT natural things CO+nat See subsets See subsets minute flora CO+flora algae, spore alga plants CO+plant rose, weed erva trees CO+tree apple, willow macieira trees/wood CO+trwd oak, maple carvalho misc. natural things CO+mnat pebble, iceberg iceberg edibles (non-mass) CO+ednm pork chop costoleta edibles/color CO+edcol orange, cherry laranja impulses/lights Col+ight lamp, beam lâmpada blemishes/marks CO+blem scratch, freckle sarda classifiers CO+class element elemento amorphous CO+amor breeze, tide brisa atomistic CO+atom electron, atom ĂĄtomo undifferentiated CO+obj trifle, curio     Categories of CONCRETE nouns Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 28. ME - MEASURE Noun Sets and Subsets Sets and Subsets Mnemonics (= SynSem) Examples abstract concepts measured by unit ME+abs humidity, length discrete measurable concepts ME+dis sum, increment units of measure ME+unit See subsets units of weight ME+unit+wt ounce, pound units of velocity ME+unit+vel mph, megahertz units of volume measure ME+unit+vol gallon, liter units of temperature ME+unit+temp degrees celsius units of energy/force ME+unit+ener watt, horsepower measurement systems ME+unit+sys fahrenheit, kelvin units of duration ME+unit+dur hour, minute, year specialized units of measure ME+unit+spec oersted, ohm, phon units of money/value ME+unit+value dollar, euro, forint units of linear/area measure ME+unit+lin inch, yard, mile general undifferentiated measure ME+undif degree, gross, share Syntactic-Semantic Ontology     Categories of MEASURE nouns Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 29.     Inflectional and Derivational Description Noun Inflectional Paradigm Adjective Inflectional Paradigm Pronoun Inflectional Paradigm Verb Inflectional Paradigm Adverb Inflectional Paradigm Determiner Inflectional Paradigm Interrogative Pronoun Inflectional Paradigm Nominalization Derivational Paradigm Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 30. Paraphrasing and Translation Grammars Translation and  bilingual paraphrasing  of simple sentences Graph to translate simple  sentences Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 31. Verb entries: • Identification of derivational paradigms for nominalizations (annotation NDRV) and predicate adjectives (annotation ADRV) • Link to the derived noun’s support verbs and to the adjective’s copula verbs (annotation VSUP and annotation VCOP) adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃO azedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTO Explicit Marking of Derivation and Support Verb Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 32. Adjective entries: • Identification of derivational paradigms for adverbializations (annotation AVDRV) literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTE Autonomous predicate nouns: • Identification of autonomous predicate nouns (annotation Npred) • Identification of a semantically related verb curso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=um Explicit Marking of Derivation and Semantic Verb Association Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 33. ReWriter: a Monolingual Standalone Paraphraser Recognition and monolingual paraphrasing of support verb constructions (support verb construction / morphologically related lexical verb) Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 34. ReWriter: Examples Recognition and paraphrasing of elementary  support verb constructions co-occurring with predicate nouns  of the biomedical field (support verb construction / lexical verb or  stylistic variant / non-elementary support verb  construction) Elementary SVC > Lexical Verb Elementary SVC > non-elementary SVC realizar/efectuar Elementary SVC > sujeitar-se a submeter-se a ONLY if the SUBJECT is a patient Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 35. ReWriter: Application - Interface Interactive ReWriter for word processing applications such as text editing Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 36. ReWriter: Application - Interface Interactive ReWriter for word processing applications such as text editing Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 37. ReWriter: Application - Interface Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 38. ReWriter: Application - Interface Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 39. ReWriter: Application - Interface Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 40. ReWriter: Extensibility 1.Applications to General Language 2.Applications to Technical Language Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 41. ReWriter: Extensibility - Examples [Paraphrasing adverbials] Ă  volta da Ăłrbita ≡ periorbital (popular versus technical) around the orbit of the eye periorbital≡ [Paraphrasing relative clauses - into adjectival past participles] N0 que tĂŞm sido escritos N0 que foram descritos N0≡ ≡ escritos N0 that have been written N0 that were described≡ ≡ N0 written   [Paraphrasing if clauses] se for necessĂĄrio se necessĂĄrio≡ if it is necessary if necessary≡Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 42. ReWriter: Extensibility - Examples [Paraphrasing coordinated noun phrases - conjoining or disjoining] recursos linguĂ­sticos para o ensino e para a investigação ĹŚ ?linguistic resources for teaching and for research ≡ recursos linguĂ­sticos para o ensino e a investigação ĹŚ linguistic resources for teaching and research [Paraphrasing subjunctive clauses - into infinitives] pedimos o favor que confirme a sua participação ĹŚ *we ask the favor that you confirm your attendance ≡ pedimos o favor de confirmar a sua participação ĹŚ *we ask the favor of confirming your attendance Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 43. ReWriter: Extensibility - Examples [Paraphrasing marked-up constructions] se a necessidade do utilizador ĂŠ criar um texto em linguagem controlada ĹŚ ?if the end-user need is to create controlled language text ≡ se o utilizador necessita de criar um texto em linguagem controlada ĹŚ if the end-user needs to create controlled language text [Paraphrasing of vague and undefined or null subject sentences] (whenever the real subject/actor is known) [-] houve um grito na rua [N-PRON]/≡ alguĂŠm gritou na rua ĹŚ there was shouting in the street [N-PRON]/≡ someone shouted in the street Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 44. ReWriter: Extensibility - Examples [Paraphrasing passives - whenever suitable] Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveu esse livro em 2008 That book was written by Saramago in 2008 Saramago wrote that≡ book in 2008 Florida foi atingida por um tornado ≡ Um tornado atingiu a Florida Florida was hit by a tornado A tornado hit Florida≡ O carro foi roubado ≡ AlguĂŠm roubou o carro The car was stolen ≡ Someone stole the car Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 45. ParaMT: a Bilingual/Multilingual Paraphraser for MT Recognition and bilingual paraphrasing of support verb constructions  (Portuguese support verb construction / corresponding English verb) Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 46. Preliminary Quantitative Results     SVC Recognition Precision SVC Recognition Recall SVC Paraphrasing Precision PĂ´r 73/73 - 100% 73/100 – 73% 72/73 - 98.6% Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1% Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7% Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1% Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8% Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4% Evaluation of recognition and paraphrasing  of support verb constructions 500 sentences 100 for each elementary support verb Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 47. Conclusions Linguistic knowledge applied to a machine translation system improves its output quality. Effective results from linguistically based research on paraphrases can save substantial effort and resources employed by machine translation systems Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008
  • 48. Thank you for your attention! Acknowledgements This work was partly supported by grant SFRH/BD/14076/2003 from Fundação para a CiĂŞncia e a Tecnologia, co-financed by POSI and partly by Fundação para a Computação CientĂ­fica Nacional. Mestrado em Tradução JurĂ­dica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Hinweis der Redaktion

  1. Good afternoon! My name is AB and I am a PhD student working on MT. I am affiliated with Universidade do Porto-Linguateca and New York University. My interests have centered on MT after working on a commercial MT system for over 7 years. In this presentation , I will introduce ParaMT, a paraphraser applied to machine translation, which was developed during my research work.
  2. Outline First an introduction to distinguish HT from MT Then talk about the resources and tools developed within the scope of my PhD research
  3. Human translation cannot be replaced by machine translation, at least until there are breakthroughs in the limitation of machine translation to sentence level translation, and in artificial intelligence.
  4. Some facts about Machine Translation For most of human history, translation was an exclusively human activity. Before that, machine translation was only accessible to a very restricted niche of the market, and computer-aided translation was used only by professional translators.
  5. Despite the availability of funding and many talented researchers worldwide, most efforts to build cost-effective, industrial strength, high-quality machine translation have fallen short of their goals, since first attempts in the 1950's. Successful machine translation has been difficult to achieve because of two major hurdles: complexity and ambiguity of language.
  6. Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language.
  7. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text. Typical problems in machine translation They often produce errors
  8. Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
  9. "bom partido" tambĂŠm pode ser considerado um composto e "tirar partido de" como uma expressao fixa ou semi-fixa
  10. Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
  11. A support verb construction is defined as a predicate noun construction containing a main verb which has a weak semantic value. Support verb constructions is an area where statistics tend to “trap” systems. If statistical systems are not sensitive to these constructions, the consequence may be misleading translations. Linguistic knowledge about support verb constructions provides a statistical system with special training data that could correct this problem.
  12. So, according to this desire to see better results, my main objectives were: READ 1, 2, 3.
  13. The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
  14. The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
  15. In any language processing application, the linguistic resources represent the foundation. In machine translation especially, the linguistic resources are the driving force that boosts the translation process. Port4NooJ is developed on two original sources: NooJ linguistic environment and OpenLogos lexical resources. Linguateca’s resources were also used.
  16. The system includes several dictionaries. The structure of the dictionary is XXX
  17. The system includes several dictionaries. The structure of the dictionary is XXX
  18. I will skip this slide on the inflectional and derivational descriptions.
  19. Este slide apresenta uma gramåtica local para a anålise e reconhecimento de construçþes com verbos suporte elementares e o parafraseamento monolingue que podemos ver na concordância. Paralelamente podemos encontrar, à esquerda a CVS e à direita um verbo lexical que lhe Ê equivalente.
  20. Neste slide temos representada mais uma concordância, desta vez para o reconhecimento e parafraseamento de construções com verbos suporte elementares que co-ocorrem com nomes predicativos da área biomédica. À esquerda está representada a CVS e à direita um verbo lexical que lhe é equivalente ou uma variante estilística da construção, que pode ser construída a partir de um verbo suporte não elementar, tal como efectuar ou realizar ou por uma construção do tipo “sujeitar-se a” ou “submeter-se a”, no caso de o sujeito da CVS ser obrigatoriamente um paciente. À esquerda está representada a CVS e à direita as suas paráfrases.
  21. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  22. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  23. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  24. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  25. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  26. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  27. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  28. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  29. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  30. One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  31. A concordância representada neste slide ilustra o reconhecimento e parafraseamento bilingue PT-EN de CVS. À esquerda temos a CVS em português e à direita, um verbo lexical equivalente em inglês.
  32. Two main conclusions derived from this work are: