Material presented at the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia.
Download paper: http://hal.archives-ouvertes.fr/hal-00585187
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina
2. Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
2
2 / 47
3. Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
3 / 47
3
4. Context of the work
• Bilingual terminology mining from
comparable corpora
• Application to:
– computer-aided translation
– computer-aided terminology
4
4 / 47
5. Scope of the work
• Find a way to show the "added-value" of
the acquired terminology when used for
technical translation
– do translators translate better and/or faster ?
• Conception and experimentation of an
"applicative" evaluation protocol for
bilingual terminologies
5
5 / 47
6. Outline
1. Context and scope of work
2. Comparable corpora and
terminology evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
6 / 47
6
7. Comparable corpora
English texts on breast
cancer
French texts on breast
cancer
It has been suggested that
breast magnetic resonance
imaging (MRI) is more
accurate in the diagnosis of
breast cancer...
L'imagerie par résonance
magnétique avec injection
de gadolinium (IRM) est une
technique indépendante de
la densité mammaire....
Histological evaluation
revealed the presence of
DCIS...
Un diagnostic histologique
est nécessaire...
7
7 / 47
8. Comparable corpora
English texts on breast
cancer
French texts on breast
cancer
It has been suggested that
breast magnetic resonance
imaging (MRI) is more
accurate in the diagnosis of
breast cancer...
L'imagerie par résonance
magnétique avec injection
de gadolinium (IRM) est une
technique indépendante de
la densité mammaire....
Histological evaluation
revealed the presence of
DCIS...
Un diagnostic histologique
est nécessaire...
8
8 / 47
9. Comparable corpora
English texts on breast cancer
French texts on breast
cancer
It has been suggested that breast
magnetic resonance imaging
(MRI) is more accurate in the
diagnosis of breast cancer...
L'imagerie par résonance
magnétique avec injection
de gadolinium (IRM) est
une technique
indépendante de la
densité mammaire....
Histological evaluation revealed
the presence of ductal
carcinoma in situ.
Un diagnostic histologique
est nécessaire...
9
9 / 47
10. Advantages of comparable
corpora
• More available
– new domains
– unprecedented language pairs
• Quality
– spontaneous language
– not influenced from source texts
10
10 / 47
11. Reference evaluation of
bilingual terminologies
• Reference evaluation:
– output of the program is compared with a list
of reference translations
• Precision:
– percentage of output translations which are in
the reference
output∩reference
output
11
11 / 47
12. Reference evaluation with
comparable corpora
• Output:
– source term → ordered list of candidate
translations
• Example:
– histological → diagnostic1, histologie2,
histologique3, … nécessairen
12
12 / 47
13. Reference evaluation with
comparable corpora
• Precision:
– percentage of output translations which are in
the reference when you take into account
the Top 20 or Top 10 candidate
translations
• State-of-the-art:
– between 42% and 80% on Top 20
depending on corpus size, corpus type,
nature of translated elements [Morin and
Daille, 2009]
13
13 / 47
14. Reference vs. Applicative
evaluation
• Reference evaluation:
– ok for testing/developing the alignment
program
– fast, cheap, reproducible, objective
• Applicative evaluation:
– how much does the alignment program help
the end-users ?
– can the terminologies improve translation
quality?
14
14 / 47
15. Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
15
20. Questions raised
1) How do you assess translation quality ?
2) Evaluate the whole of the translations or
technical terms only ?
20
20 / 47
21. 1) How do you assess translation
quality ?
• Translation studies evaluation grids:
– SICAL, SAE J 2450
– too complex, scarcely documented
• Machine translation objective metrics
– BLEU, METEOR
– not adapted to human translation
– reproducibility is not an advantage in our case
21
21 / 47
22. 1) How do you assess translation
quality ?
• Machine translation subjective
evaluation
– translations evaluated by humans:
• quality judgement: adequacy, fluency...
• ranking
– use annotator agreement measure to ensure
judges agreement is sufficient
22
22 / 47
23. 2) Evaluate the whole text or
just some terms ?
• Quality of a text translation = complex
interaction of several parameters
• Focus on those elements for which the
translator felt he/she needed a linguistic
resource:
– evaluates only the part of the translation on
which the terminology has an impact
– easier and faster
23
23 / 47
24. Applicative evaluation protocol
• Compare 3 different "situations of
translations"
– one situation = one type of resource
• Translators do the translation, note down
the terms they had to look up
• The quality of the terms' translations is
assessed by human judges
24
24 / 47
29. Translations' assessment
1. Quality judgement :
– correct: standard term or expression
– acceptable: meaning is retained
– wrong: no meaning is retained
2. Ranking :
– from best to worst
– ties allowed
29
29 / 47
30. Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
30
31. Data
• Comparable corpora :
– breast cancer: 400k words/language
– water science: 2M words/language
• Texts to translate :
– research paper abstracts: ~500 words/domain
– lay science texts: ~500 words/domain
31
32. Translators' feedback
" Globally, 75% of technical words aren't in the
glossary, and for the other 25%, 99% have between
10 and 20 candidate translations and none has
been validated. So most of the time, you are just
partly sure, but you are never totally sure of your
translation. And in the worst cases, you translate
instinctively ".
Translators were not prepared to use a bilingual
terminology with many candidate translations
The terminology covered partially the
vocabulary of the texts to translate
32
32 / 47
33. Terminology coverage of texts to
translate
• Breast Cancer
– 94% of the vocabulary of the texts is in the
terminology
– fine-grained topic
• Water Science
– 14% of the vocabulary of the texts is in the
terminology
– topic is too general
33
33 / 47
34. Quality judgement / Breast Cancer
• equivalent proportion
of incorrect
translations
• Internet gives the
more correct
translations, then the
Comparable Corpora.
BREAST CANCER
K = 0,25
100%
90%
20%
19%
18%
42%
38%
35%
38%
43%
47%
80%
70%
60%
50%
40%
30%
20%
10%
0%
SIT. 1 / CC
SIT. 0 / GEN. LANG.
SIT. 2 / WEB
34
34 / 47
35. Quality judgement / Water Science
• Translations are
much better with
Internet
• Comparable corpora
produces worse
translations than the
general resources
WATER SCIENCE
K = 0,42
100%
18%
21%
23%
23%
90%
80%
7%
16%
70%
60%
50%
40%
30%
77%
59%
56%
20%
10%
0%
SIT. 1 / CC
SIT. 0/ GEN. LANG.
35
SIT. 2 / WEB
35 / 47
36. Results seem incoherent
• Translations produced
in situation 1 are
worse than
translations produced
in sit. 2
• But they share the
same "general
language resource"
basis
Terminology
mined from
COMPARABLE
CORPORA
general
language
resources
general
language
resources
BASELINE
Situation 1
36
36 / 47
37. Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition
SITUATION 1
Comparable corpora
SITUATION 2
Web
43%
14%
3%
-
25%
56%
79%
77%
44%
When translators have a specialized
ressource they tend to ignore the general
language resource
37
37 / 47
38. Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition
SITUATION 1
Comparable
corpora
SITUATION 2
Web
43%
14%
3%
-
25%
56%
79%
77%
44%
If translators of situation 1 had always looked
up the general resource first, translations of
situation 1 would have been at least as good
as translations of situation 0
38
38 / 47
39. Ranking / Breast Cancer
BREAST CANCER
K=0,69
42%
47%
45
32%
40
35
30
28%
26%
26%
25
20
15
10
5
0
CC vs. GEN. LANG.
CC vs. WEB
39
39 / 47
40. Ranking / Water science
WATER SCIENCE
K=0,63
49%
90
41% 43%
80
33%
70
60
50
40
18%
16%
30
20
10
0
CC vs. GEN. LANG.
CC vs. WEB
40
40 / 47
41. Outline
1. Context and scope of work
2. Bilingual terminology mining :
comparable vs. parallel corpora
3. Evaluation of bilingual terminologies
4. Applicative evaluation protocol
5. Experimentation and results
6. Future improvements
41
42. Improvements: terminology
coverage
• dependency between:
– added-value of the bilingual terminology
– its coverage of the texts to translate
• any added-value measure should also
indicate to what extent the terminology
contains the vocabulary of the translated
texts
42
42 / 47
43. Improvement 1: terminology
coverage
• Perspectives:
– create a "coverage" measure
– find out what is the minimum coverage for a
terminology to be "useful" to translate a given
text
– gather smaller but finer-grained corpora
43
43 / 47
44. Improvement 2: situations of
translations
• When translators have several ressources
at their disposal, they tend to ignore the
general language resource
• Consequence : the same resource is used
differently depending on the situation
• Seems to be the cause for incoherent
results
44
44 / 47
45. Improvement 2: situations of
translations
• Perspective : use 0 or 1 resource per
situation of translation
terminology
mined from
Comparable
Corpora
Situation 0
Situation 1
Web
Situation 2
45
45 / 47
46. Improvement 3: train translators
• Prepare translators to use "ambiguous",
unvalidated terminologies
• Do a first blank evaluation to :
– train the translators
– train the judges → results in higher
agreement
46
46 / 47
47. Acknowledgements
This work was funded by:
– French National Research Agency, subvention
n° ANR-08-CORD-009
– Lingua et Machina, www.lingua-et-machina.com
Annotators:
– Clémence De Baudus
– Mathieu Delage
47