This document discusses current research topics in terminology and ontologies. It covers trends like term variation, culture-specific semantic differences, definitions, contexts, and knowledge-rich contexts. It also discusses term extraction and mapping. Key areas of research include improving techniques for specialised domains, identifying term variants, providing richer semantic descriptions, and supporting terminological workflows and users.
Exploring the Future Potential of AI-Enabled Smartphone Processors
17. Anne Schuman (USAAR) Terminology and Ontologies 2
1. Terminology and Ontologies
Section 2: Current Research Topics
Anne-Kathrin Schumann
Saarland University
“Expert“ Winter School
Birmingham
November 13, 2013
2. Overview
Current trends in research
Term variation
Culture-specific semantic differences
Definitions, contexts, knowledge-rich
contexts
Usability aspects
Term extraction and term mapping
3. Current trends in research
Controversial paper by Cabré in Terminology 5 (1),
1998/1999, pp. 5-19: Do we need an autonomous theory
of terms?
“It is increasingly being accepted that Wüster‘s
theoretical stance […] is proving inadequate for the
different current needs of term description and
processing because of its idealising and simplifying
approach.“
(markup is mine)
4. Current trends in research
What have we been talking about?
terminology adopts a decompositional, structuralist approach to
the description of specialised meanings
the meaning of a terminological unit (concept+term) can be
described by a set of sufficient and necessary semantic invariants
no interest in the linguistic domain of the field:
“Only the designations of the concepts, the lexicon, are relevant to
the terminologist. Syntax and inflection are not. For the latter, the
same rules apply as in general language .“
(my translation from Wüster 1985: 2, markup as in the original)
5. Current trends in research
Terminology, then, is an exercise of reducing the complexity of
reality to simpler feature structures
“[D]iscreteness is in the head and fuzzyness is in the world.“
(Geeraerts 2010: 132)
6. Current trends in research
Main criticism: No account for
the multidisciplinary (denominative, cognitive and
functional) nature of terms
the communicative dimension of terminology
connotational aspects in terminology
the linguistic dependence of terms on particular languages
pragmatic/functional aspects of term variation
7. Current trends in research
Small recap: term variation
is ubiquitous
is a problem for applications that use terminology
Wüster‘s solution: standardisation
counter-proposal: systematic study and handling of term
variation
8. Current trends in research
Da jedoch der Massenstrom gleich bleiben muss, weitet sich bei einer frei
angeströmten Windkraftanlage der Wind auf, da eben trotz der geringeren
Geschwindigkeit hinter der Anlage die gleiche Menge Luft abtransportiert werden
muss. Aus eben diesem Grund ist die komplette Umwandlung der Windenergie in
Rotationsenergie mit einer Windkraftanlage nicht möglich: Dafür müssten die
Luftmassen hinter der Windkraftanlage ruhen, könnten also nicht abtransportiert
werden.
(Wikipedia)
-> coreference chains for text cohesion
9. Current trends in research
Term variation:
cannot be treated only prescriptively because it is
functional from a linguistic point of view
terms are reiterated in discourse for reasons of cohesion
the informativity of the term is managed by altering the
form of the term (especially if it is a MWT)
the whole form can normally be retrieved from context
(Collet 2004: 102)
-> term variation is influenced by text-linguistic aspects
10. Current trends in research
Other reasons for terminological variation:
dialects and geographical variation
chronological variation
social variation (e.g. academic expert vs. practitioner)
creativity, emphasis, expressiveness
language contact
conceptual imprecision, ideological reasons (e.g. “armchair
linguistics“) and different points of view (ozone layer depletion,
ozone layer destruction, ozone layer loss, ozone layer reduction)
(Freixa 2006)
11. Current trends in research
What is a term variant?
“ … an utterance which is semantically and conceptually related to
an original term.“
(Daille et al. 1996: 201)
-> an attested form found in a text
-> there is a codified (authorised) original term
-> semantically and conceptually related
12. Current trends in research
Types of variants:
graphical: missing hyphen (e.g. Windkraftanlage vs.
Windkraft-Anlage) or case differences
inflectional: orthographic (e.g. conservation de produit vs.
conservation de produits)
shallow syntactic:
variation of preposition (e.g. chromatographie sur/en
colonne)
optional characters (e. g. fixation de l‘azote vs. fixation
d‘azote)
predicative use of the adjective
13. Current trends in research
Types of variants:
syntactic:
additional modifier
additional nominal modifier (closed list, e.g. protéine
végétale vs. protéine d‘origine végétale)
expansion of the nominal head
permutations (e.g. air pressure vs. pressure of the air)
14. Current trends in research
Types of variants:
morphosyntactic:
alternation between preposition/prefix (e.g. pourissment
aprés récolte vs. pourissment post-récolte)
derivations (e.g. acidité du sang vs. acidité sanguine)
paradigmatic substitution (e. g. Ehemann vs. Ehegatte)
anaphoric uses
acronyms
(Daille 2005)
15. Current trends in research
Variant recognition given a set of candidate terms:
string similarity for inflectional/orthographical variants
(candidates with same POS shape and same length):
rule-based correction of lemmatisation errors
16. Current trends in research
Variant recognition given a set of candidate terms:
term variation patterns for rule-based variant
recognition
(Weller et al. 2011)
17. Current trends in research
Culture-specific semantic differences
Terminology considers specialised concepts to be
universal across languages
For general language, this view is outdated (pragmatics,
text linguistics, cultural differences etc.)
But also for LSP, things are not that easy
18. Current trends in research
Culture-specific semantic differences
Schmitt (1999) mentions different types of semantic
differences on the CONCEPTUAL level, e.g.
culture-dependent differences between conceptual
hierarchies
culture-dependent semantic prototypes
19. Current trends in research
Culture-specific semantic differences
culture-dependent differences between conceptual
hierarchies
e.g. different concept systems for steel in Germany and the
USA
“Primary coolant system interconnecting piping is carbon steel
with internal austenitic stainless steel weld deposit cladding.“
carbon steel = Kohlenstoffstahl?
20. Current trends in research
carbon steel = Baustahl
(+ term variation …)
“Most dictionaries fail to provide
accurate descriptions, especially in
problematic cases …“
(Schmitt 1999: 219, my translation from
German)
21. Current trends in research
Culture-specific semantic differences
culture-dependent semantic prototypes
• typical “German“ hammer:
nr. 1 (second from left)
• typical hammer in UK and
US: nr. 4 (first from right)
-> complicated translation
strategies, e. g.
• insertion of a functional
equivalent
• insertion of semantic markup (“In the US, the hammer
typically used is the …“)
• adaptation of drawings etc.
22. Current trends in research
Culture-specific semantic differences
culture-dependent semantic prototypes
“Apply the parking brake firmly. Shift the automatic transaxle to
Park (or manual transaxle to Neutral).“
->
„Handbremse fest anziehen. Schalthebel in Leerlaufstellung
bringen (bei Automatikgetriebe Wählhebel in Stellung P bringen).“
(Schmitt 1999: 255)
23. Current trends in research
Intermediate summary
Translation is a knowledge-based activity involving deep
semantic analysis, functional adaptation and the creation of
discoursive cohesion.
These issues affect terminological choices.
Detailed terminological descriptions are needed
to cope with lexical issues (term variation),
to constrain terminological (semantic) and, consequently,
translational choices.
The quality of a translation is a matter of functional adequacy (usability
in the target system and language and the intended context) rather
than linguistic (surface or structural) or even semantic similarity (skopos
theory).
24. Current trends in research
Intermediate summary: some research questions
How to improve (or adapt) NLP techniques (lemmatisation,
spelling correction/variant detection, compound splitting) for
specialised domains?
How can we identify term variants and map them to their
“canonical“ counterparts?
Can we use term variants for making (automatic) translation
or any other NLP task more fluent?
To which degree are variants detected by TM systems and can
we improve on that?
How can we provide richer semantic descriptions for terms?
25. Current trends in research
Definitions, contexts, knowledge-rich contexts
(ISOCat)
26. Current trends in research
Definitions, contexts, knowledge-rich contexts
Definitions are traditional parts of lexicographic entries
and were “inherited“ by terminology (but few resources
really provide them).
There are different kinds of definitions and different
ways of using them.
Lexicographic definitions explain lexical meanings
whereas terminographic definitions describe concepts.
Terminography normally requires richer descriptions
than standard definitions.
27. Current trends in research
Definitions, contexts, knowledge-rich contexts
Examples of lexicographic definitions
Linguistics: The scientific study of language
Categorical: Of or belonging to the categories.
- Usually not a complete sentence
- Often only with reduced information (certainly not enough
for learning the concept)
- Direct reference to specific lexical units
28. Current trends in research
Definitions, contexts, knowledge-rich contexts
Terminological definitions
Definition types
relate the concept to its hypernym (class of
objects, “genus proximum“)
enumerate all objects that fall under the category
in question
state how it differs from other hyponyms of the
genus proximum (“differentia specifica“) ,
„intension“ of the concept
“extension“ of the concept, “extensional“
definition, Wüster: “Umfangsdefinition“
A definition which describes the intension of a
concept by stating the superordinate concept and
the delimiting characteristics. (ISO 12620, ISOCat)
A description of a concept by enumerating all of
its subordinate concepts under one criterion of
subdivison. (ISO 12620, ISOCat)
29. Current trends in research
Definitions, contexts, knowledge-rich contexts
Terminological definitions
Examples
“The planets of the solar system are Mercury, Venus, Earth, Mars, Jupiter,
Saturn, Uranus, Neptune and Pluto.“
(Bessé: „Terminological Definitions“. In Wright/Budin 1997, pp. 63-74)
„Defektivum. Wort, das im Vergleich zu anderen Vertretern seiner Klasse
‚defekt‘ ist in bezug (sic!) auf seine grammatische Verwendung, z. B. bestimmte
Adjektive wie hiesig, dortig, mutmaßlich, die nur attributiv verwendet werden
können.“
(Bußmann: Lexikon der Sprachwissenschaft)
Many other classifications, see e.g. Cramer 2011
30. Current trends in research
Definitions, contexts, knowledge-rich contexts
Context
Standard category in terminological entries
Important, but under-specified
Context as usage example, e. g. „Photosynthesis takes place primarily in
plant leaves, and little to none occurs in stems, etc.”
-> can provide linguistic information (selectional preferences,
collocates)
Context as semantic description, e. g. „The parts of a typical leaf include
the upper and lower epidermis, the mesophyll, the vascular bundle(s)
(veins), and the stomates.”
-> provide semantic information, including information about conceptual
relations
(examples from IATE)
31. Current trends in research
Definitions, contexts, knowledge-rich contexts
Knowledge-rich contexts (KRCs, e.g. Meyer 2001)
My take on KRCs
Sentences that provide relevant bits and pieces of information (subject to
the definition of relevant semantic relations) that, taken together, can be
used for building rich semantic descriptions.
(Intentional or extensional) definitions are subtypes of KRCs.
There is much more information in texts than just restircted types
definitions.
Annotating KRCs in corpora is hard
Which is the domain?
Which is the definiendum?
Which semantic relations are relevant for (generic or domain-specific)
terminological descriptions?
Annotators prefer Aristotelian statements and are biased by lack or existence of
domain knowledge (Cramer 2011, Schumann 2013).
Research results for different languages mentioned in references section
32. Current trends in research
Usability aspects
How to support terminological workflows?
For which groups of language workers is terminology
relevant?
What kind of information do they look for?
Which kinds of software and formats do they use?
Survey (1782 respondents) conducted within the TAAS
project (http://www.taas-project.eu/)
information and graphics provided by KD Schmitz
38. Current trends in research
Intermediate summary
The needs of language workers are rather clear (tools, data
formats, time constraints, information needs, …).
Rich terminological descriptions are needed.
Semantic (conceptual) information seems to be more
important than linguistic information (score Wüster^^).
However, some linguistic issues need to be handled.
Almost all terminological resources are deficient in the most
important types of information (semantic information).
39. Term extraction and term mapping
Term extraction
Standard approach (for European languages)
POS filtering
Statistical filtering against a reference corpus
(filtering against stop list, frequency threshold)
40. Term extraction and term mapping
Term extraction
Statistical scores, e.g.
Tf.idf (cf. Manning/Schütze 1999: 543)
C-value (Frantzi et al. 2000), and many others …
41. Term extraction and term mapping
Term extraction
Statistical scores
Zhang et al. (2008) distinguish
unithood measures (mutual information, log-likelihood, t-test
etc.)
termhood measures (tf.idf, weirdness, domain pertinence,
domain specificity)
Combined methods (e.g. C-value)
They compare several methods
42. Term extraction and term mapping
Term extraction
TermExtractor (Sclano and Velardi 2007) combines
several approaches
Domain pertinence, where 𝐷 𝑖 is the domain of interest and
𝐷𝑗 is a document in another domain
Domain consensus, where norm_freq is a normalised
frequency in a domain-specific document
43. Term extraction and term mapping
Term extraction
TermExtractor (Sclano and Velardi 2007) combines
several approaches
Lexical cohesion, where n is the number of words
composing a candidate and 𝑤 𝑗 a word in the candidate
The final score is a linear combination of the three scores
Information about structural mark-up + a set of heuristics
44. Term extraction and term mapping
Term extraction
Nazar and Cabré (2012) present a supervised learning
approach to term extraction
Input
A POS-tagged list of domain terms
A reference corpus of general language
45. Term extraction and term mapping
Term extraction
Nazar and Cabré (2012) present a supervised learning
approach to term extraction
Algorithm
Calculate frequency distribution of POS sequences
Calculate frequency distribution of lexical units (word forms and
lemmas)
Calculate character ngrams for each word type
Accept, in the test data, only candidates with frequent POS
patterns
Rank candidates with frequent features higher than others
46. Term extraction and term mapping
Term alignment
Extract term candidates from comparable multilingual
corpora and map SL terms onto TL terms
Weller et al. (2011) deal only with neoclassical terms
(internationalisms)
Detect candidate equivalents using string similarity
Decompose SL candidates into morphemes (rule-based) and
translate morphemes into TL
For compounds, split the compound first
Check against TL candidate list
47. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (knowledgepoor) method for term mapping
Pre-processing
Lowercase candidate terms
Apply simple transliteration rules for converting from other scripts
to Latin
Find top N translation equivalents from a probabilistic dictionary
Find top M transliteration equivalents using Moses character-based
MT
48. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (resourceand knowledge-poor) method for term mapping
Example of pre-processed terms
49. Term extraction and term mapping
Term alignment
Pinnis (2013) presents a context-independent (resourceand knowledge-poor) method for term mapping
Mapping
For each token in each pre-processed term, find the longest
common substring in all other terms‘ constituents
Otherwise, fallback on a Levenshtein-based similarity metric
Maximise overlaps and score them
50. Conclusion of the session
To sum up: You have learned about
The role of terminology in translation and LSP
The theoretical foundations of the discipline
The structure, parts and basic principles of terminological
entries
Other kinds of onomasiological resources
Some journals, conferences and other resources
The importance of terminological variation and methods for
finding term variants
Semantic differences between concepts/terms that cannot be
tackled yet automatically
51. Conclusion of the session
To sum up: You have learned about (continued)
Terminological definitions, contexts and knowledge-rich
contexts
The need for rich terminological representations and
approaches for providing them
Some practical aspects of terminological workflows
Knowledge-rich and knowledge-poor approaches to
term extraction and term mapping
52. References: Literature
Bessé, Bruno de (1997): “Terminological definitions“. Wright, Sue Ellen / Budin, Gerhard
(eds.): Handbook of Terminology Management. Vol. 1: Basic Aspects of Terminology
Management. Amsterdam/Philadelphia: John Benjamins, pp. 63-74.
Bußmann, Hadumod (1990): Lexikon der Sprachwissenschaft. Stuttgart: Kröner.
Cabré, M. Teresa (1998): “Do we need an autonomous theory of terms?“. Terminology 5
(1), pp. 5-19.
Cramer, Irene (2011): Definitionen in Wörterbuch und Text: Zur manuellen Annotation,
korpusgestützten Analyse und automatischen Extraktion definitorischer Textsegmente im
Kontext der computergestützten Lexikographie. PhD dissertation, University of
Dortmund, Germany.
Collet, Tanja (2004): “ What’s a term? An attempt to define the term within the
theoretical framework of text linguistics”. Linguistica Antverpiensia 3, pp. 99-111.
Daille, Béatrice (2005): “Variations and application-orinted terminology engineering“.
Terminology 11 (1), pp. 181-197.
Daille, Béatrice / Habert, Benoît / Jacquemin, Christian / Royauté, Jean (1996): “Empirical
observation of term variations and principles for their description“. Terminology 3 (2),
pp. 197-257.
53. References: Literature
Del Gaudio, Rosa / Branco, Antonio (2007): “Automatic Extraction of Definitions in
Portuguese: A Rule-Based Approach“. Neves, José / Santos, Manuel Filipe / Machado,
José Manuel (eds): Progress in Artificial Intelligence. Berlin/Heidelberg: Springer, pp. 659670.
Fahmi, Ismail / Bouma, Gosse (2006): “Learning to Identify Definitions using Syntactic
Features“. Workshop on Learning Structured Information in Natural Language
Applications at EACL 2006, Trento, Italy, April 3, pp. 64-71.
Fišer, Darja / Pollak, Senja / Vintar, Špela (2010): “Learning to Mine Definitions from
Slovene Structured and Unstructured Knowledge-Rich Resources“. LREC 2010, Valletta,
Malta, May 19-21, pp. 2932-2936.
Frantzi, Katerina / Ananiadou, Sophia / Mima, Hideki (2000): “Automatic Recognition of
Multi-Word Terms: the C-value/NC-value Method“. International Journal on Digital
Libraries 3 (2), pp. 115-130.
Freixa, Judit (2006): “ Causes of denominative variation in terminology. A typology
proposal”. Terminology 12 (1), pp. 51-77.
Geeraerts, Dirk (2010): Theories of Lexical Semantics. Oxford: Oxford University Press.
54. References: Literature
Manning, Christopher D. / Schütze, Hinrich (1999): Foundations of statistical natural
language processing. Cambridge: MIT Press.
Meyer, Ingrid (2001): “ Extracting Knowledge-Rich Contexts for Terminography: A
conceptual and methodological framework”. Bourigault, Didier / Jacquemin, Christian /
L’Homme, Marie-Claude (eds.): Recent Advances in Computational Terminology.
Amsterdam/Philadelphia: John Benjamins, pp. 279-302.
Malaisé, Véronique / Zweigenbaum, Pierre / Bachimont, Bruno (2005): “Mining defining
contexts to help structuring differential ontologies”. Terminology 11 (1), pp. 21-53.
Marshman, Elizabeth (2008): “ Expressions of uncertainty in candidate knowledge-rich
contexts”. Terminology 14 (1), pp. 124-151.
Muresan, Smaranda / Klavans, Judith (2002): “A Method for Automatically Building and
Evaluating Dictionary Resources”. LREC 2002, Las Palmas, Spain, May 29-31, pp. 231-234.
Nazar, Rogelio / Cabré, Maria Teresa (2012): “Supervised Learning Algorithms Applied to
Terminology Extraction“. TKE 2012, Madrid, Spain, June 19-22, pp. 209-217.
Pearson, Jennifer (1998): Terms in Context. Amsterdam/Philadelphia: John Benjamins.
Pinnis, Mārcis (2013): “Context Independent Term Mapper for European Languages“.
RANLP 2013, Hissar, Bulgaria, September 7-13, pp. 562-570.
55. References: Literature
Przepiórkowski, Adam / Degórski, Łukasz / Spousta, Miroslav / Simov, Kiril / Osenova,
Petya / Lemnitzer, Lothar / Kuboň, Vladislav / Wójtowicz, Beata (2007): “Towards the
Automatic Extraction of Definitions in Slavic“. BSNLP workshop at ACL 2007, Prague,
Czech Republic, June 29, pp. 43-50.
Sclano, Francesco / Velardi, Paola (2007): “TermExtractor: a Web Application to Learn
the Shared Terminology of Emergent Web Communities“. TIA 2007, Sophia Antipolis,
France, October 8-9.
Schmitt, Peter A. (1999): Translation und Technik. Tübingen: Stauffenburg.
Schumann, Anne-Kathrin (2013): “Collection, Annotation and Analysis of Gold Standard
Corpora for Knowledge-Rich Context Extraction in Russian and German“. Student
workshop at RANLP 2013, Hissar, Bulgaria, September 7-13, pp. 134-141.
Sierra, Gerardo / Alarcón, Rodrigo / Aguilar, César / Bach, Carme (2008): “Definitional
verbal patterns for semantic relation extraction”. Terminology 14 (1), pp. 74-98.
Storrer, Angelika / Wellinghoff, Sandra (2006): “Automated detection and annotation of
term definitions in German text corpora”. LREC 2006, Genoa, Italy, May 24-26, pp. 23732376.
56. References: Literature
Weller, Marion / Gojun, Anita / Heid, Ulrich / Daille, Béatrice / Harastani,
Rima (2011): “Simple methods for dealing with term variation and term
alignment“. TIA 2011, Paris, France, November 8-10, pp. 87-93.
Westerhout, Eline (2009): “Definition Extraction using Linguistic and
Structural Features“. First Workshop on Definition Extraction at RANLP
2009, Borovets, Bulgaria, September 14-16, pp. 61-67.
Wüster, Eugen (1985): Einführung in die Allgemeine Terminologielehre und
terminologische Lexikographie. 2nd edition. Wien: Infoterm.
Zhang, Ziqi / Iria, José / Brewster / Christopher, Ciravegna, Fabio (2008):
“A Comparative Evaluation of Term Recognition Algorithms“. LREC 2008,
Marrakech, Morocco, May 28-30, pp. 2108-2113.
58. Contributions to this Presentation
Prof. Klaus-Dirk Schmitz, Cologne University of Applied Sciences
Thanks to Dr. Alessandro Cattelan for backing me up!