The document discusses Lin Ma's PhD research on analyzing presuppositions in natural language requirements. Presuppositions are implicit commitments in language that simplify communication but can cause misunderstanding if not made explicit. The research aims to automatically detect presuppositions triggered by definite descriptions in requirements and identify which are not explicitly stated. It will use natural language processing techniques and knowledge sources to classify definite descriptions and analyze how presuppositions project in requirements texts.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Ma
1. 2010 CRC PhD Student Conference
Presupposition Analysis in Requirements
Lin Ma
l.ma@open.ac.uk
Supervisors Prof. Bashar Nuseibeh
Prof. Anne De Roeck
Dr. Paul Piwek
Dr. Alistair Willis
Department/Institute Department of Computing
Status Fulltime
Probation viva After
Starting date 1-Feb-2009
Motivation
Natural language is the most commonly used representation language in requirements
engineering [1]. However, compared with formal logics, natural language is
inherently ambiguous and lacks a formal semantics [2]. Communicating requirements
perfectly through natural language is thus not easy. Examining the linguistic
phenomena in natural language requirements can help with decoding what a person
means in communication. This method was originally used in psychotherapy and then
adopted in requirements engineering [3]. Presupposition is one of these linguistic
phenomena. It simplifies communication by pointing to references to bits of
knowledge that are taken for granted by the document writer. In requirements
engineering, however, we must know exactly what information we’ve lost by
simplification, or we run the risk of a misunderstanding. For instance, the requirement
(1) Accessibility in the experimental hall is required for changing the piggy board
where the device will be mounted.
commits the reader to the presuppositions that there is an experimental hall, there is a
piggy board and there is a device. These types of implicit commitments might be
misinterpreted or overlooked due to different background knowledge in the other
stakeholder’s domain. More precisely, for instance, concerning the presupposition that
there is a piggy board in example (1), the reader of this requirement may know a
piggy board A and choose to believe A is the thing that the document writer is writing
about. However, the document writer may mean piggy board B or just any new piggy
board. In this research, we propose to use natural language processing techniques for
automatically detecting such implicit commitments in requirements documents, and
identifying which of those are not made explicit.
Background
Presuppositions are triggered by certain types of syntactic structures – presupposition
triggers [4]. Therefore, presuppositions can be found by identifying the triggers in the
Page 51 of 125
2. 2010 CRC PhD Student Conference
text. The presupposition trigger types can be divided into two general classes –
definite descriptions (noun phrases starting with determiners such as the piggy board
in example (1)) and other trigger types (for example, cleft - It + be + noun +
subordinate clause, stressed constituents - words in italic in texts). Definite
descriptions differ from other trigger types because they occur very frequently in all
styles of natural language [5], are easy to retrieve (because of their distinct structure
with the determiner the) and they often have possible referential relations with earlier
text [6]. We hence focus on presuppositions triggered by definite descriptions in this
research.
One major problem in the study of presupposition is presupposition projection. An
elementary presupposition is a presupposition of part of an utterance. Presupposition
projection, as the name suggests, is the study of whether an elementary presupposition
is a presupposition of the whole utterance (termed as actual presupposition). Here two
examples are given for distinct scenarios in requirements, one where an elementary
presupposition projects out and one where it does not:
(2) a. If funds are inadequate, the system will notify….
b. If there is a system, the system will notify…
Intuitively, when a reader accepts utterance (2b), he/she does not take the
presupposition that there is a system for granted. The elementary presupposition that
there is a system in the consequent of the conditional somehow does not project. The
same elementary presupposition that there is a system nevertheless projects out in
example (2a), which signals to the reader that the document writer takes for granted
that there is a system.
Methodology
The Binding Theory [7] of presupposition is a widely accepted formal framework for
modelling presupposition, in which presupposition is viewed as anaphora (anaphora
are expressions, such as a pronoun, which depends for its interpretation on a
preceding expression, i.e., an antecedent). Presupposition projection is treated as
looking for a path to an earlier part of the discourse which hosts an antecedent that
can bind the presupposition. Whenever an antecedent is found in the discourse, the
presupposition is bound, and thus does not project out. Therefore, according to the
Binding Theory, the actual presuppositions in a discourse are those which do not have
any antecedent existing earlier in the discourse. We adopt this view as the theoretical
ground.
[8] presents an automated approach for classifying definite descriptions. This
approach is compatible with the Binding Theory. It classifies definite descriptions as:
Discourse new: those that are independent from previous discourse elements for
the description interpretation (according to the Binding Theory, discourse new
definite descriptions introduce actual presuppositions with respect to a discourse,
because they do not have any antecedent);
Page 52 of 125
3. 2010 CRC PhD Student Conference
Anaphoric: those that have co-referential 1 (co-reference is defined as multiple
expressions in a sentence or document have the same referent) antecedents in the
previous discourse;
Bridging [9]: those that either (i) have an antecedent denoting the same discourse
entity, but using a different head noun (e.g. a house . . . the building), or (ii) are
related by a relation other than identity to an entity already introduced in the
discourse (e.g. the partial relation between memory…the buffer).
Given example (3), “the experimental hall” has an antecedent in the previous sentence
– “an experiment hall”, so it will be classified as anaphoric. If we somehow have the
knowledge that a piggy board is a small circuit board mounted on a larger board, “the
piggy board” is a bridging definite description referring to part of “PAD boards”.
Finally, “the device” is a discourse new definite description which triggers the actual
presupposition that there is a device with respect to the discourse.
(3) An experimental hall shall be built….
PAD boards shall be used….
Accessibility in the experimental hall is required for changing the piggy board
where the device will be mounted.
In [8], the authors used a set of heuristics based on an empirical study of definite
descriptions [6] for performing the classification task. The heuristics include, for
example:
For discourse new definite descriptions: one of the heuristics is to examine a list
of special predicates (e.g. fact). If the head noun of the definite description
appears in the list, it is classified as discourse new.
For anaphoric definite descriptions: matching the head noun and modifiers with
earlier noun phrases. If there is a matching, it is classified as anaphoric. For
example, An experimental hall…the experimental hall.
For bridging: one of the heuristics is to use WordNet [10] for identifying relations
between head nouns with earlier noun phrases. If there is a relation, such as a
part-of relation, it is classified as bridging. For example, PAD boards…the piggy
board.
However, as stated by the authors of [8], this approach is insufficient to deal with
complex definite descriptions with modifiers and lacks a good knowledge base to
resolve the bridging definite descriptions (WordNet performed really poor in this
case). In my research, we will further develop this approach and implement a software
system that is able to analyze the projection behavior of presuppositions triggered by
definite descriptions in requirements documents. The development focus is on
analyzing modifiers of definite descriptions and making use of external knowledge
sources (such as ontologies built upon Wikipedia [11]) for resolving bridging definite
descriptions. Especially for bridging definite descriptions, if the relation can be
1
In a strict sense, the concept of anaphora is different from co-reference because the former requires
the meaning of its antecedents to interpret, but the latter do not. Here they are used as synonymies as
multiple expressions in a sentence or document have the same referent.
Page 53 of 125
4. 2010 CRC PhD Student Conference
identified in the knowledge base, it will help with making a choice between creating a
new discourse entity or picking up an existing antecedent. As a result, the actual
presuppositions (the discourse new definite descriptions) can be identified. The
system will be evaluated through existing corpora with annotated noun phrases, such
as the GNOME corpus [12]. We will also manually annotate several requirements
documents and perform the evaluation on the annotation results.
References
[1] L. Mich and R. Garigliano, “NL-OOPS: A requirements analysis tool based on
natural language processing,” Proceedings of the 3rd International Conference
on Data Mining Methods and Databases for Engineering,, Bologna, Italy: 2002.
[2] V. Gervasi and D. Zowghi, “Reasoning about inconsistencies in natural language
requirements,” ACM Transactions on Software Engineering and Methodology
(TOSEM), vol. 14, 2005, pp. 277–330.
[3] R. Goetz and C. Rupp, “Psychotherapy for system requirements,” Cognitive
Informatics, 2003. Proceedings. The Second IEEE International Conference on,
2003, pp. 75–80.
[4] S.C. Levinson, Pragmatics, Cambridge, UK: Cambridge University Press, 2000.
[5] J. Spenader, “Presuppositions in Spoken Discourse,” Phd. Thesis, Department of
Linguistics Stockholm University, 2002.
[6] M. Poesio and R. Vieira, “A corpus-based investigation of definite description
use,” Computational Linguistics, vol. 24, 1998, pp. 183–216.
[7] R.A. Van der Sandt and B. Geurts, “Presupposition, anaphora, and lexical
content,” Text Understanding in LILOG, O. Herzog and C. Rollinger, Eds.,
Springer, 1991, pp. 259-296.
[8] R. Vieira and M. Poesio, “An empirically based system for processing definite
descriptions,” Computational Linguistics, vol. 26, 2000, pp. 539–593.
[9] H.H. Clark, “Bridging,” Thinking, 1977, pp. 411–420.
[10] C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT
press, 1998.
[11] M.C. Müller, M. Mieskes, and M. Strube, “Knowledge Sources for Bridging
Resolution in Multi-Party Dialog,” Proceedings of the 6th International
Conference on Language Resources and Evaluation, Marrakech, Morocco:
2008.
[12] M. Poesio, “Annotating a corpus to develop and evaluate discourse entity
realization algorithms: issues and preliminary results,” Proc. of the 2nd LREC,
2000, pp. 211–218.
Page 54 of 125