The document discusses requirements and natural language in public administration. It describes the LearnPAd project, which aims to model public administration procedures using BPMN and allow describing them further with natural language. It discusses LearnPAd's requirements process, including identifying typical defects in natural language descriptions of procedures through interviews. It also presents an approach to detect pragmatic ambiguities in natural language requirements using collective intelligence by having different readers analyze and compare their interpretations of requirements based on their domain knowledge.
Public Administration, Laws Requirements, Natural Language
1. Public Administration, Laws
Requirements, Natural Language
Alessio Ferrari1
alessio.ferrari@isti.cnr.it
ISTI-CNR, Pisa, Italy
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 1 / 45
2. Preliminaries
Who am I?
Alessio Ferrari, Ph. D. in Computer Engineering
Three years at GE Transportation Systems s.p.a. (Modelling and
Code Generation)
Three years at ISTI-CNR (Requirements Engineering and NLP)
Main interests: artificial intelligence, natural language
Content of this Talk
LearnPAd EU Project: model-based learning for Public
Administrations (www.learnpad.eu)
Requirements in LearnPAd
Natural language pragmatic ambiguities
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 2 / 45
5. LearnPAd Project
FP7- ICT-2013.8.2 European Project
Model-based learning in the Public Administration (PA) domain
IDEA 1: PA procedures can be modelled with Business Process
Model and Notation (BPMN)
IDEA 2: PA procedures can be enriched by civil servants with
Natural Language (NL) descriptions
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 5 / 45
10. EU Projects Peculiarities
number/distribution of partners: 9 partners, plenary discussion
difficult
culture: Italy, France, Switzerland, Austria, Lithuania, need to
meet/talk
industrial vs academic mindsets: 4 academic, 2 close source
companies, 2 open source, 1 PA, industries more practical in RE
background: different domains and terminology
abstraction: focus on specific background leads to lack of
abstraction
age/roles: uneasiness of young vs old
objectives: requirements introduced to pursue specific interests
focus: the project is not the main activity of participants
What often happens...
Everyone develop their piece of the project → integration issues
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 10 / 45
12. KJ Sessions
Activity
24 people in 3 groups: Modelling, Learning, Quality
Description of the task by the moderator
Write requirements in cards
Discuss the requirements
Second session to add new requirements
People really excited and high degree of participation
Initial individual activity mitigated age/role effects and objective
discrepancies
Second session to align terminology
Moderators: with recognized authority, or external (not
representative of any group)
Still, most of the 249 requirements were poorly specified
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 12 / 45
13. Collaborative Refinement
Requirements uploaded in a Wiki platform (XWiki)
Justifications given and Refinements provided
People rather motivated (even if motivation was not perceived)
249 → 337 requirements
People do not contribute to the requirements of others
Still, requirements were poorly specified
A selected task force of project participants provided a set of 191
consolidated requirements
People directly asked to clarify their requirements
Excel sheets used for refinement and consolidation
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 13 / 45
14. Goal Modelling
Bottom-up goal model definition
From requirements to justifications (goals)
Provide higher degree of abstraction and spot-our missing needs
Goal Models
Stage 0
Reqs: -
Stage 1
R: 82
Stage 2
R: 78
Stage 3
R: 90 Score
G S E G S E G S E G S E
Main 24 4 3 32 4 5 24 4 4 24 4 4 H
Learning content accessed - - - 9 0 1 32 4 1 47 5 4 H
Quality of WIKI Documents - - - 17 0 2 17 0 2 17 0 2 M
Quality of BP Models - - - 12 0 3 17 0 4 17 0 4 M
Learning support provided - - - 13 0 1 17 1 1 17 1 1 H
BP Models edited - - - - - - 15 0 2 15 0 2 M
BP Models reused - - - - - - 8 0 0 8 0 0 M
Quality by logging - - - - - - 15 0 0 15 0 0 M
Iterative definition of content - - - - - - 19 0 1 19 0 1 M
Platform flexibility enforced - - - - - - 11 0 0 11 0 0 H
Knowledge assessment - - - - - - 8 0 0 39 1 5 M
Procedural learning provided - - - - - - - - - 24 2 1 L
TOTAL 24 4 3 83 4 12 183 9 15 283 13 24
Table : Growth of the goal models at each stage. R = number of original
requirements. G = number of hard-goals and requirements. S = number of
soft-goals. E = number of expectations.
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 14 / 45
15. What have we learnt
People have to be trained about writing requirements
People from academia less confident in collaborative
requirements elicitation
Too few user requirements → involve users in separate meetings
Need for a web-moderator/leader to motivate collaborative
refinement
XWiki is good to get statistics on requirements
Goal modelling useful to have abstract view and spot out missing
needs but requires effort
Tooling not appropriate for goal modelling and sharing (we
preferred sharing with Google Docs but traceability was poor)
Integrated tools for the whole requirements process are missing
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 15 / 45
16. Improved Requirements Process
KJ Sessions
Collaborative
Requirements
Sessions
(XWiki)
Requirements
Analysis
Preliminary
Requirements
Structured
Requirements
Justifications
Goal Model
Learning
Modelling
Quality
Glossary Tags
VOLERE
Requirements
Analysis
Consolidated
Requirements
and
Justifications
GOAL Modelling
(Objectiver)
Goals evaluation
Requirements Lesson
Preliminary Glossary
Web Moderator
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 16 / 45
17. LearnPAd: Quality of NL Descriptions
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 17 / 45
18. LearnPAd: Quality of NL descriptions ensured
BP
Model
BP Manager
WIKI
Doc
Load
Select
Criterions
VALIDATE
Press
Validate
Quality
Evaluation
Page
Complexity
Structuring
Ambiguity
Complexity: 0.9 (Reduce)
Structuring: 0.1 (Increase)
Ambiguity: 0.7 (Reduce) INSPECT
INSPECT
Inspection
Page
The document shall be sent to the
proper authorities as soon as
possible after the document has
been signed by the officer
WIKI Doc (Non Editable)
Press
Inspect
MODIFY
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 18 / 45
19. LearnPAd: Quality of NL descriptions ensured
Objective
Identify typical NL defects of PA documents
Rationale
We do not have contributions of civil servants
We ask civil servants about their difficulties with their current
documents
We identify quality defects of currently existing PA documents,
normally edited (and read) by civil servants
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 19 / 45
20. Defects in NL Descriptions: Process
Perform
Interviews
Define
Questionnaire
Deliver
Questionnaire
Evaluate
Questionnaire
List of most
relevant
categories of
defects to be
detected in PA
procedures
Evaluate Web-links
defining guidelines for
editing PA procedures
Define guidelines for
editing PA procedures
Guidelines for
editing PA
procedures
List of categories
of defects to be
detected in PA
procedures
Evaluate guidelines
Rule-based
identifiable
defects
Non-rule
based
identifiable
defects
Define defect
categories to be
identified with
machine-learning
Implement rule-based
approach for the
identification of most
relevant defects
Tag data-set
according to
categories
Select PA procedures
from the Web
Select a sub-set of PA
procedures as data-
set
Implement machine-
learning approach
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 20 / 45
21. Defects in NL Descriptions: From the interviews
7 people interviewed
1 EU officer, 4 people from administrative staff of CNR (Research
Institute), 2 municipality employees from the Marche Region
Which are the defects in the NL documents you deal with?
Defects
Most of the time, procedures are not described anywhere!
Cross-references with too many laws
Ambiguity and Vagueness
Lack of context
Redundancy
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 21 / 45
22. Defects in NL Descriptions
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 22 / 45
24. Ambiguity in Natural Language Requirements
It would be nice to have formal requirements, but NL is the most
widely understood communication code
NL is inherently ambiguous
Ambiguous requirements might cause misinterpretations
among stakeholders
The developer/modeller might decide a possible interpretation of
the requirement - unconscious disambiguation
Ambiguities are lexical, syntactic, semantic, and...
PRAGMATIC
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 24 / 45
25. A Mole at Work
There is a
MOLE
at WORK
mh...
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 25 / 45
26. Pragmatic Ambiguities depend on the CONTEXT
Fe
-
+
Common Sense
Knowledge
Domain Knowledge
Other Requirements
Other Situational Aspects
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 26 / 45
27. Approach for Pragmatic Ambiguity Detection
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 27 / 45
28. Domain knowledge acquisition for different readers
DOCUMENT SET 1 DOCUMENT SET 2
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 28 / 45
29. Different readers analyse the same requirement
REQUIREMENT
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 29 / 45
30. Different readers compare their interpretations
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 30 / 45
32. Domain Knowledge Modelling
We model the domain knowledge as a weighted graph
Each node is a concept
Each edge represents a connection among concepts
The weight of the edge represent how close is the connection
between two concepts
The lower the weight, the closer the connection
The weight is derived from the number of co-occurrences
We build this weighted graph starting from Web pages
concerning the domain of the requirements document
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 32 / 45
34. Requirements interpretation as a least-cost path search
Interpreting a requirement is activating the concepts of the
requirement in the knowledge graph
Activating two concepts in a requirement implies the activation of
other neighboring concepts
The concepts that are activated are those that are more closely
connected with the concepts in the requirement (i.e., their edges
have lower weight)
The interpretation of the requirement is a least-cost path search
within the domain knowledge graph
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 34 / 45
35. Requirements Interpretation
REQ. 1 - The system shall store patient data
system
store
patient
data
button
feedback
screen
database
retrieve
memory
content
location
vaccine
name
sickness
doctor
surname
ram
disk
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 35 / 45
37. Issues on Coverage and Threshold
Coverage
The content of the domain document shall cover the content of the
requirements specification
Minimum coverage: ρ = terms in requirements∩terms in documents
terms in requirements
Threshold
Multiple analysis with different combinations of documents to
compute similarities: ¯σ(Ri) and σmin(Ri)
Thresholds computed as average of the similarities for R1 . . . Rn
τ¯σ and τσmin
are the considered thresholds
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 37 / 45
39. Experimental Evaluation
Source
Requirement specification of a system for Outbreak Management
(OM) issued by the Public Health Information Network (PHIN)
Data collection (names, vaccines, clinical samples) from people
that might be affected by an epidemic health event
Set-up
114 requirements
43 include pragmatic ambiguities (manual)
25 domain documents
5 different combinations of documents
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 39 / 45
40. Experimental Evaluation: Domain Documents
ID Title Link
d1 PHEMCE strategy http://goo.gl/hYaipm
d2 Application to clinical and Public Health Practice http://goo.gl/hVVy1Y
d3 Biodefense countermeasure Department of Defense http://goo.gl/I6U0Ns
d4 Wikipedia page for “Case Definition” http://goo.gl/yPndtx
d5 Wikipedia page for “Chain of Custody” http://goo.gl/4uvTuc
d6 Definition of “Chain of custody” http://goo.gl/OUgcQd
d7 Communicable disease outbreak plan http://goo.gl/rV72wX
d8 Foodborn outbreak management http://goo.gl/pTlgp9
d9 Guidelines for the investigation and control of outbreaks http://goo.gl/Sv4Ebu
d10 Practice guidelines of the infectious diseases http://goo.gl/GjLvg2
d11 Implementation guide ambulatory healthcare http://goo.gl/qEiLGR
d12 Management of scabies outbreaks http://goo.gl/GUAbKS
d13 Modeling information systems architectures di P. Grefen http://goo.gl/j2E4Lx
d14 Outbreak control http://goo.gl/f0HC1h
d15 Outbreak management guidelines for healthcare http://goo.gl/EcYVEi
d16 Surveillance and response in humanitarian emergencies http://goo.gl/ybje6i
d17 PHIN guide for syndromic surveillance http://goo.gl/lEz8zw
d18 PHIN messagging guide for syndromic surveillance http://goo.gl/3AAXNE
d19 Developing a management system: an overview http://goo.gl/0l5sth
d20 Industrial system 800xA system architecture http://goo.gl/RSaBnD
d21 System architecture and complexity http://goo.gl/v44tC0
d22 WHO guidelines for epidemic prearedness and response http://goo.gl/PK9yn7
d23 Wikipedia page for “Management System” http://goo.gl/mgWfhh
d24 Wikipedia page for “Outbreak” http://goo.gl/LUQEWm
d25 Wikipedia page for “Scabies” http://goo.gl/fjYYrQ
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 40 / 45
42. Observations
Requirements analysis tools shall be tuned to favour recall over
precision (Dan Berry)
False negative cases are the main issue
“Demographic information should be collected about the
investigator [...]”
→ influence of the other terms in the computation of the similarity
“Mapping interfaces and data dictionaries must be defined [...]”
→ multi-word terms
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 42 / 45
43. Summary and Future Works
Unsupervised and statistical (not rule-based) method
Consider novel similarity metrics to emphasize the role of
single ambiguous terms
Consider multi-word terms
Include the common-sense knowledge
Concepts that are highly connected in the domain knowledge are
less connected in the common sense knowledge
Integrate structural and dynamic beliefs about the world and the
domain within the knowledge graphs
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 43 / 45