Turn leadership mistakes into a better future.pptx
Wp4 ws cea2020
1. Methods and Tools for GDPR Compliance through
Privacy and Data
Protection 4 Engineering
PDP4E-Req
Managing privacy and
GDPR-related requirements
CEA, University of Duisburg-Essen
10.03.2020
2. Innovation Scenario
10/03/2020
• Privacy engineering posits that privacy must be considered as a primary development aspect:
• Privacy must be addressed from the early stages of a systems’ life cycle (by-design)
• Requirements related to privacy and data protection must be properly elicited and documented!
• This urges the development of systematic methods and tools to support software engineers in the
identification and documentation of privacy requirements!
• Legal frameworks such as the GDPR introduce new requirement engineering challenges:
• Legal provisions are expressed in a jargon alien to most software developers
• As opposite to system requirements, provisions are described in a high-level
• Multiple interpretations leading to ambiguous/contradictory software requirements
• Mapping legal provisions to system requirements is not always straight-forward
PDP4E 2
3. Scope and Objectives
10/03/2020
• Develop requirements engineering methods and tools that:
• Support the elicitation of privacy and data protection requirements
• Systematically
• Structured
• Computer aided
• Aligned with the GDPR legal provisions
• Aligned with privacy and data protection standards (ISO 29100)
• Driven by privacy goals such as
• Integrity
• Confidentiality
• Transparency…
PDP4E 3
4. Method Background
10/03/2020
• Privacy scholars have introduced different requirement engineering methods and techniques
• ProPAn (Problem-based Privacy Analysis) is a computer-aided method for privacy requirements
engineering.
• Analyses systematically the functional requirements of a system-to-be with regard to a set of privacy
engineering protection goals.
• Goals are represented through a taxonomy of high-level privacy requirements.
• The taxonomy is derived from the legal framework relevant for the data controller and the data
subject (e.g. the GDPR).
• The taxonomy guides the identification of critical points in the data flow of the system that may
rise privacy concerns from the stakeholders.
• The input of the method is a representation of the system-to-be as a Problem Diagram.
PDP4E 4
5. The ProPAn Method
10/03/2020
• The ProPAn method can be divided on two phases:
1. PHASE 1: Identification of Privacy-Relevant Information Flows
2. PHASE 2: Generation of privacy requirements
• In Phase 1, ProPAn elaborates on a set of software artefacts:
• Context Diagram (1..1)
• Problem Diagrams (1..N)
• Domain Knowledge diagram (1..N)
• Stakeholder Information Flow diagram (1..N)
• Personal Information Diagram (1..N)
• Available Information Diagram (1..N)
Context Elicitation
Graph Generation
Identification of Personal Data
Personal Data Flow Analysis
Functional Requirements
Context Diagram, Domain Knowledge,
Problem Diagrams
Detailed Stakeholder
Information Flow Graphs
Personal Information Diagrams
Available Information Diagrams
Method Step
External Input Internal Input/Output
Phase 1: Identification of Privacy-Relevant Information Flows
PDP4E 5
6. The ProPAn Method: Phase 1
10/03/2020
• ProPAn and its artefacts are based on the problem frame notation by Jackson:
• Context Diagram (1..1)
• Problem Diagrams (1..N)
• Domain Knowledge (1..N)
• The “world” is modelled in terms of:
• Domains
• Interfaces
• Phenomena (events) exchanged between domains
• Causal: Events, actions, messages, and operations.
• Symbolic: Data and states.
• In this sense, a problem diagram is an event-oriented artefact which is used to specify requirements.
Context Elicitation
Graph Generation
Identification of Personal Data
Personal Data Flow Analysis
Functional Requirements
Context Diagram, Domain Knowledge,
Problem Diagrams
Detailed Stakeholder
Information Flow Graphs
Personal Information Diagrams
Available Information Diagrams
Method Step
External Input Internal Input/Output
PDP4E 6
7. The ProPAn Method: Phase 1
10/03/2020
• In order to perform a privacy analysis over a set of functional requirements we need requirements
expressed in a data-oriented fashion.
• To represent the exchange of personal data, ProPAn introduces a set of refinements and alternative
data structures to the original problem diagrams:
• Detailed Stakeholder Data Flow Diagrams (DSIFDs)
• Available Information Diagrams (AIDs)
• Personal Information Diagrams (PIDs)
• These new diagrams incorporate additional information which is necessary to conduct a privacy
analysis and, therefore privacy requirements derivation.
PDP4E 7
8. ProPAn Requirement Taxonomies
10/03/2020
• ProPAn analyses the GDPR and ISO 29100 using the Privacy Engineering Protection Goals (PEPGs)
unlikability, transparency, intervenability, confidentiality, integrity and availability
• A set of extensible taxonomies is generated representing a collection of meta-requirements
which are necessary for achieving these PEPGs.
• Generation of <transparency> meta-requirements:
• <transparency> related verbs and nouns such as inform, notify, present, provide, explain and
communicate are searched in the GDPR and ISO 29100 to identify <transparency>
requirements.
• Refinement relations between the identified <transparency> requirements are identified.
• If a requirement B refines a requirement A, it means that B adds further details on how or
what information has to be made <transparent>.
PDP4E 8
9. ProPAn Transparency Taxonomy
10/03/2020
78 Chapter 5. Refining the Privacy Goal Transparency
5.2.2. Setting up a Transparency Requirements Taxonomy
In this section, I structure the identified preliminary transparency requirements into a trans-
parency requirements taxonomy. Figure 5.2 shows my taxonomy in the form of a metamodel
as a UML class diagram. I structured the transparency requirements into a hierarchy, which is
derived from the initial ontology shown in Figure 5.1. I describe my taxonomy in the following
from the top to the bottom. An overview of the mapping between the transparency requirements
taxonomy to the initial transparency requirements is given in Table 5.1.
Figure 5.2.: My proposed taxonomy of transparency requirements.
Table 5.1.: Mapping of transparency requirements to preliminary requirements
Requirement Attribute Tn
TransparencyRequirement
data subject, personal data T1
controller T5
counterstakeholder T4, T14
linkability T16
Figure 5.2.: My proposed taxonomy of transparency requirements.
Table 5.1.: Mapping of transparency requirements to preliminary requirements
Requirement Attribute Tn
TransparencyRequirement
data subject, personal data T1
controller T5
counterstakeholder T4, T14
linkability T16
sensitiveData T19
PresentationRequirement
accessibility T2
language T18
time T16, T29, T30
ExceptionalInformationRequirement
case T17, T21, T24, T30
authorities T21
ProcessingInformationRequirement
controlOptions T6, T7, T8, T26
mandatory T10, T20
purpose, reason T3, T17, T23
security T22
CollectionInformationRequirement method T11, T28
StorageInformationRequirement retention T13, T15, T25
FlowInformationRequirement contract, country T9, T12, T27
5.2.2.1. Transparency Requirement
The top-level element of my hierarchy is the general TransparencyRequirement which corresponds
to the initial requirement T1. In my metamodel I declared this requirement as abstract, i.e., it is
PDP4E 9
11. Generation of Privacy Req. Candidates
Information 2016, 7, 28 12
Figure 10. Used Taxonomy of Unlinkability Requirements.
UndetectabilityRequirement Pfitzmann and Hansen define undetectability as: “Undetectabilit
an item of interest (IOI) from an attacker’s perspective means that the attacker cannot sufficien
distinguish whether it exists or not.”
AnonymityRequirement Pfitzmann and Hansen define anonymity as: “Anonymity of a sub
from an attacker’s perspective means that the attacker cannot sufficiently identify the subject with
set of subjects, the anonymity set.”
DataUnlinkabilityRequirement Pfitzmann and Hansen define unlinkability as: “Unlinkabi
of two or more items of interest (IOIs, e.g., subjects, messages, actions, ...) from an attack
perspective means that within the system (comprising these and possibly other items), the attac
cannot sufficiently distinguish whether these IOIs are related or not.” Our data unlinkabi
requirements express the intended relations between messages, actions, and the like which a sub
performs and do not concern the relations of these to the subject itself, as these are represented
anonymity requirements.
10/03/2020
Information 2016, 7, 28 10 of 32
Figure 9. View on the available information diagram for the insurance application showing which
links between the personal data of the patient are available at the insurance application.
Application to Running Example
Figure 8 shows all personal information that was identified for the patient. During the personal
data flow analysis new personal information was identified based on the personal data that were
identified during the step Identification of Personal Data (cf. Figure 6). All newly identified personal
data can be derived from or is contained in the initially identified personal data and additionally, we
identified contains and derivedFrom relations among the initially identified personal data. For example,
we identified that the healthInformation of patients can be derived from the patient’s healthStatus.
Furthermore, it was identified that the diagnosis, which doctors create for patients, and the chosen
treatment are derived from the patient’s healthStatus by doctors. In addition, from the diagnosis and
treatment, the costs for the performed treatment (treatmentCosts) can be derived.
Figure 7 shows which personal information of patients are available at the insurance application.
For example, from requirement R3 (attribute origin) it was identified that for accounting, the patient’s
diagnosis, treatment, treatment costs, and insurance number are sent to the insurance application.
Insurance Employee Available Information Diagram
Figure 7. View on the available information diagram for the insurance application showing the which
personal data of patients are available at the insurance application.
During the information flow analysis, we elicit two additional relations between the identified
personal data. The relation contains documents that one personal information contains another
personal information. We document this using an aggregation with stereotype «contains». The relation
derivedFrom documents that a personal information can be derived from one or several other pieces
of personal information. We document this using a dependency with stereotype «derivedFrom».
The contains and derivedFrom relations are–in contrast to the linkable relation–globally valid
elations, i.e., the relations are not only available at specific domains, they are valid at all domains.
Figures 8 and 9 present views on a personal information diagram and an available information
diagram. These diagrams show how the contains, derivedFrom, and linkable relations are modeled.
Figure 8. View on the final personal information diagram for the patient showing the personal
information of the patient and the relations between this personal information.
Patient Personal Information Diagram
Unlikability
Requirements Taxonomy
Figure 10. Used Taxonomy of Unlinkability Requirements.
UndetectabilityRequirement Pfitzmann and Hansen define undetectability as: “Undetectability of
an item of interest (IOI) from an attacker’s perspective means that the attacker cannot sufficiently
distinguish whether it exists or not.”
AnonymityRequirement Pfitzmann and Hansen define anonymity as: “Anonymity of a subject
from an attacker’s perspective means that the attacker cannot sufficiently identify the subject within a
set of subjects, the anonymity set.”
DataUnlinkabilityRequirement Pfitzmann and Hansen define unlinkability as: “Unlinkability
of two or more items of interest (IOIs, e.g., subjects, messages, actions, ...) from an attacker’s
perspective means that within the system (comprising these and possibly other items), the attacker
cannot sufficiently distinguish whether these IOIs are related or not.” Our data unlinkability
requirements express the intended relations between messages, actions, and the like which a subject
performs and do not concern the relations of these to the subject itself, as these are represented by
anonymity requirements.
6.1.1. Undetectability
Based on the above given definition of Pfitzmann and Hansen, an undetectability requirement of
our taxonomy (cf. Figure 10) has the following meaning:
The <counterstakeholder>s shall not be able to sufficiently distinguish whether the personal
information <phenomena> of the <stakeholder> exists or not.
If a personal information of a stakeholder is not available at a counterstakeholder and also not
part of any personal information available at the counterstakeholder, then we assume that this personal
information is undetectable for the counterstakeholder. Note that an undetectability requirement
may be too strong for this case, because the counterstakeholder may be allowed to know that a
specific personal information exists, but may not be allowed to know the content of it. Hence, the
user may weaken an undetectability requirement in the next step of our method (Section 7) to a
confidentiality requirement.
To keep the number of requirements that are generated small, we create for each pair of stakeholder
and counterstakeholder only one undetectability requirement containing all personal information of
the stakeholder that shall be undetectable for the counterstakeholder. A personal information p that
Information 2016, 7, 28 13 of 32
Application to Running Example
For the sake of simplicity, we only consider the stakeholder patient and the counterstakeholder
insurance employee for the generation of the unlinkability requirements.
To the biddable domain insurance employee the same personal information of the patient is
available as at the insurance application (cf. Figure 7). Hence, an undetectability requirement is
generated for the counterstakeholder insurance employee and the stakeholder patient with all personal
data of the patient (cf. Figure 8) that is not available at the insurance employee as value for the attribute
phenomena. The undetectability requirement is represented in the first row in Table 1. When we
instantiate the above template for the meaning of an undetectability requirement, then we get the
following textual representation of it:
The insurance employee shall not be able to sufficiently distinguish whether the personal
information healthStatus, mobileDevices, deviceId, vitalSigns, and notes of the patient exist
or not.
Table 1. Unlinkability requirements for the stakeholder patient and the counterstakeholder
insurance employee.
UnlinkabilityRequirement Phenomena / Pairs
Semantic Template
DEDUCTION
Unlikability
Requirement Candidate
PDP4E 11
12. Generation of Privacy Req. Candidates
10/03/2020
• The software artefacts generated in Phase 1 are used together with the taxonomies to generate
a set of privacy requirement candidates in Phase 2.
• Generate undetectability requirements for a stakeholder S regarding a counter-stakeholder C:
• Analyse the Personal Information Diagram of S and the Available Information Diagram of C
• “A personal information p of S shall be undetectable for C when p is not available to C”
Each type of privacy requirement (i.e. unlikability, transparency, intervenability, etc…)
is generated by reasoning over the information available in ProPAn’s models.
We create an undetectability requirement for those information in the Personal
Information Diagram of S that are not available at the Available Information Diagram of C
PDP4E 12
13. ProPAn: Issues and disadvantages
10/03/2020
• The meta-requirement taxonomies of ProPAn may not be exhaustive (w.r.t. GDPR).
• There is a high amount of redundancy across the ProPAn diagrams:
• The attribute Linkability appears in the PID and in the AID
• The attribute Origin appears in the PID and the AID
• Stakeholders (including data subjects), data-storages and processes are represented all together in a
same diagram (as consequence of starting from contextual information)
• In terms of context modelling this representation is adequate.
• In terms of privacy analysis, a more structured representation is necessary!
We want to maintain the level of granularity achieved by ProPAn
but improving its interpretability and eliminate redundancy.
PDP4E 13
15. PDP4E-REQ Tool
• There was a constant exchange of ideas/views between CEA and UDE
• Harmonize the method/tool according to the necessities of requirement engineers
• Reduce overhead to optimize the method’s applicability
• Our most salient achievements so far:
• Definition of a light-weigh methodology for privacy and GDPR requirements engineering
• Definition of data structures that support the methodology (e.g. DFDs)
• Reuse and adaptation of ProPAn code in PDP4E-Req
• Implementation of the method and its data structures
• Validation of the tool with a case study (Smart Grids)
10/03/2020 PDP4E 15
16. DFDs in PDP4E-Req
• Functional requirements are translated into one or more DFD elements:
10/03/2020
• Data Record Requirement (DRR): Collection of data
records (e.g. personal data)
• Data Process Requirement (DPR): Activities that are
performed over data records.
• Data Flow Requirement (DFR): Exchange of
information between DRR and DPR.
• The DFD elements are annotated with attributes such as
data sensitivity, degree of linkability and retention time.
PDP4E-Req DFD Metamodel
PDP4E 16
17. Future plans
• This is a first attempt to bridge the accountability gap in requirement engineering.
• Harmonization between legal provisions and technical requirements is an ongoing challenge:
• New legal provisions will appear in future.
• Privacy requirement taxonomies are elaborated manually.
• Methodological point of view
• Update the methodology accordingly
• Have a better coverage of the GDPR
• Tool point of view
• Improve GUI
• Implements filters to help to navigate in model, and requirements.
• Continue the coverage about the requirement generation.
10/03/2020 PDP4E 17
18. Methods and Tools for GDPR Compliance through
Privacy and Data
Protection 4 Engineering
For more information, visit:
www.pdp4e-project.org
Thank you for your attention
Questions?
Contributions:
patrick.tessier@cea.fr
gabriel.pedroza@cea.fr
nicolas.diaz-ferreyra@uni-due.de