Assignment 1 - Application of Simulation Software.pdf
carl-svensson-exjobb-merged
1. DEGREE PROJECT, IN , SECOND LEVELCOMPUTER SCIENCE
STOCKHOLM, SWEDEN 2015
Threat modelling of historical attacks
with CySeMoL
CARL SVENSSON
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION (CSC)
2. Threat modelling of historical attacks with
CySeMoL
Hotmodellering av historiska attacker med
CySeMoL
CARL SVENSSON
Master’s Thesis at CSC
Supervisor: Sonja Buchegger
Examiner: Mads Dam
3.
4. Abstract
This report investigates the modelling power of the Cy-
ber Security Modelling Language, CySeMoL by looking at
three documented cyber attacks and attempting to model
the respective systems in which they occurred. By doing
this, strengths and weaknesses of the model are investigated
and proposals for improvements to the CySeMoL model are
explored.
Referat
Hotmodellering av historiska attacker med
CySeMoL
Denna rapport undersöker modellingsförmågan hos Cyber
Security Modelling Language, CySeMoL genom att titta på
tre dokumenterade IT-angrepp och försöka modellera syste-
men som respektive attack skedde i. Genom att göra detta
undersöks styrkor och svagheter i modellen och förslag på
förbättringar till modellen utforskas.
5. Acknowledgements
I would like to thank my supervisor at KTH, Sonja Buchegger, for her invaluable
input and support throughout the project.
I would also like to thank my supervisor at Foreseeti, Mathias Ekstedt who
provided great discussions about the work and helped me in the right direction
throughout the course of the project.
Finally, I would like to thank the staff at Foreseeti, especially Joakim, Pontus
and Dan who have been supportive of my work and my friend Fredrik Hilding for
his enormous help with proofreading and feedback.
8. Chapter 1
Introduction
Over time, IT systems have grown larger. This has lead to an increase in both
complexity and the difficulty of maintaining full knowledge about the system.[1].
Furthermore the attack surfaces and the number of vulnerabilities in a system grow
with the size. This presents a problem for administrators and security officers who
often work under a constrained budget and need to prioritize where to investigate or
improve the system. In order to effectively be able to make these kinds of decisions
it is desirable to have relevant information to base the decisions on. Ideally, one
might want to have full understanding of the entire system including both hardware
and software components and their interactions. Unfortunately due to the sheer size
and complexity of modern systems, this is usually infeasible.
Different tools have been proposed to aid decision makers with these kind of
problems. In addition to traditional methods such as penetration testing and code
review, one proposed class of tools is various kinds of models where the analyst tries
to create a representation of the system to aid in decision making.
One such tool is CySeMoL (Cyber Security Modelling Language) which uses
Bayesian networks to calculate security risks in a model of the system. The Cy-
SeMoL model was created at KTH[2] and is being further developed by Foreseeti,
a startup company at KTH, into a fully integrated threat modelling tool. By using
CySeMoL to model known previous attacks, it is possible to both validate the model
and find areas that can be improved.
1.1 Goal and scope
The goal of this study is to look at several documented breaches in IT systems
and use them to evaluate some aspects of CySeMoL. We will consider how well
the model is capable of representing the selected attacks and the systems in which
they occurred. Improvements to CySeMoL will be proposed in cases where it is
not possible to satisfactory model the studied attacks. The proposed improvements
will be analysed in terms of how they affect the complexity of the model and the
difficulty of modelling.
1
9. CHAPTER 1. INTRODUCTION
1.2 This report
This report is divided into five parts. This introduction aims to frame the discussion
and give some context to the problem. This is followed by some background where
threat modelling is described and some different alternatives are discussed. We
also introduce CySeMoL and describe how it works. With this in place we can
move on to the actual methods and experiments where several attacks are studied,
modelled and analysed. Finally we finish up with some conclusions about these
attacks CySeMoL in particular and threat modelling in general.
2
10. Chapter 2
Background
When designing or maintaining any system of non-trivial size there are many quali-
ties that can be assessed. Security is one of them and is the focus of this study. The
analysis have been performed using CySeMoL, a threat modelling tool to create and
evaluate system-centric threat models using Bayesian networks. This chapter aims
to provide some context on threat modelling and overall background on CySeMoL.
2.1 Threat modelling
Threat modelling is a process whereby a model is created which represents a subset
of possible attacks that can be performed against a system. Such a model is useful
when reasoning about the system and to determine where focus should be put in
security efforts and which mechanisms and policies can be effective in different areas
of the system. Such a model is of interest both before deploying a system as a design
tool to investigate different scenarios and variants of the system without having to
actually implement them. Furthermore it can also be used as a way of assessing
the security properties of an existing system to understand where improvement is
needed.
A threat model can be built in several ways, for example by starting from dif-
ferent perspectives. It is possible to take an attacker-centric view and try to answer
the question: "What is this particular attacker capable of doing?". By moving from
this question and looking at which attacks are applicable on the analysed system it
is possible to create a threat model.
A second view, the one taken in CySeMoL, is the system-centric approach where
the modeller instead starts with the actual system[2]. Here questions such as "What
software and hardware is present?" and "What does the network look like?" is the
basis of the model. By looking at the system it is possible to determine what attacks
and attack steps are possible and how they affect each other.
There are several existing tools for threat modelling. Among the more popular is
Microsoft’s SDL Threat Modelling Tool which is "designed for developers and centred
on software."[3][4] In particular this means that it is designed to help maintaining
3
11. CHAPTER 2. BACKGROUND
Figure 2.1. The result of a CORAS model[6]
the security of individual software and not larger networks with multiple servers,
network zones and users.
Another modelling framework is Secure Tropos which extends Tropos, a method-
ology for software engineering, to include security considerations[5]. Tropos is based
on considering large IT systems as a group of smaller individual agents with specific
goals for each agent. This means that Secure Tropos is a methodology for develop-
ing secure software and is not intended to be used for analysing existing software
or the interactions between them.
A higher level framework for threat modelling is CORAS. CORAS, like Cy-
SeMoL uses a visual tool to model systems. However CORAS is more similar to
traditional risk assessment methods by focusing on general classes of problems and
how these can lead to valuable assets can be compromised[6][7]. It combines esti-
mating probabilities and consequences for different scenarios with their relations to
each other. An example of a part of a CORAS model can be seen in Figure 2.1.
Here we see that an actor "Employee" has certain attributes associated with it e.g.
"Insufficient training" and how those relates to some risks e.g. "Sloppy handling of
records". These risks are then associated with a consequence e.g. "Compromises
confidentiality of health records" which in turn affects concrete business aspects e.g.
"Patient’s health".
2.2 Bayesian networks
CySeMoL uses Bayesian networks to create its statistical model. A Bayesian net-
work is a model which represents a set of random variables and their conditional
dependencies. It can be visualized as a directed acyclic graph, DAG, where an
edge from a node A to a node B indicates that B has probability distribution that
is conditioned on A. For example, Figure 2.2 shows a simple Bayesian network
with 3 boolean variables and the conditional dependencies. By inspecting the ta-
bles one can see that if the sprinklers are on and it is not raining we have a 90%
4
12. 2.3. CYSEMOL
Figure 2.2. A simple Bayesian network (example from Wikipedia)
probability that the grass is wet, in other words: P(Grass wet = True|Rain =
False∧Sprinklers = True) = 0.9. It is also possible to make back inferences from a
Bayesian network[8], i.e. if we know that the grass is indeed wet, we can use condi-
tional probability to calculate the probability that it is raining and the probability
that the sprinklers are on.
Bayesian networks provide at least two major advantages over just having a
single joint distribution over all the variables. First it saves memory, especially if
the graph is sparse and secondly it is more intuitive to understand the relations of
the variables from the graph than a single large distribution. They are used in a lot
of different fields such as computational biology, image processing and risk analysis.
Bayesian networks can also be used for threat modelling[9] and play a central role
in CySeMoL[10] where the variables are either attack steps that the attacker must
perform or defence mechanics that negatively impacts the attackers probability of
succeeding.
2.3 CySeMoL
CySeMoL is used to create models of a system. The system is represented as a graph
with nodes representing parts of the system and edges how they are connected.
The edges can be of different types, creating different conditional relations between
nodes, even between the same pair of types depending on the kind of relation the
nodes have.
CySeMoL was previously based on the Probabilistic Relational Model proposed
by Sommestad et. al[9] but is now instead based on the P2AMF framework by
Johnson et. al[1]. The two frameworks are two different meta models from which
5
13. CHAPTER 2. BACKGROUND
Figure 2.3. A part of a CySeMoL model
different concrete models can be instantiated. They describe what kind of objects
should be contained in the model and how they can be related to each other. They
are both based on Bayesian networks but differ on what classes of objects they
contain and how different real world objects are mapped to objects in the model.
The CySeMoL graph is not a Bayesian network in itself but is instead used
to generate one when the actual computation is performed. Every node in the
CySeMoL graph has several attributes belonging to one of two categories: "attack
steps" and "defences". Every such attribute is a node in the Bayesian network and
their conditional dependencies is based on the CySeMoL model. For example the
CySeMoL subgraph shown in Figure 2.3 has three nodes and two edges. When the
calculations are to be performed, this is transformed into a Bayesian network with
27 nodes and even more edges. The relations in the network are based both on
previous research and assessments from domain experts with Cooke’s method[11].
By grouping attack steps together and focusing on more concrete parts of the
system, CySeMoL helps abstract away a lot of the details of the attack graph by
allowing the user to focus on the larger components of the system instead of details
of the exact method of attack. For example, there are a lot of different ways in
which a server can be compromised which leads to the same outcome but the user
only needs to define the server in terms of its operating system, other software and
relations to the world around it.
As stated above, the actual calculations are performed on the Bayesian network
inferred from the CySeMoL model. CySeMoL is built on P2AMF which is based on
OCL. OCL is a declarative language which performs its calculations with a recursive
dynamic programming algorithm. When the state of one node is to be calculated,
the algorithm recurses on all nodes on which it depends and calulates those making
6
14. 2.3. CYSEMOL
sure to save results and only calculate each node once.
P2AMF on the other hand uses OCL to create a forward algorithm much like
Dijkstra’s shortest path algorithm with the additional constraint of logical AND-
nodes. Some nodes can only be traversed if two or more conditions are fulfilled.
CySeMoL is an instantiation of the P2AMF framework with actual relations
and probabilities defined. The possibility to "inject" evidence into the model and
thus sidestepping any calculations for a particular node forces CySeMoL to perform
something slightly more complicated than a simple forward traversal of the graph.
CySeMoL uses Monte Carlo sampling with either the acceptance-rejection algorithm
or Metropolis-Hastings algorithm[1] to instantiate valid states of the nodes.
The acceptance-rejection algorithm basically generates a lot of uniformly dis-
tributed samples over the whole sample space and then removing (rejecting) those
who do not fit fall within the target distribution[12]. For example if we would like
to sample points on the unit circle, we simply uniformly sample pairs (x, y) and
reject any for which x2 + y2 > 1. The resulting samples are uniformly distributed
over the unit circle. The Metropolis-Hastings algorithm is an improvement of the
acceptance-rejection algorithm to decrease the number of rejected samples.
Finally, when nodes with evidence have been sampled and the graph traversed
with the P2AMF algorithm, CySeMoL calculates a result. The result is a graph
where the "local" conditional dependencies have been transformed into probabilities
denoting the risk of an attacker succeeding with that particular attack step through
some path through the network. It is also possible to display back inferences to
show which previous steps influenced the results of a particular attack step. By
applying back inference from the attack step of interest back to the attacker, it
is possible to trace the most likely attack paths in the network. This can help in
identifying problematic parts of the network which are extra susceptible to attack.
Such parts require extra attention and resources in the case of an existing system
or a redesign in the case of a design analysis.
7
15.
16. Chapter 3
Method
The study has been done by performing three case studies of documented attacks on
IT systems. Each case study can be divided into three phases: research, modelling
and analysis. In the research phase, information about the attack has been gathered
and studied. Due to the sensitive nature of these attacks it has proven difficult to
find technical details of them. In several cases some details have been replaced by
qualified predictions of what probably occurred. As such, the studied models may
not accurately represent the actual attack that occurred and should be viewed as a
model of an attack that could have occurred in a system similar to the one studied.
In the modelling phase two kinds of models are constructed. First a "free-hand"
model is created where the attack is represented with a graph describing both the
attacked system and the performed attack. In practice this model was created with
pen and paper to minimize any constraints that could exist in other kinds of tools.
From this model, the documented attack path is expressed in terms of CySeMoL
attack steps to make the comparison with the results from CySeMoL easier. After
the free-hand model is created, it is translated into a CySeMoL model. In this step,
all hardware and software are implemented in the CySeMoL model. All components
that have been implicitly assumed to exist, such as dataflows and users, have to be
explicitly defined.
When both models are created it is possible to do the actual analysis by studying
the CySeMoL model and compare it to the "free-hand" model. Here, questions
such as: "Does the CySeMoL model accurately describe the attack?", "Do the most
likely attack paths in CySeMoL correspond with what happened?" and "Could more
details be added to the model?" are studied. This is a qualitative analysis for which
no metric is defined. One challenge with the analysis is to assess the risks that
CySeMoL provide. It is known for a fact that the attack happened. However, it
is difficult to relate this posterior knowledge to the probability of compromise that
CySeMoL calculates.
In cases where possible additions or modifications to the CySeMoL model are
found, these additions are analysed from three perspectives: the first, and most
important, is whether anything is actually gained from these changes to the Cy-
9
17. CHAPTER 3. METHOD
SeMoL model. Adding more level of depth and details to the Bayesian network is
only productive if some knowledge on the conditional probabilities between the new
nodes are known. On the other hand, this can also expose the fact the some part
of the model hide internal dependencies for which the relations are unknown and
might be of interest for further research. The other two factors are, how much these
additions affect both the computational complexity of the model and the "mod-
elling" complexity i.e. the extra burden put on the user to provide knowledge on
the existence of added parts.
The results from each case study are summarized in their respective section.
Furthermore, general observations and broader ideas are summarized at the end of
the case studies section.
10
18. Chapter 4
Case studies
Three different attacks from the last few years have been studied. The attacks were
chosen based both on their relevance and on the availability of information about the
attacks as this is typically information that is difficult to acquire. First the Stuxnet
attack, which struck and disrupted operations in Iran’s nuclear facilities[13][14], has
been studied. Furthermore, the attack on Diginotar[15] which ultimately led to the
bankruptcy of the company was studied. Finally an attack on Logica, a Swedish
server provider which was hacked and sensitive information was stolen from[16][17]
was studied.
4.1 Stuxnet
4.1.1 Background
Stuxnet is a computer worm that was discovered in 2010. At the time it was con-
sidered one of the most sophisticated malwares ever created. Samples of the worm
has been thoroughly analysed by researchers[13]. Stuxnet’s goal was to infect pro-
grammable logic controller (PLC) in industrial systems. Specifically, it is believed
that the targets were Siemens SCADA systems in the nuclear facilities of Iran.
While it is not known exactly what occurred in that specific facility and how the
worm propagated, there are several models of the attack based on reference systems
and best practice specifications[14]. Based on the findings of this study by Byres
et. al. a network which could be similar to the facility and is representative for
SCADA networks in general has been modelled.
This reference system can be seen in Figure 4.1. It shows four major parts of the
network divided into five network zones. The bottom of the image shows the core
of the network with the Process Control Network and the Control System Network.
The latter is the zone containing the actual PLC:s. At the top of the image is
the Enterprise Control Network, a typical office network from which day to day
operations are performed. This may be physically separated from the more interior
zones and connected over WAN. The Perimiter Network allows some data to travel
between the internal network and the Enterprise Control Network.
11
19. CHAPTER 4. CASE STUDIES
Figure 4.1. The Siemens best practice reference network[18]
When modelling an attack such as Stuxnet there is one major difference from
a traditional attack where an attacker manually goes through and tries different
attack steps. Stuxnet spreads and is replicated between computers which results
in new instances of Stuxnet that operate independently from the "parent" instance.
Consequently, this means that compared to a human attacker, the capabilities of
Stuxnet grows exponentially as it spreads through the network.
In the aforementioned study by Byres et al. it is proposed that Stuxnet reached
the PLC by the attack path described below and shown in Figure 4.2. Following
an initial handoff via a physical drive, the malware spread through the Enterprise
Control Network via SMB shares until it found a computer with the right capabilities
namely VPN access to the Perimiter Network. From there it piggybacked on the
connection the Central Archive Server, CAS, and exploited it to gain foothold on
the Perimeter Network. Basically same procedure was repeated to gain access to
12
20. 4.1. STUXNET
the Process Control Network where it eventually infected PCS7 project files which
were uploaded to the PLC:s and were thus compromised. This can be split into a
few distinct attack parts described below:
Figure 4.2. The Stuxnet attack as described by Byres et. al.[14]
1. An infected USB drive is given to an off-site contractor, for example by plant-
ing it on their office or handing it out on a conference.
2. The infected drive is inserted into a workstation in the Enterprise Control
Network allowing Stuxnet to infect it.
3. Stuxnet spreads to other computers on the network until it finds one belonging
to a privileged user.
13
21. CHAPTER 4. CASE STUDIES
4. Stuxnet piggybacks on the SQL database connection established by the priv-
ileged user to the server on the Perimiter Network.
5. Stuxnet spreads within the Perimiter Network and infects several servers.
6. Stuxnet again piggybacks on the connection to the historian server on the
Process Control Network.
7. There it infects PCS7 project files which are ultimately downloaded on an
engineering workstation.
8. Stuxnet installs itself on the PLC and performs two tasks: cause harmful
operation on the machinery and tricks the monitoring systems that everything
is running as normal.
4.1.2 Modelling
With the description of the attack, it is possible to create a sequence of CySeMoL
attack steps which later can be compared to the actual output of CySeMoL. In those
terms, the attack can be described as listed below. In Figure 4.3 it can be seen how
the eight parts of the attack been roughly mapped to the CySeMoL objects they
involve.
1. SocialZone.sharePortableMedia
2. OperatingSystem.accessThroughPortableMedia, OperatingSystem.deployExploit,
OperatingSystem.compromise
3. NetworkZone.access, OperatingSystem.deployExploit, OperatingSystem.compromise
4. ApplicationClient.compromise, DataFlow.produceRequest, ApplicationServer.access,
ApplicationServer.deployExploit, ApplicationServer.compromise
5. NetworkZone.access, OperatingSystem.deployExploit, OperatingSystem.compromise
6. ApplicationClient.compromise, DataFlow.produceRequest, ApplicationServer.access,
ApplicationServer.deployExploit, ApplicationServer.compromise
7. DataFlow.produceResponse, ApplicationClient.deployExploit, ApplicationClient.compromise,
OperatingSystem.compromise
8. ApplicationClient.compromise, DataFlow.produceRequest, ApplicationServer.access,
ApplicationServer.deployExploit, ApplicationServer.compromise
It should be noted that this is not a one to one mapping and that several
translations are possible. It is possible to describe the attack with more or less
detail but this translation was chosen as a reasonably detailed description of the
attack in CySeMoL terms. Finally, the last part about what Stuxnet did with
14
23. CHAPTER 4. CASE STUDIES
Figure 4.4. The network zones, interfaces and firewalls of the Stuxnet model
the PLC once it was compromised, i.e. causing harmful behaviour and disabling
monitoring is not reflected in the CySeMoL model at all.
Based on the descriptions of the network topology and data flows, a CySeMoL
model was created. Even though the network is quite small and the details have
been kept to a minimum the resulting model consists of around 80 nodes. Images of
the full model can be found in Appendix A. Overall, the network has been assumed
to employ good security measures with strict firewall rules and regularly updated
software.
A part of the model is shown in Figure 4.4. This sub view of the CySeMoL
model shows the overall network topology of the system, excluding any computers.
4.1.3 Analysis
Finally, from the model an attack path to one of the PLC:s was calculated. Cy-
SeMoL is quite detailed and as a result the attack path from the attacker to the
PLC contains many steps. Furthermore, CySeMoL does not generate a single at-
tack path but the whole attack graph therefore several choices of attack paths are
possible. The one which closest matches the hypothesized path has been picked.
The full attack path can be found below. Steps marked in bold corresponds to the
16
24. 4.1. STUXNET
steps in the proposed path and it should be noted that almost all of those steps are
included in the path thus CySeMoL agrees that this was a possible and probable
attack path.
1. Attacker.start, Contractor Office.sharePortableMedia
2. ECN Workstation 2.accessThroughPortableMedia, ECN Workstation
2.executeArbitraryCode, ECN Workstation 2.compromise
3. Enterprise Control Network.access, ECN Workstation.findUnknownService,
ECN Workstation.findExploit, ECN Workstation.deployExploit, ECN Work-
station.executeArbitraryCode, ECN Workstation.compromise
4. Historian Web Client.compromise, CAS ECN-PN.produceRequest,
CAS Server.access, CAS Server.findExploit, CAS Server.deployExploit,
Historian Server OS.executeArbitraryCode, Historian Server OS.compromise
5. Skipped in the model
6. OS Web Client - PN.compromise, PCS7 PN-PCN.produceRequest,
OS Web Server - PCN.access, OS Web Server - PCN.compromise
7. PCS7 PCN Server-Engineer.produceReponse, OS Web Client Engi-
neer.findExploit, OS Web Client Engineer.deployExploit, Engineering
Workstation.executeArbitraryCode, Engineering Workstation.compromise
8. Siemens PLC Studio.compromise, Siemens PLC Transfer.produceRequest,
Siemens PLCStudio Server.access, Siemens PLCStudio Server.findExploit,
Siemens PLCStudio Server.deployExploit, S7-400H.executeArbitraryCode,
S7-400H.compromise
The attack path is also depicted in the CySeMoL graph shown in Figures 4.5,
4.6 and 4.7. The red arrows indicate all properties that influence the value of a node
in the attack path while the overlaid blue path shows the attack path. The images
are cluttered and can be hard to decipher, especially in the operating system nodes.
This is made easier by cross-referencing the attack step list above.
Note that the model did not include the spreading within the perimeter network
since it contained fewer servers than the described system. However, this would sim-
ply be a repetition of step 3 in the attack. Also, the spreading between workstations
in step 3 is modelled as finding and exploiting an unknown service. This could also
have been modelled explicitly by adding the Windows resource sharing protocol,
SMB, but since it is possible that this kind of connection between workstations
were unintended an unknown service captures this aspect. An explicit model of the
SMB protocol could have looked something like the example in Figure 4.8. Note
that CySeMoL does not have have the concept of peer-to-peer applications like SMB
but must instead model the client and server part separately.
17
25. CHAPTER 4. CASE STUDIES
Figure 4.5. The Stuxnet attack calculated by CySeMoL, pt.1
From these results it can be seen that CySeMoL agrees that it was possible that
the attack occurred like described by Byres et. al. given that the network looked like
this. However, CySeMoL attributes some positive probability to every connection in
the model and it is thus hard to draw any conclusions about the actual probabilities
in this case. There are two aspects of the attack that can not be properly modelled
by CySeMoL. First of all, CySeMoL does not have a concept of privileges. In the
real attack, Stuxnet spread between hosts in the network through SMB shares. This
in itself did only require regular user privileges and not root access to the machine.
In the CySeMoL model however it is modelled as a full compromise of the host. The
other thing is domain specific attacks like destroying the PLC. Currently the attack
only goes as far as considering the PLC compromised and not what that results in.
Currently, one way to represent access levels in CySeMoL is to have multiple
copies of the same application and connect different AccessControlPoints to them.
This way, one physical application is represented by several virtual applications,
each representing the environment the user sees. An example of this is shown in
18
26. 4.1. STUXNET
Figure 4.6. The Stuxnet attack calculated by CySeMoL, pt.2
Figure 4.9. This can be extended to the operating system level by creating two
copies of the same computer with slight variations depending on what capabilities
the user has. There are at least two problems with this approach. First of all, it
duplicates a lot of work and makes the model larger. Secondly, it might not properly
reflect the real conditional probabilities between the objects involved. For example,
in the case of two user environments CySeMoL would treat this as two separate
computers connected to the same network. This is a problem since the probability
of compromising an admin account given that you have compromised a regular user
is not the same as the probability of compromising a computer given that you have
compromised another computer in the network.
A more intuitive way to represent access levels without cluttering the model
with too much details could be to introduce two access levels, regular users and
administrators as is common in computer system. Many systems have more fine
grained access controls but it has to be investigated whether such detail contribute
19
27. CHAPTER 4. CASE STUDIES
Figure 4.7. The Stuxnet attack calculated by CySeMoL, pt.3
anything to the model. If done this way it would be enough to have only one instance
of each software and computer but the connection between the "PasswordAccount"
and "AccessControlPoint" could be chosen to be either "User" or "Admin" instead of
the current "Credentials". This would only introduce a slight additional modelling
burden but could potentially improve results. An example of what this could look
like is shown in Figure 4.10.
20
28. 4.1. STUXNET
Figure 4.8. How SMB could be modelled in the Stuxnet network
Figure 4.9. How ACL could be modelled in CySeMoL
21
29. CHAPTER 4. CASE STUDIES
Figure 4.10. How ACL could be modelled in CySeMoL
22
30. 4.2. DIGINOTAR
4.2 Diginotar
4.2.1 Background
Diginotar was a Dutch certificate authority. In the summer of 2011 they fell victim
for an attack. This led to the compromise of several certificate authorities (CA:s)
keys. With these keys, the attackers were able to forge certificates for a number of
host names including "*.google.com" and "*.*.com", i.e. all sites with a .com top-
level domain. After an investigation[19][15] by Dutch security company Fox-IT, the
results showed it couldn’t be ruled out that all of Diginotar’s CA certificates had
been compromised. This eventually led to that the Dutch government took over
operations of Diginotar’s systems and the company was declared bankrupt. This is
a prime example of what the consequences of an attack can be.
4.2.2 Modelling
The report from Diginotar[15] contains a lot of detail of how the attack happened.
An overview of the networks zones of Diginotar and some of the central systems in
them can be seen in Figure 4.11. Unfortunately, due to limitations in the investi-
gations, it was not possible to perform a forensic analysis of one of the computers
involved in the attack. Therefore it is unknown how that computer was compro-
mised and thus one step of the attack is missing. The attack was modelled up to
the known part. Furthermore, part of how the rest of attack was performed was
also modelled. It is difficult to say exactly in which order each step of the attack
occurred as there was a lot of lateral movement in the attack and many systems in
the same network were compromised. The investigations did however reveal a likely
attack path. The attack can roughly be described as follows:
1. The web servers Main-web and Docproof2 were compromised through an out-
dated version of DotNetNuke, a content management system, with known
vulnerabilities.
2. The attacker used a connection from the Main-web to the database server
BAPI-db, which was allowed through the firewall, to compromise it.
3. The attacker escalated privileges to compromise the whole database server.
What happens next is unclear but somehow the attacker manages to compromise
the BAPI-production server in Secure-net and connects back to Main-web in DMZ-
ext-net to use it as a stepping stone for further attacks in Secure-net. In particular
it means that according to the information in the report there is no explanation for
how a connection could have been initiated from outside Secure-net into it. Three
different hypotheses are proposed:
1. BAPI-production is connected to BAPI-db. Judging from the name, this
sounds like a likely scenario. BAPI is a system for the Dutch tax agency
and could have needed communications with the office network.
23
31. CHAPTER 4. CASE STUDIES
Figure 4.11. An overview of the central network zones and important systems of
Diginotar.[15]
24
32. 4.2. DIGINOTAR
2. BAPI-production is reachable from the DMZ-ext-net network in violation of
the descriptions of the firewall policies.
3. BAPI-production was compromised through a physically transmitted malware
by infecting a storage drive in Office-net which was later brought into Secure-
net.
The important thing is that all three of these scenarios are easily represented
by CySeMoL as seen in the previous example, the Stuxnet model. Overall, it is
probable that if more details were known about the attack, it would be possible to
fully model the attack.
The network described in the report is considerably larger than in the Stuxnet
case, however most details were not thoroughly investigated and thus left out of
the report. Consequently, most of the network has been left out in the model.
Furthermore, only the connections which are explicitly part of the suspected main
attack are modelled. Even with a lot left out and some network zones just labelled
"other nets", this resulted in a CySeMoL model of about 70 nodes. Images of the
full model can be found in Appendix B.
4.2.3 Analysis
There are at least two important points which can be gathered from the model.
The first is that the first half of the attack is satisfactorily described by the model.
The model shows that by compromising the web server of Main-web it is possible
to compromise the whole system and use it as a stepping stone for the attack. Also
by allowing an SQL connection from DMZ-ext-net into Office-net it is possible to
compromise the database server and thus the whole network.
The other, and maybe more interesting point is that CySeMoL claims that
there is a risk that the HTTP connection from within Secure-net out to DMZ-ext-
net can cause a compromise inside Secure-net. This is a lot like the latter part of
the Stuxnet attack where the engineering workstation is infected by a compromised
PCS7 server. In that case the model describes the scenario correctly. However in this
case, we want to illustrate the fact that technically the firewall allows this HTTP
connection but in general there is no such connection done. This is a potential
problem. There are two real world scenarios that both most naturally translates to
the same CySeMoL model, i.e. that there is a compromised server with a possible
dataflow to a client. In one scenario the emphasis is on the fact that this dataflow
actually exists and can be used to compromise the client. In the other case however,
the emphasis is on the fact that the dataflow is allowed and could be used to
compromise the server if it wasn’t already.
How to discern these two situation is not straightforward and for the time the
proposal is that dataflows should only be used to model actual dataflows that regu-
larly occur. One idea is to add some way to model potential dataflows, analogously
to how the attack step "Discover hidden service" exists on operating system to rep-
25
33. CHAPTER 4. CASE STUDIES
resent the fact that in real world systems full information about our systems may
not be available.
26
34. 4.3. LOGICA
Figure 4.12. How virtual environments currently can be modelled with CySeMoL
4.3 Logica
4.3.1 Background
In 2012, an attack on the Swedish company Logica (now CGI Group) was discovered.
The attack was suspected to have been going on for as long as two years. Logica is
a service provider for several customers including the Swedish Tax Agency, which
is presumed to have been the main target of the attack. Eventually two people
where arrested and convicted for the attack. As a part of the trial, investigation
reports from Logica and other affected parties were used as evidence[17][20]. Based
on this material it is possible to understand, at least partially, how the attack was
performed. Unfortunately, much of the details are redacted from the material[17].
4.3.2 Modelling
Even though the details are limited it is known that the attack involved a large IBM
mainframe computer which is a central part of Logica’s IT system. This poses a
problem for CySeMoL since the mainframe is divided into multiple logical partitions
which serves as a virtualization environment for the system. Since the mainframe
plays a central role in the Logica IT system, it is of little interest to try to model
the rest of the system. Especially since the attack itself was centred around the
mainframe. Instead the model will consider how virtualization could be modelled.
Currently CySeMoL has no way of representing virtualization and would require
an expansion of the model. Conceptually, it is possible to think of a virtual environ-
ment as multiple systems connected to a common network zone in which one system
represents the virtualization hosts and the other represents the guest systems. All
these systems can run different operating systems and software independently of
each other. For example, a simple environment with one host system and two guest
systems could look as depicted in Figure 4.12.
27
35. CHAPTER 4. CASE STUDIES
Figure 4.13. How real virtual environments could be implemented in CySeMoL
4.3.3 Analysis
There are at least two major problems with the model above. The first is that the
model should be able to discern the host system from the guest systems as compro-
mising them have very different implication. A compromise of one virtual host does
not lead to the same capabilities as compromising the host system in which case all
guest systems are automatically compromised. Secondly, the probabilities involved
in these relations are not the same as for a physical system. It is unreasonable
to assume that the conditional probability to compromise a physical system given
that you have compromised another system in the network is the same as the con-
ditional probability to compromise a virtual host given that you have compromised
another virtual host on the same host system. It is however possible to model the
two scenarios in conceptually similar ways.
In addition to the regular "NetworkZone" node, there would be a "VirtualEnvi-
ronment" node with two types of connection to "OperatingSystem" nodes instead
of just the single "Zone" connection that represented host OS connection and guest
OS connections. An example of this model is shown in Figure 4.13. The conditional
probabilities involved would have to be explored further to be able to implement
these additions to the model.
4.4 Summary of analysis
Overall, it can be seen that CySeMoL is capable of modelling the first attack sat-
isfactory but fails to provide a fully satisfactory representation for the second and
third attacks. Especially the third attack involving virtualization is a problem for
CySeMoL. Access levels are not represented in the model which has the effect that
a reasonable attack path is still calculated but with less accuracy of to what extent
the system is actually compromised. The model also does not represent domain
specific attack steps like causing malfunction in SCADA systems. There is also the
28
36. 4.4. SUMMARY OF ANALYSIS
problem with to what extent a client to server dataflow should be considered exist-
ing. Is it used regularly or is it simply technically possible? Overall, office systems
with regular computers, network appliances and software are easy to model whilst
more advanced features like virtualization is impossible and "soft" parts like access
control and people are both difficult and cumbersome.
Currently access levels can be implemented by creating separate systems for
separate accounts as shown in Figure 4.9. Instead it could be fruitful to model it
as every system having two access levels: "user" and "admin" to reduce duplication
and simplify modelling as shown in Figure 4.10.
The uncertainty in the dataflows could be modelled by adding something similar
to the "discover hidden service" attack step, which is an attack step present in the
"Operating System" CySeMoL node.
Virtualization can currently be modelled by representing virtual systems as phys-
ical systems as shown in Figure 4.12. While this might work on a conceptual level
it will introduce errors in the calculations. A better way to implement it would be
analogous to the physical systems but with its own nodes and connections as shown
in Figure 4.13.
Domain specific attacks and parts could be made easier to implement by allowing
for easier creation of custom objects, attack steps and connection. Currently, great
effort has to be expended deep down in the CySeMoL tools to create additions to
the CySeMoL model. This could be vastly simplified by creating helpful tools and
descriptions so that domain specific attacks can easily be added.
29
37.
38. Chapter 5
Conclusion
As seen from the modelling, CySeMoL handles many aspects of threat modelling
but there is still room for improvement. One of the challenges is to strike a balance
in level of detail. An over-detailed model will be cumbersome and hard to work
with, but an overly simplistic model will give meaningless results.
Due to various levels of details in the descriptions of the attack it was not
meaningful to follow the intended method fully. Nonetheless several insights were
gathered throughout the project.
Privileges and access control are central concepts in any system and are therefore
something that should be possible to explicitly model. A challenge is that access
control and identity management is already a very difficult problem. Creating an
adequate model of it must be done with great care to strike the above mentioned
balance.
Domain specific attacks are currently not present in the model. To keep the
model simple it might be better to not add specific concrete concepts to the model
for covering this. Instead the model should be flexible and proper tools created to
allow for customization to ease modelling within fields that require specific concepts.
Overall, there are a number of usability issues with the CySeMoL tool. These
are outside the scope for this study but have been recorded in appendix C with
suggestions on how to improve modelling and visualisation.
There is also the issue on how to interpret the results in CySeMoL. The idea
with the model is to be able to calculate the probability that an attacker will succeed
with a certain attack steps within a specified timeframe. This does not currently
work as intended and therefore the numbers should only be looked at in relative
terms. Even when this is implemented, the meaning of the results will regardless
vary between use cases. In this project, only the relative sizes of the probabilities
and whether any significant probability exist at all have been looked at.
As CySeMoL evolves it should be evaluated further. Revisiting the Logica attack
with a future version of CySeMoL could be a good way to verify that the way
virtualization is handled makes sense and is usable. There are also some other
recent famous attacks to look at. Threat modelling tools like CySeMoL could prove
31
39. CHAPTER 5. CONCLUSION
a valuable tool for system administrators and decision makers in the future. This
study has shown that CySeMoL manages to represent a large portion of systems
with the possibility to manage even more in the future.
32
40. Bibliography
[1] Pontus Johnson et al. “An Architecture Modeling Framework for Probabilistic
Prediction”. In: (2014).
[2] Teodor Sommestad, Mathias Ekstedt, and Hannes Holm. “The Cyber Secu-
rity Modeling Language: A Tool for Assessing the Vulnerability of Enterprise
System Architectures”. In: (2014).
[3] Introduction to Microsoft Security Development Lifecycle (SDL) Threat Mod-
eling.
[4] Adam Shostack. “Experiences threat modeling at microsoft”. In:
[5] Haralambos Mouratidis and Paolo Giorgini. Secure Tropos: A Security-oriented
Extension Of The Tropos Methodology. 2006.
[6] Haralambos Mouratidis and Paolo Giorgini. “Secure Tropos: A Security-oriented
Extension Of The Tropos Methodology”. In: International Journal of Software
Engineering and Knowledge Engineering 17.02 (2007), pp. 285–309. doi: 10.
1142/S0218194007003240. eprint: http://www.worldscientific.com/doi/
pdf/10.1142/S0218194007003240. url: http://www.worldscientific.
com/doi/abs/10.1142/S0218194007003240.
[7] Fredrik Vraalsen et al. “Specifying Legal Risk Scenarios Using the CORAS
Threat Modelling Language”. English. In: Trust Management. Ed. by Pe-
ter Herrmann, Valérie Issarny, and Simon Shiu. Vol. 3477. Lecture Notes in
Computer Science. Springer Berlin Heidelberg, 2005, pp. 45–60. isbn: 978-3-
540-26042-4. doi: 10.1007/11429760_4. url: http://dx.doi.org/10.
1007/11429760_4.
[8] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. 2012.
[9] Teodor Sommestad, Mathias Ekstedt, and Pontus Johnson. “A Probabilistic
Relational Model for Security Risk Analysis”. In: (2010).
[10] Hannes Holm, Matus Korman, and Mathias Ekstedt. “A Bayesian network
model for likelihood estimations of acquirement of critical software vulnera-
bilities and exploits”. In: (2014).
[11] Roger M. Cooke. “Validating Expert Judgment with the Classical Model”. In:
(2013).
33
41. BIBLIOGRAPHY
[12] Michael I. Jordan. “Stat260: Bayesian Modeling and Inference”. In: (2010).
[13] Aleksandr Matrosov et al. “Stuxnet Under the Microscope”. In: (2011).
[14] Eric Byres and Andrew Ginter. “How Stuxnet Spreads - A Study of Infection
Paths in Best Practice Systems”. In: (2011).
[15] Fox-IT. “Black Tulip - DigiNotar Certificate Authority breach - "Operation
Black Tulip"”. In: (2011).
[16] Polismyndigheten. “Förundersökningsprotokoll - Logicafallet”. In: (2012).
[17] Polismyndigheten and Logica. “Bilaga A - Logicas utredningsrapport”. In:
(2012).
[18] Siemens. “Process Control System PCS 7 Security concept PCS 7 & WinCC
(Basic)”. In: (2012).
[19] Fox-IT. “Interim Report - DigiNotar Certificate Authority breach - "Operation
Black Tulip"”. In: (2011).
[20] Polismyndigheten. “Bilaga B - Ovriga externa rapporter”. In: (2012).
34
42. Appendix A
The Stuxnet model
The Stuxnet model was created as a CySeMoL model with the Enterprise Architec-
ture Analysis Tool (EAAT). The tool allows to break down the model in different
views to make it more manageable. Below is all the views from the Stuxnet model.
They are included for completeness and to get a better understanding of what a
CySeMoL model looks like.
Figure A.1. View showing people and accounts in the Stuxnet model
35
43. APPENDIX A. THE STUXNET MODEL
Figure A.2. View showing half of the dataflows in the Stuxnet model
Figure A.3. View showing other half of the dataflows in the Stuxnet model
36
44. Figure A.4. View showing the Control Systems Network in the Stuxnet model
Figure A.5. View showing the Enterprise Control Network in the Stuxnet model
37
45. APPENDIX A. THE STUXNET MODEL
Figure A.6. View showing the Manufacturing Operations Network in the Stuxnet
model
Figure A.7. View showing the networks topology in the Stuxnet model
38
46. Figure A.8. View showing the Perimeter Network in the Stuxnet model
Figure A.9. View showing the protocols in the Stuxnet model
39
47. APPENDIX A. THE STUXNET MODEL
Figure A.10. View showing the Process Control Network in the Stuxnet model
Figure A.11. View showing the software in the Stuxnet model
40
49. APPENDIX B. THE DIGINOTAR MODEL
Figure B.2. View showing the DMZ-ext network in the Diginotar model
Figure B.3. View showing DMZ-int network in the Diginotar model
42
50. Figure B.4. View showing the firewalls in the Diginotar model
Figure B.5. View showing the description of the internet in the Diginotar model
43
51. APPENDIX B. THE DIGINOTAR MODEL
Figure B.6. View showing the network overview in the Diginotar model
Figure B.7. View showing the office network in the Diginotar model
44
52. Figure B.8. View showing the protocols in the Diginotar model
Figure B.9. View showing the secure net in the Diginotar model
45
53. APPENDIX B. THE DIGINOTAR MODEL
Figure B.10. View showing the softwares in the Diginotar model
46
54. Appendix C
Other findings
This section contains other findings discovered throughout the project. They mostly
concern the modelling tool itself and not the actual CySeMoL model. Many of the
issues here are outside the scope of the work performed but still directly or indirectly
related to some of the issues discovered and are thus included for completeness.
C.1 Visualization
The visualization options in CySeMoL are currently very primitive. It is difficult
to get a good overview over the model and understand what the important aspects
are. Below is some suggestions for how it could be improved.
• Give different amount of space to different objects. Currently, every node is a
rectangle occupying roughly the same amount of space. An operating system
or network zone could be more important than a single piece of software.
• Create visual clues how objects fit together. Currently, if you don’t know what
types of node a specific node can connect to you have to consult the manual
or guess. A better way would be for example to add something like a jig-saw
puzzle looking edge to indicate connection types and when connecting nodes
explicitly state "Can connect to: A, B or C". It is also possible to highlight
objects which the selected object can connect to.
• Enable expandable and collapsible nodes. Views are great but it’s better to
instantly be able to shift focus within the same view. For example make it
possible to encapsulate OS+software into a box and collapse it. This way,
one can immediately zoom in on parts of the model, make changes and then
switch back to a more overview like perspective.
• Don’t show attack steps on objects. They are irrelevant in the modelling
phase. Use that space to show other things like warnings and or hints. They
can be turned off in the view settings but this is not the default. Furthermore
the space could be better used.
47
55. APPENDIX C. OTHER FINDINGS
C.2 Modelling
There is also some problems with the actual modelling which makes it difficult to
work with the tool. The tool should really be a tool which helps you in your work
and not something you have to struggle with to get things right. Below are some
suggestions for how to improve the modelling tools.
• Make it easier to create and duplicate compound objects like combinations
of OS and software. For example it should be possible to create a template
for a typical workstation in a system consisting of an operating system, some
software and connection to some kind of authentication mechanism. This
template can then be named and reused throughout the model. Ideally it
should be possible to create both shallow and deep copies of the template
which enables the user to choose if changes to the template propagate to the
copies or not.
• create "wizards" or "generators" to help create compound objects. These
guides should remind the user that for example an OS typically has some
software connected to it and is usually connected to a network zone. It could
present the user with some pre-created templates to base their model on.
• Inform of missing mandatory components. The presence of some objects does
not make sense without being connected to certain other objects. The tool
should clearly inform the user of this and mark objects red and provide sug-
gestions to solve the problems. Currently nothing happens and it is possible
to make calculations but the results are probably not what one expects.
• Give hints of what types of objects are usually added in connection with
others. This is almost the same as the previous point but for non-compulsory
objects. It may mention that a certain kind of object typically is connected
to another object.
48