This presentation was given at the International Conference on Automated Software Engineering (ASE 2013) in Palo Alto, November 2013.
We describe a technique for automatically extracting specifications from execution traces of an application. The particular specification that we extract are scenarios in the form of conditional existential Live-Sequence Charts (LSC), which are similar to UML Sequence Diagrams.
The technique is implemented in a tool and was evaluated on two real-life event logs.
2. Eindhoven University of Technology
Singapore
Management
University
Tel Aviv University
Dirk Fahland
David Lo
Shahar Maoz
Mining
Branching-Time Scenarios
5. Understanding Existing Applications
fix bugs, add
features, test,
document, …
specification
Usually, there is no specification
or worse, if it exists, it is
outdated…
understanding of objects and
object interplay
4
12. Understanding Object Interplay
FTP server: How does Login/Logout work?
In object-oriented applications,
the hard part is to understand
how the different objects relate
to and interact with each other.
Here are some classes of an
FTP server. How does
login/logout work?
11
13. Understanding Object Interplay
An object of class A invokes
method onConnect() on an Scenario: Login/Logout
object of class B.
A
UserCmd
B
C
C
onConnect()
UserCmd
A
B
12
15. Understanding Object Interplay
Scenario: Login/Logout
A
UserCmd
B
C
C
onConnect()
onLogin()
setUser()
UserCmd
setLogout()
A
• non-local behavior:
multiple objects
• logically related
B
14
16. When does it happen?
Scenario: Login/Logout
This scenario tells us
which objects and
methods are involved in
login/logout.
A
UserCmd
B
C
onConnect()
onLogin()
setUser()
But it does not tell when
this scenario occurs in
the application.
setLogout()
15
17. When does it happen?
Scenario: Login/Logout
A
whenever the prechart
happens…
eventually the mainchart
will happen
UserCmd
B
C
onConnect()
onLogin()
setUser()
setLogout()
16
18. Linear-Time LSCs - Invariants
pre
Login
Login
There can be other behaviors
between the behaviors shown in
the LSC.
2 runs
17
19. Linear-Time LSCs - Invariants
pre
Login
Login
This run does not continue with
the complete main chart of the
LSC.
2 runs
18
22. Branching Time
We merge all runs on their joint
prefixes into an execution tree.
execution tree
21
23. Branching-Time LSCs
[Sibay, Uchitel, Braberman ICSE 2008]
whenever the prechart happens
there exists a branch
where the mainchart happens
execution tree
22
28. LSC Mining from Event Logs
log
automatically
extract
complete set of LSCs
(linear / branching)
We want to discover a set of
LSCs that can describe all
behaviors (or as much as
possible of the behaviors).
understanding of objects and
object interplay
27
29. Logs
log
automatically
extract
log method calls
complete set of LSCs
(linear / branching)
caller1, callee1, method1(…)
caller2, callee2, method2(…)
…
Each execution of the application gives
one trace. Run application multiple
times for a log.
28
32. Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
Starting from sequences of
length 1, recursively append
events and check if it occurs
often enough.
Efficient implementation
uses branch and bound and
some heuristics.
31
33. Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
From LSC with pre-chart
length 1 to LSC with mainchart length 1.
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
32
34. Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
3. test for each LSC
if satisfied ≥ c%
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
33
35. Mining Algorithm
github.com/scenario-based-tools/sam/
tree
1. enumerate all sequences of events
occurring ≥ s times
onConnect()
onConnect()
onConnect()
onLogin()
onLogin()
onLogin()
setLogin()
setLogin()
setLogin()
setLogout()
setLogout()
setLogout()
candidate words
3. test for each LSC
if satisfied ≥ c%
2. generate all
candidate LSCs
onConnect()
onLogin()
setLogin()
setLogout()
onConnect()
onLogin()
setLogin()
setLogout()
34
36. LSC Mining from Event Logs
log
automatically
extract
complete set of LSCs
(linear and branching)
What do branching
scenarios add to
specification mining?
understanding of objects and
object interplay
35
37. Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC contain the linear LSC and some more
strictly branching LSC that were not found before.
Branching LSC are less frequent (lower support
threshold).
36
38. Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC can explore more events of the log than
just Linear LSC.
37
39. Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Branching LSC are longer than Linear LSC. In other
words, they show more details for a particular behavior.
38
40. Experiments
CrossFTP server: 54 traces, 50 event types
s
Linear
LSC
covered
events
avg.
length
time
[s]
Branching
LSC
covered
events
avg.
length
time
[s]
20
7
90%
7
3
7+0
90%
7
3
14
9
90%
5
31
9+12
95%
13
26
10
9
90%
5
1008
9+18
95%
18
685
Running times for extraction are feasible.
Note that LSCs shown here are the LSCs left after
removing subsumed ones. Originally, the algorithm finds
around 6 million branching LSC in 685 seconds.
39
43. Linear vs. Branching: CrossFTP
What is the qualitative contribution
of branching LSC to specification
mining?
application life-cycle
from end to end
connect
login
logout
clean up
42
44. Linear vs. Branching: CrossFTP
application life-cycle
from end to end
short invariants of
individual FTP
commands
invariant of RENAME
43
45. Linear vs. Branching: CrossFTP
application life-cycle
from end to end FTP command
+
where triggered
short invariants of
individual FTP
commands
The branching LSC fills
the gap between large
and small invariants.
login
rename command
44
46. Linear vs. Branching: CrossFTP
application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation
short invariants of
individual FTP
commands
We found all ftp commands
supported by the server, as
alternative LSC.
45
47. Linear vs. Branching: CrossFTP
application life-cycle individual FTP
individual FTP
FTP commands
from end to end all commands +
commands +
+where they are
where they are
can be triggered in
triggered
triggered
the same situation
… and we could discover
cyclic behavior: after
rename, there could be
another delete command
short invariants of
individual FTP
commands
cycles: rename delete
46
48. Take Home Points
http://github.com/scenario-based-tools/sam/
log
• mining branching scenarios
alternatives, cycles
• combined with linear:
comprehensive specification
• future work:
visualizing results
distributed scenarios
complete set of LSCs
understanding of objects and
object interplay
47
50. Q&A …is branching time really necessary?
Yes, here is a linear LSC showing a disjunction
for continuing after the pre-chart.
if
then
delete
or
download
49
51. The full execution tree satisfies this Linear LSC with disjunction
Branching Time vs. Disjunction
and two branching LSCs describing the two alternatives in
separate LSCs.
50
52. Branching Time vs. Disjunction
Removing one branch
from the tree (the
execution of the
download command),
violates the branching
LSCs, but still satisfies
the disjunctive linear
LSCs (because only
one of them has to
hold).
51