18. “All of the world's
Big Data music can be stored
on a $600 disk drive.”
“Enterprises
globally stored
more than 7
exabytes
of new data on disk
drives in 2010,
while consumers
stored more
than 6 exabytes of
new data on “Indeed, we are
devices such as generating so much
PCs and data today that it is
notebooks.” physically impossible
to store it all. Health
care providers, for
instance, discard 90
percent of the data
that they generate.”
Source: “Big Data: The Next Frontier for Innovation, Competition,
and Productivity” McKinsey Global Institute, 2011.
PAGE 17
19. Hilbert and Lopez. The World's Technological Capacity to Store, Communicate,
and Compute Information. Science, 332(6025):60-65, 2011.
PAGE 18
25. Four Competing Quality Criteria
“able to replay event log” “Occam’s razor”
“not overfitting the log” “not underfitting the log”
PAGE 24
26. Example: one log four models
b
examine
thoroughly
g
pay
c compensation
a examine e
start register casually decide end
# trace
request
h 455 acdeh
d reject
check ticket request 191 abdeg
f reinitiate
request 177 adceh
N1 : fitness = +, precision = +, generalization = +, simplicity = +
144 abdeh
111 acdeg
a c d e h
82 adceg
start register examine check decide reject end
request casually ticket request
56 adbeh
N2 : fitness = -, precision = +, generalization = -, simplicity = +
47 acdefdbeh
“able to replay event log” “Occam’s razor”
38 adbeg
examine check
thoroughly b d ticket g 33 acdefbdeh
fitness simplicity pay
compensation
a 14 acdefbdeg
start register examine
c end 11 acdefdbeg
request casually
e f reinitiate h
process decide request reject
request
9 adcefcdeh
discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh
5 adcefbdeg
a d c e g
3 acdefbdefdbeg
generalization precision register
request
check
ticket
examine
casually
decide pay
compensation
2 adcefdbeg
a c d e g 2 adcefbdefbdeg
“not overfitting the log” “not underfitting the log” register examine check decide pay
request casually ticket compensation 1 adcefdbefbdeh
a d c e h 1 adbefbdefdbeg
register check examine decide reject
request ticket casually request 1 adcefdbefcdefdbeg
a c d e h 1391
start end
register examine check decide reject
request casually ticket request
(all 21 variants seen in the log)
a b d e g
register examine check decide pay
request thoroughly ticket compensation
a d b e h
register check examine decide reject
request ticket thoroughly request
a b d e h
register examine check decide reject
request thoroughly ticket request PAGE 25
N4 : fitness = +, precision = +, generalization = -, simplicity = -
30. # trace
455 acdeh
Model N4 191 abdeg
177 adceh
144 abdeh
a d c e g 111 acdeg
register check examine decide pay
request ticket casually compensation 82 adceg
a c d e g 56 adbeh
register examine check decide pay
request casually ticket compensation 47 acdefdbeh
a d c e h 38 adbeg
register check examine decide reject
request ticket casually request 33 acdefbdeh
a c d e h 14 acdefbdeg
start end
register examine check decide reject
request casually ticket request 11 acdefdbeg
9 adcefcdeh
(all 21 variants seen in the log)
8 adcefdbeh
5 adcefbdeg
a b d e g
register examine check decide pay 3 acdefbdefdbeg
request thoroughly ticket compensation
2 adcefdbeg
a d b e h
register check examine decide reject 2 adcefbdefbdeg
request ticket thoroughly request
1 adcefdbefbdeh
a b d e h
register examine check decide reject 1 adbefbdefdbeg
request thoroughly ticket request
1 adcefdbefcdefdbeg
N4 : fitness = +, precision = +, generalization = -, simplicity = -
PAGE 29
1391
33. Petri net view:
Just discover the places …
Adding a place limits behavior:
• overfitting ≈ adding too many places
• underfitting ≈ adding too few places
PAGE 32
34. Example: Process Discovery Using
State-Based Regions
d
e
[a,e] [a,d,e]
[ a,b]
a b
[] [a] c
c
b d
[a,c] [a,b,c] [a,b,c,d]
b
a p1 e p3 d
start end
p2 c p4
PAGE 33
35. Example of Region
d
e
[a,e] [a,d,e]
[ a,b]
a b
[] [a] c
c
b d
[a,c] [a,b,c] [a,b,c,d]
enter: b,e
leave: d
do-not-cross: a,c
b
a p1 e p3 d
start end
p2 c p4
PAGE 34
36. Example: Process Discovery Using
Language-Based Regions
A place is feasible if it
can be added without
disabling any of the
traces in the event log.
R
PAGE 35
39. # trace
455 acdeh
Can be lifted to log level 191 abdeg
177 adceh
N1 b 144 abdeh
examine
thoroughly
g
111 acdeg
p1 pay
c p3
compensation
82 adceg
a examine e
start register casually decide p5 end 56 adbeh
request
h
p2 d p4 reject 47 acdefdbeh
check ticket request
f reinitiate 38 adbeg
request
33 acdefbdeh
14 acdefbdeg
11 acdefdbeg
9 adcefcdeh
8 adcefdbeh
5 adcefbdeg
3 acdefbdefdbeg
2 adcefdbeg
2 adcefbdefbdeg
1 adcefdbefbdeh
1 adbefbdefdbeg
1 adcefdbefcdefdbeg
PAGE 38
1391
40. From “playing the token game” to
optimal alignments …
observed trace: “abeg”
a b » e g
a b d e g
move in
model only
PAGE 39
41. Another alignment
observed trace: “abcdeg”
a b c d e g
a b » d e g
move in
log only
PAGE 40
42. Moves in an alignment
move in log
trace in
event log
a b » d e g
a » c d e g
possible run
of model
move in
model move in both
Optimal alignment describes modeled behavior
closest to observed behavior PAGE 41
43. Moves have costs
… a … … » …
… » … … a …
… a … … a …
… a … … b …
• Standard cost function:
− c(x,») = 1
− c(»,y) = 1
− c(x,y) = 0, if x=y
− c(x,y) = ∞, if x≠y PAGE 42
44. Non-fitting trace: abefdeg
abefdeg
a b » e f d » e g
2
a b d e f d b e g
a b e f d e g
2
a b » » d e g
PAGE 43
45. Any cost structure is possible
… send-letter(John,2 …
weeks, $400)
… send-email(Sue,3 …
weeks,$500)
• Similar activities (more similarity implies lower costs).
• Resource conformance (done by someone that does
not have the specified role).
• Data conformance (path is not possible for this
customer).
• Time conformance (missed the legal deadline)
PAGE 44
46. b
examine
thoroughly
g
pay
c compensation
Fitness
a e
1.0
examine
start register casually decide end
# trace
request
h 455 acdeh
d reject
check ticket request 191 abdeg
f reinitiate
request 177 adceh
N1 : fitness = +, precision = +, generalization = +, simplicity = +
144 abdeh
111 acdeg
a c d e h
82 adceg
Our A* algorithm 0.8 start register
request
examine
casually
check
ticket
N2 : fitness = -, precision = +, generalization = -, simplicity = +
decide reject
request
end
56 adbeh
exploits the Petri 47 acdefdbeh
38 adbeg
net marking examine
thoroughly b d check
ticket
pay
g 33 acdefbdeh
equation and uses a
compensation
14 acdefbdeg
other “tricks” to 1.0 start register
request
examine
casually c
decide e f reinitiate
request reject
request
h
end 11 acdefdbeg
9 adcefcdeh
prune the search N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh
5 adcefbdeg
space. a d c e g
3 acdefbdefdbeg
register check examine decide pay
request ticket casually compensation
2 adcefdbeg
a c d e g 2 adcefbdefbdeg
register examine check decide pay
request casually ticket compensation 1 adcefdbefbdeh
a d c e h 1 adbefbdefdbeg
register check examine decide reject
request ticket casually request 1 adcefdbefcdefdbeg
1.0 start
a
register
request
c
examine
casually
d
check
ticket
e
decide
h
reject
request
end
1391
(all 21 variants seen in the log)
a b d e g
register examine check decide pay
Aligned event log is request thoroughly ticket compensation
a d b e h
starting point for other register
request
check
ticket
examine
thoroughly
decide reject
request
types of analysis. a
register
b d
check
e
decide
h
reject
examine
request thoroughly ticket request
PAGE 45
N4 : fitness = +, precision = +, generalization = -, simplicity = -
49. What if? there are more
than 100.000.000
events? there are more than
1000 different
activities?
acefgijkl conformance add extra
acddefhkjil checking insurance
g
abdefjkgil c8
process c4
acdddefkhijl discovery
acefgijkl
abefgjikl h
skip extra
... b
insurance
skip extra
change c5 c9
insurance d
booking
i
a c select car
in book car c1 add extra c2 c6
insurance
e f j l
confirm c3 check driver’s c10 out
initiate supply
check-in license car
there are more k
than 1.000.000 c7
charge credit
c11
cases? card
PAGE 48
52. How to distribute conformance checking?
f
abcdeg
adcefbcfdeg
abdceg
abcdefbcdeg c
abdfcefdceg
acdefbdceg a b c2 c4 e g
abcdeg
abdceg in c1 d c6 out
abdcefbdcefbdceg
abcdeg c3 c5
abcdefbcdefbdceg
abcdefbdceg
acdefg
adcfeg
abdcefcdfeg
abcdeg
abcdeg
abdceg
abcdefbcdeg f occurs
abcdeg too often
abdceg
abdcefbdcefbdceg f
abcdeg
abcdefbcdefbdceg
abcdefbdceg
abcdeg c
a b c2 c4 e g
in c1 d c6 out
c3 c5
adcefbcfdeg
abdfcefdceg b is often
acdefbdceg skipped
acdefg
adcfeg PAGE 51
abdcefcdfeg
53. Classification based on partitioning of
event log: vertical and horizontal
sets of
cases
sets of
activities
PAGE 52
54. Replication: Same event log on all
computing nodes
Only makes sense if random elements,
e.g., genetic process mining.
PAGE 53
69. So What?
• Any process model can be partitioned in minimal
passages.
• Discovery and conformance checking can be done
per passage!
clouds may contain
a d
f
h
arbitrary subprocesses not
k n
explicitly recorded in the
event log (invisible activities
o
or small networks used for
routing, e.g. XOR/AND/OR-
b e i
split/joins)
l
i g p o
c j
m
PAGE 68
70. Example result for Petri nets
f
a d h k n
“The event log fits all o
passages if and only if
b e i
the event log fits the i g
l
p o
whole model.” c j
m
Key insight: interface transitions controlled by event log PAGE 69
71. Discovery example
a g
in out
f f
b e
a g
c c
causal structure obtained using
b e
heuristics & domain knowledge
d d
f
c
a b c2 c4 e g
in c1 d c6 out
c3 c5
PAGE 70
72. Conformance checking
acefl add extra
acddefl insurance
g
abdefl c4 c8
acdddefl
acefl h
skip extra
abefl b
insurance
... skip extra
change c5 c9
insurance d
booking
i
a c select car
in book car c1 add extra c2 c6
insurance
e f j l
confirm c3 check driver’s c10 out
initiate supply
check-in license car
k
c7 c11
charge credit
card
PAGE 71