We are pleased to announce the lecture: “Process Mining: Understanding and Improving Desire Lines in Big Data”
in honour of doctor honoris causa Wil van der Aalst.
Wednesday May 30th - 10.00 a.m. - 12 a.m.,
Hasselt University, campus Diepenbeek (Agoralaan, building D) - auditorium H5
The Faculty of Business Economics of Hasselt University is pleased to invite you to the lecture
“Process Mining: Understanding and Improving Desire Lines in Big Data”.
This lecture is organised to honour prof. dr. Wil van der Aalst, on whom the degree of ‘doctor honoris causa’ will be conferred by Hasselt University, Faculty of Business Economics (promotor prof. Koen Vanhoof). Professor van der Aalst is a full professor of Information Systems at the Technische Universiteit Eindhoven (TU/e). Currently he is also an adjunct professor at Queensland University of Technology (QUT).His research interests include workflow management, process mining, Petri nets, business process management, process modeling, and process analysis. Many of his ideas have influenced researchers, software developers and standardization committees working on process support.
2. Let’s Play: Play-Out, Play-In, Replay
Big Data
Desire Lines
Process Mining
How Good is My Model?
Process Discovery
Conformance Checking
Food for Thought: Lasagna and Spaghetti
Google Maps and TomTom
How to Get Started?
Conclusion
PAGE 2
19. “All of the world's music
Big Data can be stored on a
$600 disk drive.”
“Enterprises globally
stored more than 7
exabytes
of new data on disk
drives in 2010, while
consumers stored
more
than 6 exabytes of
new data on devices
such as PCs and “Indeed, we are
notebooks.” generating so much
data today that it is
physically impossible to
store it all. Health care
providers, for instance,
discard 90 percent of
the data that they
generate.”
Source: “Big Data: The Next Frontier for Innovation, Competition, and
Productivity” McKinsey Global Institute, 2011.
PAGE 19
20. Hilbert and Lopez. The World's Technological Capacity to Store, Communicate, and
Compute Information. Science, 332(6025):60-65, 2011.
PAGE 20
28. Process Mining =
Event Data + Processes
Data Mining + Process Analysis
Machine Learning + Formal Methods
PAGE 28
29. Process Mining
supports/
“world” business
controls
processes software
people machines system
components
organizations records
events, e.g.,
messages,
specifies transactions,
models configures
analyzes etc.
implements
analyzes
discovery
(process) event
conformance
model logs
enhancement
31. Simplified event log
a = register request,
b = examine thoroughly,
c = examine casually,
d = check ticket,
e = decide,
f = reinitiate request,
g = pay compensation,
and h = reject request
PAGE 31
32. Process
discovery
b
examine
thoroughly
g
c1 c3 pay
c compensation
a examine
e
start register casually decide c5 end
request
h
c2 d c4 reject
check ticket request
f
reinitiate
request
PAGE 32
33. Conformance
checking
b case 7: e is
executed
examine without
thoroughly case 8: g or h
being g is missing
enabled
c1 c3 pay
c compensation
a examine
e
start register casually decide c5 end
request case 10: e h
is missing in
c2 d c4 reject
second
check ticket round request
f
reinitiate
request PAGE 33
34. Extension: Adding perspectives to
model based on event log
The event log can be used to
discover roles in the organization
(e.g., groups of people with similar
work patterns). These roles can be Performance information (e.g., the
used to relate individuals and average time between two
activities. subsequent activities) can be
extracted from the event log and
visualized on top of the model .
Role A: Role E: Role M:
Assistant Expert Manager
Decision rules (e.g., a decision tree
based on data known at the time a
Pete Sue Sara particular choice was made) can be
learned from the event log and used
Mike Sean to annotated decisions.
Ellen E
b
A
examine
thoroughly
A g
A M
c1 c3 pay
c compensation
a examine
e
A
start register casually
A decide c5 end
request
h
c2 d c4 M reject
check ticket request
f
reinitiate
PAGE 34
request
36. Four Competing Quality Criteria
“able to replay event log” “Occam’s razor”
fitness simplicity
process
discovery
generalization precision
“not overfitting the log” “not underfitting the log”
PAGE 36
37. Example: one log four models
b
examine
thoroughly
g
pay
c compensation
a examine e
start register casually decide end
# trace
request
h 455 acdeh
d reject
check ticket request 191 abdeg
f reinitiate
request 177 adceh
N1 : fitness = +, precision = +, generalization = +, simplicity = +
144 abdeh
111 acdeg
a c d e h
82 adceg
start register examine check decide reject end
request casually ticket request
56 adbeh
N2 : fitness = -, precision = +, generalization = -, simplicity = +
47 acdefdbeh
“able to replay event log” “Occam’s razor”
38 adbeg
examine check
thoroughly b d ticket g
fitness simplicity pay
compensation
33 acdefbdeh
a 14 acdefbdeg
start register examine
c end 11 acdefdbeg
request casually
e f
reinitiate h
process decide request reject
request
9 adcefcdeh
discovery N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh
5 adcefbdeg
a d c e g
3 acdefbdefdbeg
generalization precision register
request
check
ticket
examine
casually
decide pay
compensation
2 adcefdbeg
a c d e g 2 adcefbdefbdeg
“not overfitting the log” “not underfitting the log” register examine check decide pay
request casually ticket compensation 1 adcefdbefbdeh
a d c e h 1 adbefbdefdbeg
register check examine decide reject
request ticket casually request 1 adcefdbefcdefdbeg
a c d e h 1391
start end
register examine check decide reject
request casually ticket request
… (all 21 variants seen in the log )
a b d e g
register examine check decide pay
request thoroughly ticket compensation
a d b e h
register check examine decide reject
request ticket thoroughly request
a b d e h
register examine check decide reject
request thoroughly ticket request PAGE 37
N4 : fitness = +, precision = +, generalization = -, simplicity = -
38. # trace
455 acdeh
Model N1 191 abdeg
177 adceh
144 abdeh
111 acdeg
82 adceg
56 adbeh
b 47 acdefdbeh
examine
thoroughly 38 adbeg
g 33 acdefbdeh
pay
c compensation 14 acdefbdeg
a examine e
11 acdefdbeg
start register casually decide end
request 9 adcefcdeh
h
d reject 8 adcefdbeh
check ticket request 5 adcefbdeg
f reinitiate 3 acdefbdefdbeg
request
N1 : fitness = +, precision = +, generalization = +, simplicity = + 2 adcefdbeg
2 adcefbdefbdeg
1 adcefdbefbdeh
1 adbefbdefdbeg
1 adcefdbefcdefdbeg
PAGE 38
1391
40. # trace
455 acdeh
Model N3 191 abdeg
177 adceh
144 abdeh
111 acdeg
82 adceg
56 adbeh
47 acdefdbeh
examine check
thoroughly b d ticket g 38 adbeg
pay 33 acdefbdeh
compensation
a 14 acdefbdeg
start register examine end 11 acdefdbeg
request casually c
e f reinitiate
reject
h 9 adcefcdeh
decide request
request 8 adcefdbeh
N3 : fitness = +, precision = -, generalization = +, simplicity = +
5 adcefbdeg
3 acdefbdefdbeg
2 adcefdbeg
2 adcefbdefbdeg
1 adcefdbefbdeh
1 adbefbdefdbeg
1 adcefdbefcdefdbeg
PAGE 40
1391
41. # trace
455 acdeh
Model N4 191 abdeg
177 adceh
144 abdeh
a d c e g 111 acdeg
register check examine decide pay
request ticket casually compensation 82 adceg
a c d e g 56 adbeh
register examine check decide pay
request casually ticket compensation 47 acdefdbeh
a d c e h 38 adbeg
register check examine decide reject
request ticket casually request 33 acdefbdeh
a c d e h 14 acdefbdeg
start end
register examine check decide reject
request casually ticket request 11 acdefdbeg
… (all 21 variants seen in the log )
9 adcefcdeh
8 adcefdbeh
5 adcefbdeg
a b d e g
register examine check decide pay 3 acdefbdefdbeg
request thoroughly ticket compensation
2 adcefdbeg
a d b e h
register check examine decide reject 2 adcefbdefbdeg
request ticket thoroughly request
1 adcefdbefbdeh
a b d e h
register examine check decide reject 1 adbefbdefdbeg
request thoroughly ticket request
1 adcefdbefcdefdbeg
N 4 : fitness = +, precision = +, generalization = -, simplicity = -
PAGE 41
1391
44. Petri net view:
Just discover the places …
“able to replay event log” “Occam’s razor”
fitness simplicity
process
discovery
generalization precision
“not overfitting the log” “not underfitting the log”
a1 b1
a2 b2
... p(A,B) ...
am bn Adding a place limits behavior:
•overfitting ≈ adding too many places
•underfitting ≈ adding too few places
A={a1,a2, … am} B={b1,b2, … bn}
PAGE 44
45. Example: Process Discovery Using
State-Based Regions
01011001101101001
01111110110100011
01100111101110000
01101101001001100 d
e
[a,e] [a,d,e]
[ a,b]
a b
event log [] [a]
c
c
b d
[a,c] [a,b,c] [a,b,c,d]
b
a p1 e p3 d
start end
p2 c p4
PAGE 45
46. Example of State-Based Region
d
e
[a,e] [a,d,e]
[ a,b]
a b
[] [a] c
c
b d
[a,c] [a,b,c] [a,b,c,d]
enter: b,e
leave: d
do-not-cross: a,c
b
a p1 e p3 d
start end
p2 c p4
PAGE 46
47. Example: Process Discovery Using
Language-Based Regions
A place is feasible if it can
be added without
f c1 disabling any of the
traces in the event log.
a1 b1
e c d
pR
a2 b2
X Y
PAGE 47
48. Example of Language-Based Regions
• accd
• bd ↓accd : 0 + 0 - 0 ≥ 0
c
• bce a↓ccd : 0 + 1 - 1 ≥ 0
• ace
b d ac↓cd : 0 + 2 - 2 ≥ 0
• acd
acc↓d : 0 + 3 - 3 ≥ 0
• bcce
• ade
a e
↓ade : 0 + 0 - 0 ≥ 0
X Y a↓de : 0 + 1 - 1 ≥ 0
ad↓e : 0 + 1 - 2 < 0
PAGE 48
49. Example of a completely different process
discovery technique: Genetic Mining
PAGE 49
50. Genetic process mining: Overview
create initial
population
event log mutation
next generation
compute
fitness
elitism
termination
tournament children
crossover
select best parents
individual
“dead” individuals
PAGE 50
51. Example: crossover
b b
examine examine
thoroughly thoroughly
g g
pay pay
c c
compensation compensation
a e a e
examine examine
start register casually decide end start register casually decide end
request request
h h
d d
reject reject
check ticket request check ticket request
f f
reinitiate
reinitiate
request request
b b
examine examine
thoroughly thoroughly
g g
pay pay
c c
compensation compensation
a e a e
examine examine
start register casually decide end start register casually decide end
request request
h h
d d
reject reject
check ticket request check ticket request
f f
reinitiate
reinitiate
request
request
PAGE 51
52. Example: mutation
remove place
b b
examine examine
thoroughly thoroughly
g g
pay pay
c c
compensation compensation
a e a e
examine examine
start register casually decide end start register casually decide end
request request
h h
d d
reject reject
check ticket request check ticket request
f f
reinitiate reinitiate
request
added arc request
PAGE 52
54. Replaying trace “abeg”
a b e g b
examine
thoroughly
g
pay
c compensation
a examine e
start register casually decide end
request r=1
h
d m=1
reject
check ticket request
f reinitiate
request
1 1
= 0.83333
6 6
PAGE 54
55. # trace
455 acdeh
Can be lifted to log level 191 abdeg
177 adceh
N1 b 144 abdeh
examine
thoroughly
g
111 acdeg
p1
c p3 pay
compensation
82 adceg
a examine e
start register casually decide p5 end 56 adbeh
request
h
p2 d p4 reject 47 acdefdbeh
check ticket request
f reinitiate 38 adbeg
request
N2 b pay 33 acdefbdeh
compensation
examine g
thoroughly 14 acdefbdeg
a c d e
start register p1 examine p2 check p3
p4 11 acdefdbeg
decide end
request casually ticket
h 9 adcefcdeh
f reject request
reinitiate request 8 adcefdbeh
N3 c
p1 examine p3
5 adcefbdeg
casually
a e h 3 acdefbdefdbeg
start register decide p5 reject end
request
d request 2 adcefdbeg
p2 check p4
ticket 2 adcefbdefbdeg
N4 examine
b d check
1 adcefdbefbdeh
thoroughly ticket g
pay
compensation 1 adbefbdefdbeg
a p1
start register examine
c end 1 adcefdbefcdefdbeg
request casually
e f
reinitiate
reject
h PAGE 55
decide request
request 1391
56. From “playing the token game” to
optimal alignments …
observed trace: “abeg”
a b » e g
a b d e g
b
examine
thoroughly
g
pay
move in a
c
examine e
compensation
model only start register
request
casually decide
h
end
d reject
check ticket request
f reinitiate
request
PAGE 56
57. Another alignment
observed trace: “abcdeg”
a b c d e g
a b » d e g
b
examine
thoroughly
g
pay
c compensation
a examine e
start register casually decide end
move in log request
d
h
reject
only check ticket
f
request
reinitiate
request
PAGE 57
58. Moves in an alignment
move in log
trace in
event log
a b » d e g
a » c d e g
possible run
of model
move in
model move in both
Optimal alignment describes modeled behavior closest to
observed behavior PAGE 58
59. Moves have costs
… a … … » …
… » … … a …
… a … … a …
… a … … b …
• Standard cost function:
− c(x,») = 1
− c(»,y) = 1
− c(x,y) = 0, if x=y
− c(x,y) = ∞, if x≠y PAGE 59
60. Non-fitting trace: abefdeg
b
examine
thoroughly
g
abefdeg
pay
c compensation
a examine e
start register casually decide end
request
h
d reject
check ticket request
f reinitiate
request
a b » e f d » e g
2
a b d e f d b e g
a b e f d e g
2
a b » » d e g
PAGE 60
61. Any cost structure is possible
… send-letter(John,2 …
weeks, $400)
… send-email(Sue,3 …
weeks,$500)
• Similar activities (more similarity implies lower costs).
• Resource conformance (done by someone that does
not have the specified role).
• Data conformance (path is not possible for this
customer).
• Time conformance (missed the legal deadline)
PAGE 61
62. b
examine
thoroughly
g
pay
c
Fitness
compensation
1.0
a examine e
start register casually decide end
# trace
request
h 455 acdeh
d reject
check ticket request 191 abdeg
f reinitiate
request 177 adceh
N1 : fitness = +, precision = +, generalization = +, simplicity = +
144 abdeh
111 acdeg
a c d e h
0.8
82 adceg
Our A* algorithm start register
request
examine
casually
check
ticket
N2 : fitness = -, precision = +, generalization = -, simplicity = +
decide reject
request
end
56 adbeh
exploits the Petri net 47 acdefdbeh
38 adbeg
marking equation examine
thoroughly b d check
ticket
pay
g 33 acdefbdeh
and uses other compensation
14 acdefbdeg
1.0
a
start register examine 11 acdefdbeg
“tricks” to prune the c end
request casually
e f
reinitiate h
decide request reject 9 adcefcdeh
request
search space. N3 : fitness = +, precision = -, generalization = +, simplicity = + 8 adcefdbeh
5 adcefbdeg
a d c e g
3 acdefbdefdbeg
register check examine decide pay
request ticket casually compensation
2 adcefdbeg
a c d e g 2 adcefbdefbdeg
register examine check decide pay
request casually ticket compensation 1 adcefdbefbdeh
a d c e h 1 adbefbdefdbeg
register check examine decide reject
request ticket casually request 1 adcefdbefcdefdbeg
1.0 start
a
register
request
c
examine
casually
d
check
ticket
e
decide
h
reject
request
end
1391
… (all 21 variants seen in the log )
a b d e g
examine
Aligned event log is
register check decide pay
request thoroughly ticket compensation
a d b e h
starting point for other register
request
check
ticket
examine
thoroughly
decide reject
request
types of analysis. a
register
b d
check
e
decide
h
reject
examine
request thoroughly ticket request
PAGE 62
N4 : fitness = +, precision = +, generalization = -, simplicity = -
65. We applied ProM in >100 organizations
• Municipalities (e.g., Alkmaar, Heusden, Harderwijk, etc.)
• Government agencies (e.g., Rijkswaterstaat, Centraal
Justitieel Incasso Bureau, Justice department)
• Insurance related agencies (e.g., UWV)
• Banks (e.g., ING Bank)
• Hospitals (e.g., AMC hospital, Catharina hospital)
• Multinationals (e.g., DSM, Deloitte)
• High-tech system manufacturers and their customers
(e.g., Philips Healthcare, ASML, Ricoh, Thales)
• Media companies (e.g. Winkwaves)
• ...
PAGE 65
66. How can process mining help?
• Uncover bottlenecks • Provide new insights
• Detect deviations • Highlight important
• Performance measurement problems
• Auditing/compliance • An organization’s mirror
• Business Process (in two ways)
Redesign (BPR) • Helps to avoid ICT
• Continuous improvement failures
(Six Sigma) • Avoid “management by
• Operational support (e.g., PowerPoint”
recommendation and • From “politics” to
prediction) “analytics”
PAGE 66
68. Example of a Lasagna process: WMO
process of a Dutch municipality
Each line corresponds to one of the 528 requests that were handled
in the period from 4-1-2009 until 28-2-2010. In total there are 5498
events represented as dots. The mean time needed to handled a
case is approximately 25 days. PAGE 68
69. WMO process
(Wet Maatschappelijke Ondersteuning)
• WMO refers to the social support act that came into
force in The Netherlands on January 1st, 2007.
• The aim of this act is to assist people with disabilities
and impairments. Under the act, local authorities are
required to give support to those who need it, e.g.,
household help, providing wheelchairs and
scootmobiles, and adaptations to homes.
• There are different processes for the different kinds of
help. We focus on the process for handling requests
for household help.
• In a period of about one year, 528 requests for
household WMO support were received.
• These 528 requests generated 5498 events.
PAGE 69
75. Conformance check WMO process (3/3)
The fitness of the discovered process
is 0.99521667. Of the 528 cases, 496
cases fit perfectly whereas for 32
cases there are missing or remaining
tokens.
PAGE 75
78. Bottleneck analysis WMO process (3/3)
flow time of
approx. 25 days
with a standard
deviation of
approx. 28
PAGE 78
79. Two additional Lasagna processes
RWS
(“Rijkswaterstaat”)
process
WOZ (“Waardering
Onroerende Zaken”)
process
PAGE 79
80. RWS Process
• The Dutch national public works department, called
“Rijkswaterstaat” (RWS), has twelve provincial offices.
We analyzed the handling of invoices in one of these
offices.
• The office employs about 1,000 civil servants and is
primarily responsible for the construction and
maintenance of the road and water infrastructure in its
province.
• To perform its functions, the RWS office subcontracts
various parties such as road construction companies,
cleaning companies, and environmental bureaus. Also,
it purchases services and products to support its
construction, maintenance, and administrative
activities. PAGE 80
82. Social network constructed based on
handovers of work
Each of the 271 nodes
corresponds to a civil
servant. Two civil servants
are
connected if one executed
an activity causally
following an activity
executed by the other civil
servant
PAGE 82
83. Social network consisting of civil servants that
executed more than 2000 activities in a 9 month period.
The darker arcs
indicate the strongest
relationships in the
social network.
Nodes having the same
color belong to the
same clique.
PAGE 83
84. WOZ process
• Event log containing information about 745 objections
against the so-called WOZ (“Waardering Onroerende
Zaken”) valuation.
• Dutch municipalities need to estimate the value of
houses and apartments. The WOZ value is used as a
basis for determining the real-estate property tax.
• The higher the WOZ value, the more tax the owner needs
to pay. Therefore, there are many objections (i.e.,
appeals) of citizens that assert that the WOZ value is too
high.
• “WOZ process” discovered for another municipality (i.e.,
different from the one for which we analyzed the WMO
process).
PAGE 84
85. Discovered process model
The log contains events related to 745 objections against the so-
called WOZ valuation. These 745 objections generated 9583 events.
There are 13 activities. For 12 of these activities both start and
complete events are recorded. Hence, the WF-net has 25
PAGE 85
transitions.
87. Performance analysis
bottleneck detection: places are
colored based on average durations
time required to
move from one
activity to another
information on
total flow time
PAGE 87
90. Example of a Spaghetti process
Spaghetti process describing the diagnosis and treatment of 2765 patients
in a Dutch hospital. The process model was constructed based on an event
log containing 114,592 events. There are 619 different activities (taking
event types into account) executed by 266 different individuals (doctors,
nurses, etc.).
PAGE 90
92. Another example
(event log of Dutch housing agency)
The event log contains 208
cases that generated 5987
events. There are 74
different activities.
PAGE 92
97. Business information systems should
offer “TomTom” functionality
Recommend: How to get home ASAP? Take a left turn!
Detect: You drive too fast!
Predict: When will I be home? At 11.26! PAGE 97
99. Hundreds of plug-ins available covering
the whole process mining spectrum
open-source (L-GPL)
Download from: www.processmining.org PAGE 99
100. How to Get Started?
Collect event data Collect questions
• Minimal requirement: • What kind problems would
events referring to an you like to address (cost,
activity name and a time, risk, compliance,
process instance. service, etc.)?
• Good to have: • Related to discovery,
timestamps, resource conformance,
information, additional enhancement?
data elements. • Iterative process: can be
• Challenges: scoping and “curiosity driven” initially.
sometimes correlation.
PAGE 100