Extendible data model for real-time business process analysis

Marcello Leida

Extendible data model for real-time
business process analysis
IEEE-IEEM 10-13 December 2012, Honk Kong

Our Target: Real time process monitoring

Limits of actual approaches
• Experience collected by deploying Business Process Mining tools in Enterprise
environments (BT, Etisalat) highlighted the need for more flexible data layer and
real time capabilities.
• Offline analytical tools;
• Rigid data model;
• BPML and BPEL require big effort from the enterprise;
• Tight connection between applications that capture the information and the ones
that analyse it
• Deal with increasingly complex systems;
• Monitoring tools have to be flexible and robust enough to be able to process also
information that is not present or unknown at the time of defining the data model.

Research Challenges for new
generation of BPM tools
• Flexibility: representing the process model using a formalism
that allows an increased degree of flexibility. Able to address
also situation where the process definition and its data is not
known a priori.
• Handling very large datasets: dealing with big amount of
information while maintaining high performance level.
• Real-Time performance: able to answer to continuous flow of
events. Keep the representation simple so that it can be queried
efficiently.

Novel representation of Process
Model
• Need for a less constraining and rigid model has
arise;
• Represent the model with a formalism that allows
the degree of flexibility required;
• Keep representation simple, this will allow the level
of flexibility we need but on the other hand will
increase complexity.

RDF
• We use RDF to define the process model;
• RDF is a standard vocabulary definition which is at
the basis of the Semantic Web vision, it is composed
by three elements: concepts, relations between
concepts and attributes of concepts;
• RDF is a data representation which is extremely
extendible, flexible and publicly available;

RDF
Concepts, relations and attributes are modelled as a labeled oriented graph, defined by a set of triples
<s,p,o>
s is called subject, p is called predicate and o is called object.
Formally a graph G can be defined as:
G ≡ (U ∪ B ∪ L) × U × (U ∪ B ∪ L)
where:
U is an infinite set of constant values (called URI references) these have their well-defined semantics
provided as an example by the RDF and RDFS vocabularies;
B is an infinite set of identifiers (called Blank nodes) which identify instantiation of concepts. Elements in
this set do not have a defined semantic;
L is an infinite set of values (called Literals). Elements in this set do not have a defined semantic.
The elements of a triple <s,p,o> are respectively: s ∈(U ∪ B ∪ L) , p ∈ U, and o ∈ (U ∪ B ∪ L).
The elements in U define the schema or vocabulary, while the elements in B and L are used to define
instances.

An Extensible Business Process
Data Model
Basic Process Model representation startTime
Schema Level Process endTime

• Concepts hasTask

• Relations startTime

Task
• Attributes
endTime

preceededBy
followedBy
hasSubTask

RDF graph of the basic process model: EBTIC-BPM vocabulary
The elements in the conceptual model of the
previous slide are defined as a vocabulary S P O

EBTIC-BPM that extends the set U which already ebtic-bpm:hasTask

ebtic-bpm:hasTask
rdfs:range

rdfs:domain
ebtic-bpm:Task

ebtic-bpm:Process
contains the vocabularies RDF and RDFS. ebtic-bpm:precededBy rdfs:range ebtic-bpm:Task

ebtic-bpm:precededBy rdfs:domain ebtic-bpm:Task

ebtic-bpm:followedBy rdfs:range ebtic-bpm:Task

ebtic-bpm:followedBy rdfs:domain ebtic-bpm:Task

ebtic-bpm:hasSubtask rdfs:range ebtic-bpm:Task

ebtic-bpm:hasSubtask rdfs:domain ebtic-bpm:Task

ebtic-bpm:startTime rdfs:domain ebtic-bpm:Process

ebtic-bpm:startTime rdfs:domain ebtic-bpm:Task

ebtic-bpm:startTime rdfs:range xs:dateTime

ebtic-bpm:endTime rdfs:domain ebtic-bpm:Process

ebtic-bpm:endTime rdfs:domain ebtic-bpm:Task

ebtic-bpm:endTime rdfs:range xs:dateTime

Extending the EBTIC-BPM vocabulary
Domain Specific Process Model representation
Schema Level startTime
department
Process endTime
Create
Product
subConceptOf
hasTask A

• Concepts Test
subConceptOf

• Relations
StartTime
executedBy
Task EndTime

Assemble testedComponent
• Attributes
EIN
subConceptOf
followedBy Employee
useComponent
createComponent
serialNumber
Component

RDF graph of the domain specific extension vocabulary
S P O
The elements in the conceptual model of the … … …

previous slide are defined as a vocabulary pa:CreateProductA rdfs:subClassOf ebtic-bpm:Process

PA that extends the set U which already contains pa:Assemble rdfs:subClassOf ebtic-bpm:Task

pa:Test rdfs:subClassOf ebtic-bpm:Task
the vocabularies RDF, RDFS and EBTIC-BPM. pa:Assemble rdfs:domain pa:useComponent

pa:useComponent rdfs:range pa:Component

pa:Assemble rdfs:domain pa:createComponent

pa:createComponent rdfs:range pa:Component

pa:Test rdfs:domain pa:testedComponent

pa:testedComponent rdfs:range pa:Component

pa:Test rdfs:domain pa:executedBy

pa:executedBy rdfs:range pa:Employee

pa:CreateProductA rdfs:domain pa:department

pa:department rdfs:range xs:String

pa:Component rdfs:domain pa:serialNumber

pa:serialNumber rdfs:range xs:Integer

pa:Employee rdfs:domain pa:EIN

pa:EIN rdfs:range xs:Integer

Instances – process execution data: 1 process instance
startTime=10:39 12/2/10
endTime=11:02 12/2/10
Process1 department=DBX

hasTask

hasTask
startTime=10:39 12/2/10 hasTask hasTask
endTime=10:42 12/2/10
startTime=10:43 12/2/10
Step1 endTime=10:48 12/2/10
startTime=10:50 12/2/10
useComponent endTime=10:54 12/2/10
startTime=10:55 12/2/10
Step2 endTime=11:02 12/2/10

Step3 Step4
followedBy
useComponent
serialNumber followedBy executedBy
=003345 createComponent
Comp 2 followedBy
useComponent
testedComponent
Name=Mario Rossi
createComponent createComponent
serialNumber EIN=566568
=00445 Empl.
Comp.699
32
Comp.35
serialNumber Comp.1 serialNumber=003234
=00800

Instances – process execution:
many instances
Process1
processStartTime=10:39 12/2/10
processEndTime=11:02 12/2/10
department=DBX Process3
department=DBX
hasTask hasTask

hasTask hasTask
startTime=10:39 12/2/10 hasTask hasTask startTime=10:39 13/2/10 hasTask hasTask
endTime=10:42 12/2/10 endTime=10:42 13/2/10
startTime=10:43 12/2/10 startTime=10:43 12/2/10
Step1 endTime=10:48 12/2/10 12/2/10 Step5 endTime=10:48 12/2/10 12/2/10
startTime=10:50 startTime=10:50
startTime=10:55 12/2/10 useComponent endTime=10:54 12/2/10
startTime=10:55 13/2/10
Step2 endTime=11:02 12/2/10 Step6 endTime=11:02 13/2/10
Step3 Step4 Step7 Step8
followedBy followedBy
useComponent useComponent
followedBy executedBy followedBy executedBy
serialN createComponent serialN createComponent
umber Comp 2 umber Comp 24
useComponent followedBy
useComponent useComponent followedBy
useComponent
=00334 Name=Mario Rossi =00444 Name=Mark Redi
5 createComponentcreateComponent 5 createComponentcreateComponent
EIN=566568 EIN=533568
serialN Empl. serialN Empl.
umber Comp.699 umber Comp.69
32 15
=00445 Comp.35 =00435 Comp.75
Comp.1 serialNumber=003234 Comp.10 serialNumber=00334
serialN serialN
umber umber
=00800 =00830
processStartTime=10:39 12/2/10 processEndTime=11:02 12/2/10
processEndTime=11:02 12/2/10 Process5 department=DBX
Process4 department=DBX hasTask
hasTask
hasTask
hasTask startTime=10:39 12/2/10 hasTask hasTask
startTime=10:39 12/2/10 hasTask hasTask endTime=10:42 12/2/10
endTime=10:42 12/2/10 startTime=10:43 12/2/10
startTime=10:43 12/2/10 Step10 endTime=10:48 12/2/10 12/2/10
startTime=10:50
Step67 endTime=10:48 12/2/10 12/2/10 endTime=10:54 12/2/10
startTime=10:55 12/2/10
startTime=10:50 useComponent
startTime=10:55 12/2/10 Step20 endTime=11:02 12/2/10
Step42 endTime=11:02 12/2/10
Step30 Step40
followedBy
Step23 Step123
followedBy useComponent
useComponent followedBy executedBy
serialN createComponent
followedBy executedBy
serialN umber Comp 122
umber
createComponent useComponent followedBy
useComponent
Comp 9 =00334
useComponent createComponentcreateComponent John Smith
=00334 Name=Mario Fettuccini 5
5 createComponentcreateComponent EIN=563248
serialN Empl.
EIN=565568 Comp.6
serialN Empl. umber 2
umber Comp.177 =00445
36 Comp.5
=00445 serialN Comp.7 serialNumber=003234
Comp.29
Comp.13 serialNumber=003234
serialN umber
umber =00800
=00800

Linking Schema and Instances
Schema Information Instance Information processStartTime=10:39 12/2/10
TYPE Process1
department=DBX
hasTask

hasTask
processStartTime TYPE startTime=10:39 12/2/10
endTime=10:42 12/2/10
hasTask hasTask
processEndTime serialNumber
startTime=10:43 12/2/10
Process Create
Product TYPE useComponent Step1
endTime=10:48 12/2/10
startTime=10:50 12/2/10
endTime=10:54 12/2/10
startTime=10:55 12/2/10
subConceptOf
hasTask A
TYPE endTime=11:02 12/2/10

TYPE followedBy
Step2
Step3 Step4
useComponent
Test followedBy executedBy
subConceptOf createComponent
StartTime serialN
Task
EndTime executedBy TYPE umber
=00334
Comp 2 useComponent
createComponent
createComponent Name=Mario Rossi

Assemble testedComponent
EID
TYPE 5
serialN
EIN=566568
Empl.
subConceptOf Comp.699
umber 32
followedBy
useComponent Employee name TYPE =00445
serialN
Comp.35
Comp.1
serialNumber=003234

createComponent umber
=00800
serialNumber
Component TYPE

TYPE processEndTime=11:02 12/2/10
department=DBX
Process1
hasTask

hasTask
startTime=10:39 12/2/10 hasTask hasTask
endTime=10:42 12/2/10
startTime=10:43 12/2/10
endTime=10:48 12/2/10 12/2/10
startTime=10:50
useComponent Step1 endTime=10:54 12/2/10
startTime=10:55 12/2/10
endTime=11:02 12/2/10
Step2
followedBy Step3 Step4
useComponent
followedBy executedBy
createComponent
serialN
umber useComponent followedBy
Comp 2 useComponent Name=Mario Rossi
=00334 createComponent
createComponent
5 EIN=566568
serialN Empl.
umber Comp.699
32
=00445 Comp.35 serialNumber=003234
serialN Comp.1
umber
=00800

RDF graph of a process instance
The elements in the process instance of the S P O

previous slide are defined as a set of RDF … … …

Process1 rdf:type pa:CreateProduct A
triples that extends the set RDF graph defined Step1 rdf:type pa:Assemble

previously by the vocabularies RDF, RDFS, Step2 rdf:type pa:Assemble

EBTIC-BPM and PA. Step3 rdf:type pa:Assemble

Step4 rdf:type pa:Test

Empl32 rdf:type pa:Employee

Comp2 pa:serialNumber “003345”^^xs:Integer




Process1 pa:department “DBX”^^xs:String

Step1 ebtic-bpm:startTime “10:39 12/2/2010”^^xs:dateTime

Step1 endTime “10:42 12/2/2010”^^xs:dateTime

Step2 startTime “10:43 12/2/2010”^^xs:dateTime

Step2 endTime “10:48 12/2/2010”^^xs:dateTime

… … …

What can I do with RDF data
model?
• An important aspect of RDF is the possibility to continuously add
information to the graph. This is enabled by the fact that every
triple is a valid RDF piece of information that identify nodes and
connections in the RDF graph. Another important feature of RDF is
that both schema and instance-level information is stored in the
same graph.
• I can query the RDF graph!
• SPARQL is the standard query language for RDF.
• SPARQL query can return as result any point in the graph.

Querying the RDF graph 12
Assemble 15 TYPE
Queries over the triples graph are conjunctive queries. 18

As an example if I want to obtain all the start times of the Test
30

assembling tasks, I need to identify the values ST (start time) 12
30 15
that satisfied the following path in the graph:
10:48 12/2/10
18
T TYPE Assemble ∧ T startTime ST Step4
Step2

23
25
Step1 TYPE Assemble  T= Step1  Step1 startTime 10:39 12/2/2010 Step1

Step2 TYPE Assemble  T= Step2  Step2 startTime 10:42 12/2/2010
Step3
21
Step3 TYPE Assemble  T= Step3  Step3 startTime 10:48 12/2/201010:39 12/2/10
23

Step4 TYPE Test  NO PATH MATCHING!
21
25
ST={ 10:39 12/2/2010 10:42 12/2/2010 10:48 12/2/2010 } startTime
10:42 12/2/10

What can I do with EBTIC-BPM
vocabulary?
• The use of EBTIC-BPM vocabulary allows independence
between applications that generate business process data
and applications that consume it.
• Assuming that the EBTIC-BPM vocabulary is present in the
RDF graph, allows process discovery and analysis of domain
specific extensions that may also be created at run time by
third party applications just with the use of SPARQL queries.

Sample deployment: an application
is used to capture process
execution data.
A listener stores the triples in a
triple store and provides a SPARQL
query interface for a client
application to be able to analyse
the process information.

A sample client application is a Process Visualizer that is albe to display domain specific
process information just by constructing SPARQL queries with only knowledge of the EBTIC-
BPM vocabulary. The numbers in the boxes correspond to the queries defined in the next
slides.

Client SPARQL Query 1
SELECT ?process
WHERE { ?process rdfs:subClassOf ebtic-bpm:Process.}

This query will return all the concepts extending the basic
ebtic-bpm:process class.
The variable ?process will contain the value
pa:CreateProductA.
(from the data in the previous examples)

SELECT ?processID ?startTime ?endTime
WHERE { ?processID rdf:type pa:CreateProductA.
?processID ebtic-bpm:startTime ?startTime.
OPTIONAL { ?processID ebtic-bpm:endTime ?endTime.}}

This query returns information (?processID ?startTime ?
endTime) about the instances of the process
pa:CreateProductA.

SELECT ?attribute ?value
WHERE { pa:01 ?attribute ?value.
FILTER (?attribute != rdf:type)}

This query will return all the tasks, process attributes and their
values associated with a specific process instance (pa:01 in
this case).

Client SPARQL query for flexible visualization
Display the process instance workflow (create a GRAPHML document from the query results)

SELECT ?PID ?startTime ?endTime ?taskID ?taskType ?follBy ?precBy
WHERE { ?PID ebtic-bpm:hasTask ?taskID.
?taskID rdf:type ?taskType.
?PID ebtic-bpm:startTime ?startTime.
OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.
OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.
OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.
FILTER (?PID = pa:01)}

Choose an alternative representation whenever a task attribute exists (pa:createComponent in this
case).

SELECT ?PID ?startTime ?endTime ?taskID (7)
?taskType ?followedBy ?precededBy ?alternativeName
WHERE { ?PID ebtic-bpm:hasTask ?taskID.
OPTIONAL { ?taskID ebtic-bpm:followedBy ?followedBy.}.
OPTIONAL { ?taskID ebtic-bpm:precededBy ?precededBy.}.
OPTIONAL { ?taskID pa:createComponent ?alternativeName.}.
FILTER (?PID = pa:01)}

Choose a task specific alternative representation (use of union).
SELECT ?PID ?startTime ?endTime ?taskID ?taskType ?follBy ?precBy ?altName
WHERE {{ ?PID ebtic-bpm:hasTask ?taskID.
OPTIONAL { ?taskID pa:createComponent ?altName.}.
FILTER (?PID = pa:01 && ?taskType = pa:Assemble)}
UNION
{ ?PID ebtic-bpm:hasTask ?taskID.
OPTIONAL { ?taskID pa:executedBy ?altName.}.
FILTER (?PID = pa:01 && ?taskType = pa:Test)}
UNION
{ ?PID ebtic-bpm:hasTask ?taskID.
FILTER (?PID = pa:01 && ?taskType != pa:Assemble
&& ?taskType != pa:Test)}}

Real time aspects
• The queries that have been so far can be registered
in the triple store as continuous queries and the
application will be notified with every new result.
• Assuming that the process monitor will continuously
intercept process execution data and translate it into
triples, the visualisation application is able to
monitor the processes in real-time.

Real Time Flexible Visualization

Concluding Remarks
• Presented an extremely extendible and flexible data representation model
oriented towards real time business process monitoring and discovering
based on RDF representation.
• Demonstrated that this approach allows process discovery and analysis of
domain specific extensions that may also be created at run time by third
party applications just with the use of SPARQL queries.
• Future work on this direction will be to develop a set of non-invasive
monitoring and analytical applications that will allows us to deploy and
test this approach within any enterprise-scale environment.

Extendible data model for real-time business process analysis

Extendible data model for real-time business process analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Extendible data model for real-time business process analysis

Ähnlich wie Extendible data model for real-time business process analysis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Extendible data model for real-time business process analysis