This slides presents a promising data representation model for real time monitoring of business processes. The main benefit of this representation is that is transparent to the data creation and analysis processes and it is extensible at real-time.
The model is based on a shared vocabulary defined using RDF standard representation allowing independence between applications.
This model is a novel approach to real-time process data representation and paves the road to a complete new breed of applications for business process analysis
3. Limits of actual approaches
• Experience collected by deploying Business Process Mining tools in Enterprise
environments (BT, Etisalat) highlighted the need for more flexible data layer and
real time capabilities.
• Offline analytical tools;
• Rigid data model;
• BPML and BPEL require big effort from the enterprise;
• Tight connection between applications that capture the information and the ones
that analyse it
• Deal with increasingly complex systems;
• Monitoring tools have to be flexible and robust enough to be able to process also
information that is not present or unknown at the time of defining the data model.
4. Research Challenges for new
generation of BPM tools
• Flexibility: representing the process model using a formalism
that allows an increased degree of flexibility. Able to address
also situation where the process definition and its data is not
known a priori.
• Handling very large datasets: dealing with big amount of
information while maintaining high performance level.
• Real-Time performance: able to answer to continuous flow of
events. Keep the representation simple so that it can be queried
efficiently.
5. Novel representation of Process
Model
• Need for a less constraining and rigid model has
arise;
• Represent the model with a formalism that allows
the degree of flexibility required;
• Keep representation simple, this will allow the level
of flexibility we need but on the other hand will
increase complexity.
6. RDF
• We use RDF to define the process model;
• RDF is a standard vocabulary definition which is at
the basis of the Semantic Web vision, it is composed
by three elements: concepts, relations between
concepts and attributes of concepts;
• RDF is a data representation which is extremely
extendible, flexible and publicly available;
7. RDF
Concepts, relations and attributes are modelled as a labeled oriented graph, defined by a set of triples
<s,p,o>
s is called subject, p is called predicate and o is called object.
Formally a graph G can be defined as:
G ≡ (U ∪ B ∪ L) × U × (U ∪ B ∪ L)
where:
U is an infinite set of constant values (called URI references) these have their well-defined semantics
provided as an example by the RDF and RDFS vocabularies;
B is an infinite set of identifiers (called Blank nodes) which identify instantiation of concepts. Elements in
this set do not have a defined semantic;
L is an infinite set of values (called Literals). Elements in this set do not have a defined semantic.
The elements of a triple <s,p,o> are respectively: s ∈(U ∪ B ∪ L) , p ∈ U, and o ∈ (U ∪ B ∪ L).
The elements in U define the schema or vocabulary, while the elements in B and L are used to define
instances.
8. An Extensible Business Process
Data Model
Basic Process Model representation startTime
Schema Level Process endTime
• Concepts hasTask
• Relations startTime
Task
• Attributes
endTime
preceededBy
followedBy
hasSubTask
9. RDF graph of the basic process model: EBTIC-BPM vocabulary
The elements in the conceptual model of the
previous slide are defined as a vocabulary S P O
EBTIC-BPM that extends the set U which already ebtic-bpm:hasTask
ebtic-bpm:hasTask
rdfs:range
rdfs:domain
ebtic-bpm:Task
ebtic-bpm:Process
contains the vocabularies RDF and RDFS. ebtic-bpm:precededBy rdfs:range ebtic-bpm:Task
ebtic-bpm:precededBy rdfs:domain ebtic-bpm:Task
ebtic-bpm:followedBy rdfs:range ebtic-bpm:Task
ebtic-bpm:followedBy rdfs:domain ebtic-bpm:Task
ebtic-bpm:hasSubtask rdfs:range ebtic-bpm:Task
ebtic-bpm:hasSubtask rdfs:domain ebtic-bpm:Task
ebtic-bpm:startTime rdfs:domain ebtic-bpm:Process
ebtic-bpm:startTime rdfs:domain ebtic-bpm:Task
ebtic-bpm:startTime rdfs:range xs:dateTime
ebtic-bpm:endTime rdfs:domain ebtic-bpm:Process
ebtic-bpm:endTime rdfs:domain ebtic-bpm:Task
ebtic-bpm:endTime rdfs:range xs:dateTime
10. Extending the EBTIC-BPM vocabulary
Domain Specific Process Model representation
Schema Level startTime
department
Process endTime
Create
Product
subConceptOf
hasTask A
• Concepts Test
subConceptOf
• Relations
StartTime
executedBy
Task EndTime
Assemble testedComponent
• Attributes
EIN
subConceptOf
followedBy Employee
useComponent
createComponent
serialNumber
Component
11. RDF graph of the domain specific extension vocabulary
S P O
The elements in the conceptual model of the … … …
previous slide are defined as a vocabulary pa:CreateProductA rdfs:subClassOf ebtic-bpm:Process
PA that extends the set U which already contains pa:Assemble rdfs:subClassOf ebtic-bpm:Task
pa:Test rdfs:subClassOf ebtic-bpm:Task
the vocabularies RDF, RDFS and EBTIC-BPM. pa:Assemble rdfs:domain pa:useComponent
pa:useComponent rdfs:range pa:Component
pa:Assemble rdfs:domain pa:createComponent
pa:createComponent rdfs:range pa:Component
pa:Test rdfs:domain pa:testedComponent
pa:testedComponent rdfs:range pa:Component
pa:Test rdfs:domain pa:executedBy
pa:executedBy rdfs:range pa:Employee
pa:CreateProductA rdfs:domain pa:department
pa:department rdfs:range xs:String
pa:Component rdfs:domain pa:serialNumber
pa:serialNumber rdfs:range xs:Integer
pa:Employee rdfs:domain pa:EIN
pa:EIN rdfs:range xs:Integer
14. Linking Schema and Instances
Schema Information Instance Information processStartTime=10:39 12/2/10
processEndTime=11:02 12/2/10
TYPE Process1
department=DBX
hasTask
hasTask
processStartTime TYPE startTime=10:39 12/2/10
endTime=10:42 12/2/10
hasTask hasTask
processEndTime serialNumber
startTime=10:43 12/2/10
Process Create
Product TYPE useComponent Step1
endTime=10:48 12/2/10
startTime=10:50 12/2/10
endTime=10:54 12/2/10
startTime=10:55 12/2/10
subConceptOf
hasTask A
TYPE endTime=11:02 12/2/10
TYPE followedBy
Step2
Step3 Step4
useComponent
Test followedBy executedBy
subConceptOf createComponent
StartTime serialN
useComponent followedBy
Task
EndTime executedBy TYPE umber
=00334
Comp 2 useComponent
createComponent
createComponent Name=Mario Rossi
Assemble testedComponent
EID
TYPE 5
serialN
EIN=566568
Empl.
subConceptOf Comp.699
umber 32
followedBy
useComponent Employee name TYPE =00445
serialN
Comp.35
Comp.1
serialNumber=003234
createComponent umber
=00800
serialNumber
Component TYPE
processStartTime=10:39 12/2/10
TYPE processEndTime=11:02 12/2/10
department=DBX
Process1
hasTask
hasTask
startTime=10:39 12/2/10 hasTask hasTask
endTime=10:42 12/2/10
startTime=10:43 12/2/10
endTime=10:48 12/2/10 12/2/10
startTime=10:50
useComponent Step1 endTime=10:54 12/2/10
startTime=10:55 12/2/10
endTime=11:02 12/2/10
Step2
followedBy Step3 Step4
useComponent
followedBy executedBy
createComponent
serialN
umber useComponent followedBy
Comp 2 useComponent Name=Mario Rossi
=00334 createComponent
createComponent
5 EIN=566568
serialN Empl.
umber Comp.699
32
=00445 Comp.35 serialNumber=003234
serialN Comp.1
umber
=00800
15. RDF graph of a process instance
The elements in the process instance of the S P O
previous slide are defined as a set of RDF … … …
Process1 rdf:type pa:CreateProduct A
triples that extends the set RDF graph defined Step1 rdf:type pa:Assemble
previously by the vocabularies RDF, RDFS, Step2 rdf:type pa:Assemble
EBTIC-BPM and PA. Step3 rdf:type pa:Assemble
Step4 rdf:type pa:Test
Empl32 rdf:type pa:Employee
Comp2 pa:serialNumber “003345”^^xs:Integer
Comp699 pa:serialNumber “00445”^^xs:Integer
Comp35 pa:serialNumber “00800”^^xs:Integer
Comp1 pa:serialNumber “003234”^^xs:Integer
Process1 pa:department “DBX”^^xs:String
Step1 ebtic-bpm:startTime “10:39 12/2/2010”^^xs:dateTime
Step1 endTime “10:42 12/2/2010”^^xs:dateTime
Step2 startTime “10:43 12/2/2010”^^xs:dateTime
Step2 endTime “10:48 12/2/2010”^^xs:dateTime
… … …
16. What can I do with RDF data
model?
• An important aspect of RDF is the possibility to continuously add
information to the graph. This is enabled by the fact that every
triple is a valid RDF piece of information that identify nodes and
connections in the RDF graph. Another important feature of RDF is
that both schema and instance-level information is stored in the
same graph.
• I can query the RDF graph!
• SPARQL is the standard query language for RDF.
• SPARQL query can return as result any point in the graph.
17. Querying the RDF graph 12
Assemble 15 TYPE
Queries over the triples graph are conjunctive queries. 18
As an example if I want to obtain all the start times of the Test
30
assembling tasks, I need to identify the values ST (start time) 12
30 15
that satisfied the following path in the graph:
10:48 12/2/10
18
T TYPE Assemble ∧ T startTime ST Step4
Step2
23
25
Step1 TYPE Assemble T= Step1 Step1 startTime 10:39 12/2/2010 Step1
Step2 TYPE Assemble T= Step2 Step2 startTime 10:42 12/2/2010
Step3
21
Step3 TYPE Assemble T= Step3 Step3 startTime 10:48 12/2/201010:39 12/2/10
23
Step4 TYPE Test NO PATH MATCHING!
21
25
ST={ 10:39 12/2/2010 10:42 12/2/2010 10:48 12/2/2010 } startTime
10:42 12/2/10
18. What can I do with EBTIC-BPM
vocabulary?
• The use of EBTIC-BPM vocabulary allows independence
between applications that generate business process data
and applications that consume it.
• Assuming that the EBTIC-BPM vocabulary is present in the
RDF graph, allows process discovery and analysis of domain
specific extensions that may also be created at run time by
third party applications just with the use of SPARQL queries.
19. Sample deployment: an application
is used to capture process
execution data.
A listener stores the triples in a
triple store and provides a SPARQL
query interface for a client
application to be able to analyse
the process information.
20. A sample client application is a Process Visualizer that is albe to display domain specific
process information just by constructing SPARQL queries with only knowledge of the EBTIC-
BPM vocabulary. The numbers in the boxes correspond to the queries defined in the next
slides.
21. Client SPARQL Query 1
SELECT ?process
WHERE { ?process rdfs:subClassOf ebtic-bpm:Process.}
This query will return all the concepts extending the basic
ebtic-bpm:process class.
The variable ?process will contain the value
pa:CreateProductA.
(from the data in the previous examples)
22. Client SPARQL Query 2
SELECT ?processID ?startTime ?endTime
WHERE { ?processID rdf:type pa:CreateProductA.
?processID ebtic-bpm:startTime ?startTime.
OPTIONAL { ?processID ebtic-bpm:endTime ?endTime.}}
This query returns information (?processID ?startTime ?
endTime) about the instances of the process
pa:CreateProductA.
23. Client SPARQL Query 3
SELECT ?attribute ?value
WHERE { pa:01 ?attribute ?value.
FILTER (?attribute != rdf:type)}
This query will return all the tasks, process attributes and their
values associated with a specific process instance (pa:01 in
this case).
24. Client SPARQL query for flexible visualization
Display the process instance workflow (create a GRAPHML document from the query results)
SELECT ?PID ?startTime ?endTime ?taskID ?taskType ?follBy ?precBy
WHERE { ?PID ebtic-bpm:hasTask ?taskID.
?taskID rdf:type ?taskType.
?PID ebtic-bpm:startTime ?startTime.
OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.
OPTIONAL { ?taskID ebtic-bpm:followedBy ?follBy.}.
OPTIONAL { ?taskID ebtic-bpm:precededBy ?precBy.}.
FILTER (?PID = pa:01)}
25. Client SPARQL query for flexible visualization
Choose an alternative representation whenever a task attribute exists (pa:createComponent in this
case).
SELECT ?PID ?startTime ?endTime ?taskID (7)
?taskType ?followedBy ?precededBy ?alternativeName
WHERE { ?PID ebtic-bpm:hasTask ?taskID.
?taskID rdf:type ?taskType.
?PID ebtic-bpm:startTime ?startTime.
OPTIONAL {?PID ebtic-bpm:endTime ?endTime.}.
OPTIONAL { ?taskID ebtic-bpm:followedBy ?followedBy.}.
OPTIONAL { ?taskID ebtic-bpm:precededBy ?precededBy.}.
OPTIONAL { ?taskID pa:createComponent ?alternativeName.}.
FILTER (?PID = pa:01)}
28. Real time aspects
• The queries that have been so far can be registered
in the triple store as continuous queries and the
application will be notified with every new result.
• Assuming that the process monitor will continuously
intercept process execution data and translate it into
triples, the visualisation application is able to
monitor the processes in real-time.
30. Concluding Remarks
• Presented an extremely extendible and flexible data representation model
oriented towards real time business process monitoring and discovering
based on RDF representation.
• Demonstrated that this approach allows process discovery and analysis of
domain specific extensions that may also be created at run time by third
party applications just with the use of SPARQL queries.
• Future work on this direction will be to develop a set of non-invasive
monitoring and analytical applications that will allows us to deploy and
test this approach within any enterprise-scale environment.