Process modeling is an important activity in business transformation projects. Free-form diagramming tools, such as PowerPoint and Visio, are the preferred tools for creating process models. However, the designs created using such tools are informal sketches, which are not amenable to automated analysis. Formal models, although desirable, are rarely created (during early design) because of the usability problems associated with formal-modeling tools. In this paper, we present an approach for automatically inferring formal process models from informal business process diagrams, so that the strengths of both types of tools can be leveraged. We discuss different sources of structural and semantic ambiguities, commonly present in informal diagrams, which pose challenges for automated inference. Our approach consists of two phases. First, it performs structural inference to identify the set of nodes and edges that constitute a process model. Then, it performs semantic interpretation, using a classifier that mimics human reasoning to associate modeling semantics with the nodes and edges. We discuss both supervised and unsupervised techniques for training such a classifier. Finally, we report results of empirical studies, conducted using flow diagrams from real projects, which illustrate the effectiveness of our approach.
From Informal Process Diagrams To Formal Process Models
1. IBM Research - India, New Delhi, India‡
IBM TJ Watson Research Center, New York, USA†
2. Free form diagramming tools (e.g., Visio, Powerpoint) are preferred in
creation for initial process models
Ease of use, Intuitiveness
Ubiquity
Doesn’t hinder your creativity
Process modeling software (e.g., WBM, ARIS) create models with formal
underpinnings
Allow formal analysis, model checking
Process Reuse
Process Improvement
Traceability with realized executable process
Sound, automatic approach to convert process diagrams to formal process
models is essential
A bridge between the worlds of diagramming and formal modeling
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
3. Challenges
Ambiguities in diagrams
Limitation of existing capabilities
Approach
Structure Inference
Semantic Interpretation
Empirical Study
Related Work & Future directions
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
4. Human can interpret different visual cues in
drawings to correctly resolve the structure and
semantics of the models, but machines cannot
do the same!
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
5. Connectors not glued to shapes at their endpoints
Missing
Edge
Missing
Edge
Missing
Edge
7. Same shape conveys multiple semantics
Same semantic conveyed in multiple shapes
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
8. Popular BPM tools such as Websphere Business
Modeler, ARIS, Lombardi, Telelogic System Architect,
have Visio import capabilities
Create imprecise flow structure when faced with
structural ambiguities
Employ a simple mapping (fixed or pluggable) from a
set of diagram shapes to a target set of process
semantics to interpret semantics
Such an approach cannot deal with under-specification
Building an exhaustive mapping is painful in presence of
over-specification
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
9. Process Diagram Parsing
Diagram Attributes such as Use format specific
Parse information
coordinates, dimensions, SDKs or parse XML
about diagram shapes
text, geometry formats
Shapes &
Attributes
Structure Inference
Precisely determine the Deal with structural Extract features for each
underlying flow graph ambiguities node and edge
Flow
Graph
Semantic Interpretation
Assign process semantics to every
Process node and edge in the flow graph using
Supervised and unsupervised schemes
Model to train such a classifier
a trained classifier
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
10. Process Diagram Parsing
Diagram Attributes such as Use format specific
Parse information
coordinates, dimensions, SDKs or parse XML
about diagram shapes
text, geometry formats
Shapes &
Attributes
Structure Inference
Precisely determine the Deal with structural Extract features for each
underlying flow graph ambiguities node and edge
Flow
Graph
Semantic Interpretation
Assign process semantics to every
Process node and edge in the flow graph using
Supervised and unsupervised schemes
Model to train such a classifier
a trained classifier
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
11. A B A B
C D C D
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
12. Uses notion of connection
points created at node – line
SRC SRC SRC SRC
TGT
and line – line intersections
SRC NEU NEU
C1 C2 C5 C8
A B Assign direction to connection
C3 TGT UNK C6
points
SRC
SRC C4 TGT C7 Starting at connection points
attached to nodes, propagate
C D their directions along paths in
which the directions are
consistent and identifies the
reached nodes
Create edges if connection
point at reached node has a
different direction
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
13. Process Diagram Parsing
Diagram Attributes such as Use format specific
Parse information
coordinates, dimensions, SDKs or parse XML
about diagram shapes
text, geometry formats
Shapes &
Attributes
Structure Inference
Precisely determine the Deal with structural Extract features for each
underlying flow graph ambiguities node and edge
Flow
Graph
Semantic Interpretation
Assign process semantics to every
Process node and edge in the flow graph using
Supervised and unsupervised schemes
Model to train such a classifier
a trained classifier
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
14. Train a classifier to mimic human reasoning
to decide process semantics
Features used for classification:
Relational: Indegree, Outdegree, Count of nodes
contained within
Geometric: Shape name, Count of horizontal,
vertical, diagonal lines
Textual: Count of cue words for every target entity
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
15. Structure {Nodes, Edges}
Flow Annotated by
Diagrams Inference
Features
Classifier {Nodes, Edges}
Annotated by Features
+ Process Semantic
Classifier establishes correspondence An expert labels all nodes &
between the features and labels for edges in the input set of
process semantics diagrams by their semantics
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
16. Flow Structure {Nodes, Edges} Clusterer
Diagrams Inference Annotated by
Features
Cluster A = Semantic X
{Nodes, Edges} Cluster A
Annotated by Features
+ Process Semantic Cluster B = Semantic Y Cluster B
Clusters
have
An expert looks at common
exemplars from each semantics
Classifier cluster to label process
semantic of the cluster
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
17. Data Set: 185 Visio process diagrams created in real
business-transformation projects
Objective: Compare accuracy of our tool iDISCOVER
and a popular modeling tool (called PMT for
proprietary reasons)
Method: Compare tool outputs with models created
manually by human experts to measure precision &
recall
Precision = |Actual ∩ Retrieved| , Recall = |Actual ∩ Retrieved|
|Retrieved| |Actual|
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
18. Node 96.93 95.91 70.44 86.29
Edge 93.26 90.86 63.43 59.87
Dangling 47 (100%) 3 (14%) 56%
Connector
Unlinked Labels 46 (39%) 2 (3.7%) 38%
Count of dangling connectors has a greater correlation with the edge recall of
(ρ = −0.48) than with the edge recall of (ρ = −0.08).
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
19. Our (Overall Δ ≈30%) and (Overall Δ ≈20%) for all process semantic classes
•Accuracy is low only for scarce entities like Intermediate Events and Data Objects (together
are greater than that of
less than 3% of the data set)
is almost as good as
•Better results possible with a more equitable distribution of entities work almost
Size of the training data need not be huge. Classification could
as well with only a third of the dataset size
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
20. Large body of work in the area of
understanding line drawings and hand
sketches (e.g., Futrelle, Gross, Barbu)
Focus on identifying shape geometry
Semantic interpretation follows directly from a
fixed mapping between shape geometry and target
semantics
Visual Language theory prescribes geometry
detection with grammar rules.
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
21. More efficient modeling of textual cues
Text is the only reliable feature in highly ambiguous
scenarios
Tracking spatial patterns of shapes and labels
that emerge due to local styles
Identification of higher-level relations (block
structures) between model entities (e.g., sub-
process, loop, and fork-merge)
Extend to other diagram types
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
22. Informal process diagrams contain structural and
semantic ambiguities – need to be dealt with in order to
discover precise formal models
Existing capabilities are limited because:
Do not resolve structural ambiguities
Interpreting semantic based on shape name does not suffice
Standard pattern-classification techniques can be
successfully employed in interpreting process semantics if
the feature space is carefully modeled to mimic human
reasoning
Unsupervised clustering can almost match supervised
techniques in performance
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA