This PhD thesis explores using problem solving methods (PSMs) to support subject matter experts (SMEs) in acquiring and understanding process knowledge. It proposes four main contributions: 1) a process metamodel providing terminology for processes, 2) a reusable PSM library, 3) a graphical modeling environment enabling SMEs to formulate processes using PSMs without knowledge engineers, and 4) a method for automatically synthesizing executable process models from SME-authored diagrams supported by an underlying representation and reasoning formalism. The goal is to empower SMEs to generate computer-readable process content and apply reasoning techniques to analyze process outcomes.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Acquisition And Understanding Of Process Knowledgev1 1
1. Acquisition and Understanding of
Process Knowledge Using
Problem Solving Methods
Jose Manuel Gómez Pérez
Facultad de Informática
Universidad Politécnica de Madrid
Campus de Montegancedo sn
28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
jmgomez@fi.upm.es
Phone: 34.91.3363670
Fax: 34.91.3524819
PhD thesis Date: 08/05/2009
2. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 2
3. Knowledge Acquisition: Towards SME empowerment
DARPA’s
KA Frameworks HPKB & RKF
The Knowledge The Role
Acquisition Bottleneck Differentiation
Principle
The Knowledge Level KA by Subject
Matter Experts
DARPA’s KSE KA by (SMEs)
Subject Matter Expert (SME)
Knowledge
Engineers KB edition by SMES
(KEs) KRAKEN
Knowledge Knowledge DISCIPL-RKF
formulation
modeling KRR Languages
by SMEs
KB
OCML
maintenance
Ontologies Ontology editors KARL SHAKEN CHIMAERA
Collaborative
Knowledge
Engineer (KE) Knowledge knowledge creation
programming Problem Solving SEMANTIC WIKIS
Methods
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 3
4. Knowledge types
Source: the Halo project KR analysis phase for
the domains of Chemistry, Biology, and Physics
• Processes are special knowledge
RUL CLS
FACT
MAT types that
(classification) (factual (mathematics)
(inference)
knowledge)
• Relate to the sequence of
operations and involved
CMP TAB PCS CAUS
(comparison) (tables) (processes) (cause-effect) resources leading to the
production of some outcome
DAT PROC EXP US • Encapsulate preconditions,
(data structures) (procedural) (experiments) (underspecified)
results, contents, actors, and
TRANS causes
NF SPACE PWR
(translation)
(non functional) (spatial) (part-whole) • Process knowledge is complex
• It builds on top of other simpler
TIME GRA
(temporal) (diagrammatic)
knowledge types, like facts and
rules
“What is released/added/increased upon binding
of two amino acids?”
“A piece of solid calcium is heated in oxygen gas. ...”
“Find correct RNA sequence for a given DNA sequence.”
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 4
5. Why processes are important
• Processes appear in
37% (average) in the
domains of Biology,
Chemistry, and Physics
• The most important
knowledge type in
Chemistry (53%)
• Second in Biology
(35%)
• Fourth in Physics (22%)
Source: The Halo project KR analysis phase for
the domains of Chemistry, Biology, and Physics
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 5
6. Work objectives
Objective 1: To enable SMEs to Provide reusable guidelines
formulate processes without KEs to formulate process
How knowledge
Objective 2: To support SMEs in
Support reasoning
understanding process executions
Describe the main rationale
behind a process
What
PSMs
Whom
PCS SMEs
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 6
7. PSM perspectives
Task-method Interaction
decomposition
Black-box perspective
Knowledge transformation
within the PSM
Hierarchically defines how
PSM establishes and controls tasks decompose into simpler
the sequence of actions (sub)tasks
required to perform a task Describes tasks at several
Defines knowledge required at levels of detail
each task step Provides alternative ways to
achieve a task
Knowledge flow
Task
Method
Role
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 7
8. Provenance analysis of process executions
?
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 8
9. In summary
• This thesis proposes the use of PSMs as a novel approach for
supporting SMEs both in the formulation of process knowledge
and in the provenance analysis of process executions
• It also explores to what extent it is possible to build such tools
that take KEs out of the formulation and analysis loop
• Ultimately, it aims at showing that it is possible to
engage users
• To generate computer-readable content
represented in formal languages
• To apply knowledge representation and reasoning
techniques to analyze the outcomes of
automated, knowledge-intensive processes
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 9
10. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 10
11. Open research problems and work hypotheses: Objective 1
• H1: Empowering SMEs can increase KB
Objective 1: To provide SMEs with the
means to formulate process knowledge in quality and reduce costs
their domains of expertise without the
intervention of KEs
• H2: The complexity of process
knowledge requires providing SMEs with
specific means to acquire and reason
with processes
• H3: PSMs can reduce the complexity of
acquiring process knowledge by SMEs
• H4: The proposed methods and tools
abstract SMEs from the underlying
representation
• H5: The proposed methods and tools
produce sound and complete executable
process models
• H6: The proposed method and tools are
flexible and reusable across domains
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 11
12. Open research problems and work hypotheses: Objective 2
Objective 2: To support SMEs in
analyzing and understanding process
executions
• H7: The analytical capabilities of
PSMs can provide SMEs with
meaningful interpretations of
process executions
• H8: The method proposed identifies
the main rationale behind
processes by detecting occurrences
of PSMs in their execution logs
• H9: The method proposed can use
the hierarchical structure of PSMs
to describe process executions at
different levels of detail
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 12
13. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 13
14. Acquisition of process knowledge by SMEs
Objective 1: To provide SMEs with the means to
formulate process knowledge in their domains of
• Four main contributions expertise without the intervention of KEs
• C1: a process metamodel, which provides the terminology
necessary to express process entities in scientific domains, and
the relations between them
• C2: a PSM library, which provides high-level, reusable
abstractions for process representation and a method for its
development
• C3: a graphical process modeling and
reasoning environment, which applies
the previous contributions in order to
enable SMEs to formulate process
knowledge
• C4: a method for the automatic
synthesis of executable process
models from SME-authored process
diagrams, supported by an underlying
representation and reasoning
formalism
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 14
15. Contribution 1: The process metamodel
• Resources (roles) • Actions
• Containers of domain conceptsthat • Inspired by activities in EO and TOVE
can play a particular role
• Relations
• Forks
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 15
16. Contribution 2: Building a PSM library for acquisition of process knowledge
Identification
Decomposition
and abstraction
Extends work done in the Halo
analysis phase by Omniscience and
Ontoprise teams Reusable
PSM library
• 755 AP questions analyzed
• >100 domain-specific processes
• Four main process categories
• Join
• Split
• Modify
• Locate
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 16
17. Contribution 2: A PSM example (decompose & combine)
“Crystallization occurs when certain
pairs of oppositely charged ions
attract each other so strongly that they
form an insoluble ionic solid. This
process coexists with dissolution
processes in precipitation reactions”
The addition of a colorless solution of potassium iodide (KI)
to a colorless solution of lead nitrate [Pb(NO3)2] produces a
yellow precipitate of lead iodide (Pbl2) that slowly settles to
the bottom of the beaker.
name decompose & combine
goal member(Recombination set, Element) and
member(Constituents set, Piece) and
part-of(Piece, Element) and
part-of(Piece, Combination) and
properties(Element, ep) and
properties(Combination, cp) and
not equal(ep, cp)
actions decompose, combine
input action decompose
output action combine
input roles Recombination set, Decomposer, Combinator
output roles Combination, Byproduct
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 17
18. Contribution 3: The graphical process modeling environment
Domain process
to which this
process
diagram is Consistency
bound maintenance
Process (knowledge base and
metamodel process data flow)
PSM library
(e.g.
decompose &
recombine)
Domain-level
reasoning and
control flow
evaluation
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 18
19. Contribution 4: The process representation and reasoning formalism
• Bridges the gap between the
knowledge level and the
operational level
• Focus on three main aspects
• Process frame
• Data flow
• Control flow
Input
action
Conditional Output
precedence action
(control flow)
“In a long-distance jump competition, an athlete
can jump only after his mitochondria have
accumulated enough energy for his muscles to
contract.”
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 19
20. Contribution 4: Addressing the frame problem
Pre state of
action
• Two states (pre and post) per Dissolve
process action
• Pre state: portion of the process
frame in the scope of an action
Post state of
• Post state: updated pre state of action
the action after its the execution Dissolve
• Actions read from their pre state
and write into their post state
• At modeling time we automatically synthesize process
rules that manage the process frame during execution
• Setup rules: build the pre state of the input actions of the Explicit manipulation of
process the process frame
• Precedence rules: describe what actions can be allows runtime
management of data
connected with each other by means of their outputs and and control flow
inputs
• Transition rules: describe the transition between pre and
post states
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 20
21. Contribution 4: The Code synthesis mechanism
• Action-centric algorithm
• Each action results into a set
of process rules in F-logic
intermediate output
input actions
actions actions
setup rules x - -
transition
x x x
rules setup FORALL m, e, v
m:mitochondrion@preState(accumulateEnergy) AND
precedence m:TOOL@preState(accumulateEnergy) AND
- x x
rules e: energy@preState(accumulateEnergy) AND
transition e:RESOURCE@preState(accumulateEnergy) AND
e[hasEnergyValue -> v]@preState(accumulateEnergy)
FORALL m, e, j <-
j: jump@postState(muscleContraction) AND m:mitochondrion AND
j: OUTPUT@postState(muscleContraction) AND e:energy AND
muscleContraction[PROVIDES -> j] @postState(muscleContraction) e[hasEnergyValue -> v].
<.- precedence
m:muscle @preState(muscleContraction) AND FORALL e, v
m:TOOL@preState(muscleContraction) AND e:energy@preState(muscleContraction) AND
m[IS_USED_BY -> muscleContraction]@preState(muscleContraction) AND e[hasEnergyValue -> v]@ preState(muscleContraction)
e:energy@ preState(muscleContraction) AND <.-
e:RESOURCE@preState(muscleContraction) AND e:energy@postState(accumulateEnergy) AND
e[IS_CONSUMED_BY -> muscleContraction] @preState(muscleContraction). e[hasEnergyValue -> v]@ postState(accumulateEnergy).
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 21
22. Contribution 4: Domain-level reasoning within processes
“The minimum amount
of energy needed to
jump are 5 calories”
“The length of the jump
is directly proportional
to the amount of
energy accumulated”
transition
check
FORALL j, m, e, length
FORALL anEnergy, v
j: jump@postState(muscleContraction) AND
enough_energy_for_contraction(anEnergy)
j: OUTPUT@postState(muscleContraction) AND
@check_enough_energy_for_contraction(accumulateEnergy)
muscle contraction[PROVIDES -> j]@postState(muscleContraction) AND
<-
j[hasLength-> length] @postState(muscleContraction)
anEnergy:energy @preState(muscleContraction) AND
<.-
anEnergy:energy[hasEnergyValue -> v] @preState(muscleContraction) AND
m:muscle @preState(muscleContraction) AND
greater(v, 5).
m:TOOL@preState(muscleContraction) AND
update m[IS_USED_BY -> muscleContraction]@preState(muscleContraction) AND
FORALL length, anEnergy, v e:energy@ preState(muscleContraction) AND
aJump(out(hasLength, length):jump@update(muscleContraction) e:RESOURCE@preState(muscleContraction) AND
aJump(out(hasLength, length) [hasLength -> length]@update(muscleContraction) e[IS_CONSUMED_BY -> muscleContraction] @preState(muscleContraction) AND
<- enough_energy_for_contraction(e)
anEnergy:energy @preState(muscleContraction) AND @check_enough_energy_for_contraction(accumulateEnergy) AND
anEnergy:energy[hasEnergyValue -> v] @preState(muscleContraction) AND j:jump@update(muscleContraction) AND
multiply(length, 2, v). j[hasLength -> length]@update(muscleContraction).
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 22
23. Contribution 4: Sample question
“In a long-distance jump competition, an
athlete can jump only after his mitochondria
have accumulated enough energy for his
muscles to contract.”
At least, what amount of energy does a long jump energy1:energy[hasValue -> 100].n FORALL j, oa, v <-
athlete need to consume in order to jump more than “long jump”:PROCESS@ProcessModule AND
8m long? “long jump”[OUTPUT_ACTION ->> oa]@ProcessModule AND
j:Jump[hasValue -> v]@postState(oa) AND
√ a. 100 cal greater(v, 8).
√ b. 50 cal
√ c. 250 cal
d. 1 cal
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 23
24. Contribution 4: Properties of the process models
• Sound and complete
• Based on F-logic’s proof theory plus additional proof for the
process formalism
• A process action is sound ↔ its post state can be deduced from its
pre state
• A process action is complete ↔ it allows deducing all the possible
clauses of its post state from the clauses in the pre state
• A process model is sound and complete ↔ all its actions are sound
and complete
• Optimized
• Attribute and concept names ground
• person(Peter) instead of instanceOf(person, Peter)
• Allows OntoBroker to index tuples by class and attribute name
• Process rules are generally stratified
• Critical in the presence of negation (forks and loops)
• Avoid costly well-founded evaluation mode
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 24
25. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 25
26. Provenance analysis of process executions by SMEs
Objective 2: To support SMEs in
analyzing and understanding process
executions
• Two main contributions
• C5: A method and algorithm that uses PSMs as high-level,
reusable process abstractions and visualization paradigm to
identify and explain the reasoning strategies and rationale of
executed processes
• C6: An architecture and integrated environment for the analysis
of process executions at the knowledge level
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 26
27. Contribution 5: Towards knowledge provenance
• PSMs as semantic overlays
on top of existing process
documentation
• Task: What is going to be
achieved by executing a
process
• PSM: HOW
• Provenance, from a knowledge perspective
• How provenance relates to the execution of a process
• Simpler process analysis proposing decompositions into
simpler subprocesses
• Visualize provenance at different levels of detail
• Supporting SMEs in two main ways
• Validation of process executions
• Identification of reasoning patterns in process
Source: myGrid executions
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 27
28. Contribution 5: The twig join function
• Based on XML pattern matching algorithms on Directed
Acyclic Graphs (Bruno et al., 2002)
• twig_join detects the occurrence of a pattern in a XML DAG
• Given
• P, a process
• T, a task potentially describing P
• M, a PSM providing a strategy on how to achieve T
• i(T), the set of input roles of T
• o(T), the set of output roles of T
• D, the DAG resulting from documenting the execution of P
• twig_join(D,i(T),o(T)) checks whether a twig exists for M that
connects i(T) with o(T) in D
• In this case, PSM M is the pattern to be identified in the
process documentation DAG D
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 28
29. Contribution 5: A twig join example
Domain Bridges PSM
entities (mapping) entities
twig join!
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 29
30. Contribution 5: The matching algorithm
• twig_join recursively applied
Task-method
decomposition
at each decomposition level
• Each task decomposed by
one or several PSMs (task-
twig_join(Ti, D) method decomposition view)
• Knowledge flow defines the
sequence of evaluation
decompose(Ti)
twig_join(T11, D)
Knowledge flow
twig_join(T12, D)
twig_join(T13, D)
Backtracking
possible at PSM
and role levels
twig_join(T14, D)
Interaction
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 30
31. Contribution 6: A Knowledge-Oriented Provenance Environment
PSM- Matching
Ontology visualization
bridges
Provenance Matching
query detection
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 31
32. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 32
33. Objective 1
• Evaluation settings
• 2 Chemistry SMEs, 2 Biology SMEs, and 2 Physics SMEs
Judith Lennart Christianne Martina Markus Andreas
• Syllabus
• Chemistry: Stoichiometry, solutions and equilibrium (Brown & Lemay,
pages 75-83, 113-133, and 613-653)
• Biology: Cell and DNA structure and processes (Campbell and Reece,
pages 112-124, 217-223, 239-245, 293-301, 304-311, and 317-319)
• Physics: Kinematics and Dynamics (Serway and Faughn, chapters 2,3,
and 4)
• Two main dimensions: usability and utility
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 33
34. Evaluation results: utilization of the PSM library
Objective 1
# of
processes
modeled
SME1 (Physics) 0 H3: PSMs can reduce the H1: SME empowerment
SME2 (Biology) 2 complexity of process KA can increase KB quality
SME3 (Biology) 6 and reduce costs
H6: The proposed
SME4 (Chemistry) 0 methods and tools are
flexible and reusable
SME5 (Chemistry) 3
SME6 (Physics) 0
Total 11 Processes PSMs
Transition from G2 phase to mitosis n.a.
SME2 (Biology)
Mitosis n.a.
Mitosis decompose & combine
Carbohydrate metabolism consume, transform
Cellular respiration decompose, consume
SME3 (Biology)
Detoxification transform
Photosynthesis form by combination
Ribosome protein synthesis situate & combine
Complete ionic equation form by combination
SME5 (Chemistry) Molecular equation decompose & combine
Net ionic equation form by combination
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 34
35. Evaluation results: performance of process models
Objective 1
with respect to configuration C0
H5: The proposed
Query C0 C1 C2 methods and tools
produce sound and
SME3-q0 31 1,00 0 0,00 16 0,52 complete executable
SME3-q1 63 1,00 16 0,25 16 0,25 process models
SME3-q2 31 1,00 16 0,52 16 0,52
SME3-q3 47 1,00 16 0,34 16 0,34
SME3-q4 15 1,00 0 0,00 0 0,00
SME3-q5 32 1,00 16 0,50 0 0,00
SME3-q6 203 1,00 219 1,08 234 1,15
SME3-q7 63 1,00 31 0,49 31 0,49
• C0
SME3-q8 47 1,00 31 0,66 16 0,34 • Well-founded evaluation on
SME3-q9 62 1,00 32 0,52 16 0,26
SME3-q10 203 1,00 218 1,07 203 1,00 • Concept/attr. names ground off
Average 79,7 1,00 59,5 56,4
Median 47 1,00 16
0,75
0,34 16
0,71
0,34
• C1
Min 15 1,00 0 0,00 0 0,00 • Well-founded evaluation on
Max 203 1,00 219 1,08 234 1,15
<1 - faster • Concept/attr. names ground on
=1 - same time as config C0
>1 - slower
• C2
• Well-founded evaluation off
• Concept/attr. names ground on
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 35
36. Evaluation results: utility and usability
Objective 1
• Physics SMEs did not use processes • System Usability (SU) scale
• Not so important for Chemistry SMEs • SMEs answered a questionnaire about
the system with a quantitative value
• SME2 (Biology): “It makes the
ranging between 0 and 100
representation of biological models
easier” • Average obtained: 64,5
• SME3 (Biology): “The modeling of H2: Due to its H4: The method and
processes is very useful. It must be complexity, SMEs tools proposed
possible to ask questions about the require specific abstract SMEs from
various states of a process. And asking means for process the underlying KRR
questions with T&D worked okay” KA formalism
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 36
37. Evaluation settings (Provenance Challenge)
Objective 2
Brain Atlas Brain Atlas
Workflow Provenance Data
Flow
Catalogation PSM
37
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 37
38. Evaluation results
Objective 2
• Focus on precision and
recall metrics
• Identified at three different
layered contexts
• Method
• Task
H7: PSMs can provide
SMEs with explanations • Decomposition-level
of process executions
H8: The method
proposed identifies the
main rationale behind
processes by detecting
PSM occurrences
H9: PSMs describe
process executions at
different levels of detail
Perfect match
Partial match
No match
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 38
39. Outline
Introduction and motivation
Open research problems and work hypotheses
Acquisition of process knowledge by SMEs
Provenance analysis of process executions by SMEs
Evaluation
Conclusions and future research problems
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 39
40. Conclusions
Objective 1: To enable SMEs to
acquire processes without KEs
• Qualitative evidence rather than statistical proof (only 6 SMEs)
• However, evidence found that it is possible to engage users in
acquiring process knowledge without the intervention of KEs
• SMEs using the PSM library (SME3 and SME5) produced more and
better quality process models (82%) than the rest (SME2)
• The method used to create the PSM library has also shown evidence
to be reusable in other domains
Objective 2: To support SMEs in
understanding process executions
• Semantic overlays e.g. PSMs on top of process documentation
provide the required abstractions to analyze provenance from a
knowledge perspective
• Provenance analysis by SMEs favors from a hierarchical structure in
such overlays
• The matching algorithm has not been applied to large PSM libraries
and provenance logs
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 40
41. The ubiquity of processes
Business
Biology
Healthcare
Climate
Ecology
prediction
Chemistry
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 41
42. Future research problems
• The Web is driving a new computing paradigm through the
involvement of users forming online communities
• Additionally, focus change from data to process
• The solutions proposed live in the Semantic Web in the small
• Challenge: move to the Web in the large
Process Performance, Collaboration Process
representation coverage, in user validation, trust
and reasoning scale communities maintenance
More expressivity Distributed
(events, qualitative reasoning Share and reuse
reasoning) algorithms processes Process reliability
and validation
Incomplete, Conciliation of
partial results Compare and
inconsistent,
recommend
contradictory
Heuristics processes
knowledge bases
(assumptions,
defaults) Trust
Uncertainty, Process-specific
nonmonotonicity Caching query mechanisms
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 42
43. Acquisition and Understanding of
Process Knowledge Using
Problem Solving Methods
Jose Manuel Gómez Pérez
Facultad de Informática
Universidad Politécnica de Madrid
Campus de Montegancedo sn
28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
jmgomez@fi.upm.es
Phone: 34.91.3363670
Fax: 34.91.3524819
PhD thesis Date: 08/05/2009
44. Previous steps towards provenance analysis
• PASOA (Groth et al., 2008)
process documentation
• Relationship p-assertions
• Who exchanges messages
• Interaction p-assertions
• How input data becomes
output data
• Actor-state p-assertions
PASOA interaction p-assertion
• Non-functional properties
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 44
45. Contribution 2: PSMs for the acquisition of process knowledge
• PSMs separate process from
domain knowledge by explicitly
representing the different roles
domain entities play in a process
• Focus on how to achieve
processes
• Provide SMEs with guidelines and
templates to formulate process
models
Modelling framework for process knowledge,
extending TMDA
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 45
46. Contribution 2: The PSM library for Process Modelling
Reusable
across the
three
domains
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 46
47. The Halo Project
• “Project Halo aims to create an intelligent computer system able
to answer novel questions in scientific domains with expertise
equivalent to Advanced Placement (AP )competence level”
• Three scientific domains: Chemistry, Biology, and Physics
• Focus on tools that allow Subject Matter Experts (SMEs) to
author scientific knowledge
The elongation of the leading strand during DNA synthesis: The primer that initiates the synthesis of a new DNA
strand is usually:
a. produces Okazaki fragments
b. depends on the action of DNA polymerase a. RNA
c. does not require a template strand b. DNA
c. an Okazaki fragment
d. a structural protein
Which part of the animal cell is required only in the first stage
e. a thymine dimer
of mitosis and what is the name of such stage?
a. chromatin and prophase
b. chromatid and prometaphase
c. centromere and anaphase
d. plasma membrane and telophase
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 47
48. Validating process models: The testing & debugging perspective
Test set
Query
Test results
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 48
49. The process representation and reasoning formalism
• Bridges the gap between the knowledge level and the operational level
• Three main aspects to address
• Process frame
• On what portion of the knowledge base can a process action have effect?
• How can the effects of actions on the knowledge base be propagated to
successive actions?
• Data flow: path followed by the data throughout a process
• Control flow: run-time evaluation and enactment of the actual flow of data
Data flow Data flow
Input action Conditional Output action
precedence
(control flow)
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 49
50. Addressing the frame problem
• Two states (pre and post) per process action
• Pre state: portion of the process frame in the scope of an action
• Post state: updated pre state of the action after its the execution
• Actions read from their pre state and write into their post state
Post state of
action Dissolve
Pre state of
action Dissolve
• We automatically synthesize rules at modeling time that create Explicit manipulation of
these modules upon process execution the process frame
allows runtime
• Setup rules: build the pre state of the input actions of the process
management of data
• Precedence rules: describe what actions can be connected with and control flow
each other by means of their outputs and inputs
• Transition rules: describe the transition between pre and post states
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 50
51. Correlation between the process metamodel and the formalism
Process metamodel Process representation and reasoning formalism
Action Process rule
1 1..*
Setup rule Precedence rule
Atomic action Iterative action
Transition rule Domain rule
1 *
Update rule Check rule
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 51
52. F-logic as process representation and reasoning language
• Process diagrams authored by SMEs are automatically translated into
F-logic (OntoBroker) code
• Some key properties
• Module system (encoding the process frame; pre and post states)
• Negation: (control flow evaluation)
• Evaluation methods: dynamic filtering (Kifer and Loziinski, 1986) and well-
founded evaluation (vanGelder et a., 1991)
• Performing and scalable, as shown by OpenRuleBench (Liang et al., 2009)
• Functional advantages
• Enabling a single entry point for reasoning across the whole system
• Domain-level reasoning within processes, through rules
• Keeping introspective properties for reasoning with meta-level information,
like subprocesses and intermediate results
• Support for expressing and reasoning with processes , allowing
inference of process outcomes, grounded in the particular domains of
application
• The synthesized code implements data and control flow within
processes
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 52
53. Evaluation results: utilization of process metamodel
Objective 1
# of # of
Processes # of relations # of forks Total
resources actions
Transition from
G2 phase to 12 21 1 3 37
SME2 mitosis
(Biology)
Mitosis 34 37 0 5 76
Mitosis 29 11 0 5 45
Carbohydrate
5 6 0 2 13
metabolism
Cellular
SME3 5 7 0 2 14
respiration
(Biology)
Detoxification 4 4 0 1 9
Photosynthesis 6 6 0 1 13
Ribosome protein
2 2 0 1 5
synthesis
Complete ionic
7 7 0 1 15
equation
SME5 Molecular
4 4 0 1 9
(Chemistry) equation
Net ionic
3 3 0 1 7
equation
Total 111 108 1 23 243
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 53
54. Contribution 6: A hierarchical visualization paradigm
Task-method
decomposition view
Perfect match
Partial match
No match
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 54
55. Future research problems
• The Web is driving a new computing paradigm through the
involvement of users forming online communities
• Migrating from data to process
• The solutions proposed live in the Semantic Web in the small
• Challenge: move to the Web in the large
• Process representation and reasoning
• More expressivity required (events, qualitative)
• Incomplete, inconsistent, even contradictory knowledge bases
• Uncertainty, nonmonotonicity
• Performance, coverage, and scale
• Distributed reasoning algorithms, conciliation of partial results,
heuristics, caching, well-founded assumptions, defaults
• Collaboration in communities of users
• Share, merge, compare, and recommend process models
• Process-specific query mechanisms
• Web-scale process validation and trust maintenance
• Process reliability, trust and validation
Jose Manuel Gómez Pérez – Acquisition and Understanding of Process Knowledge Using Problem Solving Methods, PhD thesis 55