SlideShare a Scribd company logo
1 of 28
Download to read offline
Khalid Belhajjame
Joint work with
Daniela Grigori, Mariem Harmassi et Manel BenYahia
LAMSADE, Université Paris Dauphine
Keyword-Based Search of Workflow
Fragments and their Composition
Workflows
A business process specified using the
BPMN notation
A ScientificWorkflow system
(Taverna)
A workflow consists of an orchestrated and repeatable pattern of business activity enabled
by the systematic organization of resources into processes that transform materials, provide
services, or process information (Workflow Coalition)
CAIR, Lyon2 18 Feb 2016
Workflows are difficult to design
—  The design of scientific workflows, just like business process,
can be a difficult task
—  Deep knowledge of the domain
—  Awareness of the resources, e.g., programs and web
services, that can enact the steps of the workflow
—  Publish and share workflows, and promote their reuse.
—  myExperiment, CrowldLab, Galaxy, and other various business
process repository
—  Reuse is still an aim.
—  There are no capabilities that support the user in identifying the
workflows, or fragments thereof, that are relevant for the task
at hand.
CAIR, Lyon4 18 Feb 2016
Assisting designers of workflows
through reuse
CAIR, Lyon5
—  There has been a good body of work in the literature on
workflow discover.
—  The typical approach requires the user to specify a query in the
form of a workflow that is compared against a collection of
workflows.
—  In our work, we focus on a problem that is different and that
received little attention.
—  We position ourselves in a context where a designer is
specifying his/her workflow and needs assistance to design
parts of her workflow (which we term fragments).
18 Feb 2016
Approach
CAIR, Lyon6 18 Feb 2016
Mining Frequent Workflow Fragments
18 Feb 2016CAIR, Lyon7
Given a repository of workflows, we would like to mine
frequent workflow fragments, i.e., fragments that are used
across multiple workflows.This raises two questions:
—  How to deal with the heterogeneity of the labels used by
different users to model the activities of their workflows
within the repository?
—  Which graph representation is best suited for formatting
workflows for mining frequent fragments?
Homogenizing Activity Labels
18 Feb 2016CAIR, Lyon8
—  Activities implementing the same functionality should have
the same names.
—  Workflow authored by different people are likely to use
different labels for naming activities that perform the same
task.
—  To homogenize activity labels we use state of the art
techniques, which rely on a common dictionary for:
—  Identifying the activities that perform the same task, and
—  Selecting the same label for naming them.
Detecting Recurrent Fragments
—  We use graph mining algorithms to identify the fragments in
the repository that are recurrent.
—  We use the SUBDUE algorithm.
—  Which graph representation to use to represent (workflow)
fragments?
—  We examined a number of workflow representation
CAIR, Lyon12 18 Feb 2016
Representation A
att1
att2 att3
att4
att5
next
operator
And
operator
sequence
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand
operand
Representation B
att1
att2
att3
att4
att5
next
Split-And
next
Join-Xor
J-Xor
sequence
next
sp-and
sp-and
CAIR, Lyon13 18 Feb 2016
Representation C
att2
att3
att4
att5
att1
S-att1-att2 S-att1-att3
seq-att2-att4
seq-att4-att5
att2
att3
att5
att1
S-att1-att2 S-att1-att3
seq-att3-att5
CAIR, Lyon14 18 Feb 2016
att1
att2
att3
att4
att5
And_att1_att3
And_att1_att2
XOR_att3_att5
SEQ_att2_att4
XOR_att4_att5
Representation D Representation D1
att1
att2
att3
att4
att5
And
And
XOR
SEQ
XOR
CAIR, Lyon15 18 Feb 2016
Experiment
l Objective:To assess the suitability of the graph
representations for mining workflow graphs
l Effectiveness : Precision/ Recall
l Size of the graphs
l Execution time
CAIR, Lyon16 18 Feb 2016
Dataset
—  We created three datasets of workflow specifications,
containing respectively 30, 42, and 71 workflows.
—  9 out of these workflows are similar to each other and, as
such contain recurrent structures, that should be detected by
the mining algorithm.
—  Despite the small size of the collection, these datasets
allowed to distinguish to a certain extent between the
different representations.
CAIR, Lyon17 18 Feb 2016
Experiment results:
Input Data size
CAIR, Lyon18 18 Feb 2016
Experiment:
Effectiveness (Precision/ Recall)
CAIR, Lyon19
Precision =
#Truepositives
#Truepositives + #Falsepositives
Recall =
#Truepositives
#Truepositives + #Falsenegatives
18 Feb 2016
Representation A
att1
att2 att3
att4
att5
next
operator
And
operator
sequence
next
operand
operator
Xor
type
type
operand
next
operand
typeoperand
operand
Representation B
att1
att2
att3
att4
att5
next
Split-And
next
Join-Xor
J-Xor
sequence
next
sp-and
sp-and
CAIR, Lyon20 18 Feb 2016
Experiment1:
Effectiveness (Precision/ Recall)
CAIR, Lyon21
Precision =
#Truepositives
#Truepositives + #Falsepositives
Recall =
#Truepositives
#Truepositives + #Falsenegatives
18 Feb 2016
Approach
CAIR, Lyon24 18 Feb 2016
Keyword-Based Search for frequent
workflow fragments
18 Feb 2016CAIR, Lyon25
Given an initial workflow, the user issues a query to
characterize the desired missing workflow fragment.The query
is composed of two elements:
1.  A set of keywords {kw1,…,kwn} characterizing the
functionality that the fragment should implement
2.  The set of activities in the initial workflow that are to be
connected to the fragment in question.
Acommon = {a1,…,am},We call these activities common or
joint activities, interchangeably.
Retrieving Relevant Fragments
18 Feb 2016CAIR, Lyon26
—  The first step in processing the user query consists in
identifying the fragments in the repository that are relevant
given the specified keywords.
—  In doing so, we adopt the widely used technique ofTF/IDF
(term frequency / inverse document frequency) measure.
—  Note however that workflow fragments are indexed by the
labels of the activities obtained by homogenization, which we
refer to by IDX
—  Whereas, the user uses keywords of his own choosing
Retrieving relevant Fragments
18 Feb 2016CAIR, Lyon27
—  For each keyword kwi provided by the user, we retrieve its
synonyms, syn(kwi), from an existing thesaurus, e.g.,
Wordnet
—  The index IDX is trawled to identify if there is a term in
syn(kwi) U kwi, which is can be used to represent the
keyword in the vector used for retrieving fragments.
—  Once the fragments are retrieved, their constitent activities
are matched against the common activities in the initial
workflow that the user identified
Ranking Fragments
18 Feb 2016CAIR, Lyon28
The score used for ranking candidate fragments given a user
query uq is defined as follows:
—  Relevance: denotes the score obtained usingTF-IDF
—  Frequency:A workflow fragment that is frequently used
across multiple workflow is likely to implement best
practices that are useful for the designer compared with a
fragment that apperas in, say one workflow
Compatibility(wffrg, uq.Acommon) =
aj 2Acommon
matchingScore(aj, wffrg)
|Acommon|
where matchingScore(aj, wffrg) is the matching score between aj and the
best matching activity in wffrg, and |Acommon| the number of activities
in Acommon. Compatibility(wffrg, Acommon) takes a value between 0 and 1.
The larger is the number of activities Acommon that have a matching activity
in wffrg and the higher are the matching scores, the higher is the compati-
bility between the candidate workflow fragment and the initial workflow.
Based on the above factors, we define the score used for ranking candidate
fragments given a user query uq as:
Score(wffrg, uq) =
wr.Relvance(wffrg, uq) + wf .
F requency(wffrg)
MaxF requency
+ wc.Compatibility(wffrg, uq.Acommon)
where wr, wf and wc are positive real numbers representing weights such that
wr + wf + wc = 1. MaxFrequency is positive number denoting the frequency
of the fragment that appears the maximum of times in the workflow repository
harvested. Notice that the score takes a value between 0 and 1. Once the can-
didate fragments are scored, they are ranked in the descendant order of their
associated scores. The top-k fragments, e.g., 5, are then presented to the designer
who selects to the one to be composed with the initial workflow.
5 Composing Workflow Fragments
Once the user has examined the fragments that are retrieved given the set
of keywords s/he specified, s/he can choose a fragment to be composed with
Ranking Fragment: Compatibility
18 Feb 2016CAIR, Lyon29
To estimate the compatibility, we consider the number of
activities in Acommon that have a matching activity in the
workflow fragment and their associated matching score.
ated matching score. Specifically, we define the compatibility of a workflow
fragment given the activities uq.Acommon specified in the user query uq as
follows:
Compatibility(wffrg, uq.Acommon) =
P
aj 2Acommon
matchingScore(aj, wffrg)
|Acommon|
where matchingScore(aj, wffrg) is the matching score between aj and the
best matching activity in wffrg, and |Acommon| the number of activities
in Acommon. Compatibility(wffrg, Acommon) takes a value between 0 and 1.
The larger is the number of activities Acommon that have a matching activity
in wffrg and the higher are the matching scores, the higher is the compati-
bility between the candidate workflow fragment and the initial workflow.
Based on the above factors, we define the score used for ranking candidate
fragments given a user query uq as:
Score(wffrg, uq) =
wr.Relvance(wffrg, uq) + wf .
F requency(wffrg)
MaxF requency
+ wc.Compatibility(wffrg, uq.Acommon)
Composing Workflow Fragments
18 Feb 2016CAIR, Lyon30
configurable split operator to connect the two workflows as illustrated in Figure
10 (lines 9 and 10). We call such an operator configurable because it is up
to the user to choose if such an operator is of a type and split, or split
or xor split. The preceding activity of such an operator is acommon and its
succeeding activities are the succeeding activities of wfinitial and wffrg.
Fig. 8: Merging the initial workflow and a fragment in sequence based on the
common activity acommon.
Fig. 9: Merging the preceding activities of the initial workflow and a fragment
using a configurable join control operator.
Fig. 10: Merging the succeeding activities of the initial workflow and a fragment
Fig. 9: Merging the preceding activities of the initial workflow and a fragment
using a configurable join control operator.
Fig. 10: Merging the succeeding activities of the initial workflow and a fragment
using a configurable split control operator.
Example from eScience
18 Feb 2016CAIR, Lyon31
To assist the designer is specifying a workflow for profiling genes that are suspected
to be involved in a genetic disease, in this case the huntignton disease.
people. Although the genetic mutation that causes HD was identified 20 years
ago [19], the downstream molecular mechanisms leading to the HD phenotype
are still poorly understood. Relating alterations in gene expression to epigenetic
information might shed light on the disease aetiology. With this in mind, the
scientist wanted to specify a workflow that annotate HD gene expression data
with publicly available epigenetic data [20].
Fig. 11: Initial Huntington Disease profiling workflow.
The initial workflow specified by the designer is illustrated in Figure 11.
The workflow contains three activities that are ordered in sequence. The first
activity MapGeneT oConcept is used to extract the concepts that are used in
domain ontologies to characterize a given gene that is known or suspected to
be involved in an infection or disease, in this case the Huntigton Disease. The
second activity IndentifySimilarConcepts is then used to identify for each of
those concept the concepts that are similar. The rational behind doing so is that
genes that are associated with concepts that are similar to concepts that are
involved in the disease have chances of being also involved in the disease. This
activity involves running ontological similarity tests to identify similar concepts.
The third activity GetConceptIds is then used to extract the identifiers of the
similar concepts.
keywords = {concept,
score,
rank,
profile}
the common activity in the initial workflow, GetConceptIds. Still, our method
was able detect it as a common activity thanks to the string similarity utilized.
Fig. 12: Candidate fragment selected by the workflow designer.
By applying the composition algorithm presented in Section 5 to merge the
initial workflow and the fragment selected by the user, we obtained the workflow
depicted in Figure 13. Indeed, the common activity has no succeeding activity
in the initial workflow and has no preceding activity in the workflow fragment
(lines 1, 2 in Figure 7). Therefore, the two workflows are connected ins sequence
using the common activity.
Example from eScience
18 Feb 2016CAIR, Lyon32
To assist the designer is specifying a workflow for profiling genes that are suspected
to be involved in a genetic disease, in this case the huntignton disease.
people. Although the genetic mutation that causes HD was identified 20 years
ago [19], the downstream molecular mechanisms leading to the HD phenotype
are still poorly understood. Relating alterations in gene expression to epigenetic
information might shed light on the disease aetiology. With this in mind, the
scientist wanted to specify a workflow that annotate HD gene expression data
with publicly available epigenetic data [20].
Fig. 11: Initial Huntington Disease profiling workflow.
The initial workflow specified by the designer is illustrated in Figure 11.
The workflow contains three activities that are ordered in sequence. The first
activity MapGeneT oConcept is used to extract the concepts that are used in
domain ontologies to characterize a given gene that is known or suspected to
be involved in an infection or disease, in this case the Huntigton Disease. The
second activity IndentifySimilarConcepts is then used to identify for each of
those concept the concepts that are similar. The rational behind doing so is that
genes that are associated with concepts that are similar to concepts that are
involved in the disease have chances of being also involved in the disease. This
activity involves running ontological similarity tests to identify similar concepts.
The third activity GetConceptIds is then used to extract the identifiers of the
similar concepts.
keywords = {concept,
score,
rank,
profile}
Fig. 13: Workflow obtained by merging the initial workflow and selected frag-
ment.
Conclusions
18 Feb 2016CAIR, Lyon33
1.  The method we have described for mining workflow
fragments have been empirically tested.
2.  The method for retrieving and composing workflow
fragments has been used in an eScience use case to assist
the design of a Huntington Disease workflow
3.  The work on mining frequent workflow fragments has
been published in the International Keystone conference
4.  The extended version with results on the retrieval and
composition of workflow fragments has been submitted to
a special issue on keyword search of the Springer Journal:
Transactions on Computational Collective Intelligence
Khalid Belhajjame
Joint work with
Daniela Grigori, Mariem Harmassi et Manel BenYahia
LAMSADE, Université Paris Dauphine
Keyword-Based Search of Workflow
Fragments and their Composition

More Related Content

What's hot

Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notesDoug Chang
 
Java8 training - class 3
Java8 training - class 3Java8 training - class 3
Java8 training - class 3Marut Singh
 
06. operator overloading
06. operator overloading06. operator overloading
06. operator overloadingHaresh Jaiswal
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIJörn Guy Süß JGS
 
Introduction to Control systems in scilab
Introduction to Control systems in scilabIntroduction to Control systems in scilab
Introduction to Control systems in scilabScilab
 
Hoisting Nested Functions
Hoisting Nested FunctionsHoisting Nested Functions
Hoisting Nested FunctionsFeras Tanan
 
Stack organization
Stack organizationStack organization
Stack organizationchauhankapil
 
Hoisting Nested Functions
Hoisting Nested Functions Hoisting Nested Functions
Hoisting Nested Functions Feras Tanan
 
Lambda Expressions in Java
Lambda Expressions in JavaLambda Expressions in Java
Lambda Expressions in JavaErhan Bagdemir
 
Programming with Lambda Expressions in Java
Programming with Lambda Expressions in Java Programming with Lambda Expressions in Java
Programming with Lambda Expressions in Java langer4711
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
Data Type Conversion in C++
Data Type Conversion in C++Data Type Conversion in C++
Data Type Conversion in C++Danial Mirza
 
Java 8 - Project Lambda
Java 8 - Project LambdaJava 8 - Project Lambda
Java 8 - Project LambdaRahman USTA
 

What's hot (20)

5.program structure
5.program structure5.program structure
5.program structure
 
Java 8 Intro - Core Features
Java 8 Intro - Core FeaturesJava 8 Intro - Core Features
Java 8 Intro - Core Features
 
Odersky week1 notes
Odersky week1 notesOdersky week1 notes
Odersky week1 notes
 
Java8 training - class 3
Java8 training - class 3Java8 training - class 3
Java8 training - class 3
 
06. operator overloading
06. operator overloading06. operator overloading
06. operator overloading
 
A Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its APIA Brief Conceptual Introduction to Functional Java 8 and its API
A Brief Conceptual Introduction to Functional Java 8 and its API
 
Introduction to Control systems in scilab
Introduction to Control systems in scilabIntroduction to Control systems in scilab
Introduction to Control systems in scilab
 
Hoisting Nested Functions
Hoisting Nested FunctionsHoisting Nested Functions
Hoisting Nested Functions
 
Stack organization
Stack organizationStack organization
Stack organization
 
Scilab vs matlab
Scilab vs matlabScilab vs matlab
Scilab vs matlab
 
Hoisting Nested Functions
Hoisting Nested Functions Hoisting Nested Functions
Hoisting Nested Functions
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
Unit3 cspc
Unit3 cspcUnit3 cspc
Unit3 cspc
 
Major Java 8 features
Major Java 8 featuresMajor Java 8 features
Major Java 8 features
 
Lambda Expressions in Java
Lambda Expressions in JavaLambda Expressions in Java
Lambda Expressions in Java
 
Programming with Lambda Expressions in Java
Programming with Lambda Expressions in Java Programming with Lambda Expressions in Java
Programming with Lambda Expressions in Java
 
Class 6 2ciclo
Class 6 2cicloClass 6 2ciclo
Class 6 2ciclo
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
Data Type Conversion in C++
Data Type Conversion in C++Data Type Conversion in C++
Data Type Conversion in C++
 
Java 8 - Project Lambda
Java 8 - Project LambdaJava 8 - Project Lambda
Java 8 - Project Lambda
 

Viewers also liked

مشروع قانون 88.13 يتعلق بالصحافة والنشر
مشروع قانون 88.13 يتعلق بالصحافة والنشر مشروع قانون 88.13 يتعلق بالصحافة والنشر
مشروع قانون 88.13 يتعلق بالصحافة والنشر Mustapha Khalfi
 
Filters two design_with_matlab
Filters two design_with_matlabFilters two design_with_matlab
Filters two design_with_matlabresearchwork
 
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_EN
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_ENQ3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_EN
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_ENKantar Media CIC
 
Matlab Based Decimeter Design Analysis Wimax Appliacation
Matlab Based Decimeter Design Analysis Wimax AppliacationMatlab Based Decimeter Design Analysis Wimax Appliacation
Matlab Based Decimeter Design Analysis Wimax Appliacationiosrjce
 
Digital image processing using matlab: basic transformations, filters and ope...
Digital image processing using matlab: basic transformations, filters and ope...Digital image processing using matlab: basic transformations, filters and ope...
Digital image processing using matlab: basic transformations, filters and ope...thanh nguyen
 

Viewers also liked (6)

مشروع قانون 88.13 يتعلق بالصحافة والنشر
مشروع قانون 88.13 يتعلق بالصحافة والنشر مشروع قانون 88.13 يتعلق بالصحافة والنشر
مشروع قانون 88.13 يتعلق بالصحافة والنشر
 
Filters two design_with_matlab
Filters two design_with_matlabFilters two design_with_matlab
Filters two design_with_matlab
 
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_EN
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_ENQ3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_EN
Q3 WP Topic two-Alternative Ways to Measure Internet Community Dynamics_EN
 
Low pass filter
Low pass filterLow pass filter
Low pass filter
 
Matlab Based Decimeter Design Analysis Wimax Appliacation
Matlab Based Decimeter Design Analysis Wimax AppliacationMatlab Based Decimeter Design Analysis Wimax Appliacation
Matlab Based Decimeter Design Analysis Wimax Appliacation
 
Digital image processing using matlab: basic transformations, filters and ope...
Digital image processing using matlab: basic transformations, filters and ope...Digital image processing using matlab: basic transformations, filters and ope...
Digital image processing using matlab: basic transformations, filters and ope...
 

Similar to Anr cair meeting feb 2016

Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsThe University of Adelaide
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Load Runner Methodology to Performance Testing
Load Runner Methodology to Performance TestingLoad Runner Methodology to Performance Testing
Load Runner Methodology to Performance Testingijtsrd
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsHolistic Benchmarking of Big Linked Data
 
Ui path certificate question set 1
Ui path certificate question set 1Ui path certificate question set 1
Ui path certificate question set 1Majid Hashmi
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISijseajournal
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementDatabricks
 
Analytics for Process Excellence
Analytics for Process ExcellenceAnalytics for Process Excellence
Analytics for Process ExcellenceDenis Gagné
 
PrOnto: an Ontology Driven Business Process Mining Tool
PrOnto: an Ontology Driven Business Process Mining ToolPrOnto: an Ontology Driven Business Process Mining Tool
PrOnto: an Ontology Driven Business Process Mining ToolFrancesco Nocera
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsIRJET Journal
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programmingAraf Karsh Hamid
 
Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Sanif Sanif
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceIRJET Journal
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniquesaliraza786
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Makoto Yui
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniquepptGyandeep Kansal
 

Similar to Anr cair meeting feb 2016 (20)

Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Load Runner Methodology to Performance Testing
Load Runner Methodology to Performance TestingLoad Runner Methodology to Performance Testing
Load Runner Methodology to Performance Testing
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Ui path certificate question set 1
Ui path certificate question set 1Ui path certificate question set 1
Ui path certificate question set 1
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Analytics for Process Excellence
Analytics for Process ExcellenceAnalytics for Process Excellence
Analytics for Process Excellence
 
Spring aop
Spring aopSpring aop
Spring aop
 
PrOnto: an Ontology Driven Business Process Mining Tool
PrOnto: an Ontology Driven Business Process Mining ToolPrOnto: an Ontology Driven Business Process Mining Tool
PrOnto: an Ontology Driven Business Process Mining Tool
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...Api specification based function search engine using natural language query-S...
Api specification based function search engine using natural language query-S...
 
Benchmarking PyCon AU 2011 v0
Benchmarking PyCon AU 2011 v0Benchmarking PyCon AU 2011 v0
Benchmarking PyCon AU 2011 v0
 
Performance testing wreaking balls
Performance testing wreaking ballsPerformance testing wreaking balls
Performance testing wreaking balls
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniques
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
NEr using N-Gram techniqueppt
NEr using N-Gram techniquepptNEr using N-Gram techniqueppt
NEr using N-Gram techniqueppt
 

More from Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsKhalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScienceKhalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsKhalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in SepublicaKhalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenanceKhalid Belhajjame
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Khalid Belhajjame
 

More from Khalid Belhajjame (19)

Provenance witha purpose
Provenance witha purposeProvenance witha purpose
Provenance witha purpose
 
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 
Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)Intégration incrémentale de données (Valenciennes juin 2010)
Intégration incrémentale de données (Valenciennes juin 2010)
 
Edbt 2010, Belhajjame
Edbt 2010, BelhajjameEdbt 2010, Belhajjame
Edbt 2010, Belhajjame
 

Recently uploaded

Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 

Anr cair meeting feb 2016

  • 1. Khalid Belhajjame Joint work with Daniela Grigori, Mariem Harmassi et Manel BenYahia LAMSADE, Université Paris Dauphine Keyword-Based Search of Workflow Fragments and their Composition
  • 2. Workflows A business process specified using the BPMN notation A ScientificWorkflow system (Taverna) A workflow consists of an orchestrated and repeatable pattern of business activity enabled by the systematic organization of resources into processes that transform materials, provide services, or process information (Workflow Coalition) CAIR, Lyon2 18 Feb 2016
  • 3. Workflows are difficult to design —  The design of scientific workflows, just like business process, can be a difficult task —  Deep knowledge of the domain —  Awareness of the resources, e.g., programs and web services, that can enact the steps of the workflow —  Publish and share workflows, and promote their reuse. —  myExperiment, CrowldLab, Galaxy, and other various business process repository —  Reuse is still an aim. —  There are no capabilities that support the user in identifying the workflows, or fragments thereof, that are relevant for the task at hand. CAIR, Lyon4 18 Feb 2016
  • 4. Assisting designers of workflows through reuse CAIR, Lyon5 —  There has been a good body of work in the literature on workflow discover. —  The typical approach requires the user to specify a query in the form of a workflow that is compared against a collection of workflows. —  In our work, we focus on a problem that is different and that received little attention. —  We position ourselves in a context where a designer is specifying his/her workflow and needs assistance to design parts of her workflow (which we term fragments). 18 Feb 2016
  • 6. Mining Frequent Workflow Fragments 18 Feb 2016CAIR, Lyon7 Given a repository of workflows, we would like to mine frequent workflow fragments, i.e., fragments that are used across multiple workflows.This raises two questions: —  How to deal with the heterogeneity of the labels used by different users to model the activities of their workflows within the repository? —  Which graph representation is best suited for formatting workflows for mining frequent fragments?
  • 7. Homogenizing Activity Labels 18 Feb 2016CAIR, Lyon8 —  Activities implementing the same functionality should have the same names. —  Workflow authored by different people are likely to use different labels for naming activities that perform the same task. —  To homogenize activity labels we use state of the art techniques, which rely on a common dictionary for: —  Identifying the activities that perform the same task, and —  Selecting the same label for naming them.
  • 8. Detecting Recurrent Fragments —  We use graph mining algorithms to identify the fragments in the repository that are recurrent. —  We use the SUBDUE algorithm. —  Which graph representation to use to represent (workflow) fragments? —  We examined a number of workflow representation CAIR, Lyon12 18 Feb 2016
  • 9. Representation A att1 att2 att3 att4 att5 next operator And operator sequence next operand operator Xor type type operand next operand typeoperand operand Representation B att1 att2 att3 att4 att5 next Split-And next Join-Xor J-Xor sequence next sp-and sp-and CAIR, Lyon13 18 Feb 2016
  • 12. Experiment l Objective:To assess the suitability of the graph representations for mining workflow graphs l Effectiveness : Precision/ Recall l Size of the graphs l Execution time CAIR, Lyon16 18 Feb 2016
  • 13. Dataset —  We created three datasets of workflow specifications, containing respectively 30, 42, and 71 workflows. —  9 out of these workflows are similar to each other and, as such contain recurrent structures, that should be detected by the mining algorithm. —  Despite the small size of the collection, these datasets allowed to distinguish to a certain extent between the different representations. CAIR, Lyon17 18 Feb 2016
  • 14. Experiment results: Input Data size CAIR, Lyon18 18 Feb 2016
  • 15. Experiment: Effectiveness (Precision/ Recall) CAIR, Lyon19 Precision = #Truepositives #Truepositives + #Falsepositives Recall = #Truepositives #Truepositives + #Falsenegatives 18 Feb 2016
  • 16. Representation A att1 att2 att3 att4 att5 next operator And operator sequence next operand operator Xor type type operand next operand typeoperand operand Representation B att1 att2 att3 att4 att5 next Split-And next Join-Xor J-Xor sequence next sp-and sp-and CAIR, Lyon20 18 Feb 2016
  • 17. Experiment1: Effectiveness (Precision/ Recall) CAIR, Lyon21 Precision = #Truepositives #Truepositives + #Falsepositives Recall = #Truepositives #Truepositives + #Falsenegatives 18 Feb 2016
  • 19. Keyword-Based Search for frequent workflow fragments 18 Feb 2016CAIR, Lyon25 Given an initial workflow, the user issues a query to characterize the desired missing workflow fragment.The query is composed of two elements: 1.  A set of keywords {kw1,…,kwn} characterizing the functionality that the fragment should implement 2.  The set of activities in the initial workflow that are to be connected to the fragment in question. Acommon = {a1,…,am},We call these activities common or joint activities, interchangeably.
  • 20. Retrieving Relevant Fragments 18 Feb 2016CAIR, Lyon26 —  The first step in processing the user query consists in identifying the fragments in the repository that are relevant given the specified keywords. —  In doing so, we adopt the widely used technique ofTF/IDF (term frequency / inverse document frequency) measure. —  Note however that workflow fragments are indexed by the labels of the activities obtained by homogenization, which we refer to by IDX —  Whereas, the user uses keywords of his own choosing
  • 21. Retrieving relevant Fragments 18 Feb 2016CAIR, Lyon27 —  For each keyword kwi provided by the user, we retrieve its synonyms, syn(kwi), from an existing thesaurus, e.g., Wordnet —  The index IDX is trawled to identify if there is a term in syn(kwi) U kwi, which is can be used to represent the keyword in the vector used for retrieving fragments. —  Once the fragments are retrieved, their constitent activities are matched against the common activities in the initial workflow that the user identified
  • 22. Ranking Fragments 18 Feb 2016CAIR, Lyon28 The score used for ranking candidate fragments given a user query uq is defined as follows: —  Relevance: denotes the score obtained usingTF-IDF —  Frequency:A workflow fragment that is frequently used across multiple workflow is likely to implement best practices that are useful for the designer compared with a fragment that apperas in, say one workflow Compatibility(wffrg, uq.Acommon) = aj 2Acommon matchingScore(aj, wffrg) |Acommon| where matchingScore(aj, wffrg) is the matching score between aj and the best matching activity in wffrg, and |Acommon| the number of activities in Acommon. Compatibility(wffrg, Acommon) takes a value between 0 and 1. The larger is the number of activities Acommon that have a matching activity in wffrg and the higher are the matching scores, the higher is the compati- bility between the candidate workflow fragment and the initial workflow. Based on the above factors, we define the score used for ranking candidate fragments given a user query uq as: Score(wffrg, uq) = wr.Relvance(wffrg, uq) + wf . F requency(wffrg) MaxF requency + wc.Compatibility(wffrg, uq.Acommon) where wr, wf and wc are positive real numbers representing weights such that wr + wf + wc = 1. MaxFrequency is positive number denoting the frequency of the fragment that appears the maximum of times in the workflow repository harvested. Notice that the score takes a value between 0 and 1. Once the can- didate fragments are scored, they are ranked in the descendant order of their associated scores. The top-k fragments, e.g., 5, are then presented to the designer who selects to the one to be composed with the initial workflow. 5 Composing Workflow Fragments Once the user has examined the fragments that are retrieved given the set of keywords s/he specified, s/he can choose a fragment to be composed with
  • 23. Ranking Fragment: Compatibility 18 Feb 2016CAIR, Lyon29 To estimate the compatibility, we consider the number of activities in Acommon that have a matching activity in the workflow fragment and their associated matching score. ated matching score. Specifically, we define the compatibility of a workflow fragment given the activities uq.Acommon specified in the user query uq as follows: Compatibility(wffrg, uq.Acommon) = P aj 2Acommon matchingScore(aj, wffrg) |Acommon| where matchingScore(aj, wffrg) is the matching score between aj and the best matching activity in wffrg, and |Acommon| the number of activities in Acommon. Compatibility(wffrg, Acommon) takes a value between 0 and 1. The larger is the number of activities Acommon that have a matching activity in wffrg and the higher are the matching scores, the higher is the compati- bility between the candidate workflow fragment and the initial workflow. Based on the above factors, we define the score used for ranking candidate fragments given a user query uq as: Score(wffrg, uq) = wr.Relvance(wffrg, uq) + wf . F requency(wffrg) MaxF requency + wc.Compatibility(wffrg, uq.Acommon)
  • 24. Composing Workflow Fragments 18 Feb 2016CAIR, Lyon30 configurable split operator to connect the two workflows as illustrated in Figure 10 (lines 9 and 10). We call such an operator configurable because it is up to the user to choose if such an operator is of a type and split, or split or xor split. The preceding activity of such an operator is acommon and its succeeding activities are the succeeding activities of wfinitial and wffrg. Fig. 8: Merging the initial workflow and a fragment in sequence based on the common activity acommon. Fig. 9: Merging the preceding activities of the initial workflow and a fragment using a configurable join control operator. Fig. 10: Merging the succeeding activities of the initial workflow and a fragment Fig. 9: Merging the preceding activities of the initial workflow and a fragment using a configurable join control operator. Fig. 10: Merging the succeeding activities of the initial workflow and a fragment using a configurable split control operator.
  • 25. Example from eScience 18 Feb 2016CAIR, Lyon31 To assist the designer is specifying a workflow for profiling genes that are suspected to be involved in a genetic disease, in this case the huntignton disease. people. Although the genetic mutation that causes HD was identified 20 years ago [19], the downstream molecular mechanisms leading to the HD phenotype are still poorly understood. Relating alterations in gene expression to epigenetic information might shed light on the disease aetiology. With this in mind, the scientist wanted to specify a workflow that annotate HD gene expression data with publicly available epigenetic data [20]. Fig. 11: Initial Huntington Disease profiling workflow. The initial workflow specified by the designer is illustrated in Figure 11. The workflow contains three activities that are ordered in sequence. The first activity MapGeneT oConcept is used to extract the concepts that are used in domain ontologies to characterize a given gene that is known or suspected to be involved in an infection or disease, in this case the Huntigton Disease. The second activity IndentifySimilarConcepts is then used to identify for each of those concept the concepts that are similar. The rational behind doing so is that genes that are associated with concepts that are similar to concepts that are involved in the disease have chances of being also involved in the disease. This activity involves running ontological similarity tests to identify similar concepts. The third activity GetConceptIds is then used to extract the identifiers of the similar concepts. keywords = {concept, score, rank, profile} the common activity in the initial workflow, GetConceptIds. Still, our method was able detect it as a common activity thanks to the string similarity utilized. Fig. 12: Candidate fragment selected by the workflow designer. By applying the composition algorithm presented in Section 5 to merge the initial workflow and the fragment selected by the user, we obtained the workflow depicted in Figure 13. Indeed, the common activity has no succeeding activity in the initial workflow and has no preceding activity in the workflow fragment (lines 1, 2 in Figure 7). Therefore, the two workflows are connected ins sequence using the common activity.
  • 26. Example from eScience 18 Feb 2016CAIR, Lyon32 To assist the designer is specifying a workflow for profiling genes that are suspected to be involved in a genetic disease, in this case the huntignton disease. people. Although the genetic mutation that causes HD was identified 20 years ago [19], the downstream molecular mechanisms leading to the HD phenotype are still poorly understood. Relating alterations in gene expression to epigenetic information might shed light on the disease aetiology. With this in mind, the scientist wanted to specify a workflow that annotate HD gene expression data with publicly available epigenetic data [20]. Fig. 11: Initial Huntington Disease profiling workflow. The initial workflow specified by the designer is illustrated in Figure 11. The workflow contains three activities that are ordered in sequence. The first activity MapGeneT oConcept is used to extract the concepts that are used in domain ontologies to characterize a given gene that is known or suspected to be involved in an infection or disease, in this case the Huntigton Disease. The second activity IndentifySimilarConcepts is then used to identify for each of those concept the concepts that are similar. The rational behind doing so is that genes that are associated with concepts that are similar to concepts that are involved in the disease have chances of being also involved in the disease. This activity involves running ontological similarity tests to identify similar concepts. The third activity GetConceptIds is then used to extract the identifiers of the similar concepts. keywords = {concept, score, rank, profile} Fig. 13: Workflow obtained by merging the initial workflow and selected frag- ment.
  • 27. Conclusions 18 Feb 2016CAIR, Lyon33 1.  The method we have described for mining workflow fragments have been empirically tested. 2.  The method for retrieving and composing workflow fragments has been used in an eScience use case to assist the design of a Huntington Disease workflow 3.  The work on mining frequent workflow fragments has been published in the International Keystone conference 4.  The extended version with results on the retrieval and composition of workflow fragments has been submitted to a special issue on keyword search of the Springer Journal: Transactions on Computational Collective Intelligence
  • 28. Khalid Belhajjame Joint work with Daniela Grigori, Mariem Harmassi et Manel BenYahia LAMSADE, Université Paris Dauphine Keyword-Based Search of Workflow Fragments and their Composition

Editor's Notes

  1. Workflows are increasingly used by scientists as a means for specifying and enacting their experiments. Such workflows are often data intensive [5]. The data sets obtained by their enactment have several applications, e.g., they can be used to understand new phenomena or confirm known facts, and therefore such data sets are worth storing (or preserving) for future analyzes.
  2. -scientific workflows have been used to encode in-silico experiments. -The design of scientific workflows can be a difficult task . It requires a deep knowledge of the domain as well as awareness of the programs and services available for implementing the workflow steps. -In 2009, De Roure and coauthors pointed out the advantages of sharing and reusing workflows from scientific workflows repositores like MyExperiment, Crowdlabs, Galaxy and others. -The problem is that the size of these repositories is continuously growing and many problems relating to the reuse of available workflows emerged, example it become difficlut to distinguish a special use case from a usage pattern -So using mining techniques forms a goos solution. Lets discuss the most important contributions in mining workflows.
  3. Designing a workflow is a time consuming and sometimes expensive task. Not only does the designer need to be knowledgeable of the tools (services) that are available to implement the steps in the process s/he desires, she also needs to know how such services can (or should) be connected together in a network taking into consideration, amongst other criteria, the types and semantic domains of their input and output parameters. There has been a good body of work in the literature on workflow discovery, see e.g., [8,9,10]. The typical approach requires the user to specify a query in the form of a workflow that is compared against a collection of workflows. The workflows that are retrieved are then ranked taking into consideration, amongst other things, the structural similarity between the workflow issued by the user and the workflows retrieved. In our work, we focus on a problem that is different and that received little attention in the literature. Specifically, we position ourselves in a context where a designer is specifying his/her workflow and needs assistance to design parts of her workflow, e.g., because she does not know the services that are available and that can be used to enact the steps within the part in question. In certain situations, the designer may know the services that can be used for such purpose, but would still like to acquire knowledge about the best way/practice for connecting such services together. The solution we describe in this paper has been elaborated with such designers in mind. It can be used to assist them finding an existing fragment(s) that can be used to complete the workflow being designed. Furthermore, we provide the user with suggestions on the way such fragments can be composed with the initial workflow. Figure 1 illustrates our
  4. Given a repository of workflows, we would like to mine frequent workflow fragments, i.e., fragments that are used across multiple workflows. Such fragments are likely to implement tasks that can be reused in newly specified workflows. Mining workflows raises the following questions. – How to deal with the heterogeneity of the labels used by different users to model the activities of their workflows within the repository? Different designers use different labels to name the activities that compose their workflow. We need a mean to homogenize the labels before mining the workflows in order not to miss false negative relevant fragments. – Which graph representation is best suited for formatting workflows for mining frequent fragments? We argue that the effectiveness and efficiency of the mining algorithm used to retrieve frequent fragments
  5. To be able to extract frequent workflow fragments, we first need to identifycommon activities. Thus, activities implementing the same functionality shouldhave the same names. Some workflow modelling tools (see for example Signavio2) handle a dictionary and allow to reuse dictionary entries via a search or by theauto-completion function (when user starts taping an activity label, the systemsuggests similar activity labels form the dictionary). If models in the repositorycome from different tools and use different naming conventions, a preprocessingstep is applied to homogenize activity labels and to extract the dictionary [11].For facilitating this step, we rely on existing techniques like [12]. These techniquesare able to recognize the labelling styles and to automatically refactor labelswith quality issues. These labels are transformed into a verb–object label bythe derivation of actions and business objects from activity labels. The labelsof verb–object style contain an action that is followed by a business object, likeCreate invoiceandValidate order. (The benefits of this style of labeling havebeen promoted by several practical studies and modelling guidelines). Synonymsare also handled in this step. They are recognized using the WordNet lexicaldatabase and they are replaced with a common term.
  6. Filtering Notre système extrait de ce fichier graphe( Workflow de l'utilisateur) un ensemble de mot ( c'est l'ensemble de mots existant das les labels des noeuds d'activité; attention un label peut comporter plus d'un mot concatenés par un séparateur. on extrait la liste de mots completes. puis nous la soumettons a JaW api de wordnet il nous renvois la liste de tous les synonymes pour chaque mot. la nous avons une liste sémantiquement enrichie. on fait une recherche à partir de cette derniere liste, si un workflow contient mot de cette liste il est retenu.
  7. The concept is simple; Firstly the user enter its workflow (sub-workflow) in an XML format, we transform it into graph format then we extract the list of unique words mentionned in all the labels of the workflow. We estbalish a list of the kaywords and their synonymsthanks to wordnet (Java API for WordNet Searching (JAWS) to retrieve the synsets of a given label from WordNet ). After what we select from the repository only the BP/Workflows that matches at least one from the last list.
  8. The challenges to be addressed are the following : – Which mining algorithm to employ for finding frequent patterns in the repository? – Which graph representation is best suited for formatting workflows for mining frequent fragments? –how to deal with the heterogeneity of the labels used by different users to model the activities of their workflows within the repository?
  9. We conducted two experiments. The first aims to validate our proposed representation model D/D1 and to show the drawbacks of the other models. The second experiment aims to validate the filter. We compare the efficiency and effectiveness of the models.On the effectiveness plan, We focus on proving the drawback of the representation model C when it comes to extract recurrent fragments that contain the XOR link .SO, we manually created a synthetic dataset which ensures that the following sub-structure is the most recurrent. As the size of the synthetic dataset is limited ( 9 BP) we extend it to three dataset by adding some workflows from the Taverna 1 repository, while preserving the goal that the most recurrent sub-workflow is the one already presented . we compared the efficiency and effectiveness of the representation models. The second experiment assesses the impact of the semantic filter.
  10. A is the most expensive in term of space disk required to encode the base in graph format. Concerning the C model as expected: it required more than twice (the number of edges and nodes) the bits required by the model that we propose, namely D and D1, however this ratio decreases to rich between a quarter to the tenth with larger bases. This decrease is due to the content of these bases, with a low percentage of BP with XOR nodes. In third position comes the Model B, it requires less than between 25% up to 40% more than the model D and D1 in terms of number of nodes, edges and bits used. Models D and D1 require the same number of edges and nodes to encode the input data, however the labeling of edges consumes more bits to be encoded.
  11. .We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results. Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) . Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation. This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures. Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A. Comparing model B to model D; In the other side, models B and D led to very similar accuracy performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%). labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall). Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two models do not use control nodes and thereby avoid a negative inference on the results. On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison). The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception. TP+TN/TP+TN+FP+FN
  12. .We don't care about correctly classifying negative instances, you just don't want too many of them polluting our results. Model C: concerning these experimentation, as expected the Model C led to the worst qualitative performances. C performs a recall rate that varies between 0% and 61.54% and an average recall around 35%; The model C can, at best, discover only one alternative at time(in our case there is 2 alternatives attached to the XOR node) . Model A:The top extracted substructures are more significant than that of model C, and less significant than other models. However when it comes to larger sized databases, results show a dramatic decline in the quality of its sub-structures reaching 0% in terms of precision and recall; which means there is no extracted substructure related to the user expectation. This limitation, can be explained by the excessive use of control nodes. On large input data, their percentage becomes quite significant leading Subdue algorithm to consider them as important sub-structures. Model B :The model B performs much better than the previous two models, A and C. In fact, The model B retrieved successfully almost 67% of the BP elements of the target sub-structure. more than two time than model C and between 13 to 66% more than model A. Comparing model B to model D; In the other side, models B and D led to very similar accuracy performances. Although, the Model B was able to discover more relevant BP elements than model D (about 10% more), it returned more useless or irrelevant BP elements(around 7%). labeling the edges lead to specializations of the same abstract workflow template and consequently affects the quality of results returned (decrease recall). Model D: We can notice a common performance between models D and D1,which distinguish them from other models. Both of them led to a good precision rate. This performance is due to the fact that these two models do not use control nodes and thereby avoid a negative inference on the results. On large input data their percentage becomes quite significant leading Subdue algorithm to consider as significant typical sub-structures of the coding scheme of the model rules (decrease Precison). The results of the first experiment show clearly that the model D1 records the best performances on all the levels without exception. TP+TN/TP+TN+FP+FN
  13. The model A is the most expensive in terms of execution time, around 55 up to 25 more time than model D and D1. Let compare the other models. Although on the qualitative level, model B performs better than model C model C seems to be far less expensive. As expected the model D and D1 led to very performances, whereas model D1 performs slightly better.
  14. Given an initial workflow, the user issues a query to characterize the desired missing workflow fragment. The query is composed of two elements: (i) a setof keywordsfkw1; : : : ; kwngcharacterizing the functionality that the fragmentshould implement. The user selects the terms of his/her own choice. In otherwords, we do not impose any vocabulary for keyword specification. The secondelement in the user query specifies the activities in the initial workflow that are to be connected to the fragment in question,Acommon=fa1; : : : ; amg. Gener-ally speaking, the user will specify one or two activities in the initial workflow.We call such activities using the terms common activities or joint activities,interchangeably.
  15. The first step in processing the user query consists in identifying the fragments in the repository that are relevant given the specified keywords. In doing so, we adopt the widely used technique of TF/IDF (term frequency / inverse document frequency) measure. It is worth mentioning here that the workflow fragments in the repository are indexed by the labels of the activity obtained after the homogenization of the workflow (see Section \ref{sec:homogonization}). We refer to such an index by $IDX$. Applying directly TF/IDF based on the set of keywords provided by the user is likely to miss false negative fragment. This is because the designer provides keywords of his/her own choosing; a given keyword may not be present in the repository, while one of its synonyms could be present.
  16. To address this issue, we adopt the following method which augments traditional TF/IDF with a semantic enrichment phase. Specifically, given a set of keywords provided by the user $\{kw_1,\dots,kw_n\}$, for each keyword $kw_i$ we retrieve its synonyms, which we denote by $syn(kw_i)$ from an existing thesaurus, e.g., Wordnet \cite{18} or a specialized domain specific thesaurus if we are dealing with workflows from a specific domain. The index $IDX$ is trawled to identify if there is a term in $syn(kw_i) \cup \{kw_i\}$ that appears. If this is the case, then the term in the index is used to represent $kw_i$ in the vector that will be used to compare the query with the vectors representing the workflow fragments in the repository. The set of fragments retrieved in the previous step, which we call candidate fragments, are then examined. Specifically, their constituent activities are compared and matched against the activities in $A_{common}$, that are specified by the user. Given a candidate fragment $wf_{frg}$, each activity $a_j$ in $A_{common}$ is compared (matched) against the activities that constitute the fragment $wf_{frg}$. The objective of this step is to identify the activities that will be in common between the fragment and the initial workflow. Note that for a given activity $a_j$ in $A_{common}$ there may be more than one matching activities in the fragment $wf_{frg}$. In this case, we associate $a_j$ with the activity in $wf_{frg}$ with the highest matching score. Reciprocally, if two activities $a_i$, $a_j$ in $A_{common}$ are associated with the same activity in $wf_{frg}$, the one with the highest score is kept and other matcher is searched for the second one (such that the sum of the similarities of the global mapping is maximised). Note also that it is possible that $a_j$ may not have any matching activities among those of the fragment $wf_{frg}$. The matching is performed using existing techniques for matching activity labels
  17. The last step in the query processing consists in ranking the candidate fragments. The ranking takes into consideration the following factors. \begin{enumerate} \item The relevance score of the candidate fragment calculated based on TF/IDF given a user query $uq$. We use $Relvance(wf_{frg},uq)$ to denote the relevance score associated with a candidate fragment $wf_{frg}$ given the user query. \item The frequency of use of the fragment in the repository. The fragment that are used across multiple workflows are likely to implement best practices that are useful for the workflow designer compared with workflow fragment that are used in, say, only 2 workflows. We use $Frequency(wf_{frg})$ to denote the frequency, i.e., the number of times a candidate fragment $wf_{frg}$ appears in the mined workflow repository. \item The compatibility of the candidate fragement with the inital workflow. To estimate the compatibility, we consider the number of activities in $A_{common}$ that have a matching activity in the workflow fragment and their associated matching score. Specifically, we define the compatibility of a workflow fragment given the activities $uq.A_{common}$ specified in the user query $uq$ as follows: $$ Compatibility(wf_{frg},uq.A_{common}) = \frac{\sum_{a_j \in A_{common}} matchingScore(a_j,wf_{frg})}{|A_{common}|} $$ \noindent where $matchingScore(a_j,wf_{frg})$ is the matching score between $a_j$ and the best matching activity in $wf_{frg}$, and $|A_{common}|$ the number of activities in $A_{common}$. $Compatibility(wf_{frg},A_{common})$ takes a value between $0$ and $1$. The larger is the number of activities $A_{common}$ that have a matching activity in $wf_{frg}$ and the higher are the matching scores, the higher is the compatibility between the candidate workflow fragment and the initial workflow. \end{enumerate} Based on the above factors, we define the score used for ranking candidate fragments given a user query $uq$ as: \noindent \begin{small} $Score(wf_{frg},uq)$ \; = \\ $w_r . Relvance(wf_{frg},uq) \; + \; w_f . \frac{Frequency(wf_{frg})}{MaxFrequency} \; + \; w_c . Compatibility(wf_{frg},uq.A_{common})$ \end{small} \noindent where $w_r$, $w_f$ and $w_c$ are positive real numbers representing weights such that $w_r + w_f + w_c =1$. $MaxFrequency$ is positive number denoting the frequency of the fragment that appears the maximum of times in the workflow repository harvested. Notice that the score takes a value between $0$ and $1$. Once the candidate fragments are scored, they are ranked in the descendant order of their associated scores. The top-k fragments, e.g., 5, are then presented to the designer who selects to the one to be composed with the initial workflow.
  18. In this section, we illustrate the use of the method we have proposed in this paper to assist a designer of a workflow from the eScience field. Specifically, we show how the method described can help the designer specify a workflow that is used for analyzing Huntington's disease (HD) data. Huntington's disease is the most common inherited neurodegenerative disorder in Europe, affecting 1 out of 10000 people. Although the genetic mutation that causes HD was identified 20 years ago \cite{_novel_1993}, the downstream molecular mechanisms leading to the HD phenotype are still poorly understood. Relating alterations in gene expression to epigenetic information might shed light on the disease aetiology.
  19. In this section, we illustrate the use of the method we have proposed in this paper to assist a designer of a workflow from the eScience field. Specifically, we show how the method described can help the designer specify a workflow that is used for analyzing Huntington's disease (HD) data. Huntington's disease is the most common inherited neurodegenerative disorder in Europe, affecting 1 out of 10000 people. Although the genetic mutation that causes HD was identified 20 years ago \cite{_novel_1993}, the downstream molecular mechanisms leading to the HD phenotype are still poorly understood. Relating alterations in gene expression to epigenetic information might shed light on the disease aetiology.