Full paper: http://boole.diiga.univpm.it/paper/cts11.pdf
Knowledge Discovery in Databases (KDD), as any scientific experimentation in e-Science, is a complex and
computationally intensive process aimed at gaining knowledge from a huge set of data. Often performed in
distributed settings, a KDD project usually involves a deep interaction between tools and several users with specific expertise, which can be either a co-located group or a geographically distributed virtual team of experts. Given the complexity of the process, such users need some support to achieve their goal of knowledge extraction. This paper introduces KDDesigner, a webbased semantic-driven tool aimed at supporting users in the collaborative design of a KDD process. In this paper we address semantic issues related to the collaborative project managment: tool localization, tool integration, interfaces matchmaking, process execution, team building and process versioning.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
A Semantic-Aided Designer for Knowledge Discovery
1. Università Politecnica delle Marche
Department of Computer Science, Management and Automation
Ancona, Italy
A Semantic-Aided Designer
for Knowledge Discovery
Claudia Diamantini, Domenico Potena, Emanuele Storti
e.storti@univpm.it
CTS2011, Philadelphia, May 23-27
2. Introduction
Data explosion & KDD
Organizations need methods and technologies to analyze
huge amounts of data, to support decisional processes
Knowledge Discovery
in Databases (KDD) is the
process of identifying valid, novel,
potentially useful patterns in data
many steps, iterations
interaction
user knowledge
CTS2011, May 23-27 A Semantic-Aided Designer for KD
3. Introduction
1st generation of IDAs (Intelligent Data Analysis systems):
● local frameworks
● single-user
● predefined set of tools (little extensibility)
2nd generation: distribution of tools & computational aspects
Evolution of organizations: distribution of user, collaboration
domain
expert DB / DWH
administrator
How to support the design
of a KDD project in an
open, distributed and
collaborative scenario?
DM
KDD specialists
specialists
CTS2011, May 23-27 A Semantic-Aided Designer for KD
4. Issues
Heterogeneity & tool distribution
Many KDD and Data Mining tools available for any domain/task,
many possible combinations
Heterogeneous interfaces
programming languages, OSs,
transfer protocols,..
Complex to use
process design, data preparation,
precondition satisfaction, I/O interpretation
tools should be easily and dinamically added in the platform
they should be accessible, searchable, executable via standard API
suggestions about the best tool sequences
support for tool setup and process execution
CTS2011, May 23-27 A Semantic-Aided Designer for KD
5. Issues
User distribution
Distributed organizations:
● multiple branch enterprises
● E-Science project
Collaboration:
● source of complexity
● distributed computation: several users can
succeed where a single user is likely to fail
collaborative design of KDD processes
tool/process sharing and annotation
easy join of new partners in Virtual Teams
CTS2011, May 23-27 A Semantic-Aided Designer for KD
6. Methodology
Service Oriented Architecture
Basic Services Support Services
Services for any KDD task: Back-end services:
● access control
every KDD tool is wrapped
● data transfer
as a Web Service, deployed
● service publishing
on the publisher's server,
● UDDI registry
and published in a common
repository
High-level functionalities:
● service discovery
● interface matchmaking
● process composition
C4.5 tool C4.5 service
CTS2011, May 23-27 A Semantic-Aided Designer for KD
7. Methodology
Semantic descriptors for Basic Services
Separation of information in 3
abstraction layers
Tools/services are annotated through
XML descriptors: details about
interfaces and QoS
Algorithms are formally described in a
KDD ontology, which contains an
algorithm taxonomy and high level
information about their tasks,
methods and functionalities
CTS2011, May 23-27 A Semantic-Aided Designer for KD
8. Methodology
KDD algorithms ID3
KDD tools
KDD services
Benefits: loose-coupling, reusability
Support services rely on such layers:
service discovery
interface matchmaking
process composition
CTS2011, May 23-27 A Semantic-Aided Designer for KD
9. Methodology
KDD ontology Remove missing values C4.5
algorithm
Labeled Dataset
algorithm
KDD services
abc C4.5_v.2.0
Benefits: loose-coupling, reusability
Support services rely on such layers:
service discovery
interface matchmaking
process composition
CTS2011, May 23-27 A Semantic-Aided Designer for KD
10. KDDesigner
A web-based tool aimed at supporting users in collaborative
KDD process design
CTS2011, May 23-27 A Semantic-Aided Designer for KD
11. Service discovery
Retrieval of KDD services satisfying user requirements
CTS2011, May 23-27 A Semantic-Aided Designer for KD
12. Service discovery
Retrieval of KDD services satisfying user requirements
4
1
2
3 KDDONTO
CTS2011, May 23-27 A Semantic-Aided Designer for KD
13. Service discovery
Retrieval of KDD services satisfying user requirements
1
4
2
3
CTS2011, May 23-27 A Semantic-Aided Designer for KD
15. Interface matchmaking
Verification of data compatibility in an I/O connection
CTS2011, May 23-27 A Semantic-Aided Designer for KD
16. Interface matchmaking
Matchmaker service checks the validity of the match
●
syntactic compatibility
comparison between service descriptors
(I/O primitive datatype and syntax)
same format?
KDD services same primitive abc
datatype?
●
Output: cost of match
CTS2011, May 23-27 A Semantic-Aided Designer for KD
17. Interface matchmaking
Matchmaker service checks the validity of the match
●
syntactic compatibility
comparison between service descriptors
(I/O primitive datatype and syntax)
●
semantic compatibility
comparison between ontological annotations of the services
(kind of match between I/O, preconditions/postconditions... and many more)
same concept?
KDD ontology x subconcept? y
part-of concept?
KDD services abc
●
Output: cost of match
CTS2011, May 23-27 A Semantic-Aided Designer for KD
18. Semi-automatic composition
KDDComposer: advanced service for composition
Input
● user dataset
● a set of requirements
(max num algorithms,
computational complexity,
max cost of match)
● user goal (classification,
regression, ...)
Output
A ranked list of abstract processes
(suggestions about processes useful to solve the user problem)
CTS2011, May 23-27 A Semantic-Aided Designer for KD
19. Collaboration
● collaborative process edit/annotation (wiki-style)
● versioning system
● team management and add of new users
● manual parameter setting
CTS2011, May 23-27 A Semantic-Aided Designer for KD
20. Conclusion
SOA for KDD
● Basic Services and Support Services
● KDD Designer: a semantic-aided designer for KDD
Open environment and heterogeneous tools
● different interfaces: need of a common representation
(service)
● abstraction for an high-level description of tools (algorithm)
● semantics for interoperability and high-level functionalities
Future work
● extension with new support services
● process export in more workflow languages
● more collaborative features (real-time editor)
CTS2011, May 23-27 A Semantic-Aided Designer for KD