SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Università Politecnica delle Marche
                Department of Computer Science, Management and Automation
                Ancona, Italy




         A Semantic-Aided Designer
          for Knowledge Discovery

        Claudia Diamantini, Domenico Potena, Emanuele Storti

                                e.storti@univpm.it



CTS2011, Philadelphia, May 23-27
Introduction

  Data explosion & KDD
  Organizations need methods and technologies to analyze
  huge amounts of data, to support decisional processes

                        Knowledge Discovery
                        in Databases (KDD) is the
                        process of identifying valid, novel,
                        potentially useful patterns in data


                         many steps, iterations
                         interaction
                         user knowledge

CTS2011, May 23-27                A Semantic-Aided Designer for KD
Introduction
  1st generation of IDAs (Intelligent Data Analysis systems):
  ● local frameworks

  ● single-user

  ● predefined set of tools (little extensibility)



  2nd generation: distribution of tools & computational aspects

  Evolution of organizations: distribution of user, collaboration


               domain
                expert           DB / DWH
                                administrator
                                                      How to support the design
                                                      of a KDD project in an
                                                      open, distributed and
                                                      collaborative scenario?
                                    DM
          KDD                    specialists
        specialists

CTS2011, May 23-27                              A Semantic-Aided Designer for KD
Issues

  Heterogeneity & tool distribution
  Many KDD and Data Mining tools available for any domain/task,
  many possible combinations

   Heterogeneous interfaces
    programming languages, OSs,
    transfer protocols,..

   Complex to use
    process design, data preparation,
    precondition satisfaction, I/O interpretation
   tools should be easily and dinamically added in the platform
   they should be accessible, searchable, executable via standard API

   suggestions about the best tool sequences

   support for tool setup and process execution



CTS2011, May 23-27                         A Semantic-Aided Designer for KD
Issues

  User distribution

  Distributed organizations:
  ● multiple branch enterprises
  ● E-Science project




  Collaboration:
  ● source of complexity
  ● distributed computation: several users can

  succeed where a single user is likely to fail

    collaborative design of KDD processes
    tool/process sharing and annotation

    easy join of new partners in Virtual Teams




CTS2011, May 23-27                           A Semantic-Aided Designer for KD
Methodology

  Service Oriented Architecture

  Basic Services                    Support Services

  Services for any KDD task:        Back-end services:
                                    ● access control
  every KDD tool is wrapped
                                    ● data transfer
  as a Web Service, deployed
                                    ● service publishing
  on the publisher's server,
                                    ● UDDI registry
  and published in a common
  repository
                                    High-level functionalities:
                                    ● service discovery

                                    ● interface matchmaking

                                    ● process composition

  C4.5 tool          C4.5 service


CTS2011, May 23-27                     A Semantic-Aided Designer for KD
Methodology

  Semantic descriptors for Basic Services
                     Separation of information           in    3
                     abstraction layers

                     Tools/services are annotated through
                     XML     descriptors:  details  about
                     interfaces and QoS

                     Algorithms are formally described in a
                     KDD ontology, which contains an
                     algorithm taxonomy and high level
                     information   about     their   tasks,
                     methods and functionalities

CTS2011, May 23-27                 A Semantic-Aided Designer for KD
Methodology
         KDD algorithms            ID3



         KDD tools




     KDD services



  Benefits: loose-coupling, reusability
  Support services rely on such layers:
        service discovery
        interface matchmaking
        process composition
CTS2011, May 23-27                       A Semantic-Aided Designer for KD
Methodology

  KDD ontology       Remove missing values                         C4.5
                          algorithm
                                             Labeled Dataset
                                                                 algorithm




     KDD services
                                                          abc   C4.5_v.2.0




  Benefits: loose-coupling, reusability
  Support services rely on such layers:
        service discovery
        interface matchmaking
        process composition
CTS2011, May 23-27                                 A Semantic-Aided Designer for KD
KDDesigner
  A web-based tool aimed at supporting users in collaborative
  KDD process design




CTS2011, May 23-27                   A Semantic-Aided Designer for KD
Service discovery
  Retrieval of KDD services satisfying user requirements




CTS2011, May 23-27                   A Semantic-Aided Designer for KD
Service discovery
  Retrieval of KDD services satisfying user requirements


                                            4




                       1



                                        2

                                        3       KDDONTO

CTS2011, May 23-27                   A Semantic-Aided Designer for KD
Service discovery
  Retrieval of KDD services satisfying user requirements


                                      1



                                              4




                                          2

                                          3
CTS2011, May 23-27                   A Semantic-Aided Designer for KD
Process design




CTS2011, May 23-27   A Semantic-Aided Designer for KD
Interface matchmaking
  Verification of data compatibility in an I/O connection




CTS2011, May 23-27                      A Semantic-Aided Designer for KD
Interface matchmaking
 Matchmaker service checks the validity of the match
 ●
     syntactic compatibility
     comparison between service descriptors
     (I/O primitive datatype and syntax)




                                              same format?
     KDD services                             same primitive   abc
                                               datatype?


 ●
     Output: cost of match
CTS2011, May 23-27                                A Semantic-Aided Designer for KD
Interface matchmaking
 Matchmaker service checks the validity of the match
 ●
     syntactic compatibility
     comparison between service descriptors
     (I/O primitive datatype and syntax)
 ●
     semantic compatibility
     comparison between ontological annotations of the services
     (kind of match between I/O, preconditions/postconditions... and many more)

                                               same concept?
     KDD ontology                     x        subconcept?         y
                                               part-of concept?




     KDD services                                                 abc



 ●
     Output: cost of match
CTS2011, May 23-27                                   A Semantic-Aided Designer for KD
Semi-automatic composition
 KDDComposer: advanced service for composition

 Input
 ● user dataset
 ● a set of requirements

 (max num algorithms,
 computational complexity,
 max cost of match)
 ● user goal (classification,

 regression, ...)


     Output
      A ranked list of abstract processes
     (suggestions about processes useful to solve the user problem)

CTS2011, May 23-27                         A Semantic-Aided Designer for KD
Collaboration
  ● collaborative process edit/annotation (wiki-style)
  ● versioning system

  ● team management and add of new users

  ● manual parameter setting




CTS2011, May 23-27                      A Semantic-Aided Designer for KD
Conclusion

 SOA for KDD
      ●   Basic Services and Support Services
      ●   KDD Designer: a semantic-aided designer for KDD

 Open environment and heterogeneous tools
      ●   different interfaces: need of a common representation
          (service)
      ●   abstraction for an high-level description of tools (algorithm)
      ●   semantics for interoperability and high-level functionalities

 Future work
      ●   extension with new support services
      ●   process export in more workflow languages
      ●   more collaborative features (real-time editor)

CTS2011, May 23-27                           A Semantic-Aided Designer for KD

Weitere ähnliche Inhalte

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A Semantic-Aided Designer for Knowledge Discovery

  • 1. Università Politecnica delle Marche Department of Computer Science, Management and Automation Ancona, Italy A Semantic-Aided Designer for Knowledge Discovery Claudia Diamantini, Domenico Potena, Emanuele Storti e.storti@univpm.it CTS2011, Philadelphia, May 23-27
  • 2. Introduction Data explosion & KDD Organizations need methods and technologies to analyze huge amounts of data, to support decisional processes Knowledge Discovery in Databases (KDD) is the process of identifying valid, novel, potentially useful patterns in data  many steps, iterations  interaction  user knowledge CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 3. Introduction 1st generation of IDAs (Intelligent Data Analysis systems): ● local frameworks ● single-user ● predefined set of tools (little extensibility) 2nd generation: distribution of tools & computational aspects Evolution of organizations: distribution of user, collaboration domain expert DB / DWH administrator How to support the design of a KDD project in an open, distributed and collaborative scenario? DM KDD specialists specialists CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 4. Issues Heterogeneity & tool distribution Many KDD and Data Mining tools available for any domain/task, many possible combinations  Heterogeneous interfaces programming languages, OSs, transfer protocols,..  Complex to use process design, data preparation, precondition satisfaction, I/O interpretation  tools should be easily and dinamically added in the platform  they should be accessible, searchable, executable via standard API  suggestions about the best tool sequences  support for tool setup and process execution CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 5. Issues User distribution Distributed organizations: ● multiple branch enterprises ● E-Science project Collaboration: ● source of complexity ● distributed computation: several users can succeed where a single user is likely to fail  collaborative design of KDD processes  tool/process sharing and annotation  easy join of new partners in Virtual Teams CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 6. Methodology Service Oriented Architecture Basic Services Support Services Services for any KDD task: Back-end services: ● access control every KDD tool is wrapped ● data transfer as a Web Service, deployed ● service publishing on the publisher's server, ● UDDI registry and published in a common repository High-level functionalities: ● service discovery ● interface matchmaking ● process composition C4.5 tool C4.5 service CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 7. Methodology Semantic descriptors for Basic Services Separation of information in 3 abstraction layers Tools/services are annotated through XML descriptors: details about interfaces and QoS Algorithms are formally described in a KDD ontology, which contains an algorithm taxonomy and high level information about their tasks, methods and functionalities CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 8. Methodology KDD algorithms ID3 KDD tools KDD services  Benefits: loose-coupling, reusability  Support services rely on such layers:  service discovery  interface matchmaking  process composition CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 9. Methodology KDD ontology Remove missing values C4.5 algorithm Labeled Dataset algorithm KDD services abc C4.5_v.2.0  Benefits: loose-coupling, reusability  Support services rely on such layers:  service discovery  interface matchmaking  process composition CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 10. KDDesigner A web-based tool aimed at supporting users in collaborative KDD process design CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 11. Service discovery Retrieval of KDD services satisfying user requirements CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 12. Service discovery Retrieval of KDD services satisfying user requirements 4 1 2 3 KDDONTO CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 13. Service discovery Retrieval of KDD services satisfying user requirements 1 4 2 3 CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 14. Process design CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 15. Interface matchmaking Verification of data compatibility in an I/O connection CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 16. Interface matchmaking Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax) same format? KDD services same primitive abc datatype? ● Output: cost of match CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 17. Interface matchmaking Matchmaker service checks the validity of the match ● syntactic compatibility comparison between service descriptors (I/O primitive datatype and syntax) ● semantic compatibility comparison between ontological annotations of the services (kind of match between I/O, preconditions/postconditions... and many more) same concept? KDD ontology x subconcept? y part-of concept? KDD services abc ● Output: cost of match CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 18. Semi-automatic composition KDDComposer: advanced service for composition Input ● user dataset ● a set of requirements (max num algorithms, computational complexity, max cost of match) ● user goal (classification, regression, ...) Output A ranked list of abstract processes (suggestions about processes useful to solve the user problem) CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 19. Collaboration ● collaborative process edit/annotation (wiki-style) ● versioning system ● team management and add of new users ● manual parameter setting CTS2011, May 23-27 A Semantic-Aided Designer for KD
  • 20. Conclusion SOA for KDD ● Basic Services and Support Services ● KDD Designer: a semantic-aided designer for KDD Open environment and heterogeneous tools ● different interfaces: need of a common representation (service) ● abstraction for an high-level description of tools (algorithm) ● semantics for interoperability and high-level functionalities Future work ● extension with new support services ● process export in more workflow languages ● more collaborative features (real-time editor) CTS2011, May 23-27 A Semantic-Aided Designer for KD