SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Using Semantic Web
Resources for Data Quality
      Management
       Christian Fürber and Martin Hepp
      christian@fuerber.com, mhepp@computer.org

  Presentation at the 17th International Conference on
 Knowledge Engineering and Knowledge Management,
        October 10-15, 2010, Lisbon, Portugal
Purpose of Data
  Measurement                                      Information &
                                                   Knowledge

                                      101010101
                                      010101010
                                     DATA
                                      101010101
                                      001010101
    Automation                        001010101     Decisions




C. Fürber, M. Hepp:                                          2
Using SemWeb Resources for DQM
Data Quality in Practice




       Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html


C. Fürber, M. Hepp:                                                                                                 3
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                         Which one is
                                                                          the correct
                                                                         population?




C. Fürber, M. Hepp:                                                                     4
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                            Places with
                                                                             negative
                                                                           population?!?




C. Fürber, M. Hepp:                                                                        5
Using SemWeb Resources for DQM
Risk of Failure
  Measurement                                      Information &
                                                   Knowledge

                                     101010101
                                     010101010
                                    DATA
                                     101010101
                                     001010101
    Automation                       001010101      Decisions




C. Fürber, M. Hepp:                                          6
Using SemWeb Resources for DQM
Data Quality Problem Types
                                                      Inconsistent duplicates
                     Invalid characters                              Missing classification




                                                                                                                       Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  Incorrect reference                                                                  Approximate duplicates




                                                                                                                           Reference: Linking Open Data cloud diagram, by
                                                      Character alignment violation

                   Word transpositions
                                 Invalid substrings
                                                           Mistyping / Misspelling errors
  Cardinality violation
                                                 Missing values                  Referential integrity violation
                 Misfielded values
        Unique value violation        False values              Functional Dependency
                          Out of range values
                                                                Violation                Imprecise values
   Existence of Homonyms             Meaningless values
                                                                        Incorrect classification
        Existence of Synonyms                                Contradictory relationships
                          Outdated conceptual elements          Untyped literals        Outdated values


C. Fürber, M. Hepp:                                                                                                7
Using SemWeb Resources for DQM
Goals

• Use Semantic Web data to identify data
  quality problems on instance level

• Support Data Quality Management (DQM)
  process


C. Fürber, M. Hepp:                        8
Using SemWeb Resources for DQM
Total Data Quality Management
  for and based on the Semantic Web
                                                               Develop and
     Define what‘s
                                                              apply SPARQL
     good and / or
                                                              queries based
      what‘s poor                Define    Measure
                                                                 on DQ-
      data quality
                                                                Definition

                                          DQ
                                 Improve   Analyze

                                                     Reference: Richard Wang (1998)




C. Fürber, M. Hepp:                                                                   9
Using SemWeb Resources for DQM
How can the Semantic Web support
    Data Quality Management?

   Availability of FREE Data Quality Knowledge,
   e.g. for the identification of…

                • Legal value violations
                • Functional dependency violations


C. Fürber, M. Hepp:                                  10
Using SemWeb Resources for DQM
Using Trusted References
  Las Vegas                      France       DQ-Constraints



                             local:Location                    tref:Location


 Las Vegas

                                                                               Las Vegas
                France
                                                                       USA


    Tested Knowledgebase                                       Trusted Reference

C. Fürber, M. Hepp:                                                                  11
Using SemWeb Resources for DQM
Basic Architecture




C. Fürber, M. Hepp:                               12
Using SemWeb Resources for DQM
Basic Characteristics of SPIN
                                 • Allows definition of generalized
                                   SPARQL query templates
 http://spinrdf.org/
                                 • Constraint checking based on
                                   SPARQL
                                 • Definition of inferencing rules via
                                   SPARQL



C. Fürber, M. Hepp:                                                  13
Using SemWeb Resources for DQM
Generic Data Quality Constraints
       Library for Easy DQ-Defintion
                                                • Mandatory properties &
                                                  literals
                                                • Legal values*
                                                • Legal value ranges
                                                • Functional dependencies*
                                                • Legal syntaxes
                                                • Uniqueness

                                                * Designed to use trusted references

          available @ http://semwebquality.org/ontologies/dq-constraints#
C. Fürber, M. Hepp:                                                          14
Using SemWeb Resources for DQM
Definition of Data Quality
                Constraints based on SPIN




C. Fürber, M. Hepp:                           15
Using SemWeb Resources for DQM
Constraint checking in Practice




C. Fürber, M. Hepp:                       16
Using SemWeb Resources for DQM
Legal Value Constraints
   Return all instances of class vcard:Address that do not have a
   matching value for property vcard:country-name in property
   tref:country
                      SELECT ?s
                      WHERE {
                          ?s a vcard:Address .
                          ?s vcard:country-name ?value .
                      OPTIONAL {
                          ?s2 a tref:Location .
                          ?s2 tref:country ?value1 .
                          FILTER(str(?value1)= str(?value))
                          } .
                          FILTER(!bound(?value1))
                      }
C. Fürber, M. Hepp:                                                 17
Using SemWeb Resources for DQM
Functional Dependency Constraints
   Return all instances of vcard:ADR with city-country-combinations
   that do not have a matching pair in instances of gn:Location.

                     SELECT ?s
                     WHERE {
                     ?s a gr:LocationOfSalesOrServiceProvisioning .
                     ?s vcard:ADR ?node
                     ?node vcard:city ?value1 .
                     ?node vcard:country ?value2 .
                     NOT EXISTS {
                     ?s2 a gn:Location .
                     ?s2 gn:asciiname ?value1 .
                     ?s2 gn:country ?value2 .
                     }}



C. Fürber, M. Hepp:                                                   18
Using SemWeb Resources for DQM
Acquisition of Semantic Web
                 Sources for DQM
        (1)          Replication of relevant knowledge-bases
        (2)          On the fly via federated SPARQL queries:
                            PREFIX dbo:<http://dbpedia.org/ontology/>
                            SELECT *
                            WHERE {
                            ?s1 :location_CITY ?city .
                            OPTIONAL{
                            SERVICE <http://dbpedia.org/sparql>{
                            ?s2 a dbo:City .
                            ?s2 rdfs:label ?city .
                            FILTER (lang(?city) = "en") .
                            }
                            }
                            FILTER(!bound(?s2))
                            }

C. Fürber, M. Hepp:                                                     19
Using SemWeb Resources for DQM
Limitations
• High degree of uncertainty about quality of Semantic
  Web resources
• Risk for data quality problem proliferation
• Lack of Semantic Web resources for certain domains
• Flexible design of RDF and structural heterogeneity
  complicate definition of generic DQ constraints
• Scalability on large data sets
• DQ constraints close the world



C. Fürber, M. Hepp:                                      20
Using SemWeb Resources for DQM
Contributions
• Data quality control for Semantic Web data
• Identification of potential inconsistencies
  between Semantic Web Resources
• Reduction of effort for the definition of functional
  dependency rules and legal value rules
• Reuse of shared data quality rules on a Web
  scale


C. Fürber, M. Hepp:                                  21
Using SemWeb Resources for DQM
Future Work
• Semantic Web information quality assessment
  framework (SWIQA) with computation of KPI‘s
• Analysis and identification of useful „trusted
  references“ based on SWIQA
• Application on multi-source master data of
  information systems
• Evaluation on large data sets


C. Fürber, M. Hepp:                                22
Using SemWeb Resources for DQM
Data Quality Constraints Library for SPIN @
http://semwebquality.org/ontologies/dq-constraints#

          Christian Fürber
          Researcher
          E-Business & Web Science Research Group

                        Werner-Heisenberg-Weg 39
                        85577 Neubiberg
                        Germany

                        skype            c.fuerber
                        email            christian@fuerber.com
                        web              http://www.unibw.de/ebusiness
                        homepage         http://www.fuerber.com
                        twitter          http://www.twitter.com/cfuerber




     Paper available at http://bit.ly/c5v6TM
                                                                           23

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (8)

Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Lesson Plan
Lesson PlanLesson Plan
Lesson Plan
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Ähnlich wie Using Semantic Web Resources for Data Quality Management

From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsAndre Freitas
 
Story cmpe255
Story cmpe255Story cmpe255
Story cmpe255WeifengMa
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Andre Freitas
 
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFACE
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Keyguest3d0531
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrievalGeorge Ang
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
Speculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeSpeculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeJenn Riley
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classificationbohanairl
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerbohanairl
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsRuben Verborgh
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammedAmeeruddin MD
 

Ähnlich wie Using Semantic Web Resources for Data Quality Management (19)

From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
Story cmpe255
Story cmpe255Story cmpe255
Story cmpe255
 
Data aware apps
Data aware appsData aware apps
Data aware apps
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management
 
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHP
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Speculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeSpeculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards Landscape
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammed
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Using Semantic Web Resources for Data Quality Management

  • 1. Using Semantic Web Resources for Data Quality Management Christian Fürber and Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation at the 17th International Conference on Knowledge Engineering and Knowledge Management, October 10-15, 2010, Lisbon, Portugal
  • 2. Purpose of Data Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 2 Using SemWeb Resources for DQM
  • 3. Data Quality in Practice Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html C. Fürber, M. Hepp: 3 Using SemWeb Resources for DQM
  • 4. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Which one is the correct population? C. Fürber, M. Hepp: 4 Using SemWeb Resources for DQM
  • 5. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Places with negative population?!? C. Fürber, M. Hepp: 5 Using SemWeb Resources for DQM
  • 6. Risk of Failure Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 6 Using SemWeb Resources for DQM
  • 7. Data Quality Problem Types Inconsistent duplicates Invalid characters Missing classification Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Incorrect reference Approximate duplicates Reference: Linking Open Data cloud diagram, by Character alignment violation Word transpositions Invalid substrings Mistyping / Misspelling errors Cardinality violation Missing values Referential integrity violation Misfielded values Unique value violation False values Functional Dependency Out of range values Violation Imprecise values Existence of Homonyms Meaningless values Incorrect classification Existence of Synonyms Contradictory relationships Outdated conceptual elements Untyped literals Outdated values C. Fürber, M. Hepp: 7 Using SemWeb Resources for DQM
  • 8. Goals • Use Semantic Web data to identify data quality problems on instance level • Support Data Quality Management (DQM) process C. Fürber, M. Hepp: 8 Using SemWeb Resources for DQM
  • 9. Total Data Quality Management for and based on the Semantic Web Develop and Define what‘s apply SPARQL good and / or queries based what‘s poor Define Measure on DQ- data quality Definition DQ Improve Analyze Reference: Richard Wang (1998) C. Fürber, M. Hepp: 9 Using SemWeb Resources for DQM
  • 10. How can the Semantic Web support Data Quality Management? Availability of FREE Data Quality Knowledge, e.g. for the identification of… • Legal value violations • Functional dependency violations C. Fürber, M. Hepp: 10 Using SemWeb Resources for DQM
  • 11. Using Trusted References Las Vegas France DQ-Constraints local:Location tref:Location Las Vegas Las Vegas France USA Tested Knowledgebase Trusted Reference C. Fürber, M. Hepp: 11 Using SemWeb Resources for DQM
  • 12. Basic Architecture C. Fürber, M. Hepp: 12 Using SemWeb Resources for DQM
  • 13. Basic Characteristics of SPIN • Allows definition of generalized SPARQL query templates http://spinrdf.org/ • Constraint checking based on SPARQL • Definition of inferencing rules via SPARQL C. Fürber, M. Hepp: 13 Using SemWeb Resources for DQM
  • 14. Generic Data Quality Constraints Library for Easy DQ-Defintion • Mandatory properties & literals • Legal values* • Legal value ranges • Functional dependencies* • Legal syntaxes • Uniqueness * Designed to use trusted references available @ http://semwebquality.org/ontologies/dq-constraints# C. Fürber, M. Hepp: 14 Using SemWeb Resources for DQM
  • 15. Definition of Data Quality Constraints based on SPIN C. Fürber, M. Hepp: 15 Using SemWeb Resources for DQM
  • 16. Constraint checking in Practice C. Fürber, M. Hepp: 16 Using SemWeb Resources for DQM
  • 17. Legal Value Constraints Return all instances of class vcard:Address that do not have a matching value for property vcard:country-name in property tref:country SELECT ?s WHERE { ?s a vcard:Address . ?s vcard:country-name ?value . OPTIONAL { ?s2 a tref:Location . ?s2 tref:country ?value1 . FILTER(str(?value1)= str(?value)) } . FILTER(!bound(?value1)) } C. Fürber, M. Hepp: 17 Using SemWeb Resources for DQM
  • 18. Functional Dependency Constraints Return all instances of vcard:ADR with city-country-combinations that do not have a matching pair in instances of gn:Location. SELECT ?s WHERE { ?s a gr:LocationOfSalesOrServiceProvisioning . ?s vcard:ADR ?node ?node vcard:city ?value1 . ?node vcard:country ?value2 . NOT EXISTS { ?s2 a gn:Location . ?s2 gn:asciiname ?value1 . ?s2 gn:country ?value2 . }} C. Fürber, M. Hepp: 18 Using SemWeb Resources for DQM
  • 19. Acquisition of Semantic Web Sources for DQM (1) Replication of relevant knowledge-bases (2) On the fly via federated SPARQL queries: PREFIX dbo:<http://dbpedia.org/ontology/> SELECT * WHERE { ?s1 :location_CITY ?city . OPTIONAL{ SERVICE <http://dbpedia.org/sparql>{ ?s2 a dbo:City . ?s2 rdfs:label ?city . FILTER (lang(?city) = "en") . } } FILTER(!bound(?s2)) } C. Fürber, M. Hepp: 19 Using SemWeb Resources for DQM
  • 20. Limitations • High degree of uncertainty about quality of Semantic Web resources • Risk for data quality problem proliferation • Lack of Semantic Web resources for certain domains • Flexible design of RDF and structural heterogeneity complicate definition of generic DQ constraints • Scalability on large data sets • DQ constraints close the world C. Fürber, M. Hepp: 20 Using SemWeb Resources for DQM
  • 21. Contributions • Data quality control for Semantic Web data • Identification of potential inconsistencies between Semantic Web Resources • Reduction of effort for the definition of functional dependency rules and legal value rules • Reuse of shared data quality rules on a Web scale C. Fürber, M. Hepp: 21 Using SemWeb Resources for DQM
  • 22. Future Work • Semantic Web information quality assessment framework (SWIQA) with computation of KPI‘s • Analysis and identification of useful „trusted references“ based on SWIQA • Application on multi-source master data of information systems • Evaluation on large data sets C. Fürber, M. Hepp: 22 Using SemWeb Resources for DQM
  • 23. Data Quality Constraints Library for SPIN @ http://semwebquality.org/ontologies/dq-constraints# Christian Fürber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com twitter http://www.twitter.com/cfuerber Paper available at http://bit.ly/c5v6TM 23