SlideShare ist ein Scribd-Unternehmen logo
1 von 51
About 22 years ago..




                       1
11 years later…




Image from Scientific American Website
3
4
5
Tim Berners-Lee 2006
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful
   information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more
   things.


                                                                 6
In 2006 Web of Data




                      7
Linked Open Data
       • Massive collection of instance data
       • Primarily connected via owl:sameAs relationship
       • Excellent source of information for background
            knowledge

       • Labeled as mainstream Semantic Web
6/26/2012                                                      8
                                                           8
Is it really mainstream Semantic Web?
• What is the relationship between the models
 whose instances are being linked?

• How to do querying on LOD without knowing
 individual datasets?

• How to perform schema level reasoning over
 LOD cloud?

                                                9
What can be done?
• Relationships are at the heart of Semantics
• LOD primarily consists of owl:sameAs links
• LOD captures instance level relationships, but lacks
  class level relationships.
   o Superclass
   o Subclass
   o Equivalence


• How to find these relationships?
   o Perform a matching of the LOD Ontology’s using state of the art ontology matching tools.



                                                                                                10
Linked Data Alignment
   and Enrichment
 Proposal Defense June 11th, 2012

           Prateek Jain
         Kno.e.sis Center
Wright State University, Dayton, OH
Agenda
      •     Motivation and Significance of this research
      •     Research questions and proposed solutions
      •     State of the current research and planned
            work

      •     Questions and comments



14th February 2012                                         12
Linked Open Data
      •     A set of best practices for publishing and
           connecting structured data on the Web

      • Practices have been adopted by an increasing
           number of data providers in the past 5 years

      • Latest count is at 295 datasets with over 50
           Billion triples (approx)
14th February 2012                                        13
Linked Open Data 2007 (May)




Linking Open Data cloud diagram, this and subsequent pages, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/



                                                                                                                          14
Linked Open Data 2007 (Oct)




                              15
Linked Open Data 2009




                        16
Linked Open Data 2011




                        17
Linked Open Data
Number of Datasets            Number of triples (Sept 2011)

                              31,634,213,770
2011-09-19    295
                              with 503,998,829 out-links
2010-09-22    203
2009-07-14    95
2008-09-18    45
2007-10-08    25
2007-05-01    12
                 From http://www4.wiwiss.fu-berlin.de/lodcloud/state/

                                                                        18
6 years of existence how
            many applications come to
                   your mind?



6/26/2012                               19
20
Reality…
       • “We DID NOT use the entire Dbpedia or LOD.
            The only component of LOD which helped us
            with Watson was YAGO class hierarchy present
            in DBpedia. We had strict information gain
            requirements and other components honestly
            did not help much“
            – Researcher with the Watson Team


6/26/2012                                                   21
                                                           21
Why?
A simple query..
“Identify congress members, who have voted “No”
   on pro environmental legislation in the past four
   years, with
   high-pollution industry in their congressional
   districts.”



But even with LOD we cannot answer this query.

                                                       23
Example: GovTrack
                           Vote: 2009-       vote:hasOption
 vote:vote                    887                                  Votes:2009-887/+


                                                                        vote:votedBy
                                    Aye        rdfs:label
     vote:hasAction
                                                                     people/P000197
           H.R. 3962: Affordable
          Health Care for America
                                                        dc:title
                   Act                                                      name
                                          On Passage: H R
                dc:title                  3962 Affordable           Nancy Pelosi
                                           Health Care for
 Bills:h3962                                America Act




                                                                                       24
Example: GeoNames




        rdfs:subClassOf?




                           25
Our Approach
Use knowledge contributed by users




                                     To enhance existing approaches
                                     to solve these issues:

                                     • Ontology integration

                                     • Detection relationships within
LOD
                                       and across datasets
Cloud
                                     • Querying multiple datasets




                                                                        26
Circling Back


       • LOD captures instance level relationships, but
            lacks class level relationships.
             o Superclass
             o Subclass
             o Equivalence




6/26/2012                                                  28
                                                          28
BLOOMS – Bootstrapping …
• BLOOMS - Bootstrapping-based Linked Open
  Data Ontology Matching System

• Developed specifically for LOD Ontologies
• Identifies schema level links between different
  LOD datasets

• Aligns ontologies belonging to diverse domains
  using diverse data sources
                                                    30
Existing Approaches




A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10:
334–350 (2001)
                                                                                                                    31
LOD Ontology Alignment
• Actual Results from these techniques
    Nation = Menstruation, Confidence=0.9 


• They perform extremely well on established benchmarks, but
  typically not in the wilds.



• LOD Ontology’s are of very different nature
  •   Created by community for community.
  •   Emphasis on number of instances, not number of meaningful relationships.
  •   Require solutions beyond syntactic and structural matching.
                                                                                 32
Rabbit out of a hat?
• Traditional auxiliary data sources
  (WordNet, Upper Level Ontologies) have limited
  coverage.

• Community generated is noisy, but is rich in
  •   Content
  •   Structure
  •   Has a “self healing property”


• Problems like Ontology Matching have a
  dimension of context associated with them.
                                                   33
Wikipedia
• The English version alone has more than 2.9
  million articles

• Continually expanded by approx. 100,000 active
  volunteer editors

• Multiple points of view are mentioned with
  proper contexts

• Article creation/correction is an ongoing activity   34
Ontology Matching using Wikipedia
• On Wikipedia, categories are used to organize
  the entire project.

• Wikipedia's category system consists of
  overlapping trees.

• Simple rules for categorization

                                                  35
BLOOMS Approach – Step 1


• Pre-process the input ontology
     Remove property restrictions
     Remove individuals, properties




• Tokenize the class names
     Remove underscores, hyphens and other delimiters
     Breakdown complex class names
       •  example: SemanticWeb => Semantic Web


                                                         36
BLOOMS Approach – Step 2
• Identify article in Wikipedia corresponding to the concept.
   o Each article related to the concept indicates a sense of the usage of the
     word.


• For each article found in the previous step
   o Identify the Wikipedia category to which it belongs.
   o For each category found, find its parent categories till level 4.


• Once the “BLOOMS tree” for each of the sense of the source
  concept is created (Ts), utilize it for comparison with the
  “BLOOMS tree” of the target concepts (Tt).

                                                                                 37
BLOOMS Approach – Step 3
• In the tree Ts, remove all nodes for which the parent node
  which occurs in Tt to create Ts’.
   o All leaves of Ts are of level 4 or occur in Tt.
   o The pruned nodes do not contribute any additional new knowledge.


• Compute overlap Os between the source and target tree.
   o Os= n/(k-1), n = |z|, zε Ts’ ΠTt, k= |s|, sε Ts’


• The decision of alignment is made as follows.
   o For Ts εTc and Ttε Td, we have Ts=Tt, then C=D.
   o If min{o(Ts,Tt),o(Tt,Ts)} ≥ x, then set C rdfs:subClassOf D if o(Ts,Tt) ≤
     o(Tt, Ts), and set D rdfs:subClassOf C if o(Ts, Tt) ≥ o(Tt, Ts).



                                                                                 38
Example




          39
Evaluation Objectives

 • To examine BLOOMS as a tool for the purpose of LOD
   ontology matching.




 • To examine the ability of BLOOMS to serve as a general
   purpose ontology matching system.


                                                            40
Circling Back


       • LOD primarily consists of owl:sameAs links




6/26/2012                                              41
                                                      41
Part of Relationship
  Identification
Partonomy Identification
•   Currently entities across datasets are linked using primarily the
    owl:sameAs relationship


•   Relationships such as partonomy (part-of), and causality can
    allow creating even more intelligent applications such as Watson


•   Approach PLATO (Part-Of relation finder on Linked Open DAta
    Tool)




                                                                        43
PLATO Approach


• PLATO generates all possible partonomically
  linked pairs between the entities in the dataset.
  o Utilize “strongly” associated entities


• Identify the type of each entity in the pair using
  WordNet.
  o Use Class Names
  o Gives the lexicographer files for the synsets
    corresponding to these entities
                                                       44
Winston’s Taxonomy




                     45
PLATO Approach – Step 2
• PLATO generates linguistic patterns for each applicable
  property based on linguistic cues suggested by Winston.
   o Cell Wall is made of Cellulose


• Tests the lexical patterns for each entity pair in a corpus-
  driven manner.
   o Using Web as a corpus


• PLATO counts the total number of web pages that contain
  the pattern
   o Parse the page and identify the occurance of pattern.

                                                                 46
PLATO Approach – Step 3
• Asserts the partonomy property with strongest supporting
  evidence
   o Cell Wall is made of Cellulose, 48
   o Cellulose is made of Cell Wall, 10



• PLATO also enriches the schema by generalizing from the
  instance level assertions.




                                                             47
Evaluation Objectives

 • To examine PLATO as a tool for finding different kinds of
   part-of relation.

 • To examine PLATO as a tool for finding part-of relation
   within a dataset

 • To examine PLATO as a tool for finding part-of relation
   across dataset


                                                               48
49
BLOOMS                 BLOOMS+                 PLATO                 Others


       2010          1.   1 paper at ISWC                                                 1. Paper at AAAI SS
                     2.   1 paper at OM                                                   2. Paper at GEOS
                          workshop



       2011                                 1.   1 paper at ESWC
                                            2.   Workshop at ICBO




       2012                                                         1.   1 paper at ACM
                                                                         Hypertext




                                   Total of 7 publications covering this research
14th February 2012                                                                                              50
Research Plan
      • Evaluation of BLOOMS on LOD ontologies
      • Evaluation of PLATO
      • Automatic classification of datasets
      • Property alignment on LOD
14th February 2012                               51
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Simon Caton
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011Pablo Mendes
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebOscar Corcho
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentDiane Hillmann
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked dataReza Ramezani
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Assessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph RestrictionsAssessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph RestrictionsSven Lieber
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Mathieu d'Aquin
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataMathieu d'Aquin
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityMathieu d'Aquin
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingSven Lieber
 

Was ist angesagt? (18)

Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
 
2013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 20132013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 2013
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Assessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph RestrictionsAssessing, Creating and Using Knowledge Graph Restrictions
Assessing, Creating and Using Knowledge Graph Restrictions
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Domain-specific Knowledge Extraction from the Web of Data
Domain-specific Knowledge Extraction from the Web of DataDomain-specific Knowledge Extraction from the Web of Data
Domain-specific Knowledge Extraction from the Web of Data
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
LUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked DataLUCERO - Building the Open University Web of Linked Data
LUCERO - Building the Open University Web of Linked Data
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media Archiving
 

Ähnlich wie Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques

ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1Dr.-Ing. Thomas Hartmann
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)James Hendler
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notesBernadette Hyland-Wood
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
The Research Data Alliance--Creating the culture and technology for an intern...
The Research Data Alliance--Creating the culture and technology for an intern...The Research Data Alliance--Creating the culture and technology for an intern...
The Research Data Alliance--Creating the culture and technology for an intern...Research Data Alliance
 
"Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential""Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential"Research Data Alliance
 
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialKeynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialCASRAI
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...Dr.-Ing. Thomas Hartmann
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 

Ähnlich wie Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques (20)

Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
The Research Data Alliance--Creating the culture and technology for an intern...
The Research Data Alliance--Creating the culture and technology for an intern...The Research Data Alliance--Creating the culture and technology for an intern...
The Research Data Alliance--Creating the culture and technology for an intern...
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
"Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential""Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential"
 
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is EssentialKeynote: Mark Parsons - Plans are Useless, But Planning is Essential
Keynote: Mark Parsons - Plans are Useless, But Planning is Essential
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
A theory of Metadata enriching & filtering
A theory of  Metadata enriching & filteringA theory of  Metadata enriching & filtering
A theory of Metadata enriching & filtering
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 

Kürzlich hochgeladen

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques

  • 1. About 22 years ago.. 1
  • 2. 11 years later… Image from Scientific American Website
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Tim Berners-Lee 2006 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. 6
  • 7. In 2006 Web of Data 7
  • 8. Linked Open Data • Massive collection of instance data • Primarily connected via owl:sameAs relationship • Excellent source of information for background knowledge • Labeled as mainstream Semantic Web 6/26/2012 8 8
  • 9. Is it really mainstream Semantic Web? • What is the relationship between the models whose instances are being linked? • How to do querying on LOD without knowing individual datasets? • How to perform schema level reasoning over LOD cloud? 9
  • 10. What can be done? • Relationships are at the heart of Semantics • LOD primarily consists of owl:sameAs links • LOD captures instance level relationships, but lacks class level relationships. o Superclass o Subclass o Equivalence • How to find these relationships? o Perform a matching of the LOD Ontology’s using state of the art ontology matching tools. 10
  • 11. Linked Data Alignment and Enrichment Proposal Defense June 11th, 2012 Prateek Jain Kno.e.sis Center Wright State University, Dayton, OH
  • 12. Agenda • Motivation and Significance of this research • Research questions and proposed solutions • State of the current research and planned work • Questions and comments 14th February 2012 12
  • 13. Linked Open Data • A set of best practices for publishing and connecting structured data on the Web • Practices have been adopted by an increasing number of data providers in the past 5 years • Latest count is at 295 datasets with over 50 Billion triples (approx) 14th February 2012 13
  • 14. Linked Open Data 2007 (May) Linking Open Data cloud diagram, this and subsequent pages, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/ 14
  • 15. Linked Open Data 2007 (Oct) 15
  • 16. Linked Open Data 2009 16
  • 17. Linked Open Data 2011 17
  • 18. Linked Open Data Number of Datasets Number of triples (Sept 2011) 31,634,213,770 2011-09-19 295 with 503,998,829 out-links 2010-09-22 203 2009-07-14 95 2008-09-18 45 2007-10-08 25 2007-05-01 12 From http://www4.wiwiss.fu-berlin.de/lodcloud/state/ 18
  • 19. 6 years of existence how many applications come to your mind? 6/26/2012 19
  • 20. 20
  • 21. Reality… • “We DID NOT use the entire Dbpedia or LOD. The only component of LOD which helped us with Watson was YAGO class hierarchy present in DBpedia. We had strict information gain requirements and other components honestly did not help much“ – Researcher with the Watson Team 6/26/2012 21 21
  • 22. Why?
  • 23. A simple query.. “Identify congress members, who have voted “No” on pro environmental legislation in the past four years, with high-pollution industry in their congressional districts.” But even with LOD we cannot answer this query. 23
  • 24. Example: GovTrack Vote: 2009- vote:hasOption vote:vote 887 Votes:2009-887/+ vote:votedBy Aye rdfs:label vote:hasAction people/P000197 H.R. 3962: Affordable Health Care for America dc:title Act name On Passage: H R dc:title 3962 Affordable Nancy Pelosi Health Care for Bills:h3962 America Act 24
  • 25. Example: GeoNames rdfs:subClassOf? 25
  • 26. Our Approach Use knowledge contributed by users To enhance existing approaches to solve these issues: • Ontology integration • Detection relationships within LOD and across datasets Cloud • Querying multiple datasets 26
  • 27. Circling Back • LOD captures instance level relationships, but lacks class level relationships. o Superclass o Subclass o Equivalence 6/26/2012 28 28
  • 29. • BLOOMS - Bootstrapping-based Linked Open Data Ontology Matching System • Developed specifically for LOD Ontologies • Identifies schema level links between different LOD datasets • Aligns ontologies belonging to diverse domains using diverse data sources 30
  • 30. Existing Approaches A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10: 334–350 (2001) 31
  • 31. LOD Ontology Alignment • Actual Results from these techniques  Nation = Menstruation, Confidence=0.9  • They perform extremely well on established benchmarks, but typically not in the wilds. • LOD Ontology’s are of very different nature • Created by community for community. • Emphasis on number of instances, not number of meaningful relationships. • Require solutions beyond syntactic and structural matching. 32
  • 32. Rabbit out of a hat? • Traditional auxiliary data sources (WordNet, Upper Level Ontologies) have limited coverage. • Community generated is noisy, but is rich in • Content • Structure • Has a “self healing property” • Problems like Ontology Matching have a dimension of context associated with them. 33
  • 33. Wikipedia • The English version alone has more than 2.9 million articles • Continually expanded by approx. 100,000 active volunteer editors • Multiple points of view are mentioned with proper contexts • Article creation/correction is an ongoing activity 34
  • 34. Ontology Matching using Wikipedia • On Wikipedia, categories are used to organize the entire project. • Wikipedia's category system consists of overlapping trees. • Simple rules for categorization 35
  • 35. BLOOMS Approach – Step 1 • Pre-process the input ontology  Remove property restrictions  Remove individuals, properties • Tokenize the class names  Remove underscores, hyphens and other delimiters  Breakdown complex class names • example: SemanticWeb => Semantic Web 36
  • 36. BLOOMS Approach – Step 2 • Identify article in Wikipedia corresponding to the concept. o Each article related to the concept indicates a sense of the usage of the word. • For each article found in the previous step o Identify the Wikipedia category to which it belongs. o For each category found, find its parent categories till level 4. • Once the “BLOOMS tree” for each of the sense of the source concept is created (Ts), utilize it for comparison with the “BLOOMS tree” of the target concepts (Tt). 37
  • 37. BLOOMS Approach – Step 3 • In the tree Ts, remove all nodes for which the parent node which occurs in Tt to create Ts’. o All leaves of Ts are of level 4 or occur in Tt. o The pruned nodes do not contribute any additional new knowledge. • Compute overlap Os between the source and target tree. o Os= n/(k-1), n = |z|, zε Ts’ ΠTt, k= |s|, sε Ts’ • The decision of alignment is made as follows. o For Ts εTc and Ttε Td, we have Ts=Tt, then C=D. o If min{o(Ts,Tt),o(Tt,Ts)} ≥ x, then set C rdfs:subClassOf D if o(Ts,Tt) ≤ o(Tt, Ts), and set D rdfs:subClassOf C if o(Ts, Tt) ≥ o(Tt, Ts). 38
  • 38. Example 39
  • 39. Evaluation Objectives • To examine BLOOMS as a tool for the purpose of LOD ontology matching. • To examine the ability of BLOOMS to serve as a general purpose ontology matching system. 40
  • 40. Circling Back • LOD primarily consists of owl:sameAs links 6/26/2012 41 41
  • 41. Part of Relationship Identification
  • 42. Partonomy Identification • Currently entities across datasets are linked using primarily the owl:sameAs relationship • Relationships such as partonomy (part-of), and causality can allow creating even more intelligent applications such as Watson • Approach PLATO (Part-Of relation finder on Linked Open DAta Tool) 43
  • 43. PLATO Approach • PLATO generates all possible partonomically linked pairs between the entities in the dataset. o Utilize “strongly” associated entities • Identify the type of each entity in the pair using WordNet. o Use Class Names o Gives the lexicographer files for the synsets corresponding to these entities 44
  • 45. PLATO Approach – Step 2 • PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston. o Cell Wall is made of Cellulose • Tests the lexical patterns for each entity pair in a corpus- driven manner. o Using Web as a corpus • PLATO counts the total number of web pages that contain the pattern o Parse the page and identify the occurance of pattern. 46
  • 46. PLATO Approach – Step 3 • Asserts the partonomy property with strongest supporting evidence o Cell Wall is made of Cellulose, 48 o Cellulose is made of Cell Wall, 10 • PLATO also enriches the schema by generalizing from the instance level assertions. 47
  • 47. Evaluation Objectives • To examine PLATO as a tool for finding different kinds of part-of relation. • To examine PLATO as a tool for finding part-of relation within a dataset • To examine PLATO as a tool for finding part-of relation across dataset 48
  • 48. 49
  • 49. BLOOMS BLOOMS+ PLATO Others 2010 1. 1 paper at ISWC 1. Paper at AAAI SS 2. 1 paper at OM 2. Paper at GEOS workshop 2011 1. 1 paper at ESWC 2. Workshop at ICBO 2012 1. 1 paper at ACM Hypertext Total of 7 publications covering this research 14th February 2012 50
  • 50. Research Plan • Evaluation of BLOOMS on LOD ontologies • Evaluation of PLATO • Automatic classification of datasets • Property alignment on LOD 14th February 2012 51

Hinweis der Redaktion

  1. both as a knowledge source and test bed
  2. “If logical membership of one category implies logical membership of a second, then the first category should be made a subcategory”“Pages are not placed directly into every possible category, only into the most specific one in any branch”“Every Wikipedia article should belong to at least one category.”