SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Contextual Ontology Alignment of LOD with
  an Upper Ontology: A Case Study with
                 Proton

PrateekJain, Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana
              Damova, Pascal Hitzler and Amit P. Sheth

               Kno.e.sis, Wright State University, Dayton, OH
                     Ontotext, Sofia, Bulgaria,
             Accenture Technology Labs, San Jose, CA
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               2
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               3
Web of Data




              4
Linked Open Data

• “The term Linked Data is used to describe a method of exposing,
  sharing, and connecting data via de-referenceable URIs on the
  Web.”- Wikipedia

• Datasets part of Linked Open Data include
   –   Geographical Datasets
   –   Movies
   –   Life Science, Genes, Proteins
   –   General Information (Wikipedia), Customer Reviews,…
   –   US Census, Senator Voting Records,….


• Links primarily at instance level to assert equality between
  entities
   Example: linkedMDB:film/77 owl:sameAsdbpedia:resource/Pulp_Fiction




• By September 2010 LOD is estimated to have 25 billion RDF
  triples, interlinked by around 395 million RDF links.


                                                                        5
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               6
If everything is nice, why am I here..

• Lack of Conceptual Description of Datasets



• Absence of Schema Level Links



• Lack of expressivity



• Difficulties with respect to querying using SPARQL
   – Schema heterogeneity
   – Entity disambiguation
   – Ranking of results


                                                       7
What can be done?
• Relationships are at the heart of Semantics.

• LOD captures instance level relationships, but lacks class level
  relationships.
   – Superclass
   – Subclass
   – Equivalence


• How to find these relationships?
   – Perform a matching of the LOD Ontology’s using state of the art schema
     matching tools.


• Desirable
   – Considering the size of LOD, at least have results which a human can
     curate.



                                                                              8
Schema Matching

• Schema matching is the process of identifying that two objects
  are semantically related.

• In two schemas DB1.Student (Name, SSN, Level, Major, Marks)
  and DB2.Grad-Student (Name, ID, Major, Grades); possible
  matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN =
  DB2.ID etc. and possible transformations or mappings would be:
  DB1.Marks to DB2.Grades (100-90 A; 90-80 B..).

• Need for high quality data for querying and analytics in large
  enterprises.

• Schema mapping provides a way of resolving discrepancies in
  data.



                                                                   9
Why does it matters?

• Massive amount of data available within enterprise which refers
  to same entities, terminology is different.

• Enterprise information asset awareness.

• Finding relevant and related schemata,

• Project planning.
   – Can project specific requirements be fulfilled with the data at
     disposal.


• Generating an exchange schema.
   – Collaboration with clients which use different schemas.



     Reference: K. Smith, P. Mork, L. Seligman, A. Rosenthal, M. Morse, D. Allen, and M. Li. The Role
     of Schema Metching in Large Enterprises. CIDR, 2009.                                               10
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               11
Existing Approaches




   A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB
   Journal 10: 334–350 (2001)                                                                              12
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               13
Our Approach


Use knowledge contributed by users           Structured knowledge contributed by
                                             users

                                To improve




                                                                                   14
Rabbit out of a hat?


• Traditional auxiliary data sources like (WordNet, Upper Level
  Ontologies) have limited coverage and are insufficient for LOD
  datasets.
    •   LOD datasets have diverse domains


• Community generated data although noisy but is rich in
    •   Content
    •   Structure
    •   Has a “self healing property”


•   Problems like Schema Matching have a dimension of context
    associated with them. Since community generated data is
    created by diverse set of people, hence captures diverse
    context.

                                                                   15
Wikipedia

• The English version alone contains more than 2.9 million
  articles.

• It is continually expanded by approximately 100,000 active
  volunteer editors world-wide.

• Allows multiple points of view to be mentioned with their proper
  contexts.

• Article creation/correction is an ongoing activity with no down
  time.




                                                                     16
Schema Matching on LOD using Wikipedia
Categorization
• On Wikipedia, categories are used to organize the entire project.

• Wikipedia's category system consists of overlapping trees.

• Simple rules for categorization
   – “If logical membership of one category implies logical
     membership of a second, then the first category should be
     made a subcategory”
   – “Pages are not placed directly into every possible category,
     only into the most specific one in any branch”
   – “Every Wikipedia article should belong to at least one
     category.”




                                                                      17
BLOOMS+ Approach – Step 1


• Pre-process the input schema
   •   Remove property restrictions
   •   Remove individuals, properties




• Tokenize the class names
   •   Remove underscores, hyphens and other delimiters
   •   Breakdown complex class names
        – example: SemanticWeb => Semantic Web




                                                          18
BLOOMS+ Approach – Step 2

• For each concept name processed in the previous step
   – Identify article in Wikipedia corresponding to the concept.
   – Each article related to the concept indicates a sense of the usage of the
     word.


• For each article found in the previous step
   – Identify the Wikipedia category to which it belongs.
   – For each category found, find its parent categories till level 4.


• Once the “BLOOMS tree” for each of the sense of the source
  concept is created (Ti), utilize it for comparison with the
  “BLOOMS tree” of the target concepts (Tj).
   – BLOOMS trees are created for individual senses of the concepts.




                                                                                 19
BLOOMS+ Approach – Step 3

• In the tree Ti, find n (the number of common nodes which occurs
  in Tj).

• Compute overlap Os between the source and target tree.




• Exponentiation of the inverse depth of common node gives less
  node to nodes which appear lower in the hierarchy (generic
  nodes)

• Log of tree avoids bias against large trees.


                                                                    20
Contextual Similarity

• BLOOMS+ computes contextual similarity between a source
  class C and target D to further determine if they should be
  aligned.

• Information about super classes of C and D is a good source of
  contextual information.

• If the super classes agree, it is a good alignment otherwise it
  should be penalized.

• For example, Jaguar has super classes such as Car and Vehicle,
  and Cat has super classes such as Feline and Mammal, then the
  alignment should be penalized because its contextual similarity
  is low.



                                                                    21
BLOOMS+ Approach – Step 4


• BLOOMS+ retrieves all super classes of C and D up to level 2
  (can be changed). The set of super classes is N( C ) and N (D).

• For each BLOOMS+ tree pair ( Ti, Tj) between C and D, BLOOMS+
  determines the number of super classes in N(C) and N(D) in
  following way.

• A super class c ∈ N(C) is supported by Tiif either of the following
  conditions are satisfied:–
   – The name of c matches a node inTj
   – The Wikipedia article (or article category) corresponding to c
     based on a Wikipedia search web service call using the name
     of c – matches a node in Ti.


                                                                        22
BLOOMS Approach – Step 5

• BLOOMS+ computes the overall contextual similarity between C
  and D with respect to Ti and Tj using the harmonic mean, which
  is instantiated as:




• We chose the harmonic mean to emphasize super class
  neighborhoods that are not well supported (and hence should
  significantly lower the overall contextual similarity).



                                                                   23
BLOOMS Approach – Step 6

• BLOOMS+ computes the overall similarity between classes C
  and D w.r.t. BLOOMS+ trees Ti and Tj by taking the weighted
  average of the class and contextual similarity.




• BLOOMS+ defaults alpha and beta to 1 to give equal importance.



• BLOOMS+ then selects the tree pair (Ti,Tj) ∈ FC × FD with the
  highest overall similarity score and if this score is greater than
  the alignment threshold HA.


                                                                       24
Alignment decision

• If O(Ti,Tj) = O(Ti,Tj), then BLOOMS+ sets
    – C owl:equivalentClass D.



• If O(Ti,Tj) <O(Tj,,Ti), then BLOOMS+ sets
    – C rdfs:subClassOf D. –



• Otherwise, BLOOMS+ sets D rdfs:subClassOf C.




                                                 25
Results BLOOMS+




                  26
Outline

•   Introduction
•   Background
•   Challenges
•   Existing Approaches
•   BLOOMS+ Approach
•   Conclusion & Future Work
•   References




                               27
Conclusion

• We have presented a system called BLOOMS+ for performing
  ontology alignment using contextual information.

• BLOOMS+ has been evaluated on alignment of three different
  LOD ontologies to PROTON, created manually by human experts
  for real world application called FactForge.

• To the best of our knowledge, BLOOMS+ is the only system
  which utilizes contextual information present in ontology and
  Wikipedia category hierarchy for ontology matching.

• BLOOMS+ significantly outperforms state of the art solutions for
  the task of ontology alignment.




                                                                     28
Future Work

• Extended BLOOMS to utilize contextual information available on
  community generated data.

• New weighting mechanism for identifying matches between the
  concepts in the dataset.

• Develop a polling mechanism for identifying the best source to
  assist in the process of schema alignment.

• Allow seamless querying across datasets by utilizing the
  generated alignments (preliminary work LOQUS).




                                                                   29
References

•   PrateekJain,Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana
    Damova, Pascal Hitzler and Amit P. Sheth, “Contextual Ontology Alignment
    of LOD with an Upper Ontology: A Case Study with Proton”. Proceedings of
    the 8th Extended Semantic Web Conference 2011, volume 6643 of Lecture
    Notes in Computer Science, Heidelberg, 2011. Springer Berlin

•   Prateek Jain, Pascal Hitzler, Amit P. Sheth, KunalVerma, Peter Z. Yeh:
    Ontology Alignment for Linked Open Data. Proceedings of the 9th
    International Semantic Web Conference 2010, Shanghai, China, November
    7th-11th, 2010. Pages 402-417.

•   Prateek Jain, Pascal Hitzler, Peter Z. Yeh, KunalVerma, and AmitP.Sheth,
    Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry
    Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence.
    Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp.
    82-86. ISBN 978-1-57735-461-1.



                                                                                  30
Thank You!


Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011Pablo Mendes
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit IIpkaviya
 
Intro semanticweb
Intro semanticwebIntro semanticweb
Intro semanticwebultimate007
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: SummarizationMarina Santini
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentDiane Hillmann
 
Semantic relations: new (terminological) challenges in a world of Linked Data
Semantic relations: new (terminological) challenges in a world of Linked DataSemantic relations: new (terminological) challenges in a world of Linked Data
Semantic relations: new (terminological) challenges in a world of Linked DataNathalie Aussenac-Gilles
 
Better Search With Structured Knowledge
Better Search With Structured KnowledgeBetter Search With Structured Knowledge
Better Search With Structured KnowledgeMichel Dumontier
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...Kent State University
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Andre Freitas
 
Analysis of Overlapping Communities in Signed Complex Networks
Analysis of Overlapping Communities in Signed Complex NetworksAnalysis of Overlapping Communities in Signed Complex Networks
Analysis of Overlapping Communities in Signed Complex NetworksMohsen Shahriari
 

Was ist angesagt? (19)

Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011DBpedia Spotlight at I-SEMANTICS 2011
DBpedia Spotlight at I-SEMANTICS 2011
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
Intro semanticweb
Intro semanticwebIntro semanticweb
Intro semanticweb
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia SpotlightLOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar Series: DBpedia Spotlight
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Metadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data EnvironmentMetadata Training for Staff and Librarians for the New Data Environment
Metadata Training for Staff and Librarians for the New Data Environment
 
Semantic relations: new (terminological) challenges in a world of Linked Data
Semantic relations: new (terminological) challenges in a world of Linked DataSemantic relations: new (terminological) challenges in a world of Linked Data
Semantic relations: new (terminological) challenges in a world of Linked Data
 
Better Search With Structured Knowledge
Better Search With Structured KnowledgeBetter Search With Structured Knowledge
Better Search With Structured Knowledge
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...Using Text Comprehension Model for  Learning Concepts, Context, and Topic  of...
Using Text Comprehension Model for Learning Concepts, Context, and Topic of...
 
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
 
Analysis of Overlapping Communities in Signed Complex Networks
Analysis of Overlapping Communities in Signed Complex NetworksAnalysis of Overlapping Communities in Signed Complex Networks
Analysis of Overlapping Communities in Signed Complex Networks
 

Ähnlich wie ESWC 2011 BLOOMS+

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1Dr.-Ing. Thomas Hartmann
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: IntroductionGuus Schreiber
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfJaberRad1
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2Seonho Kim
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpediaSamantha Lam
 
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Vrije Universiteit Amsterdam
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Knowledge engineering and the Web
Knowledge engineering and the WebKnowledge engineering and the Web
Knowledge engineering and the WebGuus Schreiber
 
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Andrea Nuzzolese
 

Ähnlich wie ESWC 2011 BLOOMS+ (20)

PhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek JainPhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek Jain
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Semantic web
Semantic webSemantic web
Semantic web
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: Introduction
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpedia
 
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
Using Linked Data Traversal to Label Academic Communities - SAVE-SD2015
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Knowledge engineering and the Web
Knowledge engineering and the WebKnowledge engineering and the Web
Knowledge engineering and the Web
 
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

ESWC 2011 BLOOMS+

  • 1. Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton PrateekJain, Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth Kno.e.sis, Wright State University, Dayton, OH Ontotext, Sofia, Bulgaria, Accenture Technology Labs, San Jose, CA
  • 2. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 2
  • 3. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 3
  • 5. Linked Open Data • “The term Linked Data is used to describe a method of exposing, sharing, and connecting data via de-referenceable URIs on the Web.”- Wikipedia • Datasets part of Linked Open Data include – Geographical Datasets – Movies – Life Science, Genes, Proteins – General Information (Wikipedia), Customer Reviews,… – US Census, Senator Voting Records,…. • Links primarily at instance level to assert equality between entities Example: linkedMDB:film/77 owl:sameAsdbpedia:resource/Pulp_Fiction • By September 2010 LOD is estimated to have 25 billion RDF triples, interlinked by around 395 million RDF links. 5
  • 6. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 6
  • 7. If everything is nice, why am I here.. • Lack of Conceptual Description of Datasets • Absence of Schema Level Links • Lack of expressivity • Difficulties with respect to querying using SPARQL – Schema heterogeneity – Entity disambiguation – Ranking of results 7
  • 8. What can be done? • Relationships are at the heart of Semantics. • LOD captures instance level relationships, but lacks class level relationships. – Superclass – Subclass – Equivalence • How to find these relationships? – Perform a matching of the LOD Ontology’s using state of the art schema matching tools. • Desirable – Considering the size of LOD, at least have results which a human can curate. 8
  • 9. Schema Matching • Schema matching is the process of identifying that two objects are semantically related. • In two schemas DB1.Student (Name, SSN, Level, Major, Marks) and DB2.Grad-Student (Name, ID, Major, Grades); possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades (100-90 A; 90-80 B..). • Need for high quality data for querying and analytics in large enterprises. • Schema mapping provides a way of resolving discrepancies in data. 9
  • 10. Why does it matters? • Massive amount of data available within enterprise which refers to same entities, terminology is different. • Enterprise information asset awareness. • Finding relevant and related schemata, • Project planning. – Can project specific requirements be fulfilled with the data at disposal. • Generating an exchange schema. – Collaboration with clients which use different schemas. Reference: K. Smith, P. Mork, L. Seligman, A. Rosenthal, M. Morse, D. Allen, and M. Li. The Role of Schema Metching in Large Enterprises. CIDR, 2009. 10
  • 11. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 11
  • 12. Existing Approaches A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10: 334–350 (2001) 12
  • 13. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 13
  • 14. Our Approach Use knowledge contributed by users Structured knowledge contributed by users To improve 14
  • 15. Rabbit out of a hat? • Traditional auxiliary data sources like (WordNet, Upper Level Ontologies) have limited coverage and are insufficient for LOD datasets. • LOD datasets have diverse domains • Community generated data although noisy but is rich in • Content • Structure • Has a “self healing property” • Problems like Schema Matching have a dimension of context associated with them. Since community generated data is created by diverse set of people, hence captures diverse context. 15
  • 16. Wikipedia • The English version alone contains more than 2.9 million articles. • It is continually expanded by approximately 100,000 active volunteer editors world-wide. • Allows multiple points of view to be mentioned with their proper contexts. • Article creation/correction is an ongoing activity with no down time. 16
  • 17. Schema Matching on LOD using Wikipedia Categorization • On Wikipedia, categories are used to organize the entire project. • Wikipedia's category system consists of overlapping trees. • Simple rules for categorization – “If logical membership of one category implies logical membership of a second, then the first category should be made a subcategory” – “Pages are not placed directly into every possible category, only into the most specific one in any branch” – “Every Wikipedia article should belong to at least one category.” 17
  • 18. BLOOMS+ Approach – Step 1 • Pre-process the input schema • Remove property restrictions • Remove individuals, properties • Tokenize the class names • Remove underscores, hyphens and other delimiters • Breakdown complex class names – example: SemanticWeb => Semantic Web 18
  • 19. BLOOMS+ Approach – Step 2 • For each concept name processed in the previous step – Identify article in Wikipedia corresponding to the concept. – Each article related to the concept indicates a sense of the usage of the word. • For each article found in the previous step – Identify the Wikipedia category to which it belongs. – For each category found, find its parent categories till level 4. • Once the “BLOOMS tree” for each of the sense of the source concept is created (Ti), utilize it for comparison with the “BLOOMS tree” of the target concepts (Tj). – BLOOMS trees are created for individual senses of the concepts. 19
  • 20. BLOOMS+ Approach – Step 3 • In the tree Ti, find n (the number of common nodes which occurs in Tj). • Compute overlap Os between the source and target tree. • Exponentiation of the inverse depth of common node gives less node to nodes which appear lower in the hierarchy (generic nodes) • Log of tree avoids bias against large trees. 20
  • 21. Contextual Similarity • BLOOMS+ computes contextual similarity between a source class C and target D to further determine if they should be aligned. • Information about super classes of C and D is a good source of contextual information. • If the super classes agree, it is a good alignment otherwise it should be penalized. • For example, Jaguar has super classes such as Car and Vehicle, and Cat has super classes such as Feline and Mammal, then the alignment should be penalized because its contextual similarity is low. 21
  • 22. BLOOMS+ Approach – Step 4 • BLOOMS+ retrieves all super classes of C and D up to level 2 (can be changed). The set of super classes is N( C ) and N (D). • For each BLOOMS+ tree pair ( Ti, Tj) between C and D, BLOOMS+ determines the number of super classes in N(C) and N(D) in following way. • A super class c ∈ N(C) is supported by Tiif either of the following conditions are satisfied:– – The name of c matches a node inTj – The Wikipedia article (or article category) corresponding to c based on a Wikipedia search web service call using the name of c – matches a node in Ti. 22
  • 23. BLOOMS Approach – Step 5 • BLOOMS+ computes the overall contextual similarity between C and D with respect to Ti and Tj using the harmonic mean, which is instantiated as: • We chose the harmonic mean to emphasize super class neighborhoods that are not well supported (and hence should significantly lower the overall contextual similarity). 23
  • 24. BLOOMS Approach – Step 6 • BLOOMS+ computes the overall similarity between classes C and D w.r.t. BLOOMS+ trees Ti and Tj by taking the weighted average of the class and contextual similarity. • BLOOMS+ defaults alpha and beta to 1 to give equal importance. • BLOOMS+ then selects the tree pair (Ti,Tj) ∈ FC × FD with the highest overall similarity score and if this score is greater than the alignment threshold HA. 24
  • 25. Alignment decision • If O(Ti,Tj) = O(Ti,Tj), then BLOOMS+ sets – C owl:equivalentClass D. • If O(Ti,Tj) <O(Tj,,Ti), then BLOOMS+ sets – C rdfs:subClassOf D. – • Otherwise, BLOOMS+ sets D rdfs:subClassOf C. 25
  • 27. Outline • Introduction • Background • Challenges • Existing Approaches • BLOOMS+ Approach • Conclusion & Future Work • References 27
  • 28. Conclusion • We have presented a system called BLOOMS+ for performing ontology alignment using contextual information. • BLOOMS+ has been evaluated on alignment of three different LOD ontologies to PROTON, created manually by human experts for real world application called FactForge. • To the best of our knowledge, BLOOMS+ is the only system which utilizes contextual information present in ontology and Wikipedia category hierarchy for ontology matching. • BLOOMS+ significantly outperforms state of the art solutions for the task of ontology alignment. 28
  • 29. Future Work • Extended BLOOMS to utilize contextual information available on community generated data. • New weighting mechanism for identifying matches between the concepts in the dataset. • Develop a polling mechanism for identifying the best source to assist in the process of schema alignment. • Allow seamless querying across datasets by utilizing the generated alignments (preliminary work LOQUS). 29
  • 30. References • PrateekJain,Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth, “Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton”. Proceedings of the 8th Extended Semantic Web Conference 2011, volume 6643 of Lecture Notes in Computer Science, Heidelberg, 2011. Springer Berlin • Prateek Jain, Pascal Hitzler, Amit P. Sheth, KunalVerma, Peter Z. Yeh: Ontology Alignment for Linked Open Data. Proceedings of the 9th International Semantic Web Conference 2010, Shanghai, China, November 7th-11th, 2010. Pages 402-417. • Prateek Jain, Pascal Hitzler, Peter Z. Yeh, KunalVerma, and AmitP.Sheth, Linked Data Is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82-86. ISBN 978-1-57735-461-1. 30

Hinweis der Redaktion

  1. A bit of introduction about us. How the work came about as a result of our collaboration between Kno.e.sis, Accenture and Ontotext.
  2. Some introduction about LOD, since the track is not LOD specific track.
  3. Some introduction about LOD, since the track is not LOD specific track.
  4. Some introduction about LOD, since the track is not LOD specific track.
  5. Some introduction about LOD, since the track is not LOD specific track.
  6. Some introduction about LOD, since the track is not LOD specific track.