Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Knowledge graphs on the Web

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Knowledge Graph Engineering
Knowledge Graph Engineering
Wird geladen in …3
×

Hier ansehen

1 von 24 Anzeige

Knowledge graphs on the Web

Herunterladen, um offline zu lesen

This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.

This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Knowledge graphs on the Web (20)

Anzeige

Aktuellste (20)

Knowledge graphs on the Web

  1. 1. Collaboratively building Knowledge Graphs on the Web Armin Haller Associate Professor, ANU
  2. 2. Data deluge Impossible to manually process even a fraction of this information … … we need to prepare for a post- big data world.
  3. 3. Machine Learning/AI ML/AI approaches are performing extremely well in dealing with such massive amounts of data on tasks such as: – Image Recognition – Speech Recognition – Product recommendations – Question & Answering – Spam filtering … and for neither of these applications we need an explanation of the learned facts.
  4. 4. Machine Learning/AI and its limitations However, if it comes to: – Self driving cars – Medical diagnosis – Drug design – Robot interactions – Military applications – etc. Humans need to understand the rationale of a decision. – Facebook employs nearly 15,000 people to moderate posts deemed inappropriate by ML/AI
  5. 5. eXplainable AI XAI requires • Encoding of context (Who, What, How, When...) • Encoding the semantics of inputs, outputs and their properties • Encoding of common sense knowledge (e.g., one sits on a chair and eats on a table)
  6. 6. Knowledge Graphs (KGs) • Performance and explainability of ML improves when data is given a context – a Knowledge Graph increases the informative value of the collected data that is given to the model Knowledge Graphs [Paulheim 2017] – describe real-world entities and their interrelations – define possible classes and relations of entities in a schema (ontology) – allow for interrelating arbitrary entities with each other
  7. 7. Knowledge Graphs (KGs) • Knowledge graphs are (generally) created collaboratively by many users • Information can be added in a relatively arbitrary manner as structural constraints are few Closed KGs (~2019) [Noy et al., 2019] Microsoft ~2bn entities, ~55bn facts Google ~1bn entities, ~70bn assertions Facebook ~50m entities, ~500m assertions eBay ~1bn triples IBM ~100m entities, 5bn relationships Open KGs (April 2021) DBpedia ~4.58m entities, ~9.25GB Yago4 ~50m entities, ~18.4GB Wikidata ~93m entities, ~99GB
  8. 8. Knowledge Graphs (KGs) Graphs Natural way of structuring and presenting knowledge Heterogenous Knowledge from different sources can be integrated and/or interlinked Schema-later Schema often not decided until later, and does not impose integrity constraints
  9. 9. Schema in KGs Ontologies as schemas in KGs An ontology is an “explicit specification of a conceptualization consisting of a set of objects, and the describable relationships among them” [Gruber, 1993] Components of an Ontology • Classes: abstract groups (sets) of objects that are defined by properties that all its members share (e.g., Person, Organisation, Event) • Attributes: characteristics or parameters that objects (and classes) can have (e.g., data of birth, longitude, latitude, timestamp) • Relationships: ways in which classes and individuals can be related to one another (e.g., role, attributed to, observed by) • Individuals: Concrete objects that are inherent to the domain of discourse, such as specific people, organisations or abstract individuals such as numbers (e.g., g, π)
  10. 10. Limited many entities Generic applies to many Specific applies to few KG modelling detail Comprehensive fewer entities Data Schema Q58043963 Q76 Barack Obama (3,947 axioms) Armin Haller (189 axioms) P361 Q35120 Entity partOf minimum no of players Chess Person Q73145133 P1872
  11. 11. Types of Schemas (Ontologies) Level of Abstraction Most General Most Specific Reusability Highest Lowest Upper Ontologies Mid-Level Ontologies Domain Ontologies Use-Case Ontologies e.g., CyC, SUMO, DOLCE, BFO, CYC e.g., PROV-O, FOAF, ORG, SOSA/SSN, AGRIF e.g., GO, ChEBI, DO, BTO [Haller & Polleres, 2020a]
  12. 12. KG Engineering KG Creation Extract data from existing resources KG Usage KG Linking Add instance assertions KG Curation Add schema assertions
  13. 13. KG Creation Top-Down Schema first, Data later Bottom Up Data first, Schema later Data Schema Middle-Out
  14. 14. KG Creation (cont’d) Bottom-Up KG Creation • Schema is not defined, and data is added organically and manually using tools such as: – OntoWiki [Frischmuth et al., 2015] – Semantic MediaWiki [Krötzsch et al., 2006] – Wikibase – Schímatos [Wright et al., 2020] Top-Down KG Creation • Schema is created upfront, existing data mapped to schema using languages/tools such as: – R2RML – SPARQL Generate [Lefrançois et al., 2017] – SHACL Rules – TARQL – Metadata Extractor & Loader (MEL) [Méndez et al., 2021] – JSON to RDF Mappings (J2RM) [Méndez et al., 2020] Middle-Out KG Creation [Sure et al., 2004] • Schema is partly defined upfront based on use cases, with mappings added later when data defines semantics
  15. 15. Collaboratively building KGs • Biggest KGs on the Web are built, collaboratively, bottom-up: – Schema.org Ontology and KG • Over 10 million sites use Schema.org to markup their web pages and email messages – Wikidata Ontology and KG • Wikipedia for Data, 149GB schema.org Wikidata Availability • Ontology highly available • Data availability depending on publisher • Ontology highly available • Data highly available Discoverability • Ontology → Easy • Instances → Very Difficult • Ontology → Relatively Difficult • Instances → Very Easy Completeness & Adaptability • Domain specific (E-Commerce) • Community extensions available • (All of) Human Knowledge Maintenance & Versioning • Continuous curation • Versions are not made explicit • Continuous curation • Explicit entity versions + version history Modularization • Fully distributed, easily accessible, ontology • Fully distributed, difficult to access, data • Fully distributed, relatively difficult to access, ontology • Fully distributed, easy to access, data Quality • High quality ontology • Low quality data • High quality ontology • High quality data
  16. 16. Meta-modelling issues Without enforced (upfront designed) schemas, KGs suffer from, e.g.: • Inconsistent modelling of classes/instances <Q1412680> <P279> <Q28100368> | <Beef Wellington> <subclass of> <Beef Dish> <Q6497852> <P31> <Q28100665> | <Wiener Schnitzel> <instance of> <Veal Dish> • Subclassing of disjoint super-classes <Q190928> <P279> <Q124282> | <shipyard> <subclass of> <dock> <Q190928> <P279> <Q4830453> | <shipyard> <subclass of> <business> <Q124282> <P279> <Q7184903> | <shipyard> <subclass of> <abstract object> <Q190928> <P279> <Q223557> | <shipyard> <subclass of> <physical object> • Instance of relations between first-order classes <Q12156> <P31> <Q12136> | <Malaria> <instance of> <Disease> <Q12156> <P279> <Q12136> | <Malaria> <subclass of> <Disease> • Redundant/circular inheritances between first-order classes <Q18557307> <P279> <Q692536> | <muscle tissue disease> <subclass of> <muscular disease> <Q692536> <P279> <Q18557307> | <muscular disease> <subclass of> <muscle tissue disease>
  17. 17. KG Curation Correctness – Evaluation Accessibility, Accuracy, Consistency, Conciseness, Trustability, Dynamicity, Representationality [Zaveri et al., 2016] – Correction Evaluating data quality (SHACL, SheX) • Syntactic errors • Semantic errors Completeness – KG Completion [Paulheim, 2017] Using structural information observed in triples • Classification • Probabilistic and Statistical Methods
  18. 18. KG Linking Internal vs. External links [Haller et al., 2020b] – internal links, i.e., links between parts of one coherent KG, i.e., edges linking nodes within the graph • Link prediction techniques are used to learn those new links – external links, i.e., links between different KGs, i.e., edges between nodes from different graphs, or reusing edges from a different graph to link nodes in one KG Linking Issues [Haller et al., 2020b] • References to many inaccessible URIs (i.e., broken links) may render a KG largely useless • Changes in linked external KGs are out of control of the KG publisher
  19. 19. KG Linking • Ontology links [Haller et al., 2020b] – class link t:[dbo:Person, rdfs:subClassOf, foaf:Person] – instance typing link t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person] – property link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang Amadeus Mozart"@en] – instance role link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088] (Antonio Salieri) • Instance link t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
  20. 20. KG Linking in Wikidata • Wikidata by far the largest openly available KG, truly built bottom-up schema (ontology) and data • Wikidata dump (in HDT) from 3rd of March 2021, 53GB (149GB uncompressed). General Statistics # Triples (Facts) 1,693,668,039 # Subjects 1,625,057,179 # Predicates (edges) 38,867 # Unique objects 2,538,585,808 # Unique entities 89,120,227 # Unique Classes 2,522,595 # Unique Properties 74,309 Links # Class Links 3,955 (0.001 per class) # Property Links 835 (0.01 per property) # Instance Typing Links 0 # Instance Links • Exact Match (P2888) • Said to be the Same (P460) • Inverse Property (P1696) 173,177,045 (1.94 per entity) 3,268,021 2 0
  21. 21. KG Linking in Wikidata (cont’d) • Wikidata ontology includes links to other ontologies, but relatively fewer class and property links compared to other open KGs on the Web • Wikidata defines an extensive ontology (schema) that is used to define entities within its KG • Wikidata links to other KGs, but uses relatively less instance links than other KGs on the Web – Does not (yet) include many similarity relations even though it should not be the authoritative source for many of its entities
  22. 22. KG Usage • Knowledge Management, Knowledge Discovery • Training of ML models with KGs • Conversational Agents – Q&A – Personal Assistants – Chatbots • Open Data
  23. 23. Conclusions • Stronger focus on the KG contributors and end user needed – Tools/methods needed for creating/maintaining KGs – Tools/methods needed to support querying/analysing KG Schemas • KGs need to be stronger interlinked, e.g., link prediction techniques need to be deployed between KGs rather than just on a single KG • Improved NLP/NER-based learning techniques needed (distant supervision) that build s-p-o relations from unstructured text [Mintz et al., 2009] • Permanent Distributed querying/replication of data/schema
  24. 24. References • Hogan, A., et al.: Knowledge Graphs. ACM Computing Surveys (to appear), 2021. • Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. , Taylor, J.: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019. • Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993. • Frischmuth, P., Martin, M., Tramp, S., Riechert, T., Auer, S.: OntoWiki – An Authoring, Publication and Visualization Interface for the Data Web. Semantic Web, vol. 6, no. 3, pp. 215-240, 2015. • Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. The Semantic Web – ISWC 2006. • Wright, J., Méndez, S. J. R., Haller, A., Taylor, K., Omran, P. G.: Schímatos: a SHACL-based Web-Form Generator for Knowledge Graph Editing. The Semantic Web – ISWC 2020. • Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generating RDF from Heterogeneous Formats. ESWC (1), 2017. • Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web 7 (1), 63-93, 2016. • Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508, 2017. • Berners-Lee, T.: Linked Data. W3C Design Issues. URL: http://www.w3.org/DesignIssues/LinkedData.html, 2006. • Haller, A., Polleres, A.: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99, 2020a. • Sure, Y., Staab, S., Studer, R., On-To-Knowledge Methodology (OTKM), Handbook on Ontologies (2004) pp 117-132. • Haller, A., Fernández, J. D., Kamdar, M. R. , Polleres, A.: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM J. Data Inf. Qual. 12(2): 9:1-9:34, 2020b. • Abele, A., McCrae, J. P., Buitelaar, P., Jentzsch, A., Cyganiak, R: Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre. 2017. • Méndez, S. J. R., Haller, A., Omran, P.G., Wright, J., Taylor, K.: J2RM: An ontology-based JSON-to-RDF Mapping tool. ISWC (Demos/Industry) 2020. • Méndez, S. J. R., Haller, A., Omran, P.G., Taylor, K.: MEL: Metadata Extractor & Loader. ISWC (Posters/Demos/Industry) 2021. • Omran, P. G., Taylor, K., Méndez, S. J. R., Haller, A.: Towards SHACL Learning from Knowledge Graphs. ISWC (Demos/Industry) 2020. • Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (ACL ‘09), 2009.

×