Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Knowledge Graph Research and Innovation Challenges

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 65 Anzeige

Knowledge Graph Research and Innovation Challenges

Herunterladen, um offline zu lesen

Gives an overview on some challenges regarding the combination of machine-learning and knowledge graph technologies and the vision of devising a concept of Cognitive Knowledge Graphs consisting of graphlets instead of mere entity descriptions.

Gives an overview on some challenges regarding the combination of machine-learning and knowledge graph technologies and the vision of devising a concept of Cognitive Knowledge Graphs consisting of graphlets instead of mere entity descriptions.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Knowledge Graph Research and Innovation Challenges (20)

Weitere von Sören Auer (20)

Anzeige

Aktuellste (20)

Knowledge Graph Research and Innovation Challenges

  1. 1. Sören Auer Symposium of the Knowledge Graph IG at the Alan Turing Institute June 17, 2022 Knowledge Graph Research and Innovation Challenges
  2. 2. Page 2 • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases Knowledge Graphs – A definition
  3. 3. Page 3 Industry Knowledge Graph Adoption https://www.slideshare.net/ Frank.van.Harmelen/adopti on-of-knowledge-graphs- late-2019 Eccenca aims at making KGs a commodity
  4. 4. Page 4 Comparison of various enterprise data integration paradigms Paradigm Data Model Integr. Strategy Conceptual/ operational Hetero- geneous data Intern./ extern. data No. of sources Type of integr. Domain coverage Se- mantic repres. XML Schema DOM trees LaV operational   medium both medium high Data Warehouse relational GaV operational - partially medium physical small medium Data Lake various LaV operational   large physical high medium MDM UML GaV conceptual - - small physical small medium PIM / PCS trees GaV operational partially partially - physical medium medium Enterprise search document - operational  partially large virtual high low EKG RDF LaV both   medium both high very high [1] M. Galkin, S. Auer, M.-E. Vidal, S. Scerri: Enterprise Knowledge Graphs: A Semantic Approach for Knowledge Management in the Next Generation of Enterprise Information Systems. ICEIS (2) 2017: 88-98 KGs are pretty much established for Data Integration, but what about real Knowledge?
  5. 5. Page 5 1. Integrate KGs with ML - Neuro-symbolic AI 2. Extend the concept of KGs 3. Establish true Human-Machine Collaboration From KGs for Data Integration to KGs for Knowledge Integration
  6. 6. Integrate KGs with ML - Neuro-symbolic AI
  7. 7. Page 7 How can we combine ML and KG? ML reseracher: We can learn on graphs (GNN)  KG researcher: We can use ML for KG completion (KG embedding) 
  8. 8. Page 8 Towards Neuro-Symbolic Perception Input Output Horse Tail 4 hasLegs has Pony small size subClassOf Zebra Stripes has subClassOf
  9. 9. Page 9 What do we need? 1. Use KGs as contextual/background knowledge for ML in addition to raw data  Causal reasoning 2. Use ML to extend and revise KGs 3. Integrate human and machine intelligence
  10. 10. Page 10 Synergistic Combination of Human & Machine Intelligence leveraging Knowledge Graphs Machine Intelligence Cognitive Knowledge Graph Human Intelligence Concept KG nodes/graphlets Connecting KG graphlets with ML models KG graphlet authoring, curation, validation
  11. 11. Extend the concept of KGs
  12. 12. Page 12 KGs are proven to capture factual knowledge Research Challenge: Manage • Uncertainty & disagreement • Varying semantic granularity • Emergence, evolution & provenance • Integrating existing domain models But maintain flexibility and simplicity Cognitive Knowledge Graphs for scholarly knowledge Towards Cognitive Knowledge Graphs • Fabric of knowledge molecules (graphlets) – compact, relatively simple, structured units of knowledge • Can be incrementally enriched, annotated, interlinked …
  13. 13. Page 13 KG Graphlets initial working definition Formally a CKG graphlet is a tuple of sets of classes and properties (C,P), where 1. ∀ p ∈ P the domain (either explicitly defined or implicitly inferred from a concrete CKG) includes at least one of the types c ∈ C: domain(p) ⊂ C and 2. all classes in C are connected via a property chain in P: ∀c1, c2 ∈ C ∃p1, ..., pj, ..., pn ∈ P: domain(p1) = c1, range(pj) = domain(pj+1), range(pn) = c2. Alternatively (a) a special type of connected graph patterns, where variables occur in the positions of concrete instances and literals or (b) as specific sets of SHACL shapes. Graphlets can serve as a structuring element between entity/resource descriptions and whole ontologies/KGs  KG management (e.g. reasoning, querying, completion etc.) can be adapted to KG graphlet handling
  14. 14. Page 14 Graphlet Example „Scholarly Contribution“
  15. 15. Page 15 Graphlet Example „Secutiry Advise“
  16. 16. Page 16 Factual Base entities Real world Granularity Atomic Entities Evolution Addition/deletion of facts Collaboration Fact enrichment From Factual Knowledge Graphs Today
  17. 17. Page 17 Factual Cognitive Base entities Real world Conceptual Granularity Atomic Entities Interlinked descriptions (molecules) with annotations (provenance) Evolution Addition/deletion of facts Concept drift, varying aggregation levels Collaboration Fact enrichment Emergent semantics From Factual to Cognitive Knowledge Graphs Today Needed for SKG
  18. 18. Organizing Scholarly Communication with Knowledge Graphs
  19. 19. Page 19 How did information flows change in the digital era?
  20. 20. Page 20 How does it work today? The World of Publishing & Communication has profundely changed • New means adapted to the new possibilities were developed, e.g. „zooming“, dynamics • Business models changed completely • More focus on data, interlinking of data / services and search in the data • Integration, crowdsourcing, data curation play an important role
  21. 21. Page 21 What about Scholarly Communication?
  22. 22. Page 22 Scholarly Communication has not changed (much) 17th century 19th century 20th century 21th century
  23. 23. Page 23 Challenges we are facing: We need to rethink the way how research is represented and communicated [1] http://thecostofknowledge.com, https://www.projekt-deal.de [2] M. Baker: 1,500 scientists lift the lid on reproducibility, Nature, 2016. [3] Science and Engineering Publication Output Trends, National Science Foundation, 2018. [4] J. Couzin-Frankel: Secretive and Subjective, Peer Review Proves Resistant to Study. Science, 2013. Digitalisation of Science  Data integration and analysis  Digital collaboration Monopolisation by commercial actors  Publisher look-in effects  Maximization of profits [1] Reproducibility Crisis  Majority of experiments are hard or not reproducible [2] Proliferation of publications  Publication output doubled within a decade  continues to rise [3] Deficiency of Peer Review  Deteriorating quality [4]  Predatory publishing
  24. 24. Page 24 Lack of… Root Cause – Deficiency of Scholarly Communication? Transparency information is hidden in text Integratability fitting different research results together Machine assistance unstructured content is hard to process Identifyability of concepts beyond metadata Collaboration one brain barrier Overview Scientists look for the needle in the haystack
  25. 25. Page 25 How good is CRISPR (wrt. precision, safety, cost)? What specifics has genome editing with insects? Who has applied it to butterflies? Search for CRISPR: > 238.000 Results Source: https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=CRISPR&btnG=, 04.2019
  26. 26. Page 26 How can we fix it?
  27. 27. Page 27 Mathematics • Definitions • Theorems • Proofs • Methods • … Physics • Experiments • Data • Models • … Chemistry • Substances • Structures • Reactions • … Computer Science • Concepts • Implemen- tations • Evaluations • … Technology • Standards • Processes • Elements • Units, Sensor data Architecture • Regulations • Elements • Models • … Concepts Overarching Concepts  Research problems  Definitions  Research approaches  Methods Artefacts  Publications  Data  Software  Image/Audio/Video  Knowledge Graphs / Ontologies Domain specific Concepts
  28. 28. Page 28 Chemistry Example: CRISPR Genome Editing Source: https://cacm.acm.org/system/assets/0002/2618/021116_Google_KnowledgeGraph.large.jpg?1476779500&1455222197
  29. 29. Page 29 1. Original Publication Chemistry Example: Populating the Graph 2. Adaptive Graph Curation & Completion Author Robert Reed Research Problem Genome editing in Lepidoptera Methods CRISPR / cas9 Applied on Lepidoptera Experimental Data https://doi.org/10.5281/zenodo.89691 6 3. Graph representation CRISPR / cas9 editing in Lepidoptera https://doi.org/10.1101/130344 Robert Reed https://orcid.org/0000-0002-6065-6728 Genome editing in Lepidoptera Experimental Data https://doi.org/10.5281/zenodo.896916 adresses CRSPRS/cas9 isEvaluatedWith Genome editing https://www.wikidata.org/wiki/Q24630389
  30. 30. Page 30 Research Challenge: • Intuitive exploration leveraging the rich semantic representations • Answer natural language questions Exploration and Question Answering Questi on parsin g Named Entity Recogniti on (NER) & Linking (NEL) Relatio n extracti on Query con- structi on Query executi on Result renderi ng Q: How do different genome editing techniques compare? SELECT Approach, Feature WHERE { Approach adresses GenomEditing . Approach hasFeature Feature } [1] K. Singh, S. Auer et al: Why Reinvent the Wheel? Let's Build Question Answering Systems Together. The Web Conference (WWW 2018). Q: How do different genome editing techniques compare?
  31. 31. Page 31 Engineered Nucleases Site-specificity Safety Ease-of-use / costs/ speed zinc finger nucleases (ZFN) ++ 9-18nt + -- $$$: screening, testing to define efficiency transcription activator-like effector nucleases (TALENs) +++ 9-16nt ++ ++ Easy to engineer 1 week / few hundred dollar engineered meganucleases +++ 12-40 nt 0 -- $$$ Protein engineering, high-throughput screening CRISPR system/cas9 ++ 5-12 nt - +++ Easy to engineer few days / less 200 dollar Result: Automatic Generation of Comparisons / Surveys Q: How do different genome editing techniques compare?
  32. 32. Page 32 The Open Research Knowledge Graph
  33. 33. Establish true Human- Machine Collaboration
  34. 34. To create a scholarly knowledge graph, a transformation from unstructured to structured knowledge should happen ORKG | Knowledge transformation Unstructured knowledge Structured knowledge Can we use Natural Language Processing (NLP) for the transformation process?
  35. 35. ● NLP techniques are not sufficiently accurate to perform this task autonomously ● But we can intertwine machine intelligence with human intelligence to get a synergy → the best of both worlds! ORKG | Knowledge transformation Can we use Natural Language Processing (NLP) for the transformation process? 74% 84% 78% x x = 48% Error propagation
  36. 36. Manual data entry Gradations of automation Human-in-the-loop Machine-in-the-loop Fully automated Human adds paper manually Human is assisted by a machine Assistance Assistance Machine is assisted by a human Machine adds paper automatically Better scalable
  37. 37. Manual data entry Gradations of automation Human-in-the-loop Machine-in-the-loop Fully automated Human adds paper manually Human is assisted by a machine Assistance Assistance Machine is assisted by a human Machine adds paper automatically Better scalable Human-in-the-loop Machine-in-the-loop Human is assisted by a machine Assistance Assistance Machine is assisted by a human
  38. 38. Gradations of automation Human-in-the-loop Machine-in-the-loop Human-in-the-loop Machine-in-the-loop 1. Add paper wizard 2. Paper annotator 3. TinyGenius Main entry point of adding new papers to the ORKG Annotation of key sentences in scholarly PDF articles Microtasks to validate NLP generated statements
  39. 39. Gradations of automation Human-in-the-loop Machine-in-the-loop Human-in-the-loop Machine-in-the-loop 1. Add paper wizard 2. Paper annotator 3. TinyGenius Main entry point of adding new papers to the ORKG Annotation of key sentences in scholarly PDF articles Microtasks to validate NLP generated statements
  40. 40. Machine-in-the-loop | Add paper wizard | Step 1 ● Collect metadata of paper ● Fetched automatically if a DOI is available ● Manual entry possible
  41. 41. Machine-in-the-loop | Add paper wizard | Step 2 ● Selection of a research field ● Shows the ORKG research field taxonomy
  42. 42. Machine-in-the-loop | Add paper wizard | Step 3 The third step is the description of contribution data Machine-in-the- loop
  43. 43. Add paper wizard - Step 3 ● The third step is the description of contribution data ● This includes the possibility to annotate the abstract ● The user is in charge and make the final decision on whether the automatically generated data is added on not (i.e., machine-in-the-loop) ● Annotations can be added or removed ● A confidence slider hides suggestions with a low score
  44. 44. Machine-in-the-loop | Add paper wizard Try it yourself! https://www.orkg.org/orkg/add-paper
  45. 45. Gradations of automation Human-in-the-loop Machine-in-the-loop Human-in-the-loop Machine-in-the-loop 1. Add paper wizard 2. Paper annotator 3. TinyGenius Main entry point of adding new papers to the ORKG Annotation of key sentences in scholarly PDF articles Microtasks to validate NLP generated statements
  46. 46. Machine-in-the-loop | Paper annotator ● Goal: annotate key sentences in scholarly articles with discourse classes ● Two machine-in-the-loop approaches: sentence highlighting and class recommendations
  47. 47. Sentence highlighting ● Highlights potentially interesting sentences within the article ● Can be ignored by users
  48. 48. Class recommendations Recommends potentially relevant classes based on the selected sentence, called “Smart suggestions”
  49. 49. Machine-in-the-loop | Add paper wizard Try it yourself! https://www.orkg.org/orkg/pdf-text-annotation
  50. 50. ● The human takes the lead, machine assists where possible ● The user interface integration plays a key role ● Machine provides non-intrusive suggestions, wrong suggestions can easily be ignored ● Indicate to users that suggestions are based on AI (for example by using a dedicated color schema) Machine-in-the-loop takeaways
  51. 51. Gradations of automation Human-in-the-loop Machine-in-the-loop Human-in-the-loop Machine-in-the-loop 1. Add paper wizard 2. Paper annotator 3. TinyGenius Main entry point of adding new papers to the ORKG Annotation of key sentences in scholarly PDF articles Microtasks to validate NLP generated statements
  52. 52. ● Leverage existing NLP tools to process large quantities of scholarly data ● Ask any user/visitor to validate the statements using simple tasks (aka microtasks) ● Users that are normally “content consumers” can become “content creators” as microtasks lower the entrance barrier to contribute significantly Human-in-the-loop | TinyGenius
  53. 53. ● Use question templates to ask relevant questions for a variety of NLP tasks Summarization (Hugging face) Entity Linking (Ambiverse NLU) Open Information Extraction (ORKG abstract annotator & ORKG title parser) Topic Modeling (CSO Classifier) Human-in-the-loop | TinyGenius | NLP tasks
  54. 54. Show only validated statements by default Human-in-the-loop | TinyGenius | Prototype
  55. 55. Conclusion
  56. 56. Page 63 1. Neuro Symbolic AI – combination of knowledge graphs and machine learning 2. Extend the concept of KGs (e.g. with graphlets) 3. Integration of Human and Machine Intelligence (e.g. with crowdsourcing) The grand KG challenges
  57. 57. Page 64 The Team Prof. (Univ. S. Bolivar) Dr. Maria Esther Vidal Software Development Dr. Kemele Endris Collaborators TIB Scientific Data Mgmt. Group Leaders PostDocs Project Management Doctoral Researchers Dr. Markus Stocker Dr. Gábor Kismihók Dr. Javad Chamanara Dr. Jennifer D’Souza Allard Oelen Yaser Jaradeh Manuel Prinz Alex Garatzogianni Collaborators InfAI Leipzig / AKSW Dr. Michael Martin Natanael Arndt Dr. Lars Vogt Vitalis Wiens Kheir Eddine Farfar Muhammad Haris Administration Katja Bartel Simone Matern
  58. 58. https://de.linkedin.com/in/soerenauer https://twitter.com/soerenauer https://www.xing.com/profile/Soeren_Auer http://www.researchgate.net/profile/Soeren_Auer TIB & Leibniz University of Hannover auer@tib.eu Prof. Dr. Sören Auer

×