Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Programming with Semantic Broad Data

599 Aufrufe

Veröffentlicht am

Joint Keynote at Int. Conference on Knowledge Engineering and Semantic Web and Prague Computer Science Seminar, Prague, September 22, 2016

The challenges of Big Data are frequently explained by dealing with Volume, Velocity, Variety and Veracity. The large variety of data in organizations results from accessing different information systems with heterogeneous schemata or ontologies. In this talk I will present the research efforts that target the management of such broad data.

They include: (i) an integrated development environment for programming with broad data, (ii) a query language that allows for typing of query results, (iii) a typed lambda-calculus based on description logics, and (iv) efficient access to data repositories via schema indices.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Programming with Semantic Broad Data

  1. 1. Steffen Staab Programming with Semantic Broad Data 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Programming with Semantic Broad Data Steffen Staab @ststaab west.uni-koblenz.de
  2. 2. Steffen Staab Programming with Semantic Broad Data 2 The World of Big Data – Volume & Velocity Genome data • Up to 200 GB/person Video data • Upload 300 hrs/min Sensor data • 5000 sensors/jet engine • 1 Tera bit/s 360 TB/disc https://flic.kr/p/8zuDTm https://flic.kr/p/59jc2h
  3. 3. Steffen Staab Programming with Semantic Broad Data 3 The World of Big Data – Volume & Velocity Genome data • Up to 200 GB/person Video data • Upload 300 hrs/min Sensor data • 5000 sensors/jet engine • 1 Tera bit/s https://flic.kr/p/8zuDTm https://flic.kr/p/59jc2h 18 concepts Noise amplitudes
  4. 4. Steffen Staab Programming with Semantic Broad Data 4 The World of Big Data – Variety Data models • Graph data • Relational • XML • RDF • CSV • JPEG • MPEG-1, 2, 4 • Dicom • PDF • Excel • ... Conceptual models aka ER schemata aka Logical schemata aka XML schemata aka RDFS / OWL ontologies Foaf, Dublin Core, Marc81, Unifact,..... Dozens - Hundreds "¥"
  5. 5. Steffen Staab Programming with Semantic Broad Data 5 The World of Big Data – Variety – 15 years ago SAP • In the order of 10,000 ‘concepts’ • Days to find the right column Medical information system (Lars) • Treating transplant patients • Approx. 10,000 concepts Only my very limited experiences Big consulting business
  6. 6. Steffen Staab Programming with Semantic Broad Data 6 The World of Big Data – Variety – Today! Wikidata • 1,148,230 concepts • 2515 relations UMLS • 1 Mio concepts Bioinformatics • 1000s public databases • 35 in Bio2rdf (11 bio triples) eGov datasets • 200,000 by Fraunh. Fokus • 20,000 by ODI Knowledge Graphs • Ask Google, Microsoft, Samsung, HP, ... Sensor types • 330 broad types in Wikipedia • Tens of thousands How to write valid, robust programs? How to find data?
  7. 7. Steffen Staab Programming with Semantic Broad Data 7 How to write a valid, robust program? SELECT ?x WHERE { ?x a CONCEPT15 } SELECT ?x WHERE { ?x a CONCEPT151735 } https://flic.kr/p/8zuDTm 18 concepts 1,166,040 concepts 1,148,230 concepts Sept, ´16 March, ´16
  8. 8. Steffen Staab Programming with Semantic Broad Data 8 How to approach big data In fhe following I am guessing what Axel Polleres might have told you about Enterprise Linked Data
  9. 9. Steffen Staab Programming with Semantic Broad Data 9 Traditional Information Architecture Business Logics Structured Data Unstructured Data Presentation and Interaction Characteristics: • Processes are known • Data structures are known • Meaning of data primarily in schema and code
  10. 10. Steffen Staab Programming with Semantic Broad Data 10 Big Data in Today‘s Information Architecture Characteristics: • Little structure • Semi-structured data • Meaning of data of primary importance!
  11. 11. Steffen Staab Programming with Semantic Broad Data 11 Variety Issue 1: Data Models Data Models: • Relational • Tree (XML,...) • Document oriented • Stream • Array • Graph-DB RDF Graph data model as common denominator
  12. 12. Steffen Staab Programming with Semantic Broad Data 12 Dealing with Issue 1: RDF as Data Model RDF Graph data model as common denominator knows Bowie Saran- don 8-1-1947 bornOn
  13. 13. Steffen Staab Programming with Semantic Broad Data 13 Variety Issue 2: Conceptual Models Conceptual Models: • ER • UML • ... RDFS Ontology as common denominator
  14. 14. Steffen Staab Programming with Semantic Broad Data 14 Variety Issue 2: RDFS as common conceptual meta model RDFS for explicit conceptual description knows Bowie Saran- don 8-1-1947 bornOn MusicArtist Actor typetype
  15. 15. Steffen Staab Programming with Semantic Broad Data 15 Variety Issue 3: System Boundaries IRIs for globally unique referencing f:knows m:Bowie d:Saran -don 8-1-1947 m:bornOn m:Music Artist d:Actor rdf:typerdf:type m = http://musicbrainz.org d = http://dbpedia.org f = http://xmlns.com/foaf/0.1/ rdf = https://www.w3.org/2001/sw/
  16. 16. Steffen Staab Programming with Semantic Broad Data 16 A Practical Perspective on Broad Data with LITEQ
  17. 17. Steffen Staab Programming with Semantic Broad Data 17 Drosophila: Linked Open Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Dozens of domains Hundreds of data sources Thousands of concepts Millions of entities Billions of triples Semantic Broad Data
  18. 18. Steffen Staab Programming with Semantic Broad Data 18 Programming with Linked Data
  19. 19. Steffen Staab Programming with Semantic Broad Data 19 c1 Programming with Linked Data Tasks of the Programmer 1 Schema exploration 2 Programming code types 3 Programming queries 4 Programming procedures for • creating, • manipulating, • persisting objects
  20. 20. Steffen Staab Programming with Semantic Broad Data 20 Node Path Query Language Using Autocompletion Exploration of classes
  21. 21. Steffen Staab Programming with Semantic Broad Data 21 Node Path Query Language Using Autocompletion Exploration of classes Exploration of relations
  22. 22. Steffen Staab Programming with Semantic Broad Data 22 Node Path Query Language: Query Formulation Exploration of classes Exploration of relations Querying for instances Type set of mo:MusicArtist No definition or declaration needed
  23. 23. Steffen Staab Programming with Semantic Broad Data 23 Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries All translated into SPARQL queries at • Development time • Type inference at compile time (but also as part of IDE) • Querying again at run time One language to bind them all
  24. 24. Steffen Staab Programming with Semantic Broad Data 24 Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries Developing code with new classes All translated into SPARQL queries at • Development time • Run time update • Persistence!
  25. 25. Steffen Staab Programming with Semantic Broad Data 25 Formal NPQL Syntax Data browsing Restricting Class Expressions Evaluating Class Expressions Navigating from Data to Classes Navigating from Data to Property Types URI set Intensional Queries Extensional Queries Navigational Queries
  26. 26. Steffen Staab Programming with Semantic Broad Data 27 NPQL Algebra (Example) Reversibility can be used to simplify path expressions.
  27. 27. Steffen Staab Programming with Semantic Broad Data 28 Summary on LITEQ Language Integrated Types, Extensions, and Queries NPQL (Node Path Query Language) • Navigational Queries • Intensional Queries • Extensional Queries • Compilation to SPARQL LITEQ • Implementation of NPQL as F# Type Provider in Visual Studio • Autocompletion using NPQL queries • Automatic typing of extensional query results by intensional queries
  28. 28. Steffen Staab Programming with Semantic Broad Data 29 „That seems to work very well in practice, but how does it work in theory?“ 17 let allArtists = Store.NPQL().``mo:MusicArtist``.Extension What is implied by such a line... ...for the programme? ...for the compiler? seems to
  29. 29. Steffen Staab Programming with Semantic Broad Data 30 A Foundational Perspective on Semantic Broad Data Using DL
  30. 30. Steffen Staab Programming with Semantic Broad Data 31 What we want to have: Static Type Checking But: • In LITEQ: Queries must receive types • Number of types in our system very/infinitely large • Existing type systems expect complete knowledge Programming with Data from a Knowledge Base Issue in our prototype
  31. 31. Steffen Staab Programming with Semantic Broad Data 32 Related Work Generic Types • Everything is a node or an edge • No type checking!  Only 2nd place in Halo competition Mapping approaches • Hibernate • LITEQ • ActiveRDF • Summer / Winter • ... Preferred in SemWeb now Been there, done that
  32. 32. Steffen Staab Programming with Semantic Broad Data 33 Example – and Issues with Mapping Mapping DL types to PL types problematic because 1. Mix of nominal (MusicArtist) and structural typing (recorded.Song) 2. Schema-less information (influencedBy) 3. Inference (hendrix:MusicArtist) 4. Sheer size of terminology How to type a query?
  33. 33. Steffen Staab Programming with Semantic Broad Data 34 Example Code To be rejected is not subtype of How to type a query?
  34. 34. Steffen Staab Programming with Semantic Broad Data 35 Example Code To be accepted is a How to type a query?
  35. 35. Steffen Staab Programming with Semantic Broad Data 36 What we want to have: Static Type Checking Challenge: • A programming language that accepts concept expressions as types and can deal with inferences Programming with Data from a Knowledge Base DL
  36. 36. Steffen Staab Programming with Semantic Broad Data 37 Given  • Atomic Types: A={...Ai...} • Plus Function types: T={...Ai..., ...TiTj...} Add elements • Concept expressions ( Intensional NPQL queries ) • Instances ( Extensional NPQL queries) Add knowledge • Typing and subtyping derived from knowledge base Core Ideas of DL
  37. 37. Steffen Staab Programming with Semantic Broad Data 38 Concept Forming Expressions Syntax Semantics Top T I Bottom  I Concept Name A AI Intersection A  B AI  BI Negation A I AI Existential Restriction R.C { a I | (a,b) RI and b  CI} Axioms Syntax Semantics T-Box Subclass C  D AI  BI A-Box Concept assertion a:C aI CI A-Box Role assertion (a,b) : R (aI,bI)  RI Description Logics Fragment
  38. 38. Steffen Staab Programming with Semantic Broad Data 39 Universal model of computation • Abstraction • Application Example: • f.x.f (f x) Evaluation rules  Calculus
  39. 39. Steffen Staab Programming with Semantic Broad Data 40 Syntax for core DL
  40. 40. Steffen Staab Programming with Semantic Broad Data 41 Core DL: Evaluation and Typing Nominal DL-Type
  41. 41. Steffen Staab Programming with Semantic Broad Data 42 Subtyping  many types Add KB knowledge only when needed for checking application, not proactively
  42. 42. Steffen Staab Programming with Semantic Broad Data 43 • Queries return sets • Concept set type needed • Set operators needed • Map, Fold, Element • Queries may return infinite sets • No theoretical problem, but lack of well-defined stopping conditions in KBs • Type dispatch based on inferencing Further issues and opportunities in DL
  43. 43. Steffen Staab Programming with Semantic Broad Data 44 DL Interpreter in F# and using HermiT
  44. 44. Steffen Staab Programming with Semantic Broad Data 45 Theorem: A well-typed closed term does not get stuck during evaluation (with common exceptions). Result for DL Typing is a safety net, but does not solve the halting problem (empty list)
  45. 45. Steffen Staab Programming with Semantic Broad Data 46 Conclusion
  46. 46. Steffen Staab Programming with Semantic Broad Data 47 Broad data • has grown from 104 to 106 concepts (plus data) • continues to grow – more integration of distributed databases – more sensors of different types – More crowdwork • has not been recognized as a problem of its own, yet • will lead to – brittleness – high maintenance efforts – loss of opportunities Present of Broad Data
  47. 47. Steffen Staab Programming with Semantic Broad Data 48 New Methods for Broad data • Explore – Understand • Find • Relate (see e.g. Linda‘s talk today) • Program • Maintain Future of Broad Data
  48. 48. Steffen Staab Programming with Semantic Broad Data 49Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Thank you for your attention! Thanks to my collaborators for this work: Stefan Schegelmann, Martin Leinberger, Matthias Thimm (WeST, Koblenz) Evelyne Viegas (Microsoft Research, Redmond) Ralf Lämmel (SOFTLANG, Koblenz)

×