Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

LDIF Lightening Talk

456 Aufrufe

Veröffentlicht am

LDIF translates heterogeneous Linked Data from the Web into a clean, local target representation while keeping track of data provenance.

  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

LDIF Lightening Talk

  1. 1. LDIFLinked Data Integration Framework
  2. 2. | LINKED DATA CHALLENGES• Data sources that overlap in content may: • use a wide range of different RDF vocabularies • use different identifiers for the same real-world entity • provide conflicting values for the same properties• Implications: • Queries are usually hand-crafted against individual sources – no different than an API • Improvised or manual merging of entities• Integrating public datasets with internal databases poses the same problems
  3. 3. | LDIF• LDIF homogenizes Linked Data from multiple sources into a clean, local target representation while keeping track of data provenance 1 Collect data: Managed download and update 2 Translate data into a single target vocabulary 3 Resolve identifier aliases into local target URIs 4 Cleanse data resolving the conflicting values 5 Output• Open source (Apache License, Version 2.0)• Collaboration between Freie Universität Berlin and mes|semantics
  4. 4. | LDIF PIPELINE1 Collect data Supported data sources:2 Translate data • RDF dumps (various formats) • SPARQL Endpoints3 Resolve identities • Crawling Linked Data4 Cleanse data5 Output
  5. 5. | LDIF PIPELINE1 Collect data Sources use a wide range of different RDF vocabularies2 Translate data dbpedia-owl: City3 Resolve identities schema:Place R2R local:City fb:location.citytown4 Cleanse data5 Output • Mappings expressed in RDF (Turtle) • Simple mappings using OWL / RDFs statements (x rdfs:subClassOf y) • Complex mappings with SPARQL expressivity • Transformation functions
  6. 6. | LDIF PIPELINE1 Collect data Sources use different identifiers for the same entity2 Translate data Berlin, Germany , Berlin, CT 1′ N O ° 3 24′  52 °  Berlin, MD3 Resolve identities 13 Berlin, NJ Berlin, MA4 Cleanse data Berlin =5 Output Berlin Silk Berlin, ,  N 1′  O  3 ′ Germany 2°  24 5 ° 13 • Profiles expressed in XML • Supports various comparators and transformations
  7. 7. | LDIF PIPELINE Sources provide different values for the same property1 Collect data Berlin2 Translate data population is 3.4M3 Resolve identities ★ ★ Berlin4 Cleanse data population Berlin is 3.5M Sieve population is 3.5M5 Output ★ ★ ★ • Profiles expressed in XML • Supports various quality assessment policies and conflict resolution methods
  8. 8. | LDIF PIPELINE1 Collect data Output options:2 Translate data • N-Quads3 Resolve identities • N-Triples • SPARQL Update Stream4 Cleanse data5 Output • Provenance tracking using Named Graphs
  9. 9. ! |!!! LDIF ARCHITECTUREApplication!Layer! Application!Code!! SPARQL!or!RDF!API! !!!!!!LDIF!! !!Data!Access,!! Data! Identity! Data!Quality!Integration!and!! Web!Data! Integrated! Translation! Resolution! and!Fusion! Access!Module! Web!Data!Storage!Layer! ! Module! Module! Module! ! ! HTTP!Web!of!Data! HTTP! HTTP! HTTP! RDFa! LD!Wrapper! LD!Wrapper!Publication!Layer! RDF/X ML! Database!A! Database!B! CMS!
  10. 10. | LDIF VERSIONS• In-memory • keeps all intermediate results in memory • fast, but scalability limited by local RAM• RDF Store (TDB) • stores intermediate results in a Jena TDB RDF store • can process more data than In-memory but doesnt scale• Cluster (Hadoop) • scales by parallelizing work across multiple machines using Hadoop • can process a virtually unlimited amount of data
  11. 11. | THANK YOU• Website: http://ldif.wbsg.de• Google group: http://bit.ly/ldifgroup• Supported in part by • Vulcan Inc. as part of its Project Halo • EU FP7 project LOD2 - Creating Knowledge out of Interlinked Data (Grant No. 257943)

×