Anzeige

Introduction to linked data

Researcher at University of Modena and Reggio Emilia um University of Modena and Reggio Emilia
18. Jul 2016
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Anzeige
Anzeige

Introduction to linked data

  1. Introduction to Linked Data Laura Po - Exploration, Visualization and Querying of Linked Open Data sources 2nd Keystone Training School - Keyword Search in Big Linked Data, University of Santiago de Compostela (USC), Spain. Laura Po
  2. Objectives By the end of this module you should have an understanding of • What is linked data • What is open data • What is the difference between linked and open data • How to publish linked data (5-star schema) • What are the linked data principles and the linked data technologies (the semantic web stack) • The economic and social impact of linked data
  3. The Web of Data The evolution from a Web of linked documents to a web of linked data The Web as a huge decentralized database (knowledge base) of machine- accessible data Web of documents... Web of linked data...
  4. The evolution of the web • The Web started as a collection of documents published online – accessible at Web location identified by a URL. • These documents often contain data about real- world resources which is mainly human-readable and cannot be understood by machines. • The Web of Data is about enabling the access to this data, by making it available in machine- readable formats and connecting it using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence). Machine-readable data (or metadata) is data in a format that can be interpreted by a computer. 2 types of machine-readable data: • human-readable data that is marked upso that it can also be understood by computers, e.g. microformats, RDFa; • data formats intended principally for computers, e.g. RDF, XML and JSON.
  5. Linked Data and the ‘Web of Data‘ ● Term refers to an idea originally from Tim Berners-Lee (Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html) ● Set of best practices for publication and linking of structured data on the web ● Basic assumption: The value of data on the web increases when they are connected to other data sources M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare. net/mediasemanticweb/quick-linked-data-introduction The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.
  6. Defining linked data “Linked data is a set of design principles for sharing machine-readable data on the Web for use by public administrations, business and citizens.” EC ISA Case Study: How Linked Data is transforming eGovernment
  7. Linked Data Principles 1. Use URIs as names for things. 2. Use HTTP URIs, so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things.
  8. How to get Data from the Web? ● Data can only be found on the Web, if it is available at some website JDBC Browser Web Server Database HTTP
  9. How to get Data from the Web? ● There is a number of different (proprietary) Web APIs, data exchange formats and Mashups on top of that Database 1 Database 2 Database 3 Database 4 Web API 1 Web API 2 Web API 3 Web API 4 Mashup
  10. In the Web today... ● Data is locked up in small data islands ● Other applications usually cannot access this data... Database Database Database Database Database Database Database Database Database Database
  11. Semantic Web Technologies , Dr. Harald Sack, Hasshttp://www.w3.org/2009/Talks/0204-ted-tbl/#(22)
  12. How to get rid of Closed Data Islands? Database 1 Database 2 Database 3 Database 4 ● Apply Semantic Web technologies ○ to publish (structured) data on the web ○ to draw connections from one data source to data from other data sources RDF data RDF data RDF data RDF data
  13. Linked Data Principles (1/4) 1. Use URIs as names for things. ○ URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts https://viaf.org/viaf/32197206/ http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart http://musicbrainz.org/artist/20244d07-534f-4eff-b4d4-930878889970 http://www.imdb.com/title/tt3659388
  14. Linked Data Principles (2/4) 2. Use HTTP URIs, so that people can look up those names. ○ HTTP URIs (URLs) as globally unique names enable dereferencing of associated information in the Web ○ via http Content Negotiation machine and humans can access the resource identified by the URI RDF Document URI represents Designatum http://dbpedia.org/resource/ Wolfgang_Amadeus_Mozart http://dbpedia.org/page/ Wolfgang_Amadeus_Mozart http://dbpedia.org/data/ Wolfgang_Amadeus_Mozart URI represents Designator URI represents Designator HTML Document FOR MACHINE FOR HUMANS Dereferencable Every term in a LOD source must be accessible via its URI through an HTTP GET. Once we access the URI we found the definition of the term.
  15. Linked Data Principles (3/4) 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) ○ RDF as universal data model for publishing structured data on the Web ○ Make all URIs in the RDF graph dereferenceable ○ Avoid RDF constructs that cause problems in Linked Data context ■ RDF Reification ■ RDF Collections und Containers ■ unnamed Blank Nodes
  16. Linked Data Principles (4/4) 4. Include links to other URIs, so that they can discover more things. ○ Link RDF references among data between different data sources: ○ owl:sameAs –create a link between individuals ○ rdfs:seeAlso – states that a resource may provide additional information ○ Relationship Links Links to external LOD Entitites related with the original entity ○ Identity Links Links to external LOD Entities referring to the same object or concept ○ Vocabulary Links Links to definitions of the original entity
  17. Advantages of Linked Open Data vs. APIs ○ Simple and generic API for various heterogeneous data sources enables simple reuse and data sharing among applications ○ RDF Data model guarantees (simple) extensibility ○ Transport via http, standard Port 80, prevents firewall adaption ○ Ontologies enable meaningful connections between data sources ○ Reasoning over Linked Data enables to generate new knowledge, i.e. inference from implicit to explicit knowledge
  18. The Semantic Web Technology Stack http://dbpedia.org/resource/ Santiago_de_Compostela Santiago de Compostela URI - Uniform Resource Identifier
  19. From Wikipedia to DBpedia https://en.wikipedia.org/wiki/ Santiago_de_Compostela http://dbpedia.org/resource/Santiago_de_Compostela
  20. From Wikipedia to DBpedia http://dbpedia.org/resource/Santiago_de_Compostela
  21. RDF Resource Description Framework :Santiago_de_Compostela rdf:type dbo:City . :Santiago_de_Compostela dbo:country dbr:Spain . :Santiago_de_Compostela owl:sameAs geodata:Santiago di Compostela . dbr:University_of_Santiago_de_Compostela dbp:city dbr:Santiago_de_Compostela . :Santiago_de_Compostela dbp:populationTotal 95671 (xsd:integer) . ... :Santiago rdf:type dbo:City . RDF Subject RDF Property RDF Object RDF Triple From Wikipedia to DBpedia http://dbpedia.org/resource/Santiago_de_Compostela
  22. ● Resource ○ can be everything ○ must be uniquely identified and referencable via URI ● Description ○ = description of resources ○ via representing properties and relationships among resources as graphs ● Framework ○ = combination of web based protocolls (URI, HTTP, XML, Turtle, JSON, …) ○ based on formal model (semantics) ● Knowledge in RDF is expressed as a list of statements ● all RDF statements follow the same simple schema (= RDF Triple) Resource Description Framework
  23. Resource Description Framework ● RDF Statements (RDF-Triple): + Object / ValueSubject + Property URI URI URI / Literal RDF Building Blocks <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology/ populationTotal> N-Triples Serialization “95671” . graph representation <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology/ populationTotal> “95671” .
  24. Resource Description Framework ● URIs and Literals ○ URIs reference resources uniquely ○ Literals describe data values that don’t have a separate existence <http://dbpedia.org/resource/Spain> <http://dbpedia.org/ontology /country> <http://dbpedia.org/resource /Santiago_de_Compostela> <http://dbpedia.org/ontology /populationTotal> “95671” .
  25. RDF Schema dbo:City rdf:type owl:class . dbo:City rdfs:subClassOf dbo:Settlement . dbo:foundationPlace rdfs:range dbo:City. ... City foundation Place Settlement rdfs:isSubclassOf The Semantic Web Technology Stack http://dbpedia.org/ontology/City rdfs:range
  26. logical constraint City Spain Madrid dbo:country Small_town ∩ Capital = ∅ rdf:type rdfs:isSubclassOf ∀x. ( City(x)∧ seatOfGovernment(x) → Capital(x) ) description logics + logical rules classes entities The Semantic Web Technology Stack
  27. Look for a l l cities located i n the same area of Santiago de Compostela (use the property dbp:subdivisionName) PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX dbp: <http://dbpedia.org/property/> PREFIX dbr: <http://dbpedia.org/resource/> SELECT distinct ?area ?city FROM <http://dbpedia.org/> WHERE{ ?area dbp:subdivisionName dbr:Santiago_de_Compostela. ?area dbp:subdivisionName ?city. } The Semantic Web Technology Stack http://dbpedia.org/sparql
  28. http://dbpedia.org/sparql Look fo r a l l cities located i n the same area of Santiago de Compostela (use the property dbp:subdivisionName)
  29. Query language designed to use a syntax similar to SQL for retrieving data from relational databases. Different query forms: • SELECT returns variables and their bindings directly. • CONSTRUCT returns a single RDF graph specified by a graph template. • ASK test whether or not a query pattern has a solution. Returns yes/no. • DESCRIBE returns a single RDF graph containing RDF data about resources. SPARQL – * Protocol and RDF Query Language
  30. SQL versus SPARQL SQL SPARQL Based on relations (tables). Based on labelled directed graphs. The relations (tables) to be matched over should be indicated. Assumes a default graph. (The FROM clause populates this with specific identified subgraphs). (Retrieval) queries produce a relation from a relation. SPARQL SELECT queries produce a relation from a graph. CONSTRUCT queries (considered later) produce a graph from a graph.
  31. The application of the Linked Data Principles leads to a ,Web of Data‘ >1014Datasets >74B RDF Triples 808M Links as of August 2014
  32. The Development of the Web of Data May 2007
  33. The Development of the Web of Data Nov 2007
  34. The Development of the Web of Data
  35. The Development of the Web of Data July 2009
  36. The Development of the Web of Data Aug 2014
  37. Linked Open Data ○ Public Linked Data resources in the Web, licensed as Creative Common CC-BY ○ Tim Berners-Lee‘s 5-Star Criteria for Linked Open Data ★★ ★★★ Available on the web (whatever format) but with an open licence, to be Open Data Available as machine-readable structured data (e.g. excel instead of image scan of a table) as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★★ All the above, plus: link your data to other people’s data to provide context ★★★★ All the above plus: use open standards from W3C (URI,RDF and SPARQL) to identify things, so that people can point at your stuff ★
  38. Linked Open Data http://5stardata.info/en/
  39. December 2007 8 principles for the Open Government Data: Complete Primary (not aggregate) Up to date Accessible Machine processable Non-discriminatory Non-proprietary No license fees https://opengovdata.org/
  40. Open data Data can be published and be publicly available under an open licence without linking to other data sources. Linked data Data can be linked to URIs from other data sources, using open standards such as RDF without being publicly available under an open licence. “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” - OpenDefinition.org Seealso: Cobden et al., A research agenda for Linked ClosedData http://ceur-ws.org/Vol-782/CobdenEtAl_COLD2011.pdf Linked Data vs open Data
  41. • Flexible data integration: LOGD facilitates data integration and enables the interconnection of previously disparate government datasets. • Increase in data quality: The increased (re)use of LOGD triggers a growing demand to improve data quality. Through crowd-sourcing and self-service mechanisms, errors are progressively corrected. • New services: The availability of LOGD gives rise to new services offered by the public and/or private sector. • Cost reduction: The reuse of LOGD in e-Government applications leads to considerable cost reductions. Seealso: ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business- models-linked-open-government-data-bm4logd Linked (open) governament data
  42. Key milestones for linked government data
  43. Linked Data - A Guided Tour ● Datasets ordered by category http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
  44. Government ● 183 datasets ● top 10 highest indegree: reference.data.gov.uk ● 48 proprietary vocabularies used ● c. 21% fully dereferencable Dereferencable Every term in a LOD source must be accessible via its URI through an HTTP GET. Once we access the URI we found the definition of the term. The dereferencability quota of a LOD source is define as the number of dereferencable terms divided by all terms collected into the source. fully dereferencable LOD source – there exist a definition for all URIs partially dereferencable LOD source - for some terms, but not for all, a definition could be retrieved
  45. Media ● 22 datasets ● 22 proprietary vocabularies used ● 0% fully dereferencable ● 9% partially dereferencable
  46. User Generated Content ● 48 datasets ● top 10 highest outdegree: semanticweb.org ● 30 proprietary vocabularies used ● 13% fully dereferencable ● 10% partially dereferencable
  47. Linguistics ● no statistics available so far
  48. Bibliographic Data ● 96 datasets ● top 10 highest indegree: data.semanticweb.org ● top 10 highest outdegree: bibsonomy.org ● 58 proprietary vocabularies used ● 21% fully dereferencable ● 7% partially dereferencable
  49. ● 83 datasets ● 35 proprietary vocabularies used ● 28% fully dereferencable ● 6% partially dereferencable Life Sciences
  50. Cross Domain ● 41 datasets ● top 10 highest indegree: dbpedia.org, w3.org, lexvo.org ● 55 proprietary vocabularies used ● 27% fully dereferencable ● 11% partially dereferencable
  51. Social Networking ● 520 datasets ● top 10 highest indegree: quitter.se, status.net, … ● top 10 highest outdegree: deri.org, harth.org,... ● 128 proprietary vocabularies used ● 16% fully dereferencable ● 6% partially dereferencable
  52. Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Insti Geographic ● 21 datasets ● top 10 highest indegree: geonames.org ● 24 proprietary vocabularies used ● 21% fully dereferencable ● 4% partially dereferencable
  53. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● OWL owl:sameAs connects identical individuals owl:equivalentClass connects equivalent classes
  54. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● SKOS ○ „Simple Knowledge Organization System“ ○ based on RDF and RDFS ○ applied for definitions and mappings of vocabularies and ontologies ■ skos:Concept (classes) ■ skos:narrower ■ skos:broader ■ skos:related ■ skos:exactMatch (vacabulary) ■ skos:narrowMatch ■ skos:broadMatch ■ skos:relatedMatch
  55. Linked Data Ontologies ● Ontologies hold the Linked Data Cloud together ● umbel ○ „Upper Mapping and Binding Exchange Layer“ ○ Subset of OpenCycas RDF Triples based on SKOS and OWL2 ○ Upper Ontology with 28.000 concepts (skos:Concept) ○ 46.000 Mappings into DBpedia, geonames, e.a. (owl:equivalentClass, rdfs: subClassOf) ○ Links to more than 2 Mio Wikipedia pages
  56. Member State initiatives – some examples Some examples on supra-national, national, regional and private initiatives in the area of linked (open) data across Europe. DE – Bibliotheksverbund Bayern Linked data from 180 academic libraries in Bavaria, Berlin and Brandenburg. IT – Agenzia per l’Italia digitiale Three datasets published as linked data: the Index of Public Administration, the SPC contracts for web services and conduction systems and the Classifications for the data in Public Administration. NL – Building and address register The Dutch Address and Buildings base register published as linked data. UK – Ordnance Survey Three OS Open Data products published as linked data: the 1:50 000 Scale Gazetteer, Code-Point Open and the administrative geography taken from Boundary Line. UK – Companies House Publishing basic company details as linked data using a simple URI for each company in their database. Seealso: ISA Study on Business Models for LOGD https://joinup.ec.europa.eu/community/semic/document/study-business- models-linked-open-government-data-bm4logd
  57. Linked Government Data & Metadata initiatives funded by the European Commission ADMS. SW CORE PUBLIC SERVICE VOCABULARY
  58. Linked Government Data Pilots http://health.testproject. eu/PPP/ http://maritime.testproject. eu/CISE/ http://cpsv.testproject.e u/CPSV/
  59. Non-governmental applications
  60. Conclusion • Linked data is a set of design principles for sharing machine-readable data on the Web. • Linked data and open data are not the same. • URIs, RDF and SPARQL form the foundational layer for Linked data. • Linked data offers a number of advantages for: • Data integration with small impact on legacy systems; • Enables for semantic interoperability; • Enables creativity and innovation through context and knowledge- creation.
  61. Group questions Is there supply and demand for (Linked) Open Government Data in your country? What are, in your opinion, the expected benefits and pitfalls of Linked Data? Do you know if there are any Linked (Open) Data initiatives in your country? If so, how many stars would you give them?
  62. Download the slide from My research group website www.dbgroup.unimore.it On slide share http://www.slideshare.net/polaura
  63. References Some of the materials used in these slides have been rearranged from - Slides of the “Knowledge Engineering with Semantic Web Technologies 2015” course held by Dott. Harald Sack https://open.hpi.de/courses/semanticweb2015 - Slides of the "Introduction to linked data" of Open Data Support http://www.slideshare.net/OpenDataSupport/introduction-to-linked-data- 23402165 - Slides of "Usage of Linked Data Introduction and Application Scenarios « and "Querying Linked Data" by Barry Norton, EUCLID project
  64. Further readings Linked Open Government Data. Li Ding Qualcomm, Vassilios Peristeras and Michael Hausenblas. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6237454 EUCLID - Course 1: Introduction and Application Scenarios http://www.euclid- project.eu/modules/course1 Linked Open Data: The Essentials. Florian Bauer, Martin Kaltenböck. http://www.semantic-web.at/LOD-TheEssentials.pdf Linked Data: Evolving the Web into a Global Data Space. Tom Heath and Christian Bizer. http://linkeddatabook.com/editions/1.0/
  65. LOD2 FP7 project, http://lod2.eu/ The Open Knowledge Foundation, http://okfn.org/ W3C Semantic Web, http://www.w3.org/standards/semanticweb/ EUCLID, http://projecteuclid.org/ ISA Programme, http://ec.europa.eu/isa/ W3C LOGD WG, http://www.w3.org/2011/gld/wiki/Main_Page LOD Around The Clock FP7 project, http://latc-project.eu/ Data.gov.uk, http://data.gov.uk/linked-data Related projects and initiatives
Anzeige