Basic introductory talk about the Web of Linked Data, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010. Knowledge about Semantic Web is required
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Introduction to Linked Data
1. IntroductiontoLinked Data Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es) Universidad Politécnica de Madrid Universidad del Valle, Cali, ColombiaSeptember 10th 2010 Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and manyothers WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
2. Contents IntroductiontoLinked Data Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 2
3. Whatisthe Web of Linked Data? An extension of the current Web… … where information and services are given well-defined and explicitly represented meaning, … … so that it can be shared and used by humans and machines, ... ... better enabling them to work in cooperation How? Promoting information exchange by tagging web content with machineprocessable descriptions of its meaning. And technologies and infrastructure to do this And clear principles on how to publish data data
4. What is Linked Data? Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. Part of the Semantic Web Exposing, sharing and connecting data Technologies: URIs and RDF (although others are also important)
5. The fourprinciples (Tim Berners Lee, 2006) Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html 5 http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
23. What we were actually looking for Example courtesy of Guillermo Alvaro Rey
24. Itwouldbebettertomake a data query… (footballplayersfrom Albacete whoplayedEurocup 2008) Example courtesy of Guillermo Alvaro Rey
25. Howshouldwepublish data? Formats in which data ispublishednowadays… XML HTML DBs APIs CSV XLS … However, mainlimitationsfrom a Web of Data point of view Difficulttointegrate Data isnotlinkedtoeachother, as ithappenswith Web documents.
26. Which format do we use then? RDF (ResourceDescription Framework) Data model Basedon triples: subject, predicate, object <Oscar> <vive en> <Madrid> <Madrid> <es la capital de> <España> <España> <es campeona de> <Mundial de Fútbol> … Serialised in differentformats RDF/XML, RDFa, N3, Turtle, JSON…
27. URIs (Universal-UniformResourceIdentifer) Two types of identifiers can be used to identify Linked Data resources URIRefs(Unique Resource IdentifiersReferences) A URI and an optional FragmentIdentifier separated from the URI by the hash symbol ‘#’ http://www.ontology.org/people#Person people:Person Plain URIs can also be used, as in FOAF: http://xmlns.com/foaf/0.1/Person 17
28. How do wepublishLinked Data? ExposingRelationalDatabasesorother similar formatsintoLinked Data D2R Triplify R2O NOR2O Virtuoso Ultrawrap … Usingnative RDF triplestores Sesame Jena Owlim Talisplatform … Incorporatingit in theform of RDFa in CMSslikeDrupal 18
29. How do we consume Linked Data? Linked Data browsers To explore things and datasets and to navigate between them. Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland) Linked Data mashups Sites that mash up (thus combine Linked data) Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK), DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland) Search engines To search for Linked Data. Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch (Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA) Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig 19
42. Linked Data Mashup (data.gov) Clean Air Status and Trends (CASTNET) http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php
43. Linked Data in the UK Education http://education.data.gov.uk/id/school/106661 Parliament http://parliament.psi.enakting.org/id/member/1227 Maps E.g., London: http://data.ordnancesurvey.co.uk/id/7000000000041428 http://map.psi.enakting.org Transport http://www.dft.gov.uk/naptan/ SameAs service http://www.sameas.org Challenges http://gov.tso.co.uk/openup/sparql/gov-transport 29
44. Linked Data Mashup (data.gov.uk) Research Funding Explorer http://bis.clients.talis.com/
49. Linked Data Mashup (Waterquality) Water quality in Asturias’ beaches http://datos.fundacionctic.org/sandbox/asturias/playas/
50. Contents IntroductiontoLinked Data Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 36
51. GeoLinkedData It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE) http://geo.linkeddata.es
52. Motivation 99.171 % English 0.019 % Spanish The Web of Data ismainlyfor Englishspeakers Poorpresence of Spanish Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/ Thanks to Aidan and Richard
54. Impact of Geo.linkeddata.es Number of triples in Spanish (July 2010): 1.412.248 Number of triples in Spanish (EndAugust 2010): 21.463.088 40 Asunción Gómez Pérez
55. Processfor Publishing Linked Data onthe Web Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
56. 1. Identification and selection of the data sources Identification of the data sources Instituto GeográficoNacional Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Instituto Nacionalde Estadística Enable effective discovery
62. 1. Identification and selection of the data sources IndustryProductionIndex Year Province
63. 2. Vocabulary development http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabs Identification of the data sources Vocabulary development Generation of the RDF Data Thisisnotenough Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
64. 2. Vocabularydevelopment Features Lightweight : Taxonomies and a fewproperties Consensuatedvocabularies Toavoidthemappingproblems Multilingual Linked data are multilingual TheNeOnmethodology can helpto Re-enginer Non ontologicalresourcesintoontologies Pros: use domainterminologyalreadyconsensuatedbydomainexperts Withdraw in heavyweightontologiesthosefeaturesthatyoudon’tneed Reuseexistingvocabularies 47 Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery Asunción Gómez Pérez
65. Knowledge Resources Ontological Resources O. Design Patterns 3 4 O. Repositories and Registries 5 6 Flogic RDF(S) OWL OntologicalResource Reuse O. Aligning O. Merging 5 6 2 Ontology Design Pattern Reuse Non Ontological Resource Reuse 4 3 6 Non Ontological Resources 2 Ontological Resource Reengineering 7 Glossaries Dictionaries Lexicons 5 Non Ontological Resource Reengineering 4 6 Classification Schemas Thesauri Taxonomies Alignments 2 RDF(S) 1 Flogic O. Conceptualization O. Implementation O. Formalization O. Specification Scheduling OWL 8 Ontology Restructuring (Pruning, Extension, Specialization, Modularization) 9 O. Localization 1,2,3,4,5,6,7,8, 9 Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment 48
66. Vocabularydevelopment: Specification Content requirements: Identifythe set of questionsthattheontologyshouldanswer Whichone are theprovinces in Spain? Where are thebeaches? Where are thereservoirs? Identifytheproductionindex in Madrid Whichoneisthecitywithhigherproductionindex? Give me Madrid latitude and altitude …. Non-contentrequirements Theontologymustbe in thefourofficialSpanishlanguages 49 Asunción Gómez Pérez
67. 2. Lightweight Ontology Development WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers, lakes, etc.) Vocabulary for instants, intervals, durations, etc. Names and international code systems for territories and groups Ontology for OGC Geography Markup Language reused Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation. hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
68. Objetivos: INSPIRE intenta conseguir fuentes armonizadas de Información Geográfica para dar soporte a la formulación, implementación y evaluación de políticas comunitarias (Medio Ambiente, etc). Fuentes de Información Geográfica: Bases de datos de los Estados Miembros (UE) a nivel local, regional, nacional e internacional. Contexto – Directiva INSPIRE Luis Manuel Vilches Blázquez
70. hydrOntology Existencia de gran diversidad de problemas (múltiples fuentes, heterogeneidad de contenido y estructuración, ambigüedad del lenguaje natural, etc.) en la información geográfica. Necesidad de un modelo compartido para solventar los problemas de armonización y estructuración de la información hidrográfica. hydrOntology es una ontología global de dominio desarrollada conforme a un acercamiento top-down. Recubrir la mayoría de los fenómenos representables cartográficamente asociados al dominio hidrográfico. Servir como marco de armonización entre los diferentes productores de información geo-espacial en el entorno nacional e internacional. Comenzar con los pasos necesarios para obtener una mejor organización y gestión de la información geográfica (hidrográfica). Luis Manuel Vilches Blázquez
71. Fuentes Tesauros y Bibliografía Catálogos de fenómenos Getty FTT ADL BCN25 GEMET WFD CC.AA. EGM & ERM Diccionarios y Monografías BCN200 Nomenclátor Geográfico Nacional Nomenclátor Conciso Luis Manuel Vilches Blázquez
72. Criterios de estructuración Directiva Marco del Agua Propuesta por Parlamento y Consejo de la UE Lista de definiciones de fenómenos hidrográficos Proyecto SDIGER Proyecto piloto INSPIRE Dos cuencas, países e idiomas Criterios semánticos Diccionarios geográficos Diccionario de la Real Academia de la Lengua WordNet Wikipedia Bibliografía de varias áreas de conocimiento Herencia: Estructuración actual de catálogos Asesoramiento expertos en toponimia del IGN Luis Manuel Vilches Blázquez
76. 3. Generation of RDF From the Data sources Geographic information (Databases) Statistic information (.xsl) Geospatial information Different technologies for RDF generation Reengineering patterns R20 and ODEMapster Annotation tools Geometry generation Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
77. 3. Generation of the RDF Data NOR2O INE ODEMapster IGN Geometry2RDF Geospatial column IGN
78. 3. Generation of the RDF Data / instances NOR2O is a software librarythatimplementsthetransformationsproposedbythePatternsfor Re-engineering Non-OntologicalResources (PR-NOR). Currentlywehave 16 PR-NORs. PR-NORs define a procedurethattransforms a Non-OntologicalResource (NOR) componentsintoontologyelements. http://ontologydesignpatterns.org/ · Classification schemes NOR2O · Thesauri · Lexicons NOR2O FAO Water classification · Classification scheme · Path enumeration data model · Implemented in a database
83. 3. Generation of the RDF Data – R2O & ODEMapster Creation of the R2O Mappings
84. 3. Generation of the RDF Data – Geometry2RDF Oracle STO UTIL package SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry FROM "BCN200"."BCN200_0301L_RIO" c WHERE c.Etiqueta='Arroyo'
87. 3. Generation of the RDF data – RDF graphs IGN INE So far 7 RDF NamedGraphs 1.412.248 triples BTN25 BCN200 IPI …. http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI
88. 4. Publication of the RDF Data Identification of the data sources Vocabulary development SPARQL Linked Data HTML Generation of the RDF Data IncludingProvenance Support Publication of the RDF data Pubby Pubby 0.3 Data cleansing Linking the RDF data Enable effective discovery Virtuoso 6.1.0
90. 4. Publication of the RDF Data - License License for GeoLinkedData Creative Commons Attribution-ShareAlike 3.0 GNU Free Documentation License Each dataset will have its own specific license, IGN, INE, etc.
91. 5. Data cleansing Identification of the data sources Lack of documentation of the IGN datasets Broken links: Spain, IGN resources Lack of documentation of theontology Missingenglish and spanishlabels Building a spanish ontology and importing some concepts of other ontology (in English): Importing the English ontology. Add annotations like a Spanish label to them. Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents. Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label. Creating your own class and properties that model the same things as the English ontology. Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
92. 5. Data cleansing URIs in Spanish http://geo.linkeddata.es/ontology/Río RDF allows UTF-8 characters for URIs But, Linked Data URIs has to be URLs as well So, non ASCII-US characters have to be %code http://geo.linkeddata.es/ontology/R%C3%ADo
93. 6. Linking of the RDF Data Identification of the data sources Silk - A Link Discovery Framework for the Web of Data First set of links: Provinces of Spain 86% accuracy Vocabulary development Geonames Generation of the RDF Data GeoLinkedData DBPedia Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
94. 6. Linking of the RDF Data http://geo.linkeddata.es/page/Provincia/Granada 77 Asunción Gómez Pérez
95. 7. Enable effective discovery Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
101. Contents IntroductiontoLinked Data Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 84
102. Ontology-based Access to DBs 1 3 2 4 Build a new ontology from 1 DB schema and 1 DB Align the ontology built with approach 1 with a legacy ontology Align an existing DB with a legacy ontology a) Massive dump (semantic data warehouse) b) Query-driven Align an ontology network with n DB schemas and other data sources a) Massive dump (semantic data warehouse) b) Query-driven new ontology existing ontology
103. Ontology-based Access to Databases Universidad Profesor Doctorando Ontología ? Organización Personal BDR Modelo Relacional Pregunta: Nombre de los profesores de la universidad UPM * Un profesor es una persona cuyo puesto es “docente” * Una universidad es una organización de tipo “3” Procesador Procesado de la consulta de acuerdo a la descripción formal de correspondencia Consulta: valores de la columna nombre de los registros de la tabla Personal para los que el valor de la columna puesto is “docente” que estén relacionados con al menos un registro de la tabla Organización con el valor “3” en la columna tipo y “UPM” en la columna nombre.
104. Align data sourceswithlegacyontologies Aeropuertos Ontología O2 Ontología O1 Centro Comunicaciones PuntoGPS Estación Punto Europeo Aeropuerto PuntoAsiatico PuntoEspañol Aeropuerto f (Aeropuertos) = PuntoEuropeo f (Aeropuertos) = RC(O2,M1) RC(O1,M1) Modelo Relacional M1
105. R2O is a declarative language to specify mappings between relational data sources and ontologies. <xml> R2O Mapping </xml> Organization Persons University RDB Professor Student Relational Model Ontology
106. Example: types of mappingsneeded Attibute Mapping with transformation (Regular Expression) Attibute Direct Mapping Relation Mapping w. Transformation (Regular Expression) Relation Mapping w. Transformation (Keyword search)
107. Population example (II) Population example (II) The Operation element defines a transformation based on a regular expression to be applied to the database column for extracting property values
108. For concepts... One or more concepts can be extracted from a single data field (not in 1NF). A view maps exactly one concept in the ontology. For attributes... A column in a database view maps directly an attribute or a relation. A subset of the columns in the view map a concept in the ontology. A subset (selection) of the records of a database view map a concept in the ontology. A column in a database view maps an attribute or a relation after some transformation. A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL. A set of columns in a database view map an attribute or a relation. R2O (Relational-to-Ontology) Language
118. Using an RDF repository Itallowsstoring and accessing RDF data Forexample, SESAME (http://www.openrdf.org/) Downloaditfromhttp://www.openrdf.org/download.jsp openrdf-sesame-2.3.0-sdk.zip Deploythe .war in Tomcat (JDK and Tomcatneeded) Create a repository at http://localhost:8080/openrdf-sesame Check: http://localhost:8080/openrdf-sesame/repositories/XXXX http://localhost:8080/openrdf-sesame/repositories/XXX/statements
119. Linked Data frontend Toexpose data as Linked Data Includingcontentnegotiation, etc. Forexample, Pubby http://www4.wiwiss.fu-berlin.de/pubby/ Installation Use pubby-0.3.zip Deploythewebapp folder (and rename)in Tomcat Modify config.n3 Restarttomcat Check: http://localhost:8080/XXX/
124. Contents IntroductiontoLinked Data Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 101
125. RelFinder: finding relations in Linked Data E.g., relations between films “Pulp Fiction”, “Kill Bill” y “Reservoir Dogs”
126.
127. Designing URI sets forthePublic Sector (UK) http://www.cabinetoffice.gov.uk/media/301253/puiblic_sector_uri.pdf 105
129. IntroductiontoLinked Data Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es) Universidad Politécnica de Madrid Universidad del Valle, Cali, ColombiaSeptember 10th 2010 Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Guillermo Alvaro, Juan Sequeda, Carlos Ruiz Moreno and manyothers WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0