SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Datalift: A Catalyser for the Web of Data


                    François Scharffe
                    LIRMM/CNRS/University of Montpellier
                       francois.scharffe@lirmm.fr
                       @lechatpito




With the help of the Datalift team
And the support of the French National Research Agency



                FOSDEM 5/02/2011                    1
The data revolution is on its way !

     As Open Data meets the Semantic Web
The promises of linked-data
Richer Applications




Linked Data Lite | the Web on Steroids 1.0 (iPhone)
Richer applications




    BBC Programmes
More precise search and QA
Making your data 5 stars




http://www.w3.org/DesignIssues/LinkedData.html
So, how to lift data ?
    How to publish data on the Web as linked-
    data ?
●   Basic principles Tim Berners Lee [2006] (Design Issues)
       –   Use URIs to identify things (not only documents)
       –   Use HTTP URIs
       –   When dereferecing URIS, return a description of the
           ressource
       –   Include links to other ressources on the Web
Welcome aboard the data lift
                Published and interlinked data on the Web
                             Applications


                Interconnexion


Publication infrastructure


           Data convertion


                 Vocabulary selection




                                        Raw data
Datalift


Datasets publication
R&D to automate the publication process
Tool suite to help publish data
Training, tutorials, data publication camps
st
                       1 floor - Selection
SemWebPro 18/01/2011            11
Les vocabulaires de mes amis …


Ø What is a (good) vocabulary for linked data ?
    § Usability criterias
            Simplicity, visibility, sustainability, integration, coherence …

Ø Differents types of vocabularies
    §   metadata, reference, domain, generalist …
    § The pillars of Linked Data : Dublin Core, FOAF, SKOS
Ø Good and less good practices
    § Ex : Programmes BBC vs legislation.gov.uk
    § Vocabulary of a Friend : networked vocabularies
Ø Linguistic problems
    § Existing vocabularies are in English at 99%
    § Terminological approach :which vocabularies for « Event » « Organization »
Did you say « vocabulary »


… And why not « ontology »?
    § Or « schema » ou « metadata schema »?
    § Ou « model » (data ? World ?)
Ø All these terms are used and justifiable
They are all « vocabularies »
    § The define types of objects (or classes)
      and the properties (oo attributes) atttached to these objects.
    § Types and attributes are logically defined
      and named using natural language
    § A (semantic) vocabulary
      is an explicit formalization
      of concepts existing in natural language

                     SemWebPro 18/01/2011                   13
Vocabularies for linked data


Ø Are meant to describe resources in RDF
Ø Are based on one of the standard W3C language
  § RDF Schema (RDFS)
     • For vocabulaires without too much logical complexity
  § OWL
     • For more complex ontological constructs
   § These two languages are compatible (almost)
Ø The can be composed « ad libitum »
  § One can reuse a few elements of a vocabulary
  § The original semantics have to be followed
What makes a good vocabulary ?


Ø A good vocabulary is a used vocabulary
   § Data published on CKAN give an idea of vocabulary usage
   § Exemple : v
     list of datasets using FOAF http://xmlns.com/foaf/0.1/
Ø Other usability criterias
   § Simplicity and readability in natural language
   § Elements documentation (definition in natural language)
   § Visibility and sustainability of the publication
   § Flexibility and extensibility
   § Sémantique integration (with other vocabularies)
   § Social integration (with the user community)
A vocabulary is also a community


Ø Bad (but common) practice
   ●
       Build a lonely vocabulary
        –   For example as a research project
        –   Without basing it on any existing vocabulary
  § To publish it (or not) and then to forget about it
  § Not to care about its users
Ø A good vocabulary has an organic life
  § Users and use cases
  § Revisions and extensions
  § Like a « natural » vocabulary
Types of vocabularies


Ø Metadata vocabularies
   § Allowing to annotate other vocabularies
       • Dublin Core, Vann, cc REL, Status
Ø Reference vocabularies
   § Provide « common » classes and properties
       • FOAF, Event, Time, Org Ontology
Ø Domain vocabularies
   § Specific to a domain of knowledge
       • Geonames, Music Ontology, WildLife Ontology
Ø « general » vocabularies
   § Describe « everything » at an arbitrary detail level
       • DBpedia Ontology, Cyc Ontology, SUMO
Vocabulary of a Friend


Ø http://www.mondeca.com/foaf/voaf
Ø A simple vocabulary...
Ø To represent interconnexions between vocabularies
Ø A unique entry point to vocabularies and Datasets of
  the linked-data cloud Linked Data Cloud
Ø Ongoing work in Datalift
nd
                   2 floor - Conversion
SemWebPro 18/01/2011         19
URL Design et URL Pattern


Ø Good practices for linked-data
  § Ressource: http://dbpedia.org/resource/Paris
  § Document: http://dbpedia.org/page/Paris
  § Data: http://dbpedia.org/data/Paris
Ø … served using content negociation
URI Pattern in REST


Ø Les services REST (Representational State Transfer)
  manipulent des ressources et les URLs sont
  principalement utilisés pour adresser ces ressources
Ø Une URI de base:
   § http://www.example.com/bookstore/
Ø Une ressource à un URL unique: (retrieve, update,
  create, delete)
   § http://www.example.com/bookstore/books/ISBN123
Ø Notion de collection: (list, replace, create, delete)
   § http://www.example.com/bookstore/books
Convertion tools to RDF


Ø How is the raw data to be converted ?
  § Relational Database ?
  § (Semi-)structured formats ?
  § Programmatic acces (API) ?
Ø There are solutions for all cases
D2RQ Map
Triplify: Relational data to JSON/RDF




Ø Extract a folder in your Webapp:
  http://sourceforge.net/projects/triplify/
Ø Modify a config file:
   § SQL query … URI pattern
   § PHP lover!
Working on spreadsheets
Google acquired Freebase




http://code.google.com/p/google-refine/
RDF extension for Google Refine


Ø A graphical extension for Google Refine allowing to
  export the clean data as RDF
  http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/

                                                                 Annual pay rate
                                                                    - including
     Name            Job Title        Grade     Organization                             Notes
                                                                 taxable benefits
                                                                 and allowances

                 Chief Executive              Asset Protection   £150,000 -
Stephan Wilcke
                 Officer                      Agency             £154,999
                                              Asset Protection   £165,000 -
Jens Bech        Chief Risk Officer                                                 No pension
                                              Agency             £169,999
                 Chief Invesment              Asset Protection   £165,000 -
Ion Dagtoglou                                                                       No pension
                 Officer                      Agency             £169,999
                 Chief Credit                 Asset Protection   £130,000 -
Brian Scammell                                                                      4 days per week
                 Officer                      Agency             £134,999
Google Refine et RDF
rd
                       3 floor - Publication
SemWebPro 18/01/2011             29
Publication components

                       Querying
                       Browsing

            SPARQL               REST
            endpoint


                                            Alimentation
Inference
 Engine                  RDF
                       storage              Alimentation


                                            Alimentation


             A few products
             Virtuoso, Sesame, Mulgara, 4store
             OWLIM, AllegroGraph, Big Data,Jena
Named graphs



Ø Rdf graphs are bags of triples, everything is mixed
                                                            1
Ø Delete on a graph
                                                                    2
Ø SPARQL queries define                                 3

                                                                5
  graphs                            9

                                                                            6
                                        11
                               10
                                                                                    8
                                    12
                                                                        4       7

                                              13

                                                            16

                                         14        15
Inference
                                                                                 1

                                                                             3           2
                                                                                     5
Ø Generating triples from other triples                        9
                                                                                             6
                                                          10       11
                                                                                                     8
Ø Deduction mechanism                                          12
                                                                                         4       7
                                                                        13
   § Men are mortals, Socrates is a man, so Socrates is                          16
     mortal                                                         14 15


Ø Allows to avoid exhaustivity, give sense to
  defining hierarchies
Ø Constraints: cardinality, NFPs, ...
Analyse des RDF Store : la méthode QSOS




Ø Qualification and Selection of Open Source Software
   §   Projet Open Source sur des solutions open source
   §   http://www.qsos.org
Ø Objectifs de QSOS
   §   Qualifier des logiciels
   §   Comparer des solutions après avoir défini des exigences et en pondérant les critères
   §   Sélectionner le produit le plus adapté par rapport à un besoin
Ø QSOS fournit
   §   Une méthode objective et formalisée ‫‏‬
   §   Un référentiel d’études disponibles
   §   Des outils facilitant le déroulement de la méthode
th
                 4 floor - Interconnexion
SemWebPro 18/01/2011         34
Linked data and interconnexions


Ø Without links there is no Web but data silos
Ø Links can be part of the datasets design (reference
  datasets)
Ø Links can be found after the publication: equivalence
  links between resources
Comment interconnecter ses données ?
Tools


Ø RKB-CRS A coreference resolution service for the RKB
  knowledge base
Ø LD-mapper A linkage tool for datasets described using the
  Music Ontology
Ø ODD Linker A linkage tool based on SQL
Ø RDF-AI Multi purpose data linkage and fusion
Ø Silk et Silk LSL Linkage tool and linkage specification language
Ø Knofuss architecture Datasets linkage and fusion
Exemple Silk specification
<Silk>                                           <Interlink id="cities">
 <Prefix id="rdfs" namespace=                      <LinkType>owl:sameAs</LinkType>
      "http://www.w3.org/2000/01/rdf-schema#" />   <SourceDataset dataSource="dbpedia" var="a">
 <Prefix id="dbpedia" namespace=                     <RestrictTo>
      "http://dbpedia.org/ontology/" />                ?a rdf:type dbpedia:City
 <Prefix id="gn" namespace=                          </RestrictTo>
      "http://www.geonames.org/ontology#" />       </SourceDataset>
                                                   <TargetDataset dataSource="geonames" var="b">
 <DataSource id="dbpedia">                           <RestrictTo>
  <EndpointURI>http://demo_sparql_server1/sparql       ?b rdf:type gn:P
  </EndpointURI>                                     </RestrictTo>
  <Graph>http://dbpedia.org</Graph>                </TargetDataset>
 </DataSource>                                     <LinkCondition>
                                                     <AVG>
 <DataSource id="geonames">                            <Compare metric="jaroSimilarity">
  <EndpointURI>http://demo_sparql_server2/sparql        <Param name="str1" path="?a/rdfs:label" />
  </EndpointURI>                                        <Param name="str2" path="?b/gn:name" />
  <Graph>http://sws.geonames.org/</Graph>              </Compare>
 </DataSource>                                         <Compare metric="numSimilarity">
                                                        <Param name="num1"
 <Thresholds accept="0.9" verify="0.7" />                    path="?a/dbpedia:populationTotal" />
 <Output acceptedLinks="accepted_links.n3"              <Param name="num2" path="?b/gn:population" />
   verifyLinks="verify_links.n3"                       </Compare>
   mode="truncate" />                                </AVG>
                                                   </LinkCondition>
                                                 </Interlink>
                                                 </Silk>
Where to find links ?
Towards automated interconnexion services


Ø The linkage specification could be simplified
  § Using alignments between vocabularies
  § Detection of discriminating properties
  § Indicating comparison methods by attaching metadata to
    ontologies
Ø Work in progress in Datalift
5th floor - Applications
SemWebPro 18/01/2011          41
Data visualization




                Tabulator
                (CSAIL, MIT)
VisiNav
Sig.ma
Nos Députés . FR
A few examples from US




http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
Mashups … Mashups … Mashups …
That's it !
●   Datalift.org
●   We're looking for a Datageek !

Weitere ähnliche Inhalte

Andere mochten auch

Computer Network 6 CAN
Computer Network 6 CANComputer Network 6 CAN
Computer Network 6 CANFelix Lin
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift
 
мо русский язык
мо русский языкмо русский язык
мо русский языкAleksey Yevseyev
 
Тиждень хімії, біології та психології
Тиждень хімії, біології та психологіїТиждень хімії, біології та психології
Тиждень хімії, біології та психологіїAleksey Yevseyev
 
Тиждень математики
Тиждень математикиТиждень математики
Тиждень математикиAleksey Yevseyev
 
Maker精神 (長篇)
Maker精神 (長篇)Maker精神 (長篇)
Maker精神 (長篇)Felix Lin
 
電力監控IOT
電力監控IOT電力監控IOT
電力監控IOTFelix Lin
 
Антипрезентація для МАН
Антипрезентація для МАНАнтипрезентація для МАН
Антипрезентація для МАНAleksey Yevseyev
 

Andere mochten auch (8)

Computer Network 6 CAN
Computer Network 6 CANComputer Network 6 CAN
Computer Network 6 CAN
 
Datalift lod2-paris-24032011
Datalift lod2-paris-24032011Datalift lod2-paris-24032011
Datalift lod2-paris-24032011
 
мо русский язык
мо русский языкмо русский язык
мо русский язык
 
Тиждень хімії, біології та психології
Тиждень хімії, біології та психологіїТиждень хімії, біології та психології
Тиждень хімії, біології та психології
 
Тиждень математики
Тиждень математикиТиждень математики
Тиждень математики
 
Maker精神 (長篇)
Maker精神 (長篇)Maker精神 (長篇)
Maker精神 (長篇)
 
電力監控IOT
電力監控IOT電力監控IOT
電力監控IOT
 
Антипрезентація для МАН
Антипрезентація для МАНАнтипрезентація для МАН
Антипрезентація для МАН
 

Ähnlich wie Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011

Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic WebRoberto García
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2Martin Hepp
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2guestecacad2
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensStoitsis Giannis
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarMustafa Jarrar
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 
Pal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_faPal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_faMustafa Jarrar
 
Pal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparqlPal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparqlMustafa Jarrar
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceMarin Dimitrov
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationBoris Villazón-Terrazas
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesRIANIreland
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Matthew Petrillo
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...Marta Villegas
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracleMustafa Jarrar
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 

Ähnlich wie Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011 (20)

20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Pal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrarPal gov.tutorial2.session5 1.rdf_jarrar
Pal gov.tutorial2.session5 1.rdf_jarrar
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Pal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_faPal gov.tutorial2.session15 2.rd_fa
Pal gov.tutorial2.session15 2.rd_fa
 
Pal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparqlPal gov.tutorial2.session10.sparql
Pal gov.tutorial2.session10.sparql
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and Organization
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Pal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oraclePal gov.tutorial2.session11.oracle
Pal gov.tutorial2.session11.oracle
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011

  • 1. Datalift: A Catalyser for the Web of Data François Scharffe LIRMM/CNRS/University of Montpellier francois.scharffe@lirmm.fr @lechatpito With the help of the Datalift team And the support of the French National Research Agency FOSDEM 5/02/2011 1
  • 2. The data revolution is on its way ! As Open Data meets the Semantic Web
  • 3. The promises of linked-data
  • 4. Richer Applications Linked Data Lite | the Web on Steroids 1.0 (iPhone)
  • 5. Richer applications BBC Programmes
  • 7. Making your data 5 stars http://www.w3.org/DesignIssues/LinkedData.html
  • 8. So, how to lift data ? How to publish data on the Web as linked- data ? ● Basic principles Tim Berners Lee [2006] (Design Issues) – Use URIs to identify things (not only documents) – Use HTTP URIs – When dereferecing URIS, return a description of the ressource – Include links to other ressources on the Web
  • 9. Welcome aboard the data lift Published and interlinked data on the Web Applications Interconnexion Publication infrastructure Data convertion Vocabulary selection Raw data
  • 10. Datalift Datasets publication R&D to automate the publication process Tool suite to help publish data Training, tutorials, data publication camps
  • 11. st 1 floor - Selection SemWebPro 18/01/2011 11
  • 12. Les vocabulaires de mes amis … Ø What is a (good) vocabulary for linked data ? § Usability criterias Simplicity, visibility, sustainability, integration, coherence … Ø Differents types of vocabularies § metadata, reference, domain, generalist … § The pillars of Linked Data : Dublin Core, FOAF, SKOS Ø Good and less good practices § Ex : Programmes BBC vs legislation.gov.uk § Vocabulary of a Friend : networked vocabularies Ø Linguistic problems § Existing vocabularies are in English at 99% § Terminological approach :which vocabularies for « Event » « Organization »
  • 13. Did you say « vocabulary » … And why not « ontology »? § Or « schema » ou « metadata schema »? § Ou « model » (data ? World ?) Ø All these terms are used and justifiable They are all « vocabularies » § The define types of objects (or classes) and the properties (oo attributes) atttached to these objects. § Types and attributes are logically defined and named using natural language § A (semantic) vocabulary is an explicit formalization of concepts existing in natural language SemWebPro 18/01/2011 13
  • 14. Vocabularies for linked data Ø Are meant to describe resources in RDF Ø Are based on one of the standard W3C language § RDF Schema (RDFS) • For vocabulaires without too much logical complexity § OWL • For more complex ontological constructs § These two languages are compatible (almost) Ø The can be composed « ad libitum » § One can reuse a few elements of a vocabulary § The original semantics have to be followed
  • 15. What makes a good vocabulary ? Ø A good vocabulary is a used vocabulary § Data published on CKAN give an idea of vocabulary usage § Exemple : v list of datasets using FOAF http://xmlns.com/foaf/0.1/ Ø Other usability criterias § Simplicity and readability in natural language § Elements documentation (definition in natural language) § Visibility and sustainability of the publication § Flexibility and extensibility § Sémantique integration (with other vocabularies) § Social integration (with the user community)
  • 16. A vocabulary is also a community Ø Bad (but common) practice ● Build a lonely vocabulary – For example as a research project – Without basing it on any existing vocabulary § To publish it (or not) and then to forget about it § Not to care about its users Ø A good vocabulary has an organic life § Users and use cases § Revisions and extensions § Like a « natural » vocabulary
  • 17. Types of vocabularies Ø Metadata vocabularies § Allowing to annotate other vocabularies • Dublin Core, Vann, cc REL, Status Ø Reference vocabularies § Provide « common » classes and properties • FOAF, Event, Time, Org Ontology Ø Domain vocabularies § Specific to a domain of knowledge • Geonames, Music Ontology, WildLife Ontology Ø « general » vocabularies § Describe « everything » at an arbitrary detail level • DBpedia Ontology, Cyc Ontology, SUMO
  • 18. Vocabulary of a Friend Ø http://www.mondeca.com/foaf/voaf Ø A simple vocabulary... Ø To represent interconnexions between vocabularies Ø A unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data Cloud Ø Ongoing work in Datalift
  • 19. nd 2 floor - Conversion SemWebPro 18/01/2011 19
  • 20. URL Design et URL Pattern Ø Good practices for linked-data § Ressource: http://dbpedia.org/resource/Paris § Document: http://dbpedia.org/page/Paris § Data: http://dbpedia.org/data/Paris Ø … served using content negociation
  • 21. URI Pattern in REST Ø Les services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressources Ø Une URI de base: § http://www.example.com/bookstore/ Ø Une ressource à un URL unique: (retrieve, update, create, delete) § http://www.example.com/bookstore/books/ISBN123 Ø Notion de collection: (list, replace, create, delete) § http://www.example.com/bookstore/books
  • 22. Convertion tools to RDF Ø How is the raw data to be converted ? § Relational Database ? § (Semi-)structured formats ? § Programmatic acces (API) ? Ø There are solutions for all cases
  • 24. Triplify: Relational data to JSON/RDF Ø Extract a folder in your Webapp: http://sourceforge.net/projects/triplify/ Ø Modify a config file: § SQL query … URI pattern § PHP lover!
  • 27. RDF extension for Google Refine Ø A graphical extension for Google Refine allowing to export the clean data as RDF http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ Annual pay rate - including Name Job Title Grade Organization Notes taxable benefits and allowances Chief Executive Asset Protection £150,000 - Stephan Wilcke Officer Agency £154,999 Asset Protection £165,000 - Jens Bech Chief Risk Officer No pension Agency £169,999 Chief Invesment Asset Protection £165,000 - Ion Dagtoglou No pension Officer Agency £169,999 Chief Credit Asset Protection £130,000 - Brian Scammell 4 days per week Officer Agency £134,999
  • 29. rd 3 floor - Publication SemWebPro 18/01/2011 29
  • 30. Publication components Querying Browsing SPARQL REST endpoint Alimentation Inference Engine RDF storage Alimentation Alimentation A few products Virtuoso, Sesame, Mulgara, 4store OWLIM, AllegroGraph, Big Data,Jena
  • 31. Named graphs Ø Rdf graphs are bags of triples, everything is mixed 1 Ø Delete on a graph 2 Ø SPARQL queries define 3 5 graphs 9 6 11 10 8 12 4 7 13 16 14 15
  • 32. Inference 1 3 2 5 Ø Generating triples from other triples 9 6 10 11 8 Ø Deduction mechanism 12 4 7 13 § Men are mortals, Socrates is a man, so Socrates is 16 mortal 14 15 Ø Allows to avoid exhaustivity, give sense to defining hierarchies Ø Constraints: cardinality, NFPs, ...
  • 33. Analyse des RDF Store : la méthode QSOS Ø Qualification and Selection of Open Source Software § Projet Open Source sur des solutions open source § http://www.qsos.org Ø Objectifs de QSOS § Qualifier des logiciels § Comparer des solutions après avoir défini des exigences et en pondérant les critères § Sélectionner le produit le plus adapté par rapport à un besoin Ø QSOS fournit § Une méthode objective et formalisée ‫‏‬ § Un référentiel d’études disponibles § Des outils facilitant le déroulement de la méthode
  • 34. th 4 floor - Interconnexion SemWebPro 18/01/2011 34
  • 35. Linked data and interconnexions Ø Without links there is no Web but data silos Ø Links can be part of the datasets design (reference datasets) Ø Links can be found after the publication: equivalence links between resources
  • 37. Tools Ø RKB-CRS A coreference resolution service for the RKB knowledge base Ø LD-mapper A linkage tool for datasets described using the Music Ontology Ø ODD Linker A linkage tool based on SQL Ø RDF-AI Multi purpose data linkage and fusion Ø Silk et Silk LSL Linkage tool and linkage specification language Ø Knofuss architecture Datasets linkage and fusion
  • 38. Exemple Silk specification <Silk> <Interlink id="cities"> <Prefix id="rdfs" namespace= <LinkType>owl:sameAs</LinkType> "http://www.w3.org/2000/01/rdf-schema#" /> <SourceDataset dataSource="dbpedia" var="a"> <Prefix id="dbpedia" namespace= <RestrictTo> "http://dbpedia.org/ontology/" /> ?a rdf:type dbpedia:City <Prefix id="gn" namespace= </RestrictTo> "http://www.geonames.org/ontology#" /> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <DataSource id="dbpedia"> <RestrictTo> <EndpointURI>http://demo_sparql_server1/sparql ?b rdf:type gn:P </EndpointURI> </RestrictTo> <Graph>http://dbpedia.org</Graph> </TargetDataset> </DataSource> <LinkCondition> <AVG> <DataSource id="geonames"> <Compare metric="jaroSimilarity"> <EndpointURI>http://demo_sparql_server2/sparql <Param name="str1" path="?a/rdfs:label" /> </EndpointURI> <Param name="str2" path="?b/gn:name" /> <Graph>http://sws.geonames.org/</Graph> </Compare> </DataSource> <Compare metric="numSimilarity"> <Param name="num1" <Thresholds accept="0.9" verify="0.7" /> path="?a/dbpedia:populationTotal" /> <Output acceptedLinks="accepted_links.n3" <Param name="num2" path="?b/gn:population" /> verifyLinks="verify_links.n3" </Compare> mode="truncate" /> </AVG> </LinkCondition> </Interlink> </Silk>
  • 39. Where to find links ?
  • 40. Towards automated interconnexion services Ø The linkage specification could be simplified § Using alignments between vocabularies § Detection of discriminating properties § Indicating comparison methods by attaching metadata to ontologies Ø Work in progress in Datalift
  • 41. 5th floor - Applications SemWebPro 18/01/2011 41
  • 42. Data visualization Tabulator (CSAIL, MIT)
  • 45.
  • 47. A few examples from US http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
  • 48. Mashups … Mashups … Mashups …
  • 49. That's it ! ● Datalift.org ● We're looking for a Datageek !