SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Linked Data for the Life Sciences

           Release 2
           Michel Dumontier
Associate Professor, Carleton University

    on behalf of the Bio2RDF team

                                           Dumontier::EBI:Jan 23, 2013
is an open source framework

  that makes biological data available

       on the emerging semantic web

           using a set of simple conventions



                                               Dumontier::EBI:Jan 23, 2013
reduces the time and effort

  involved in data integration

      so that you can get to

         the business
           of doing science


                                 Dumontier::EBI:Jan 23, 2013
the Semantic Web
is the new global web of knowledge
    It provides standards for publishing, sharing and querying
                        facts, expert knowledge and services

                              It is a scalable approach to the
                      discovery of independently formulated
                            and highly distributed knowledge




                                                Dumontier::EBI:Jan 23, 2013
a rapidly growing network of linked data




“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
                                                                                         Dumontier::EBI:Jan 23, 2013
Link all the data!!!




                   Dumontier::EBI:Jan 23, 2013
Four Simple Rules for Linked Data
1) Use Internationalized Resource Identifiers
  (IRIs;unicode) or Uniform Resource Identifiers
  (URIs; ascii) for names for things
2) Use HTTP URIs so that people can look up those
  names.
3) When someone looks up a URI, provide useful
  information about them using the standards (RDF)
4) Include links to other URIs so that they can discover
  more things.
http://www.w3.org/DesignIssues/LinkedData.html
                                                 Dumontier::EBI:Jan 23, 2013
Bio2RDF is a framework to
create and provide linked
data for the life sciences




                             Dumontier::EBI:Jan 23, 2013
Main features of Bio2RDF Release 2
• Bio2RDF conversion scripts, mapping files and web
  application are open source and freely available at
  http://github.com/bio2rdf

• Bio2RDF enables (syntactic) data integration within and
  across datasets by using one language (RDF), having a
  common URI pattern and a common resource registry

• 19 Release 2 datasets have provenance and endpoints
  feature pre-computed graph summaries for fast lookup

• Bio2RDF web application enables entity resolution, query
  federation across an expandable distributed network of
  SPARQL endpoints
                                                   Dumontier::EBI:Jan 23, 2013
Bio2RDF RDFization guidelines are
      available at our wiki




  https://github.com/bio2rdf/bio2rdf-scripts/wiki/RDFization-Guide-v1.1
                                                             Dumontier::EBI:Jan 23, 2013
Bio2RDF converters
are open-source and available at GitHub




    http://github.com/bio2rdf/bio2rdf-scripts
                                           Dumontier::EBI:Jan 23, 2013
Bio2RDF data are identified using
       simple http URI patterns

When available, use the provider’s identifier in
the naming the resource

     http://bio2rdf.org/namespace:identifier

e.g.: DrugBank’s resource IRI for Leucovorin

      http://bio2rdf.org/drugbank:DB00650

                                          Dumontier::EBI:Jan 23, 2013
Linked Data: You can look it up




                           Dumontier::EBI:Jan 23, 2013
Valid Bio2RDF namespaces are listed
          in a dataset registry
• An initial registry of ~600 datasets is accessible through an API provided by
  my PHP-LIB library (available on github). It includes
    – Dataset title, Preferred namespace prefix, Alternative namespace prefixes
• In the summer, we consolidated and curated nearly 2100 entries in a
  Google spreadsheet, which includes a mostly complete coverage of
  datasets/collections listed in
  Bio2RDF, MIRIAM, BioPortal, UniProt, NCBI, NAR database issue. New
  fields were added including:
    – Dataset description, organization, website, HTML template
    – Identifier syntax, license and rights
• Working with identifiers.org team (Nick Juty, Camille Laibe, Nicolas Le
  Novere) to have a single dataset registry that we can use for both Bio2RDF
  and identifiers.org
    – enable automatic cross-links between Bio2RDF and identifiers.org


                                                                      Dumontier::EBI:Jan 23, 2013
vocabulary and resource namespaces are used
       to describe auxiliary resources
• types and predicates that are generated to
  support the semantic annotation are in the
  vocabulary namespace
  http://bio2rdf.org/drugbank_vocabulary:Drug (type)
  http://bio2rdf.org/drugbank_vocabulary:target (predicate)


• n-ary relations are named in the resource
  namespace
  http://bio2rdf.org/drugbank_resource:DB00440_DB00650


                                                        Dumontier::EBI:Jan 23, 2013
The Resource Description Framework (RDF) is a
            formal knowledge representation language
                capable of expressing a statement


  A RDF statement consists of:
   – Subject: resource identified by a URI
   – Predicate: resource identified by a URI
   – Object: resource or literal


   http://bio2rdf.org/drugbank:DB00650            drugbank:DB00650


                       rdf:type                                 rdf:type


http://bio2rdf.org/drugbank_vocabulary:Drug    drugbank_vocabulary:Drug

                                                        Dumontier::EBI:Jan 23, 2013
Every statement expands the network
            of linked data
        drugbank_vocabulary:Drug


            rdf:type
                                     rdfs:label
            drugbank:DB00650                             Leucovorin [drugbank:DB00650]


 drugbank_vocabulary:ddi-interactor-in

                                            rdfs:label      DDI between Trimethoprim and
 drugbank_resource:DB00440_DB00650                         Leucovorin [drugbank_resource:
                                                                 DB00440_DB00650]
            rdf:type

drugbank_vocabulary:Drug-Drug-Interaction


                                                                        Dumontier::EBI:Jan 23, 2013
Syntactic integration across datasets
                                                               DrugBank
   drugbank_vocabulary:Drug


      rdf:type
                              rdfs:label
      drugbank:DB00650                     Leucovorin [drugbank:DB00650]




                 pharmgkb_vocabulary:xref

                              rdfs:label
     pharmgkb:PA450198                     leucovorin [pharmgkb:PA450198]




   pharmgkb_vocabulary:Drug                                    PharmGKB
                                                             Dumontier::EBI:Jan 23, 2013
You can get what links to it
What links to DrugBank’s Leucovorin?

http://bio2rdf.org/linksns/drugbank/drugbank:DB00650




                                                       Dumontier::EBI:Jan 23, 2013
The Bio2RDF Network of Linked Data




                              Dumontier::EBI:Jan 23, 2013
Every Bio2RDF dataset now contains
       provenance metadata
                           Features
                           - Dates
                           - Licensing
                           - Source
                           - Creators
                           - Publishers
                           - Availability

                           Vocabularies
                           VoID
                           Dublin Core
                           W3C Provenance
                           Bio2RDF vocabulary




                               Dumontier::EBI:Jan 23, 2013
For every Bio2RDF dataset we pre-
      compute 9 descriptive metrics
• total number of triples
• number of unique subjects
• number of unique predicates
• number of unique objects
• number of unique types
• unique predicate-object links and their frequencies
• unique predicate-literal links and their frequencies
• unique subject type-predicate-object type links and
  their frequencies
• unique subject type-predicate-literal links and their
  frequencies

                                                 Dumontier::EBI:Jan 23, 2013
Accessing Bio2RDF dataset metrics
    • Each Bio2RDF endpoint contains a named
      graph that holds the pre-computed metrics
         – http://bio2rdf.org/bio2rdf-[namespace]-statistics
    • Metrics can be queried using SPARQL, e.g.:
     SELECT *
     FROM <http://bio2rdf.org/bio2rdf-drugbank-statistics>
     WHERE {
              ?dataset a <http://bio2rdf.org/dataset_vocabulary:Endpoint> .
              ?dataset <http://bio2rdf.org/dataset_vocabulary:has_triple_count> ?tc .
              ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_subject_count> ?sc .
              ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_predicate_count> ?pc .
              ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_object_count> ?oc .
              …
     }

https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-dataset-metrics         Dumontier::EBI:Jan 23, 2013
Graph summaries can also assist in
               query formulation
         Subject Type            Subject Count         Predicate           Object Type       Object Count
Pharmaceutical                      11512        form                Unit                         56
Drug-Transporter-Interaction         1440        drug                Drug                        534
Drug-Transporter-Interaction         1440        transporter         Target                       88
Drug                                 1266        dosage              Dosage                      230
Patent                               1255        country             Country                      2
Drug                                 1127        product             Pharmaceutical             11512
Drug                                 1074        ddi-interactor-in   Drug-Drug-Interaction      10891
Drug                                  532        patent              Patent                      1255
Drug                                  277        mixture             Mixture                     3317
Dosage                                230        route               Route                        42
Drug-Target-Interaction               84         target              Target                       43

                    PREFIX drugbank_vocabulary: <http://bio2rdf.org/drugbank_vocabulary:>
                    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
                    SELECT ?ddi ?d1name ?d2name
                    WHERE {
                                  ?ddi a drugbank_vocabulary:Drug-Drug-Interaction .
                                  ?d1 drugbank_vocabulary:ddi-interactor-in ?ddi .
                                  ?d1 rdfs:label ?d1name .
                                  ?d2 drugbank_vocabulary:ddi-interactor-in ?ddi .
                                  ?d2 rdfs:label ?d2name.
                        FILTER (?d1 != ?d2)
                    }
                                                                                                Dumontier::EBI:Jan 23, 2013
You can use the SPARQLed query
assistant with updated endpoints
              http://sindicetech.com/sindice-suite/sparqled/




             graph: http://sindicetech.com/analytics
                                             Dumontier::EBI:Jan 23, 2013
Bio2RDF covers the major biological
           databases




                             Dumontier::EBI:Jan 23, 2013
Bio2RDF Release 2 – New and Updated Datasets
    Dataset                               Namespace     # of triples
    Affymetrix                             affymetrix   44469611
    Biomodels*                             biomodels     589753
    Comparative Toxicogenomics Database         ctd     141845167
    DrugBank                               drugbank      1121468
    NCBI Gene                               ncbigene    394026267
    Gene Ontology Annotations                   goa     80028873
    HUGO Gene Nomenclature Committee           hgnc      836060
    Homologene                            homologene     1281881
    InterPro*                                interpro    999031
    iProClass                               iproclass   211365460
    iRefIndex                               irefindex   31042135
    Medical Subject Headings                   mesh      4172230
    NCBO BioPortal*                         bioportal   15384622
    National Drug Code Directory*               ndc     17814216
    Online Mendelian Inheritance in Man        omim      1848729
    Pharmacogenomics Knowledge Base        pharmgkb     37949275
    SABIO-RK*                                 sabiork    2618288
    Saccharomyces Genome Database               sgd      5551009
    NCBI Taxonomy                              taxon    17814216

    Total                                     19        1010758291


                                                               Dumontier::EBI:Jan 23, 2013
Status of Bio2RDF Release 1 datasets
Dataset      Status
Atlas        Maintained – will not be updated
BIND         Deprecated – in iRefIndex
BioCarta     Deprecated – in Pathway Commons
BioCyc       Deprecated – in Pathway Commons
EC           Deprecated – in Gene Ontology/UniProt
GenBank      Maintained – will be updated
HHPID        Maintained – will not be updated
INOH         Deprecated – in Pathway Commons
KEGG         Maintained – will not be updated
MGI          Maintained – will be updated
Pubmed       Maintained – will be updated
PID          Deprecated – in Pathway Commons
Reactome     Deprecated – in Pathway Commons
RefSeq       Maintained – will be updated
                                                     Dumontier::EBI:Jan 23, 2013
A PHP-based library acts a point of
            integration
• Provides a set of APIs
  – to produce RDF and OWL statements
  – to generate valid Bio2RDF URIs by checking
    against a dataset registry
  – to generate dataset provenance




                                            Dumontier::EBI:Jan 23, 2013
Oh Bio2RDF, how can I access you?

Let me count the ways:
1. Downloads (data, stats, virtuoso db)
2. Web interface (lookup + services)
3. SPARQL endpoint
4. SPARQLed editor
5. Virtuoso Faceted Browser



                                          Dumontier::EBI:Jan 23, 2013
Use virtuoso’s built in faceted
browser to construct increasingly
complex queries with little effort




                              Dumontier::EBI:Jan 23, 2013
Heterogeneous biological data on the
    semantic web is difficult to query
Question: Find all proteins that interact with beta
  amyloid (uniprot:P05067)
                                       UniProt Protein              PDB Protein
                                                             ?
SELECT * WHERE {                       iRefIndex Protein

  ?protein a bio2rdf:Protein .
  ?protein bio2rdf:interacts_with uniprot:P05067 .
}
                         Physical interaction?       Genetic interaction?

                                      Pathway interaction?


                                                                 Dumontier::EBI:Jan 23, 2013
RDF-based Linked Data is a great first
     step, but it’s not enough.




     From linked data to linked knowledge through syntactic and semantic normalization.   Dumontier::EBI:Jan 23, 2013
ontology as a
 strategy to formally
    represent and
integrate knowledge




             Dumontier::EBI:Jan 23, 2013
SIO provides an OWL ontology for the representation
          of diverse biomedical knowledge




                                         Dumontier::EBI:Jan 23, 2013
Dumontier::EBI:Jan 23, 2013
Semantic data integration, consistency checking and
           query answering over Bio2RDF with the
         Semanticscience Integrated Ontology (SIO)


                      uniprot:P05067
                     uniprot:P05067                           refseq:NP_009225.1

                         is a                                      is a


                        uniprot:Protein
                       uniprot:Protein                             refseq:Protein
                                                                  refseq:Protein
                                                                                        dataset


                                  is a               is a
                                                                          is a

                                             sio:protein

                                                                                  ontology
                                                                                 Knowledge Base
Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and
Michel Dumontier. Bio-ontologies 2012.
                                                                                           Dumontier::EBI:Jan 23, 2013
Bio2RDF types include processes, material
       entities and informational entities
CTD: Chemical, Disease, Chemical-Disease Interaction, Chemical-
   Gene Interaction
NCBIGene: Gene, Protein, Model Organism, Publication
HGNC: Accession Number, Gene, Gene Symbol
iRefIndex: Protein Complex, Protein Interaction
MGI: Gene Marker, Gene Symbol
PharmGKB: Gene-Disease Associations, Disease, Drug, Gene
SGD: Enzyme, Pathway, Protein, RNA, Reaction,
Location, Experiment



                                                    Dumontier::EBI:Jan 23, 2013
Bio2RDF and SIO powered SPARQL 1.1 federated query:
Find chemicals in CTD and proteins in SGD that participate
                 in the same GO process
 SELECT ?chem, ?prot, ?proc
 FROM <http://bio2rdf.org/ctd>
 WHERE {
    ?chemical a sio:chemical-entity.
    ?chemical rdfs:label ?chem.
    ?chemical sio:is-participant-in ?process.
    ?process rdfs:label ?proc.
   FILTER regex (?process, "http://bio2rdf.org/go:")
    SERVICE <http://sgd.bio2rdf.org/sparql> {
          ?protein a sio:protein .
          ?protein sio:is-participant-in ?process.
          ?protein rdfs:label ?prot .
    }
 }
                                                       Dumontier::EBI:Jan 23, 2013
Bio2RDF Release 2 – A summary
• Updated data conversion source code to use
  PHP API (all available through GitHub)
• Simple Bio2RDF IRI design patterns that
  facilitate syntactic consistency and
  interoperability backed by simple registry
• Dataset provenance and metrics
• We welcome contributions from the
  community
     • Join our mailing list at bio2rdf@googlegroups.com


                                                  Dumontier::EBI:Jan 23, 2013
Future Directions
• Aiming for twice yearly release schedule
• Update large scale datasets
   – RefSeq, Genbank, PubMed, PDB
• Incorporate EBI & W3C Linking Open Drug Data (LODD)
  effort
   – SIDER (beta), TCM (RDF available), CHEMBL (in dev),
     OMIM (released), DailyMed (RDF available), LinkedCT
     (beta)
• Extended dataset coverage by tapping into existing
  endpoints (uniprot, bioportal, ebi-rdf?)
• OpenBioCloud w/DERI + SindiceTech
• Showcase with other third party tools

                                                   Dumontier::EBI:Jan 23, 2013
Acknowledgements
Bio2RDF                                       SADI: Christopher Baker, Melanie Courtot, Jose
Peter Ansell, Francois Belleau, Allison       Cruz-Toledo, Steve Etlinger, Nichealla
Callahan, Jacques Corbeil, Jose Cruz-         Keath, Artjom Klein, Luke McCarthy, Silvane
Toledo, Alex De Leon, Steve Etlinger, James   Paixao, Ben Vandervalk, Natalia Villanueva-
Hogan, Nichealla Keath, Jean                  Rosales, Mark Wilkinson
Morissette, Marc-Alexandre Nolin, Nicole
Tourigny, Philippe Rigault and Paul Roe
                                              W3C HCLS: J Luciano, B Andersson, C Batchelor, O
                                              Bodenreider, T Clark, C Denney, C Domarew, T
OpenBioCloud                                  Gambet, L Harland, A Jentzsch, V Kashyap, P
Dana Klassen and Giovanni Tumarello           Kos, J Kozlovsky, T Lebo, SM Marshall, JP
                                              McCusker, DL McGuinness, C Ogbuji, E Pichler, R
                                              Powers, E Prud hommeaux, M Samwald, L
                                              Schriml, PJ Tonellato, PL Whetzel, J Zhao, S
                                              Stephens, C Denney, J Luciano, J McGurk, Lynn
                                              Schriml, and Peter J. Tonellato.




                                                                            Dumontier::EBI:Jan 23, 2013
dumontierlab.com
michel_dumontier@carleton.ca
                         Website: http://dumontierlab.com
    Presentations: http://slideshare.com/micheldumontier




                                         Dumontier::EBI:Jan 23, 2013

Weitere ähnliche Inhalte

Andere mochten auch

Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...jodischneider
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
UserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom
 
Webinar: Combining Lab and Online Usability Testing: Lessons Learned
Webinar: Combining Lab and Online Usability Testing: Lessons LearnedWebinar: Combining Lab and Online Usability Testing: Lessons Learned
Webinar: Combining Lab and Online Usability Testing: Lessons LearnedUserZoom
 
GeekMeet Iasi Intro
GeekMeet Iasi IntroGeekMeet Iasi Intro
GeekMeet Iasi IntroGeekMeet
 
In Sync Consulting - Profile
In Sync Consulting - ProfileIn Sync Consulting - Profile
In Sync Consulting - Profilerenurag
 
IVI Program, 'Scaling Up Entrepreneurship,' progam description
IVI Program, 'Scaling Up Entrepreneurship,' progam descriptionIVI Program, 'Scaling Up Entrepreneurship,' progam description
IVI Program, 'Scaling Up Entrepreneurship,' progam descriptionThomas Nastas
 
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...Fernando Hernandez-Mateo
 
Card Sorting & Tree Testing Webinar
Card Sorting & Tree Testing WebinarCard Sorting & Tree Testing Webinar
Card Sorting & Tree Testing WebinarUserZoom
 
Spiceworksintro 1219952728712087 9
Spiceworksintro 1219952728712087 9Spiceworksintro 1219952728712087 9
Spiceworksintro 1219952728712087 9fredjr
 

Andere mochten auch (20)

Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
Wimmics seminar--drug interaction knowledge base, micropublication, open anno...
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
CSS Information
CSS InformationCSS Information
CSS Information
 
D4 I Intro Web
D4 I Intro WebD4 I Intro Web
D4 I Intro Web
 
Health-e-cITi NJ
Health-e-cITi NJHealth-e-cITi NJ
Health-e-cITi NJ
 
UserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare WebinarUserZoom & Key Lime Interactive Healthcare Webinar
UserZoom & Key Lime Interactive Healthcare Webinar
 
Webinar: Combining Lab and Online Usability Testing: Lessons Learned
Webinar: Combining Lab and Online Usability Testing: Lessons LearnedWebinar: Combining Lab and Online Usability Testing: Lessons Learned
Webinar: Combining Lab and Online Usability Testing: Lessons Learned
 
GeekMeet Iasi Intro
GeekMeet Iasi IntroGeekMeet Iasi Intro
GeekMeet Iasi Intro
 
Para ver y bajar los ppt
Para ver y bajar los pptPara ver y bajar los ppt
Para ver y bajar los ppt
 
Music
MusicMusic
Music
 
Make Love Not War
Make Love Not WarMake Love Not War
Make Love Not War
 
The Beatles Parte II
The Beatles Parte IIThe Beatles Parte II
The Beatles Parte II
 
In Sync Consulting - Profile
In Sync Consulting - ProfileIn Sync Consulting - Profile
In Sync Consulting - Profile
 
IVI Program, 'Scaling Up Entrepreneurship,' progam description
IVI Program, 'Scaling Up Entrepreneurship,' progam descriptionIVI Program, 'Scaling Up Entrepreneurship,' progam description
IVI Program, 'Scaling Up Entrepreneurship,' progam description
 
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...
Vinyl sulfones: Click applications in bioconjugation. The resurgence of a che...
 
Card Sorting & Tree Testing Webinar
Card Sorting & Tree Testing WebinarCard Sorting & Tree Testing Webinar
Card Sorting & Tree Testing Webinar
 
Glucose english
Glucose englishGlucose english
Glucose english
 
MSP-AzureDev101
MSP-AzureDev101MSP-AzureDev101
MSP-AzureDev101
 
Spiceworksintro 1219952728712087 9
Spiceworksintro 1219952728712087 9Spiceworksintro 1219952728712087 9
Spiceworksintro 1219952728712087 9
 

Ähnlich wie Bio2RDF Release 2: Improved coverage, interoperability and provenance of Linked Data for the Life Sciences

W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2nolmar01
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataTomasz Adamusiak
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...Dag Endresen
 
Providing named entity based search with a common biological database naming ...
Providing named entity based search with a common biological database naming ...Providing named entity based search with a common biological database naming ...
Providing named entity based search with a common biological database naming ...nolmar01
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Dag Endresen
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsKemele M. Endris
 
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET
 
Observing Linked Data Dynamics
Observing Linked Data DynamicsObserving Linked Data Dynamics
Observing Linked Data Dynamicskaefer3000
 
Knowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic WebKnowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic WebMichel Dumontier
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesStefan Dietze
 
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen
 
Best practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataBest practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataalison.callahan
 
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)Rensselaer Polytechnic Institute
 

Ähnlich wie Bio2RDF Release 2: Improved coverage, interoperability and provenance of Linked Data for the Life Sciences (20)

W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
2013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r22013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r2
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked Data
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
 
Providing named entity based search with a common biological database naming ...
Providing named entity based search with a common biological database naming ...Providing named entity based search with a common biological database naming ...
Providing named entity based search with a common biological database naming ...
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
dkNET Webinar: Discovering and Evaluating Antibodies, Cell Lines, Software To...
 
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF DatasetsBOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
BOUNCER: A Privacy-aware Query Processing Over Federations of RDF Datasets
 
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
dkNET Webinar: Discover the Latest from dkNET - Biomed Resource Watch 06/02/2023
 
Observing Linked Data Dynamics
Observing Linked Data DynamicsObserving Linked Data Dynamics
Observing Linked Data Dynamics
 
Knowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic WebKnowledge Discovery using an Integrated Semantic Web
Knowledge Discovery using an Integrated Semantic Web
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
Best practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked dataBest practices for generating Bio2RDF linked data
Best practices for generating Bio2RDF linked data
 
What is a DMP
What is a DMPWhat is a DMP
What is a DMP
 
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
 

Mehr von Michel Dumontier

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerMichel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureMichel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsMichel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessMichel Dumontier
 

Mehr von Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
Ontologies
OntologiesOntologies
Ontologies
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Bio2RDF Release 2: Improved coverage, interoperability and provenance of Linked Data for the Life Sciences

  • 1. Linked Data for the Life Sciences Release 2 Michel Dumontier Associate Professor, Carleton University on behalf of the Bio2RDF team Dumontier::EBI:Jan 23, 2013
  • 2. is an open source framework that makes biological data available on the emerging semantic web using a set of simple conventions Dumontier::EBI:Jan 23, 2013
  • 3. reduces the time and effort involved in data integration so that you can get to the business of doing science Dumontier::EBI:Jan 23, 2013
  • 4. the Semantic Web is the new global web of knowledge It provides standards for publishing, sharing and querying facts, expert knowledge and services It is a scalable approach to the discovery of independently formulated and highly distributed knowledge Dumontier::EBI:Jan 23, 2013
  • 5. a rapidly growing network of linked data “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” Dumontier::EBI:Jan 23, 2013
  • 6. Link all the data!!! Dumontier::EBI:Jan 23, 2013
  • 7. Four Simple Rules for Linked Data 1) Use Internationalized Resource Identifiers (IRIs;unicode) or Uniform Resource Identifiers (URIs; ascii) for names for things 2) Use HTTP URIs so that people can look up those names. 3) When someone looks up a URI, provide useful information about them using the standards (RDF) 4) Include links to other URIs so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html Dumontier::EBI:Jan 23, 2013
  • 8. Bio2RDF is a framework to create and provide linked data for the life sciences Dumontier::EBI:Jan 23, 2013
  • 9. Main features of Bio2RDF Release 2 • Bio2RDF conversion scripts, mapping files and web application are open source and freely available at http://github.com/bio2rdf • Bio2RDF enables (syntactic) data integration within and across datasets by using one language (RDF), having a common URI pattern and a common resource registry • 19 Release 2 datasets have provenance and endpoints feature pre-computed graph summaries for fast lookup • Bio2RDF web application enables entity resolution, query federation across an expandable distributed network of SPARQL endpoints Dumontier::EBI:Jan 23, 2013
  • 10. Bio2RDF RDFization guidelines are available at our wiki https://github.com/bio2rdf/bio2rdf-scripts/wiki/RDFization-Guide-v1.1 Dumontier::EBI:Jan 23, 2013
  • 11. Bio2RDF converters are open-source and available at GitHub http://github.com/bio2rdf/bio2rdf-scripts Dumontier::EBI:Jan 23, 2013
  • 12. Bio2RDF data are identified using simple http URI patterns When available, use the provider’s identifier in the naming the resource http://bio2rdf.org/namespace:identifier e.g.: DrugBank’s resource IRI for Leucovorin http://bio2rdf.org/drugbank:DB00650 Dumontier::EBI:Jan 23, 2013
  • 13. Linked Data: You can look it up Dumontier::EBI:Jan 23, 2013
  • 14. Valid Bio2RDF namespaces are listed in a dataset registry • An initial registry of ~600 datasets is accessible through an API provided by my PHP-LIB library (available on github). It includes – Dataset title, Preferred namespace prefix, Alternative namespace prefixes • In the summer, we consolidated and curated nearly 2100 entries in a Google spreadsheet, which includes a mostly complete coverage of datasets/collections listed in Bio2RDF, MIRIAM, BioPortal, UniProt, NCBI, NAR database issue. New fields were added including: – Dataset description, organization, website, HTML template – Identifier syntax, license and rights • Working with identifiers.org team (Nick Juty, Camille Laibe, Nicolas Le Novere) to have a single dataset registry that we can use for both Bio2RDF and identifiers.org – enable automatic cross-links between Bio2RDF and identifiers.org Dumontier::EBI:Jan 23, 2013
  • 15. vocabulary and resource namespaces are used to describe auxiliary resources • types and predicates that are generated to support the semantic annotation are in the vocabulary namespace http://bio2rdf.org/drugbank_vocabulary:Drug (type) http://bio2rdf.org/drugbank_vocabulary:target (predicate) • n-ary relations are named in the resource namespace http://bio2rdf.org/drugbank_resource:DB00440_DB00650 Dumontier::EBI:Jan 23, 2013
  • 16. The Resource Description Framework (RDF) is a formal knowledge representation language capable of expressing a statement A RDF statement consists of: – Subject: resource identified by a URI – Predicate: resource identified by a URI – Object: resource or literal http://bio2rdf.org/drugbank:DB00650 drugbank:DB00650 rdf:type rdf:type http://bio2rdf.org/drugbank_vocabulary:Drug drugbank_vocabulary:Drug Dumontier::EBI:Jan 23, 2013
  • 17. Every statement expands the network of linked data drugbank_vocabulary:Drug rdf:type rdfs:label drugbank:DB00650 Leucovorin [drugbank:DB00650] drugbank_vocabulary:ddi-interactor-in rdfs:label DDI between Trimethoprim and drugbank_resource:DB00440_DB00650 Leucovorin [drugbank_resource: DB00440_DB00650] rdf:type drugbank_vocabulary:Drug-Drug-Interaction Dumontier::EBI:Jan 23, 2013
  • 18. Syntactic integration across datasets DrugBank drugbank_vocabulary:Drug rdf:type rdfs:label drugbank:DB00650 Leucovorin [drugbank:DB00650] pharmgkb_vocabulary:xref rdfs:label pharmgkb:PA450198 leucovorin [pharmgkb:PA450198] pharmgkb_vocabulary:Drug PharmGKB Dumontier::EBI:Jan 23, 2013
  • 19. You can get what links to it What links to DrugBank’s Leucovorin? http://bio2rdf.org/linksns/drugbank/drugbank:DB00650 Dumontier::EBI:Jan 23, 2013
  • 20. The Bio2RDF Network of Linked Data Dumontier::EBI:Jan 23, 2013
  • 21. Every Bio2RDF dataset now contains provenance metadata Features - Dates - Licensing - Source - Creators - Publishers - Availability Vocabularies VoID Dublin Core W3C Provenance Bio2RDF vocabulary Dumontier::EBI:Jan 23, 2013
  • 22. For every Bio2RDF dataset we pre- compute 9 descriptive metrics • total number of triples • number of unique subjects • number of unique predicates • number of unique objects • number of unique types • unique predicate-object links and their frequencies • unique predicate-literal links and their frequencies • unique subject type-predicate-object type links and their frequencies • unique subject type-predicate-literal links and their frequencies Dumontier::EBI:Jan 23, 2013
  • 23. Accessing Bio2RDF dataset metrics • Each Bio2RDF endpoint contains a named graph that holds the pre-computed metrics – http://bio2rdf.org/bio2rdf-[namespace]-statistics • Metrics can be queried using SPARQL, e.g.: SELECT * FROM <http://bio2rdf.org/bio2rdf-drugbank-statistics> WHERE { ?dataset a <http://bio2rdf.org/dataset_vocabulary:Endpoint> . ?dataset <http://bio2rdf.org/dataset_vocabulary:has_triple_count> ?tc . ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_subject_count> ?sc . ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_predicate_count> ?pc . ?dataset <http://bio2rdf.org/dataset_vocabulary:has_unique_object_count> ?oc . … } https://github.com/bio2rdf/bio2rdf-scripts/wiki/Bio2RDF-dataset-metrics Dumontier::EBI:Jan 23, 2013
  • 24. Graph summaries can also assist in query formulation Subject Type Subject Count Predicate Object Type Object Count Pharmaceutical 11512 form Unit 56 Drug-Transporter-Interaction 1440 drug Drug 534 Drug-Transporter-Interaction 1440 transporter Target 88 Drug 1266 dosage Dosage 230 Patent 1255 country Country 2 Drug 1127 product Pharmaceutical 11512 Drug 1074 ddi-interactor-in Drug-Drug-Interaction 10891 Drug 532 patent Patent 1255 Drug 277 mixture Mixture 3317 Dosage 230 route Route 42 Drug-Target-Interaction 84 target Target 43 PREFIX drugbank_vocabulary: <http://bio2rdf.org/drugbank_vocabulary:> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?ddi ?d1name ?d2name WHERE { ?ddi a drugbank_vocabulary:Drug-Drug-Interaction . ?d1 drugbank_vocabulary:ddi-interactor-in ?ddi . ?d1 rdfs:label ?d1name . ?d2 drugbank_vocabulary:ddi-interactor-in ?ddi . ?d2 rdfs:label ?d2name. FILTER (?d1 != ?d2) } Dumontier::EBI:Jan 23, 2013
  • 25. You can use the SPARQLed query assistant with updated endpoints http://sindicetech.com/sindice-suite/sparqled/ graph: http://sindicetech.com/analytics Dumontier::EBI:Jan 23, 2013
  • 26. Bio2RDF covers the major biological databases Dumontier::EBI:Jan 23, 2013
  • 27. Bio2RDF Release 2 – New and Updated Datasets Dataset Namespace # of triples Affymetrix affymetrix 44469611 Biomodels* biomodels 589753 Comparative Toxicogenomics Database ctd 141845167 DrugBank drugbank 1121468 NCBI Gene ncbigene 394026267 Gene Ontology Annotations goa 80028873 HUGO Gene Nomenclature Committee hgnc 836060 Homologene homologene 1281881 InterPro* interpro 999031 iProClass iproclass 211365460 iRefIndex irefindex 31042135 Medical Subject Headings mesh 4172230 NCBO BioPortal* bioportal 15384622 National Drug Code Directory* ndc 17814216 Online Mendelian Inheritance in Man omim 1848729 Pharmacogenomics Knowledge Base pharmgkb 37949275 SABIO-RK* sabiork 2618288 Saccharomyces Genome Database sgd 5551009 NCBI Taxonomy taxon 17814216 Total 19 1010758291 Dumontier::EBI:Jan 23, 2013
  • 28. Status of Bio2RDF Release 1 datasets Dataset Status Atlas Maintained – will not be updated BIND Deprecated – in iRefIndex BioCarta Deprecated – in Pathway Commons BioCyc Deprecated – in Pathway Commons EC Deprecated – in Gene Ontology/UniProt GenBank Maintained – will be updated HHPID Maintained – will not be updated INOH Deprecated – in Pathway Commons KEGG Maintained – will not be updated MGI Maintained – will be updated Pubmed Maintained – will be updated PID Deprecated – in Pathway Commons Reactome Deprecated – in Pathway Commons RefSeq Maintained – will be updated Dumontier::EBI:Jan 23, 2013
  • 29. A PHP-based library acts a point of integration • Provides a set of APIs – to produce RDF and OWL statements – to generate valid Bio2RDF URIs by checking against a dataset registry – to generate dataset provenance Dumontier::EBI:Jan 23, 2013
  • 30. Oh Bio2RDF, how can I access you? Let me count the ways: 1. Downloads (data, stats, virtuoso db) 2. Web interface (lookup + services) 3. SPARQL endpoint 4. SPARQLed editor 5. Virtuoso Faceted Browser Dumontier::EBI:Jan 23, 2013
  • 31. Use virtuoso’s built in faceted browser to construct increasingly complex queries with little effort Dumontier::EBI:Jan 23, 2013
  • 32. Heterogeneous biological data on the semantic web is difficult to query Question: Find all proteins that interact with beta amyloid (uniprot:P05067) UniProt Protein PDB Protein ? SELECT * WHERE { iRefIndex Protein ?protein a bio2rdf:Protein . ?protein bio2rdf:interacts_with uniprot:P05067 . } Physical interaction? Genetic interaction? Pathway interaction? Dumontier::EBI:Jan 23, 2013
  • 33. RDF-based Linked Data is a great first step, but it’s not enough. From linked data to linked knowledge through syntactic and semantic normalization. Dumontier::EBI:Jan 23, 2013
  • 34. ontology as a strategy to formally represent and integrate knowledge Dumontier::EBI:Jan 23, 2013
  • 35. SIO provides an OWL ontology for the representation of diverse biomedical knowledge Dumontier::EBI:Jan 23, 2013
  • 37. Semantic data integration, consistency checking and query answering over Bio2RDF with the Semanticscience Integrated Ontology (SIO) uniprot:P05067 uniprot:P05067 refseq:NP_009225.1 is a is a uniprot:Protein uniprot:Protein refseq:Protein refseq:Protein dataset is a is a is a sio:protein ontology Knowledge Base Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. Bio-ontologies 2012. Dumontier::EBI:Jan 23, 2013
  • 38. Bio2RDF types include processes, material entities and informational entities CTD: Chemical, Disease, Chemical-Disease Interaction, Chemical- Gene Interaction NCBIGene: Gene, Protein, Model Organism, Publication HGNC: Accession Number, Gene, Gene Symbol iRefIndex: Protein Complex, Protein Interaction MGI: Gene Marker, Gene Symbol PharmGKB: Gene-Disease Associations, Disease, Drug, Gene SGD: Enzyme, Pathway, Protein, RNA, Reaction, Location, Experiment Dumontier::EBI:Jan 23, 2013
  • 39. Bio2RDF and SIO powered SPARQL 1.1 federated query: Find chemicals in CTD and proteins in SGD that participate in the same GO process SELECT ?chem, ?prot, ?proc FROM <http://bio2rdf.org/ctd> WHERE { ?chemical a sio:chemical-entity. ?chemical rdfs:label ?chem. ?chemical sio:is-participant-in ?process. ?process rdfs:label ?proc. FILTER regex (?process, "http://bio2rdf.org/go:") SERVICE <http://sgd.bio2rdf.org/sparql> { ?protein a sio:protein . ?protein sio:is-participant-in ?process. ?protein rdfs:label ?prot . } } Dumontier::EBI:Jan 23, 2013
  • 40. Bio2RDF Release 2 – A summary • Updated data conversion source code to use PHP API (all available through GitHub) • Simple Bio2RDF IRI design patterns that facilitate syntactic consistency and interoperability backed by simple registry • Dataset provenance and metrics • We welcome contributions from the community • Join our mailing list at bio2rdf@googlegroups.com Dumontier::EBI:Jan 23, 2013
  • 41. Future Directions • Aiming for twice yearly release schedule • Update large scale datasets – RefSeq, Genbank, PubMed, PDB • Incorporate EBI & W3C Linking Open Drug Data (LODD) effort – SIDER (beta), TCM (RDF available), CHEMBL (in dev), OMIM (released), DailyMed (RDF available), LinkedCT (beta) • Extended dataset coverage by tapping into existing endpoints (uniprot, bioportal, ebi-rdf?) • OpenBioCloud w/DERI + SindiceTech • Showcase with other third party tools Dumontier::EBI:Jan 23, 2013
  • 42. Acknowledgements Bio2RDF SADI: Christopher Baker, Melanie Courtot, Jose Peter Ansell, Francois Belleau, Allison Cruz-Toledo, Steve Etlinger, Nichealla Callahan, Jacques Corbeil, Jose Cruz- Keath, Artjom Klein, Luke McCarthy, Silvane Toledo, Alex De Leon, Steve Etlinger, James Paixao, Ben Vandervalk, Natalia Villanueva- Hogan, Nichealla Keath, Jean Rosales, Mark Wilkinson Morissette, Marc-Alexandre Nolin, Nicole Tourigny, Philippe Rigault and Paul Roe W3C HCLS: J Luciano, B Andersson, C Batchelor, O Bodenreider, T Clark, C Denney, C Domarew, T OpenBioCloud Gambet, L Harland, A Jentzsch, V Kashyap, P Dana Klassen and Giovanni Tumarello Kos, J Kozlovsky, T Lebo, SM Marshall, JP McCusker, DL McGuinness, C Ogbuji, E Pichler, R Powers, E Prud hommeaux, M Samwald, L Schriml, PJ Tonellato, PL Whetzel, J Zhao, S Stephens, C Denney, J Luciano, J McGurk, Lynn Schriml, and Peter J. Tonellato. Dumontier::EBI:Jan 23, 2013
  • 43. dumontierlab.com michel_dumontier@carleton.ca Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier Dumontier::EBI:Jan 23, 2013

Hinweis der Redaktion

  1. Bio2RDF facilitates data sharing and reuse in a number of ways: Access the scripts that generate our linked data. These scripts are openly licensed for modification, redistribution or use by anyone wishing to generate RDF data on their own We make use of syntactically consistent IRI patterns We do not impose a structural scheme for representing our data Bio2RDF allows for multiple mirrors host our data, a DNS “round robin” procedure balances incoming requests between participating mirrors, this prevents a single point of failure and allows for additional mirrors to be added to the network A centralized resource registry has been developed for consistent usage of namespaces and vocabularies- In the next slides I will go into a bit more detail about the first 3 points
  2. In order to facilitate collaboration by the community, one of the most important developments to this second release of Bio2RDF was the creation of a publicly available code repository for all of the programs used for converting biological data to linked data.- Anyone can reuse or modify our openly (MIT) licensed code
  3. In order to keep a clear link back to the original data, in our RDFized datasets we maintain the original data provider’s record identifiers by making use of the following URI pattern:- namespace: preferred short name for a biological dataset
  4. In order to keep a clear link back to the original data, in our RDFized datasets we maintain the original data provider’s record identifiers by making use of the following URI pattern:- namespace: preferred short name for a biological dataset
  5. All resources in every bio2RDF graph is linked to a graph that describes the provenance of the generated linked data. We make use of the W3C Vocabulary of Interlinked Datasets (VoID), the Provenance vocabulary and Dublin core vocabulary to describe items such as: URL to the source data Date of generation Licensing information (if available) URL to the script used to generate the data Download URL for linked data files URL of SPARQL endpoint
  6. In order to eliminate the need for our users to run expensive queries on our endpoints we have made available several metrics that can aid in the description of the contents of our linked data sets.- These metrics have been included into each of our datasets and also serve as guides for developing queries over the data based on the data’s structure
  7. As I had mentioned earlier we have included dataset metrics for every one of our endpoints. - These metrics are stored in a named graph and can be queried for as shown:
  8. Here I am showing with more detail the top 10 subject type – predicate - object type frequencies that can be found in our Drugbank endpoint So for example to find drugs that participate in drug-drug interactions one would use the ddi-interactor-in predicate that associates the 1074 resources of type drug to the 10891 resources typed as drug-drug-interaction.
  9. Here I present some numbers for the 19 updated datasets
  10. HHPID is database of HIV-1 human protein interactions that was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature. The database serves the scientific community exploring the discovery of novel HIV vaccine candidates and therapeutic targets.MGI – Mouse Genome Informatics-==-====Boutique endpoints maintained in Release 2 (will not be updated): Bio2RDF Atlas and HHPID
  11. We are currently in the process of exposing Bio2RDF data using a variety of 3rd party tools- specifically, all of our endpoints can be navigated using Virtuoso’s faceted browser- We have also generated Sindice metrics that enable the use of SPARQLedautomed query builder. (This tool allows users to semiautomatically construct SPARQL queries based on namespaces used therein)-- finally it is possible to use Sig.ma browser to aggreate data from multiple endpoints and construct mashups
  12. However, types and predicates are dataset specific and therefore not consistentWouldn’t it be nice to be able to query all of these datasets with a common nomenclature? (our objective)