3. Bio2RDF Best Practices
1. Assign a URI for all things
2. Assign labels and identifiers
3. Declare and assign types
4. Provide dataset provenance
4. 1. Assign URIs for all things
● The base Bio2RDF URI pattern:
http://bio2rdf.org/namespace:identifier
● Data provider record identifiers are
maintained from source
● Linked Data = no blank nodes!
5. 1. Assign URIs for all things
● Data provider records are maintained from
source
○ e.g. DrugBank’s resource IRI for
Leucovorin
http://bio2rdf.org/drugbank:DB00650
6. 1. Assign URIs for all things
● Vocabulary namespaces are used for
dataset specific types and predicates
http://bio2rdf.org/drugbank_vocabulary:Drug
● Resource namespaces are used to assign
an identifier when one isn't a provided by the
source
- unique identifier with UUID, hash, counter, concatenated
strings, etc
http://bio2rdf.org/drugbank_resource:DB00440_DB00650
7. 1. Assign URIs for all things
● All valid namespaces are listed in the
Bio2RDF Life Sciences Registry
○ ensures that URIs are consistent across all Bio2RDF
datasets
○ registry is publicly available at http://tinyurl.
com/dataregistry
8. 2. Assign labels and identifiers
● Use rdfs:label to assign a language-specified
label for all resources
○ can be a source provided title, a script generated
phrase, or a phrase provided in a third party dataset
○ Pattern: rdfs:label "label [ns:id]"@lang
● Use Dublin Core predicates for source-
provided label and identifiers
○ Pattern: dc:title "label"@lang (assign language tag
only when one is provided)
○ Pattern: dc:identifier "ns:id"^^xsd:string
9. 2. Assign labels and identifiers
● Use Bio2RDF predicates to assign Bio2RDF
namespace and Bio2RDF identifiers:
○ Pattern: bio2rdf_vocabulary:namespace "ns"^^xsd:
string
○ Pattern: bio2rdf_vocabulary:identifier "id"^^xsd:
string
11. 3. Declare and assign types
● All resources should be typed as being
resources of the dataset
○ Pattern: rdf:type namespace_vocabulary:Resource
● Instances of a dataset vocabulary type
should also be typed as owl:
NamedIndividual
○ Pattern: rdf:type namespace_vocabulary:Type
○ Pattern: rdf:type owl:NamedIndividual
● Classes should be typed as owl:Class
○ Pattern: rdf:type owl:Class
○ If superclass has been described using
namespace_vocabulary pattern, then link class
using rdfs:subClassOf
12. 3. Declare and assign types
● Object properties and datatype properties
should also be typed
○ Pattern: rdf:type owl:ObjectProperty
○ Pattern: rdf:type owl:DatatypeProperty
● Examples:
drugbank:DB0159
rdf:type drugbank_vocabulary:Resource ;
rdf:type owl:Class ;
rdfs:subClassOf drugbank_vocabulary:Drug .
drugbank_vocabulary:ddi-interactor-in
rdf:type owl:ObjectProperty .
13. 4. Provide dataset provenance
data item
Bio2RDF dataset
Features
-Entity-dataset link
-Creator
-Publisher
-Date created
-License & rights
-Source
-Availability
- SPARQL endpoint
- Data dump
Vocabularies
VoID
Dublin Core
W3C Provenance
Bio2RDF vocabulary
Source dataset
prov:wasDerivedFrom
void:inDataset
14. 4. Provide dataset provenance
● link every resource to the versioned/dated
Bio2RDF dataset in which it is described
○ Pattern: void:inDataset <http://bio2rdf.org/dataset:
namespace-dd-mm-yyyy.rdf>
○ Example:
drugbank:DB0159 void:inDataset <http://bio2rdf.
org/dataset:drugbank-03-07-2013> .
16. PHP : Hypertext Preprocessor
● A general-purpose open source scripting
language
○ homepage : http://php.net
● PHP scripts can be executed from the
command line or embedded in HTML
documents
● Syntactically similar to C/C++/Java but it is
not strongly typed
17. A hello world PHP script
● All PHP scripts are surrounded by the <?php
and ?> tags
19. Using the Bio2RDF PHP API to create an
RDFizer
● Basic structure of a Bio2RDFizer script:
○ Initialize script parameters - input file(s), default
dataset namespace, etc.
○ Define a Run() function that handles downloading
and iterating over input files, as well as function calls
to parse and convert input data to RDF
○ Define function(s) to convert input data to RDF using
Bio2RDF API helper functions
20. Using the Bio2RDF PHP API to create an
RDFizer
● Bio2RDF PHP API defines helper functions
that implement Bio2RDF best practices:
○ getNamespace()
○ getVoc()
○ getRes()
○ triplify($subject, $predicate, $object) //object is an rdf resource
○ triplifyString($subject, $predicate, "string")// object is a literal
○ describeIndividual($uri, $label, $type, $title, $description, $language)
○ describeClass( ... )
○ describeProperty ( ... )
23. Using and contributing to the
Bio2RDF project on GitHub
1. Fork the bio2rdf-scripts and php-lib
repositories on Github
https://help.github.com/articles/fork-a-repo
2. Write some code!
3. Commit code to your fork
4. Make a pull request to the bio2rdf-scripts
repo