SlideShare a Scribd company logo
1 of 23
Download to read offline
Best practices for generating
linked data
Tutorial @ ICBO 2013
Tutorial Roadmap
Bio2RDF Best Practices
1. Assign a URI for all things
2. Assign labels and identifiers
3. Declare and assign types
4. Provide dataset provenance
1. Assign URIs for all things
● The base Bio2RDF URI pattern:
http://bio2rdf.org/namespace:identifier
● Data provider record identifiers are
maintained from source
● Linked Data = no blank nodes!
1. Assign URIs for all things
● Data provider records are maintained from
source
○ e.g. DrugBank’s resource IRI for
Leucovorin
http://bio2rdf.org/drugbank:DB00650
1. Assign URIs for all things
● Vocabulary namespaces are used for
dataset specific types and predicates
http://bio2rdf.org/drugbank_vocabulary:Drug
● Resource namespaces are used to assign
an identifier when one isn't a provided by the
source
- unique identifier with UUID, hash, counter, concatenated
strings, etc
http://bio2rdf.org/drugbank_resource:DB00440_DB00650
1. Assign URIs for all things
● All valid namespaces are listed in the
Bio2RDF Life Sciences Registry
○ ensures that URIs are consistent across all Bio2RDF
datasets
○ registry is publicly available at http://tinyurl.
com/dataregistry
2. Assign labels and identifiers
● Use rdfs:label to assign a language-specified
label for all resources
○ can be a source provided title, a script generated
phrase, or a phrase provided in a third party dataset
○ Pattern: rdfs:label "label [ns:id]"@lang
● Use Dublin Core predicates for source-
provided label and identifiers
○ Pattern: dc:title "label"@lang (assign language tag
only when one is provided)
○ Pattern: dc:identifier "ns:id"^^xsd:string
2. Assign labels and identifiers
● Use Bio2RDF predicates to assign Bio2RDF
namespace and Bio2RDF identifiers:
○ Pattern: bio2rdf_vocabulary:namespace "ns"^^xsd:
string
○ Pattern: bio2rdf_vocabulary:identifier "id"^^xsd:
string
2. Assign labels and identifiers
Example: DrugBank entry for Nitrazepam
drugbank:DB0159
rdfs:label "Nitrazepam [drugbank:DB0159]"@en ;
dc:title “Nitrazepam”@en ;
dc:identifier “drugbank:DB0159”^^xsd:string ;
bio2rdf_vocabulary:namespace “drugbank”^^xsd:string ;
bio2rdf_vocabulary:identifier “DB0159”^^xsd:string .
3. Declare and assign types
● All resources should be typed as being
resources of the dataset
○ Pattern: rdf:type namespace_vocabulary:Resource
● Instances of a dataset vocabulary type
should also be typed as owl:
NamedIndividual
○ Pattern: rdf:type namespace_vocabulary:Type
○ Pattern: rdf:type owl:NamedIndividual
● Classes should be typed as owl:Class
○ Pattern: rdf:type owl:Class
○ If superclass has been described using
namespace_vocabulary pattern, then link class
using rdfs:subClassOf
3. Declare and assign types
● Object properties and datatype properties
should also be typed
○ Pattern: rdf:type owl:ObjectProperty
○ Pattern: rdf:type owl:DatatypeProperty
● Examples:
drugbank:DB0159
rdf:type drugbank_vocabulary:Resource ;
rdf:type owl:Class ;
rdfs:subClassOf drugbank_vocabulary:Drug .
drugbank_vocabulary:ddi-interactor-in
rdf:type owl:ObjectProperty .
4. Provide dataset provenance
data item
Bio2RDF dataset
Features
-Entity-dataset link
-Creator
-Publisher
-Date created
-License & rights
-Source
-Availability
- SPARQL endpoint
- Data dump
Vocabularies
VoID
Dublin Core
W3C Provenance
Bio2RDF vocabulary
Source dataset
prov:wasDerivedFrom
void:inDataset
4. Provide dataset provenance
● link every resource to the versioned/dated
Bio2RDF dataset in which it is described
○ Pattern: void:inDataset <http://bio2rdf.org/dataset:
namespace-dd-mm-yyyy.rdf>
○ Example:
drugbank:DB0159 void:inDataset <http://bio2rdf.
org/dataset:drugbank-03-07-2013> .
A crash course in PHP
PHP : Hypertext Preprocessor
● A general-purpose open source scripting
language
○ homepage : http://php.net
● PHP scripts can be executed from the
command line or embedded in HTML
documents
● Syntactically similar to C/C++/Java but it is
not strongly typed
A hello world PHP script
● All PHP scripts are surrounded by the <?php
and ?> tags
Declaring and instantiating classes
Using the Bio2RDF PHP API to create an
RDFizer
● Basic structure of a Bio2RDFizer script:
○ Initialize script parameters - input file(s), default
dataset namespace, etc.
○ Define a Run() function that handles downloading
and iterating over input files, as well as function calls
to parse and convert input data to RDF
○ Define function(s) to convert input data to RDF using
Bio2RDF API helper functions
Using the Bio2RDF PHP API to create an
RDFizer
● Bio2RDF PHP API defines helper functions
that implement Bio2RDF best practices:
○ getNamespace()
○ getVoc()
○ getRes()
○ triplify($subject, $predicate, $object) //object is an rdf resource
○ triplifyString($subject, $predicate, "string")// object is a literal
○ describeIndividual($uri, $label, $type, $title, $description, $language)
○ describeClass( ... )
○ describeProperty ( ... )
Example: The Comparative
Toxicogenomics Database
CTD Bio2RDFizer
script is available
on GitHub
Using and contributing to the
Bio2RDF project on GitHub
Using and contributing to the
Bio2RDF project on GitHub
1. Fork the bio2rdf-scripts and php-lib
repositories on Github
https://help.github.com/articles/fork-a-repo
2. Write some code!
3. Commit code to your fork
4. Make a pull request to the bio2rdf-scripts
repo

More Related Content

What's hot

Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
Alexandra Roatiș
 

What's hot (19)

Getting triples from records: the role of ISBD
Getting triples from records: the role of ISBDGetting triples from records: the role of ISBD
Getting triples from records: the role of ISBD
 
Data shapes-test-suite
Data shapes-test-suiteData shapes-test-suite
Data shapes-test-suite
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
java programming
java programmingjava programming
java programming
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
5 rdfs
5 rdfs5 rdfs
5 rdfs
 
Semantic Web introduction
Semantic Web introductionSemantic Web introduction
Semantic Web introduction
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
ShEx by Example
ShEx by ExampleShEx by Example
ShEx by Example
 
Challenges and applications of RDF shapes
Challenges and applications of RDF shapesChallenges and applications of RDF shapes
Challenges and applications of RDF shapes
 
RDF validation tutorial
RDF validation tutorialRDF validation tutorial
RDF validation tutorial
 
RDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic RepositoriesRDF, SPARQL and Semantic Repositories
RDF, SPARQL and Semantic Repositories
 

Viewers also liked

As Outline
As OutlineAs Outline
As Outline
dc1
 
Email Delivery Support
Email Delivery SupportEmail Delivery Support
Email Delivery Support
robbie2629
 
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
Charles Nouyrit
 
Info literacy and social media in a public library
Info literacy and social media in a public libraryInfo literacy and social media in a public library
Info literacy and social media in a public library
Sue Lawson
 

Viewers also liked (20)

Querying Bio2RDF data
Querying Bio2RDF dataQuerying Bio2RDF data
Querying Bio2RDF data
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
As Outline
As OutlineAs Outline
As Outline
 
What's up with Prototype and script.aculo.us?
What's up with Prototype and script.aculo.us?What's up with Prototype and script.aculo.us?
What's up with Prototype and script.aculo.us?
 
Email Delivery Support
Email Delivery SupportEmail Delivery Support
Email Delivery Support
 
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
Ignite Paris 2009 - Is World of Warcraft the best leadership training solution?
 
Compa 2009 Giurus
Compa 2009 GiurusCompa 2009 Giurus
Compa 2009 Giurus
 
Sardsos more than a map, the role of the community in osm SOTMEU 2014
Sardsos more than a map, the role of the community in osm SOTMEU 2014Sardsos more than a map, the role of the community in osm SOTMEU 2014
Sardsos more than a map, the role of the community in osm SOTMEU 2014
 
Gezinsbond
GezinsbondGezinsbond
Gezinsbond
 
Info literacy and social media in a public library
Info literacy and social media in a public libraryInfo literacy and social media in a public library
Info literacy and social media in a public library
 
Visual Public Communication And Art
Visual Public Communication And ArtVisual Public Communication And Art
Visual Public Communication And Art
 
DevOps D-Day - Streamline DevOps workflows with APIs
DevOps D-Day - Streamline DevOps workflows with APIsDevOps D-Day - Streamline DevOps workflows with APIs
DevOps D-Day - Streamline DevOps workflows with APIs
 
Best Practice Solutions for Frequest Ajax Use Cases With Prototype
Best Practice Solutions for Frequest Ajax Use Cases With PrototypeBest Practice Solutions for Frequest Ajax Use Cases With Prototype
Best Practice Solutions for Frequest Ajax Use Cases With Prototype
 
Vertsol Report
Vertsol ReportVertsol Report
Vertsol Report
 
Docker wjax2014
Docker wjax2014Docker wjax2014
Docker wjax2014
 
Thesis 1 4
Thesis 1 4Thesis 1 4
Thesis 1 4
 
Nilai nilai Aqidah
Nilai nilai AqidahNilai nilai Aqidah
Nilai nilai Aqidah
 
Clutrain Ppt
Clutrain PptClutrain Ppt
Clutrain Ppt
 
RIM Conference
RIM ConferenceRIM Conference
RIM Conference
 

Similar to Best practices for generating Bio2RDF linked data

Exploring Oracle Database 12c Multitenant best practices for your Cloud
Exploring Oracle Database 12c Multitenant best practices for your CloudExploring Oracle Database 12c Multitenant best practices for your Cloud
Exploring Oracle Database 12c Multitenant best practices for your Cloud
dyahalom
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
Samuel Lampa
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Michel Dumontier
 
Nobel Prizes as Linked Open Data
Nobel Prizes as Linked Open DataNobel Prizes as Linked Open Data
Nobel Prizes as Linked Open Data
MetaSolutions AB
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Data Consortium
 

Similar to Best practices for generating Bio2RDF linked data (20)

GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Exploring Oracle Database 12c Multitenant best practices for your Cloud
Exploring Oracle Database 12c Multitenant best practices for your CloudExploring Oracle Database 12c Multitenant best practices for your Cloud
Exploring Oracle Database 12c Multitenant best practices for your Cloud
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
 
Php training in_noida
Php training in_noidaPhp training in_noida
Php training in_noida
 
Keep your repo clean
Keep your repo cleanKeep your repo clean
Keep your repo clean
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 
Dublin Core Description Set Profiles
Dublin Core Description Set ProfilesDublin Core Description Set Profiles
Dublin Core Description Set Profiles
 
Introduction to Bio SPARQL
Introduction to Bio SPARQL Introduction to Bio SPARQL
Introduction to Bio SPARQL
 
Nobel Prizes as Linked Open Data
Nobel Prizes as Linked Open DataNobel Prizes as Linked Open Data
Nobel Prizes as Linked Open Data
 
Xiaoli Li: MARC to BIBFRAME (Linked Data)
Xiaoli Li: MARC to BIBFRAME (Linked Data)Xiaoli Li: MARC to BIBFRAME (Linked Data)
Xiaoli Li: MARC to BIBFRAME (Linked Data)
 
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
 
How To Recoord
How To RecoordHow To Recoord
How To Recoord
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
Expanding the content categories at JaLC
Expanding the content categories at JaLCExpanding the content categories at JaLC
Expanding the content categories at JaLC
 
Health Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha NoyHealth Datapalooza 2013: Open Government Data - Natasha Noy
Health Datapalooza 2013: Open Government Data - Natasha Noy
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Best practices for generating Bio2RDF linked data

  • 1. Best practices for generating linked data Tutorial @ ICBO 2013
  • 3. Bio2RDF Best Practices 1. Assign a URI for all things 2. Assign labels and identifiers 3. Declare and assign types 4. Provide dataset provenance
  • 4. 1. Assign URIs for all things ● The base Bio2RDF URI pattern: http://bio2rdf.org/namespace:identifier ● Data provider record identifiers are maintained from source ● Linked Data = no blank nodes!
  • 5. 1. Assign URIs for all things ● Data provider records are maintained from source ○ e.g. DrugBank’s resource IRI for Leucovorin http://bio2rdf.org/drugbank:DB00650
  • 6. 1. Assign URIs for all things ● Vocabulary namespaces are used for dataset specific types and predicates http://bio2rdf.org/drugbank_vocabulary:Drug ● Resource namespaces are used to assign an identifier when one isn't a provided by the source - unique identifier with UUID, hash, counter, concatenated strings, etc http://bio2rdf.org/drugbank_resource:DB00440_DB00650
  • 7. 1. Assign URIs for all things ● All valid namespaces are listed in the Bio2RDF Life Sciences Registry ○ ensures that URIs are consistent across all Bio2RDF datasets ○ registry is publicly available at http://tinyurl. com/dataregistry
  • 8. 2. Assign labels and identifiers ● Use rdfs:label to assign a language-specified label for all resources ○ can be a source provided title, a script generated phrase, or a phrase provided in a third party dataset ○ Pattern: rdfs:label "label [ns:id]"@lang ● Use Dublin Core predicates for source- provided label and identifiers ○ Pattern: dc:title "label"@lang (assign language tag only when one is provided) ○ Pattern: dc:identifier "ns:id"^^xsd:string
  • 9. 2. Assign labels and identifiers ● Use Bio2RDF predicates to assign Bio2RDF namespace and Bio2RDF identifiers: ○ Pattern: bio2rdf_vocabulary:namespace "ns"^^xsd: string ○ Pattern: bio2rdf_vocabulary:identifier "id"^^xsd: string
  • 10. 2. Assign labels and identifiers Example: DrugBank entry for Nitrazepam drugbank:DB0159 rdfs:label "Nitrazepam [drugbank:DB0159]"@en ; dc:title “Nitrazepam”@en ; dc:identifier “drugbank:DB0159”^^xsd:string ; bio2rdf_vocabulary:namespace “drugbank”^^xsd:string ; bio2rdf_vocabulary:identifier “DB0159”^^xsd:string .
  • 11. 3. Declare and assign types ● All resources should be typed as being resources of the dataset ○ Pattern: rdf:type namespace_vocabulary:Resource ● Instances of a dataset vocabulary type should also be typed as owl: NamedIndividual ○ Pattern: rdf:type namespace_vocabulary:Type ○ Pattern: rdf:type owl:NamedIndividual ● Classes should be typed as owl:Class ○ Pattern: rdf:type owl:Class ○ If superclass has been described using namespace_vocabulary pattern, then link class using rdfs:subClassOf
  • 12. 3. Declare and assign types ● Object properties and datatype properties should also be typed ○ Pattern: rdf:type owl:ObjectProperty ○ Pattern: rdf:type owl:DatatypeProperty ● Examples: drugbank:DB0159 rdf:type drugbank_vocabulary:Resource ; rdf:type owl:Class ; rdfs:subClassOf drugbank_vocabulary:Drug . drugbank_vocabulary:ddi-interactor-in rdf:type owl:ObjectProperty .
  • 13. 4. Provide dataset provenance data item Bio2RDF dataset Features -Entity-dataset link -Creator -Publisher -Date created -License & rights -Source -Availability - SPARQL endpoint - Data dump Vocabularies VoID Dublin Core W3C Provenance Bio2RDF vocabulary Source dataset prov:wasDerivedFrom void:inDataset
  • 14. 4. Provide dataset provenance ● link every resource to the versioned/dated Bio2RDF dataset in which it is described ○ Pattern: void:inDataset <http://bio2rdf.org/dataset: namespace-dd-mm-yyyy.rdf> ○ Example: drugbank:DB0159 void:inDataset <http://bio2rdf. org/dataset:drugbank-03-07-2013> .
  • 15. A crash course in PHP
  • 16. PHP : Hypertext Preprocessor ● A general-purpose open source scripting language ○ homepage : http://php.net ● PHP scripts can be executed from the command line or embedded in HTML documents ● Syntactically similar to C/C++/Java but it is not strongly typed
  • 17. A hello world PHP script ● All PHP scripts are surrounded by the <?php and ?> tags
  • 19. Using the Bio2RDF PHP API to create an RDFizer ● Basic structure of a Bio2RDFizer script: ○ Initialize script parameters - input file(s), default dataset namespace, etc. ○ Define a Run() function that handles downloading and iterating over input files, as well as function calls to parse and convert input data to RDF ○ Define function(s) to convert input data to RDF using Bio2RDF API helper functions
  • 20. Using the Bio2RDF PHP API to create an RDFizer ● Bio2RDF PHP API defines helper functions that implement Bio2RDF best practices: ○ getNamespace() ○ getVoc() ○ getRes() ○ triplify($subject, $predicate, $object) //object is an rdf resource ○ triplifyString($subject, $predicate, "string")// object is a literal ○ describeIndividual($uri, $label, $type, $title, $description, $language) ○ describeClass( ... ) ○ describeProperty ( ... )
  • 21. Example: The Comparative Toxicogenomics Database CTD Bio2RDFizer script is available on GitHub
  • 22. Using and contributing to the Bio2RDF project on GitHub
  • 23. Using and contributing to the Bio2RDF project on GitHub 1. Fork the bio2rdf-scripts and php-lib repositories on Github https://help.github.com/articles/fork-a-repo 2. Write some code! 3. Commit code to your fork 4. Make a pull request to the bio2rdf-scripts repo