This document discusses bringing model organism databases onto the Semantic Web using SADI (Semantic Automated Discovery and Integration). SADI allows bioinformatics data and software to be integrated automatically through web services that consume and generate RDF. The document describes how SADI has been implemented for GMOD (Generic Model Organism Database) to provide services for accessing sequence feature data from model organism databases. It outlines the structure of the SADI services and their inputs and outputs, and provides instructions for setting up and registering the services.
Bringing Model Organism Databases onto Semantic Web
1. SADI for GMOD:
Bringing Model Organism
Databases onto the
Semantic Web
Ben Vandervalk, Luke McCarthy, Edward
Kawas, Mark Wilkinson
James Hogg Research Centre, Heart + Lung Institute
University of British Columbia
http://code.google.com/p/sadi/wiki/SADIforGMOD
2. SADI for GMOD: Background
SADI (Semantic Automated Discovery and
Integration)
• Standard for Web services that consume/generate
RDF
• Motivation: automated integration of bioinformatics
data and software
GMOD (Generic Model Organism Database)
• Toolkit for building a model organism database and
website
• Collection of related open source projects: e.g.
Chado, Gbrowse, Pathway Tools
• Many sites use GMOD components: FlyBase,
BeetleBase, DictyBase, etc.
3. SADI in a Nutshell
• to invoke a SADI service:
o HTTP POST an RDF document to the service URI
o e.g. $ curl --data-binary @input.rdf
http://sadiframework.org/examples/hello
• to get service metadata:
o HTTP GET on service URL
o returns an RDF document with service name, description, etc.
o e.g. $ curl http://sadiframework.org/examples/hello
• structure of input/output data is described in OWL
o service provider specifies one input OWL class and one output OWL class
• strengths of SADI
o no framework-specific messaging formats or ontologies
o supports batch processing of inputs
o supports long-running services (asynchronous services)
more info: http://sadiframework.org/
4. SADI for GMOD
• SADI services for accessing sequence feature data
• implemented as Perl CGI scripts
Service Name Input Relationship Output
get_feature_info database identifier is about feature description
get_features_ collection of feature
genomic coordinates overlaps
overlapping_region descriptions
get_sequence_ DNA, RNA, or amino
genomic coordinates is represented by
for_region acid sequence
collection of feature
get_child_features feature description has part / derives into
descriptions
is part of / derives collection of feature
get_parent_feature feature description
from descriptions
5. SADI for GMOD: Structure of Service
Input/Output RDF
Input RDF (N3) Output RDF (N3)
@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> . @perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .
@prefix GeneID: <http://lsrn.org/GeneID:> . @prefix GeneID: <http://lsrn.org/GeneID:> .
@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?
GeneID:49962 id=> .
a lsrn:GeneID_Record; @prefix GenBank: <http://lsrn.org/GB:> .
sio:SIO_000008 [ # p = 'has attribute'
a lsrn:GeneID_Identifier; # p = 'is about'
sio:SIO_000300 "49962" # p = 'has value' GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .
] .
# feature
FlyBase:FBgn0040037
a SO:SO_0000704 . # o = 'gene'
range:position [
HTTP a range:RangedSequencePosition;
sio:SIO_000053 . # p = 'has proper part'
POST [ a range:StartPosition; sio:SIO_000300 26994];
sio:SIO_000053 . # p = 'has proper part'
[ a range:EndPosition; sio:SIO_000300 32391];
range:in_relation_to _:minus_strand_seq
] .
_:minus_strand_seq
sio:SIO_000011 [ # p = 'represents'
a strand:MinusStrand;
sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of'
] .
# reference feature (chromosome)
FlyBase:4 # chromosome 4
get_feature_info a SO:SO_0000105 . # o = 'chromosome arm'
8. Acknowledgements
Team
Mark Wilkinson: Principal Investigator
Luke McCarthy: Lead Programmer, SADI & SHARE
Edward Kawas: Perl Programmer, SADI
Funding
Microsoft
Research
http://sadiframework.org/