SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
COMPREHENSIVE SELF-SERVICE
LIFE SCIENCE DATA FEDERATION
WITH SADI SEMANTIC WEB SERVICES
AND HYDRA
Alexandre Riazanov, CTO
IPSNP Computing Inc
Oslo University, Sep 23, 2015
WHO WE ARE
• IPSNP Computing Inc -- a Canadian startup,
building on and commercializing prior academic
research on SADI.
• Founded to develop an industrial strength query
tool for SADI, to supercede a research proof-of-
concept prototype.
• Looking for customers/partners and investors.
BIOMEDICAL RESEARCHERS AND CLINICIANS USE DATA
FROM MULTIPLE SOURCES
• Online and in-house databases, spreadsheets.
• Web services, e.g., literature search, etc.
• Nomenclatures, ontologies, controlled
vocabularies.
• Web sites, scientific publications, patents, etc.
• Algorithms, e.g., BLAST, molecular structure
prediction, various text mining programs, etc.
BIG VISION: FEDERATED QUERYING OF
HETEROGENEOUS AND DISTRIBUTED DATA SOURCES
• We want to query 1000s of data sources as a
single database.
• We want more agility than datawarehousing
can provide: e.g., just-in-time algorithm
execution, plug-and-play data source addition,
live data querying.
• We want to use simple and declarative queries,
not to program workflow scripts.
IS THIS SCI-FI?
WE CAN ACTUALLY DO THIS
WITH SEMANTIC WEB SERVICES
Here is how our data federation engine HYDRA works:
HOW IS THIS ALL POSSIBLE?
• Key ingredient: the SADI framework for
Semantic Web services (Semantic Automated
Discovery and Integration).
• SADI services are:
• RESTful services
• consuming and producing one format -- RDF,
• with semantic descriptions (in OWL) fully defining
their functionality.
PLAN OF THE TALK
• What are SADI services?
• Automatic service discovery and
invocation in query engines (HYDRA).
• Self-service querying vision.
• Query composition with HYDRA GUI.
• An overview of Bioinformatics and Clinical
Intelligence case studies.
Tons of screenshots!
SADI SERVICE I/O
• Input: RDF description of an input object.
• Output: another RDF graph providing more
(computed or retrieved) info about the input
object or linking it to other objects.
• Since all SADI services “talk the same
language” (RDF), they are 100% syntactically
interoperable:
– output of one SADI service can be directly
consumed by any other SADI services.
Describe your
input, and I will
tell you
something else
about it”
COMPLETE SEMANTIC DESCRIPTIONS
OF SERVICE FUNCTIONALITY
• SADI services carry semantic descriptions of their
I/O that completely define what the service expects
and can accept as input, and what RDF assertions the
service can output.
• Unique and extremely powerful property: it facilitates
completely automatic discovery
and
orchestration of services.
HYDRA QUERY ENGINE
● Given a SPARQL query, HYDRA analyses it
by using an intelligent logic-based algorithm
(proprietary, unlike SADI itself).
● HYDRA requests descriptions of potentially
useful services from available SADI service
registries.
● HYDRA processes the descriptions and
figures out which services have to be
invoked, on what data and in what order.
SPARQL is a W3C
standard semantic
query language --
much more intuitive
than SQL.
QUERY EXAMPLE
• Find documents mentioning "haloalkane dehalogenase
activity", extract information about mutations and visualise the
mutations on 3D protein structure images.
• HYDRA automatically finds and orchestrates 5 services from
our registry:
– PubMed search: keyword query ⟶ document PubMed IDs
– PDF retrieval: PubMed ID ⟶ PDF file URL
– ASCII extraction: PDF file ⟶ ASCII text
– Text mining: ASCII text ⟶ mutation info
– Visualisation: mutation & protein ⟶ 3D image (Jmol)
RESULTS
Deploying mutation impact text-mining software with the SADI Semantic Web Services framework
http://www.biomedcentral.com/qc/1471-2105/12/S4/S6
WHAT IS SO COOL ABOUT IT?
• Data federation at its best:
– independent, heterogeneous data sources (PubMed
doc search, PubMed Central for PDFs);
– not only data is integrated: ASCII extraction, text
mining and 3D visualisation are algorithms!
• Execution is completely automatic: HYDRA finds and
invokes the services without any help from the user.
MORE QUERY EXAMPLES
• Find drug products that contain active ingredient X.
• Find drugs that have been studied in clinical trials targeting
infections caused by bacteria X.
• Annotate a DNA sequence X with molecular functions of
proteins produced by the corresponding gene.
• Find patients with precondition X diagnosed with infections Y
resulting from procedure Z.
• Many many other questions that Life Scientists and
Clinicians ask on a daily basis.
IT’S ONLY ½ OF THE STORY
REMEMBER THE BIG VISION?
HERE IS AN EVEN BIGGER VISION:
Self-service ad hoc querying of federated data.
HYDRA IMPLEMENTS SEMANTIC QUERYING
• Users need not know how the source data
is organised or accessed.
• They just need to know the terminology of
their subject domain.
• Queries are completely declarative:
specify what you want to find, not how.
HYDRA ALSO SUPPORTS
CONCEPT HIERARCHIES AND RULES
● Some queries would be too complex if we could not
exploit generality:
o a query concerning all antibiotics requires
generalisation, otherwise all types of antibiotics would
have to be enumerated in the query.
● Much better way to do this is to import a classification of
drugs and use it in query execution.
● HYDRA facilitates such reasoning and even more
complex reasoning with rules.
THERE ARE NO PRINCIPLE OBSTACLES
TO SELF-SERVICE QUERYING
We just need an adequate user interface
for building queries.
HYDRA QUERY TOOL = ENGINE + GUI
QUERY COMPOSITION
Queries built based on entry of “Google-like” keyphrases:
Keyphrase: “document mentions protein “P22607”
A QUERY GRAPH IS GENERATED
FOR THE KEYPHRASE
“document mentions protein “P22607””
Keyphrase: “has pubmed id”:
ADDING ANOTHER KEYPHRASE
QUERY GRAPH IS EXTENDED WITH NODES
CORRESPONDING TO THE SECOND KEYPHRASE
Keyphrase: “has pubmed id”
Keyphrase: “document mentions protein “P22607”
OPTION 2: MANUALLY ADD/DELETE CLASSES,
INCOMING AND OUTGOING PROPERTIES
MANUALLY ADDED PROPERTY
FINISHED QUERY: FIND PUBMED IDS OF DOCUMENTS MENTIONING
PROTEIN P22607 AND CO-MENTIONED PROTEINS
SERVICES IN THE REGISTRY
SPARQL GENERATION
QUERY EXECUTION WITH THE HYDRA ENGINE
EXPORTED RESULTS IN AN EXCEL SPREADSHEET
SADI AND HYDRA QUERY TOOL
AT WORK
BIOINFORMATICS AND CHEMINFORMATICS CASE
STUDIES AND PILOTS WITH SADI AND HYDRA
• Integrating genomics text mining results with online
biomedical data and visualisation algorithms.
• Integrating programs for lipid molecule structural
analysis and classification.
• Interpreting toxicity experiment data by discovering
relevant info in online databases.
• Large-scale retrieval of toxicity information from
publications.
INTERPRETING TOXICITY EXPERIMENT DATA
• Partner: university lab studying effects of
environmental pollutants.
• Querying needs: finding relevant prior experiments,
gene annotation, protein domain annotation, etc.
• Data sources: ArrayExpress, BLAST, HMMER3,
RefSeq, Pfam, ORFPredictor, GO, UniProt, NCBI
Taxonomy -- all queried as a single DB!
SUBTASK: DNA MICROARRAY ANNOTATION
• Toxicity experiments with microarrays: which DNA sequences
are under/overexpressed after organism’s exposure to toxin X?
• Interpretation requires knowing affected protein functions and
domains.
• HYDRA virtually implements this workflow:
RETRIEVAL OF TOXICITY DATA FROM
PUBLICATIONS
• Customer: government agency (Canada).
• Querying needs: online publication search by
organism and chemical types, text-mining for
toxicity data.
• Data sources: NCBI Taxonomy and ChEBI with
free-text search, PubMed search, electronic
libraries, journal Web sites, Google Scholar,
specialised text-mining algorithm, text utilities.
Apparent
value: some
queries save
many man-
weeks of work
of a postdoc.
CLASSIFYING NEW LIPID MOLECULES
• One of the early experiments with SADI.
• A group in Carleton U. had a program for
identifying functional groups in a molecule
structure.
• A group in U. of New Brunswick had a classifier
estimating lipid classes based on
presence/absence of functional groups.
• Publishing the prototypes as SADI services
allowed us to integrate them with each other and
relevant external resources.
CLINICAL IT CASE STUDIES AND PILOTS
WITH SADI AND HYDRA
• Ad hoc querying of clinical data for Hospital
Acquired Infections surveillance and research
(with UNB, McGill SoM and Ottawa H.)
• On-going pilot with a US hospital.
• Looking for pilot opportunities for Clinical Trial
Cohort selection:
• trial eligibility criteria can be implemented as queries
over heterogeneous and distributed clinical data;
• benefits: cost reduction and timely alerts.
THANK YOU!
Further materials/services are available on request:
• Live and recorded demos.
• Publications on previous (academic) case studies.
• Training/consulting.
• http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)

Weitere ähnliche Inhalte

Was ist angesagt?

Hibernate Training Session1
Hibernate Training Session1Hibernate Training Session1
Hibernate Training Session1
Asad Khan
 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1
Maori Ito
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
bisg
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
Darshan Patil
 

Was ist angesagt? (20)

MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Hibernate Training Session1
Hibernate Training Session1Hibernate Training Session1
Hibernate Training Session1
 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
Snow Owl Platform. Unlocking the meaning from healthcare data.
Snow Owl Platform. Unlocking the meaning from healthcare data. Snow Owl Platform. Unlocking the meaning from healthcare data.
Snow Owl Platform. Unlocking the meaning from healthcare data.
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Data mining
Data miningData mining
Data mining
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
 
SNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to workSNOMED Bound to (Information) Model | Putting terminology to work
SNOMED Bound to (Information) Model | Putting terminology to work
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)
 
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsData Mining on SpamBase,Wine Quality and Communities and Crime Datasets
Data Mining on SpamBase,Wine Quality and Communities and Crime Datasets
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
 
WP4-QoS Management in the Cloud
WP4-QoS Management in the CloudWP4-QoS Management in the Cloud
WP4-QoS Management in the Cloud
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 
Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases Knowledge Discovery in Remote Access Databases
Knowledge Discovery in Remote Access Databases
 

Ähnlich wie Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
Etihad_CaseStudy_Interview_Himanshu - Copy.pptx
Etihad_CaseStudy_Interview_Himanshu - Copy.pptxEtihad_CaseStudy_Interview_Himanshu - Copy.pptx
Etihad_CaseStudy_Interview_Himanshu - Copy.pptx
ssuserb872d3
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 

Ähnlich wie Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA (20)

Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
Towards Clinical Intelligence with SADI Semantic Web Services: a Case Study w...
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
HANDI Summit 18 - Introducing HANDI-HOPD - Dr Ian McNicoll
HANDI Summit 18 - Introducing HANDI-HOPD - Dr Ian McNicollHANDI Summit 18 - Introducing HANDI-HOPD - Dr Ian McNicoll
HANDI Summit 18 - Introducing HANDI-HOPD - Dr Ian McNicoll
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Digital assembly Cardiff HANDI-HOPD workshop
Digital assembly  Cardiff  HANDI-HOPD workshopDigital assembly  Cardiff  HANDI-HOPD workshop
Digital assembly Cardiff HANDI-HOPD workshop
 
Digital assembly 2015 Cardiff HANDI-HOPD workshop
Digital assembly 2015 Cardiff HANDI-HOPD workshopDigital assembly 2015 Cardiff HANDI-HOPD workshop
Digital assembly 2015 Cardiff HANDI-HOPD workshop
 
Dr. Ian McNicoll Digital Health Assembly 2015
Dr. Ian McNicoll Digital Health Assembly 2015Dr. Ian McNicoll Digital Health Assembly 2015
Dr. Ian McNicoll Digital Health Assembly 2015
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About Data
 
Etihad_CaseStudy_Interview_Himanshu - Copy.pptx
Etihad_CaseStudy_Interview_Himanshu - Copy.pptxEtihad_CaseStudy_Interview_Himanshu - Copy.pptx
Etihad_CaseStudy_Interview_Himanshu - Copy.pptx
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
FHIR for Hackers
FHIR for HackersFHIR for Hackers
FHIR for Hackers
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 

Kürzlich hochgeladen

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Kürzlich hochgeladen (20)

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 

Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web services and HYDRA

  • 1. COMPREHENSIVE SELF-SERVICE LIFE SCIENCE DATA FEDERATION WITH SADI SEMANTIC WEB SERVICES AND HYDRA Alexandre Riazanov, CTO IPSNP Computing Inc Oslo University, Sep 23, 2015
  • 2. WHO WE ARE • IPSNP Computing Inc -- a Canadian startup, building on and commercializing prior academic research on SADI. • Founded to develop an industrial strength query tool for SADI, to supercede a research proof-of- concept prototype. • Looking for customers/partners and investors.
  • 3. BIOMEDICAL RESEARCHERS AND CLINICIANS USE DATA FROM MULTIPLE SOURCES • Online and in-house databases, spreadsheets. • Web services, e.g., literature search, etc. • Nomenclatures, ontologies, controlled vocabularies. • Web sites, scientific publications, patents, etc. • Algorithms, e.g., BLAST, molecular structure prediction, various text mining programs, etc.
  • 4. BIG VISION: FEDERATED QUERYING OF HETEROGENEOUS AND DISTRIBUTED DATA SOURCES • We want to query 1000s of data sources as a single database. • We want more agility than datawarehousing can provide: e.g., just-in-time algorithm execution, plug-and-play data source addition, live data querying. • We want to use simple and declarative queries, not to program workflow scripts.
  • 6. WE CAN ACTUALLY DO THIS WITH SEMANTIC WEB SERVICES Here is how our data federation engine HYDRA works:
  • 7. HOW IS THIS ALL POSSIBLE? • Key ingredient: the SADI framework for Semantic Web services (Semantic Automated Discovery and Integration). • SADI services are: • RESTful services • consuming and producing one format -- RDF, • with semantic descriptions (in OWL) fully defining their functionality.
  • 8. PLAN OF THE TALK • What are SADI services? • Automatic service discovery and invocation in query engines (HYDRA). • Self-service querying vision. • Query composition with HYDRA GUI. • An overview of Bioinformatics and Clinical Intelligence case studies. Tons of screenshots!
  • 9. SADI SERVICE I/O • Input: RDF description of an input object. • Output: another RDF graph providing more (computed or retrieved) info about the input object or linking it to other objects. • Since all SADI services “talk the same language” (RDF), they are 100% syntactically interoperable: – output of one SADI service can be directly consumed by any other SADI services. Describe your input, and I will tell you something else about it”
  • 10. COMPLETE SEMANTIC DESCRIPTIONS OF SERVICE FUNCTIONALITY • SADI services carry semantic descriptions of their I/O that completely define what the service expects and can accept as input, and what RDF assertions the service can output. • Unique and extremely powerful property: it facilitates completely automatic discovery and orchestration of services.
  • 11. HYDRA QUERY ENGINE ● Given a SPARQL query, HYDRA analyses it by using an intelligent logic-based algorithm (proprietary, unlike SADI itself). ● HYDRA requests descriptions of potentially useful services from available SADI service registries. ● HYDRA processes the descriptions and figures out which services have to be invoked, on what data and in what order. SPARQL is a W3C standard semantic query language -- much more intuitive than SQL.
  • 12. QUERY EXAMPLE • Find documents mentioning "haloalkane dehalogenase activity", extract information about mutations and visualise the mutations on 3D protein structure images. • HYDRA automatically finds and orchestrates 5 services from our registry: – PubMed search: keyword query ⟶ document PubMed IDs – PDF retrieval: PubMed ID ⟶ PDF file URL – ASCII extraction: PDF file ⟶ ASCII text – Text mining: ASCII text ⟶ mutation info – Visualisation: mutation & protein ⟶ 3D image (Jmol)
  • 13. RESULTS Deploying mutation impact text-mining software with the SADI Semantic Web Services framework http://www.biomedcentral.com/qc/1471-2105/12/S4/S6
  • 14. WHAT IS SO COOL ABOUT IT? • Data federation at its best: – independent, heterogeneous data sources (PubMed doc search, PubMed Central for PDFs); – not only data is integrated: ASCII extraction, text mining and 3D visualisation are algorithms! • Execution is completely automatic: HYDRA finds and invokes the services without any help from the user.
  • 15. MORE QUERY EXAMPLES • Find drug products that contain active ingredient X. • Find drugs that have been studied in clinical trials targeting infections caused by bacteria X. • Annotate a DNA sequence X with molecular functions of proteins produced by the corresponding gene. • Find patients with precondition X diagnosed with infections Y resulting from procedure Z. • Many many other questions that Life Scientists and Clinicians ask on a daily basis.
  • 16. IT’S ONLY ½ OF THE STORY
  • 17. REMEMBER THE BIG VISION?
  • 18. HERE IS AN EVEN BIGGER VISION: Self-service ad hoc querying of federated data.
  • 19. HYDRA IMPLEMENTS SEMANTIC QUERYING • Users need not know how the source data is organised or accessed. • They just need to know the terminology of their subject domain. • Queries are completely declarative: specify what you want to find, not how.
  • 20. HYDRA ALSO SUPPORTS CONCEPT HIERARCHIES AND RULES ● Some queries would be too complex if we could not exploit generality: o a query concerning all antibiotics requires generalisation, otherwise all types of antibiotics would have to be enumerated in the query. ● Much better way to do this is to import a classification of drugs and use it in query execution. ● HYDRA facilitates such reasoning and even more complex reasoning with rules.
  • 21. THERE ARE NO PRINCIPLE OBSTACLES TO SELF-SERVICE QUERYING We just need an adequate user interface for building queries.
  • 22. HYDRA QUERY TOOL = ENGINE + GUI
  • 23. QUERY COMPOSITION Queries built based on entry of “Google-like” keyphrases: Keyphrase: “document mentions protein “P22607”
  • 24. A QUERY GRAPH IS GENERATED FOR THE KEYPHRASE “document mentions protein “P22607””
  • 25. Keyphrase: “has pubmed id”: ADDING ANOTHER KEYPHRASE
  • 26. QUERY GRAPH IS EXTENDED WITH NODES CORRESPONDING TO THE SECOND KEYPHRASE Keyphrase: “has pubmed id” Keyphrase: “document mentions protein “P22607”
  • 27. OPTION 2: MANUALLY ADD/DELETE CLASSES, INCOMING AND OUTGOING PROPERTIES
  • 29. FINISHED QUERY: FIND PUBMED IDS OF DOCUMENTS MENTIONING PROTEIN P22607 AND CO-MENTIONED PROTEINS
  • 30. SERVICES IN THE REGISTRY
  • 32. QUERY EXECUTION WITH THE HYDRA ENGINE
  • 33. EXPORTED RESULTS IN AN EXCEL SPREADSHEET
  • 34. SADI AND HYDRA QUERY TOOL AT WORK
  • 35. BIOINFORMATICS AND CHEMINFORMATICS CASE STUDIES AND PILOTS WITH SADI AND HYDRA • Integrating genomics text mining results with online biomedical data and visualisation algorithms. • Integrating programs for lipid molecule structural analysis and classification. • Interpreting toxicity experiment data by discovering relevant info in online databases. • Large-scale retrieval of toxicity information from publications.
  • 36. INTERPRETING TOXICITY EXPERIMENT DATA • Partner: university lab studying effects of environmental pollutants. • Querying needs: finding relevant prior experiments, gene annotation, protein domain annotation, etc. • Data sources: ArrayExpress, BLAST, HMMER3, RefSeq, Pfam, ORFPredictor, GO, UniProt, NCBI Taxonomy -- all queried as a single DB!
  • 37. SUBTASK: DNA MICROARRAY ANNOTATION • Toxicity experiments with microarrays: which DNA sequences are under/overexpressed after organism’s exposure to toxin X? • Interpretation requires knowing affected protein functions and domains. • HYDRA virtually implements this workflow:
  • 38. RETRIEVAL OF TOXICITY DATA FROM PUBLICATIONS • Customer: government agency (Canada). • Querying needs: online publication search by organism and chemical types, text-mining for toxicity data. • Data sources: NCBI Taxonomy and ChEBI with free-text search, PubMed search, electronic libraries, journal Web sites, Google Scholar, specialised text-mining algorithm, text utilities. Apparent value: some queries save many man- weeks of work of a postdoc.
  • 39. CLASSIFYING NEW LIPID MOLECULES • One of the early experiments with SADI. • A group in Carleton U. had a program for identifying functional groups in a molecule structure. • A group in U. of New Brunswick had a classifier estimating lipid classes based on presence/absence of functional groups. • Publishing the prototypes as SADI services allowed us to integrate them with each other and relevant external resources.
  • 40. CLINICAL IT CASE STUDIES AND PILOTS WITH SADI AND HYDRA • Ad hoc querying of clinical data for Hospital Acquired Infections surveillance and research (with UNB, McGill SoM and Ottawa H.) • On-going pilot with a US hospital. • Looking for pilot opportunities for Clinical Trial Cohort selection: • trial eligibility criteria can be implemented as queries over heterogeneous and distributed clinical data; • benefits: cost reduction and timely alerts.
  • 41. THANK YOU! Further materials/services are available on request: • Live and recorded demos. • Publications on previous (academic) case studies. • Training/consulting. • http://ipsnp.com/ (Canada) and http://ipsnp.co/ (UK)