SlideShare a Scribd company logo
1 of 129
SADI, SHARE and the Scientific Method The Quest for the Holy Grail
The Problem
The Problem
The Holy Grail:(this slide created circa 2002) Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Two novel technologies  developed in our lab are getting us very close to the Holy Grail!
Holy Grail Demo #1
Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
How do we query that database?
A Brief Digression…
“Database”
?
Boxes became ovals… Straight lines became curvy lines…
Boxes became ovals… Straight lines became curvy lines… …and you want us to give you a grant for THAT??
Relational Database “Graph”
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
“Foreign keys” are used to link tables in a database Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Links in Graphs consist of statements called“TRIPLES”    isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Both Data Sources are on the Same Machine Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Graph Data Sources (may be) on Independent Machines on the Web isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
“Meaning” of the connection between data-points is understood only by the database administrator Protein regulates Gene Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID “Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!) isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Connect all of the graphs in the world  to one another And what do you get?
Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
How do you find information on this “Semantic Web” ??
SPARQL The query language used to discover and extract information represented in Graphs
SPARQL Unfortunately, YOU have to know which Web resources contain which Triples (HARD!) Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs (SLOW  AND  CUMBERSOME)
SPARQL If the data doesn’t existin any Graph at all…
Basically… A novel way of making Triples available on the Semantic Web, using a technology called Web Services “Services” for short
Basically… We invented SADI to overcome some/all of these problems …but I wont bore you with the technical details…
Detour EndsPlease resume speed
Holy Grail Demo #1 Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis How do we query that database?
SHARESemantic Health And Research Environment SPARQL enhanced by SADI
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more…
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more… MUCH more!!
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  	uniprot:P47989 pred:isEncodedBy ?gene .  	?gene ont:isParticipantIn ?pathway .  }
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  	uniprot:P47989 pred:isEncodedBy ?gene .  	?gene ont:isParticipantIn ?pathway .  }
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  uniprot:P47989pred:isEncodedBy ?gene .  	?geneont:isParticipantIn ?pathway .  } Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
Now stick that query into SHARE
Recapwhat we just saw A standard SPARQL query was entered into SHARE, a SADI-aware query engine
Recapwhat we just saw The query was interpreted to extract the individual data/relationships being requested  (and any component/sub-properties, as we shall see later!)
Recapwhat we just saw The “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
Recapwhat we just saw Services capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
Recapwhat we just saw We posed, and answered a ~complex database query  WITHOUT A DATABASE (in fact, the data didn’t even have to exist...)
Holy Grail Demo #1 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Holy Grail Demo #2
Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>  PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>  SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { 	?patientrdf:typepatient:LikelyRejecter . 	?patient l:latestBUN ?bun .  	?patient l:latestCreatinine ?creat .  }
Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson MD
Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
?
The definition of a LikelyRejecter is encoded in a machine-readable document  written in the OWL language  (“Ontology”) “the regression line over creatinine measurements should have an increasing slope”
The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
Then…  Two magical events occur…
The machine figures out by itselfthe need to do a Linear Regression analysisin order to answer your question
The machine figures out by itselfhow and wherethat analysiscan be doneand does it automatically!
http://www.impactlab.net/2009/03/22/improve-your-brain-power/
The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis
VOILA!
How do we do that?!? We let the data describe itself! This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
Data exhibits “late binding”
Late binding:“purpose and meaning”of the data isnot determined untilthe moment it is required
Benefit of late binding Data is amenable to constant re-interpretation
Example? Blood Creatinine measurements were not dictated to be (only) Blood Creatinine measurements!
Example? The data had the ‘qualities/properties’ that allowed the machine to infer that they were Blood Creatinine measurements
Example? But the data also had the ‘qualities/properties’ that  allowed them to be interpreted as  X/Y coordinate data by another Service
http://www.flickr.com/people/faernworks/
Holy Grail Demo #2 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
Mark’s Manifesto What is my next “Holy Grail”?
Science Support for the in silico Scientific Method
The Scientific Method Discourse:  What do you believe?  What do I believe? Disagreement:  You’re wrong!  And I’m gonna prove it! Clarity:  This is the experiment I am going to do Reproducibility:  This is how I did it (“provenance”) Clarity:  This is my new hypothesis
The Scientific Method Discourse:  What do you believe?  What do I believe? Disagreement:  You’re wrong!  And I’m gonna prove it! Clarity:  This is the experiment I am going to do Reproducibility:  This is how I did it (“provenance”) Clarity:  This is my new hypothesis Workflows                 (e.g. myExperiment)
Another Brief Digression…
“Facebook” for Scientists http://myexperiment.org
An exciting evolution in the way Researchers express and share their in silico “Materials and Methods” Through things called ‘Workflows’
Workflows are explicit representationsof the method by which an analysis was doneand which resources are used to do it
Workflows can be very simple…    “Blast this sequence”
Or not... This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool)  Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
Why bother?
Taverna A workbench for designing and executingScientific Workflows
Load-up your data and press “play”! …Then go home for the weekend!  You are just one click away from your M.Sc.!!
By the by… The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
Detour EndsPlease resume speed
WORKFLOWS
At the moment  the Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
Large, centralized ontologies  (e.g. the Gene Ontology) that claim to represent community agreement about “biological reality”
…is that Science?
To restore the “traditions of Science” to in silico science The Semantic Web needs to encourage/facilitate personal opinion and debate
What has this got to do with SADI and SHARE?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>  PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>  SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { 	?patientrdf:typepatient:LikelyRejecter . 	?patient l:latestBUN ?bun .  	?patient l:latestCreatinine ?creat .  }
Likely Rejecter
I created a small ontology describing my definition ofa Likely Rejecter
… it was MY ontology!
I can re-use it
I can modify it as I change myworld-view
I can publish it for others to use
Others can modify it and/or compare it to THEIR world-view
Sharing my ontology gives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
Using SADI and SHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
http://www.dailymail.co.uk/femail/article-488234/Friends-dignity-self-respect---weight-wasnt-I-lost-slimming-club.html
…but there’s more…
“Likely Rejecter”
I made that up!  It came out of my head!
What’s another word for a world-view that you make-up? Hypothesis
The “Likely Rejecter” OWL Class is an explicitly-expressed hypothesis; Members of that class may or may not exist!
Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity Hypothesis Ischemia SADI +  SHARE Hypertension Blood Pressure Analytical Algorithm Database 1 Database 2
Join us! SADI and CardioSHARE are Open-Source projects Come join us – we’re having a lot of fun!! http://sadiframework.org
 Credits Benjamin VanderValk(SHARE & SADI) Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE) SoroushSamadian(CardioSHARE) David Withers(Taverna) Edward Kawas(SADI Service auto-generator)
U of New Brunswick Dr. Chris BakerAlexandreRiazanov Carleton University Dr. Michel Dumontier 	Marc-Alexandre Nolin 	Leonid Chepelev 	Steve Etlinger NichaellaKieth 	Jose Cruz
Microsoft Research
                          Credits Benjamin VanderValk (SADI & CardioSHARE) Luke McCarthy (SADI & CardioSHARE) SoroushSamadian (CardioSHARE) IO Informatics (Knowledge Explorer API) Microsoft Research  Fin This presentation available on SlideShare:  keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’

More Related Content

Viewers also liked

Fantasia In Zaire
Fantasia In ZaireFantasia In Zaire
Fantasia In Zaire
Stelarosa .
 
What Is Literary Criticism[1]2
What Is Literary Criticism[1]2What Is Literary Criticism[1]2
What Is Literary Criticism[1]2
makeefer
 
New Media Presentation
New Media PresentationNew Media Presentation
New Media Presentation
gaskinjo
 
Intro a finanzas
Intro a finanzasIntro a finanzas
Intro a finanzas
ancrzamo
 
Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008
paobazzi
 
Another Introduce to Redis
Another Introduce to RedisAnother Introduce to Redis
Another Introduce to Redis
jiaqing zheng
 
Almost 2009
Almost 2009Almost 2009
Almost 2009
paobazzi
 
VC_flier_HYD.compressed
VC_flier_HYD.compressedVC_flier_HYD.compressed
VC_flier_HYD.compressed
Suneel Sharma
 
Beautiful Lanscape
Beautiful LanscapeBeautiful Lanscape
Beautiful Lanscape
fauzanmuslim
 

Viewers also liked (20)

Crew, Foia, Documents 010156 - 010573
Crew, Foia, Documents  010156 - 010573Crew, Foia, Documents  010156 - 010573
Crew, Foia, Documents 010156 - 010573
 
France 3 Lorraine : présence numérique 2010-2014
France 3 Lorraine : présence numérique 2010-2014France 3 Lorraine : présence numérique 2010-2014
France 3 Lorraine : présence numérique 2010-2014
 
Fantasia In Zaire
Fantasia In ZaireFantasia In Zaire
Fantasia In Zaire
 
What Is Literary Criticism[1]2
What Is Literary Criticism[1]2What Is Literary Criticism[1]2
What Is Literary Criticism[1]2
 
New Media Presentation
New Media PresentationNew Media Presentation
New Media Presentation
 
Power Point 802.3
Power Point 802.3Power Point 802.3
Power Point 802.3
 
Crew documents 020334 - 020392
Crew documents 020334 - 020392Crew documents 020334 - 020392
Crew documents 020334 - 020392
 
Intro a finanzas
Intro a finanzasIntro a finanzas
Intro a finanzas
 
MSE Part1-Chapter3
MSE Part1-Chapter3MSE Part1-Chapter3
MSE Part1-Chapter3
 
Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008
 
Introdução
IntroduçãoIntrodução
Introdução
 
Why Choose My Sensei
Why Choose My SenseiWhy Choose My Sensei
Why Choose My Sensei
 
Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622
 
Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793
 
Another Introduce to Redis
Another Introduce to RedisAnother Introduce to Redis
Another Introduce to Redis
 
Almost 2009
Almost 2009Almost 2009
Almost 2009
 
VC_flier_HYD.compressed
VC_flier_HYD.compressedVC_flier_HYD.compressed
VC_flier_HYD.compressed
 
Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108
 
Pysec
PysecPysec
Pysec
 
Beautiful Lanscape
Beautiful LanscapeBeautiful Lanscape
Beautiful Lanscape
 

Similar to The Scientific Method on the Semantic Web

C:\fakepath\bioit world2010
C:\fakepath\bioit world2010C:\fakepath\bioit world2010
C:\fakepath\bioit world2010
guestdde063f8
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Nils Gehlenborg
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmc
USD Bioinformatics
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
Jun Zhao
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
Meetika Gupta
 

Similar to The Scientific Method on the Semantic Web (20)

The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its Personal
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Research - this time it's personal
Research - this time it's personalResearch - this time it's personal
Research - this time it's personal
 
C:\fakepath\bioit world2010
C:\fakepath\bioit world2010C:\fakepath\bioit world2010
C:\fakepath\bioit world2010
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Article
ArticleArticle
Article
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Novel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsNovel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity Relationships
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmc
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 

More from Mark Wilkinson

Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
Mark Wilkinson
 

More from Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 

Recently uploaded

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

The Scientific Method on the Semantic Web

  • 1. SADI, SHARE and the Scientific Method The Quest for the Holy Grail
  • 4. The Holy Grail:(this slide created circa 2002) Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 5. Two novel technologies developed in our lab are getting us very close to the Holy Grail!
  • 7. Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
  • 8. How do we query that database?
  • 11.
  • 12.
  • 13.
  • 14. ?
  • 15. Boxes became ovals… Straight lines became curvy lines…
  • 16. Boxes became ovals… Straight lines became curvy lines… …and you want us to give you a grant for THAT??
  • 18. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 19. “Foreign keys” are used to link tables in a database Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 20. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Links in Graphs consist of statements called“TRIPLES” isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 21. Both Data Sources are on the Same Machine Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 22. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Graph Data Sources (may be) on Independent Machines on the Web isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 23. “Meaning” of the connection between data-points is understood only by the database administrator Protein regulates Gene Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 24. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID “Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!) isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 25. Connect all of the graphs in the world to one another And what do you get?
  • 26. Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
  • 27. The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
  • 28. How do you find information on this “Semantic Web” ??
  • 29. SPARQL The query language used to discover and extract information represented in Graphs
  • 30. SPARQL Unfortunately, YOU have to know which Web resources contain which Triples (HARD!) Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs (SLOW AND CUMBERSOME)
  • 31. SPARQL If the data doesn’t existin any Graph at all…
  • 32.
  • 33. Basically… A novel way of making Triples available on the Semantic Web, using a technology called Web Services “Services” for short
  • 34. Basically… We invented SADI to overcome some/all of these problems …but I wont bore you with the technical details…
  • 36. Holy Grail Demo #1 Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis How do we query that database?
  • 37. SHARESemantic Health And Research Environment SPARQL enhanced by SADI
  • 38. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers
  • 39. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more…
  • 40. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more… MUCH more!!
  • 41. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 42. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 43. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989pred:isEncodedBy ?gene . ?geneont:isParticipantIn ?pathway . } Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
  • 44. Now stick that query into SHARE
  • 45.
  • 46.
  • 47. Recapwhat we just saw A standard SPARQL query was entered into SHARE, a SADI-aware query engine
  • 48. Recapwhat we just saw The query was interpreted to extract the individual data/relationships being requested (and any component/sub-properties, as we shall see later!)
  • 49. Recapwhat we just saw The “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
  • 50. Recapwhat we just saw Services capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
  • 51. Recapwhat we just saw We posed, and answered a ~complex database query WITHOUT A DATABASE (in fact, the data didn’t even have to exist...)
  • 52. Holy Grail Demo #1 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 54. Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 55. Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson MD
  • 56. Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
  • 57. ?
  • 58. The definition of a LikelyRejecter is encoded in a machine-readable document written in the OWL language (“Ontology”) “the regression line over creatinine measurements should have an increasing slope”
  • 59. The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
  • 60. Then… Two magical events occur…
  • 61. The machine figures out by itselfthe need to do a Linear Regression analysisin order to answer your question
  • 62. The machine figures out by itselfhow and wherethat analysiscan be doneand does it automatically!
  • 64. The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis
  • 66. How do we do that?!? We let the data describe itself! This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
  • 67. Data exhibits “late binding”
  • 68.
  • 69. Late binding:“purpose and meaning”of the data isnot determined untilthe moment it is required
  • 70. Benefit of late binding Data is amenable to constant re-interpretation
  • 71. Example? Blood Creatinine measurements were not dictated to be (only) Blood Creatinine measurements!
  • 72. Example? The data had the ‘qualities/properties’ that allowed the machine to infer that they were Blood Creatinine measurements
  • 73. Example? But the data also had the ‘qualities/properties’ that allowed them to be interpreted as X/Y coordinate data by another Service
  • 75. Holy Grail Demo #2 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 76. The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
  • 77. Mark’s Manifesto What is my next “Holy Grail”?
  • 78. Science Support for the in silico Scientific Method
  • 79.
  • 80. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis
  • 81. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis Workflows (e.g. myExperiment)
  • 83. “Facebook” for Scientists http://myexperiment.org
  • 84. An exciting evolution in the way Researchers express and share their in silico “Materials and Methods” Through things called ‘Workflows’
  • 85.
  • 86. Workflows are explicit representationsof the method by which an analysis was doneand which resources are used to do it
  • 87. Workflows can be very simple… “Blast this sequence”
  • 88. Or not... This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool) Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
  • 90. Taverna A workbench for designing and executingScientific Workflows
  • 91.
  • 92. Load-up your data and press “play”! …Then go home for the weekend! You are just one click away from your M.Sc.!!
  • 93. By the by… The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
  • 96.
  • 97. At the moment the Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
  • 98. Large, centralized ontologies (e.g. the Gene Ontology) that claim to represent community agreement about “biological reality”
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. To restore the “traditions of Science” to in silico science The Semantic Web needs to encourage/facilitate personal opinion and debate
  • 105. What has this got to do with SADI and SHARE?
  • 106. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 108. I created a small ontology describing my definition ofa Likely Rejecter
  • 109. … it was MY ontology!
  • 111. I can modify it as I change myworld-view
  • 112. I can publish it for others to use
  • 113. Others can modify it and/or compare it to THEIR world-view
  • 114. Sharing my ontology gives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
  • 115. Using SADI and SHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
  • 119. I made that up! It came out of my head!
  • 120. What’s another word for a world-view that you make-up? Hypothesis
  • 121. The “Likely Rejecter” OWL Class is an explicitly-expressed hypothesis; Members of that class may or may not exist!
  • 122.
  • 123.
  • 124. Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity Hypothesis Ischemia SADI + SHARE Hypertension Blood Pressure Analytical Algorithm Database 1 Database 2
  • 125. Join us! SADI and CardioSHARE are Open-Source projects Come join us – we’re having a lot of fun!! http://sadiframework.org
  • 126. Credits Benjamin VanderValk(SHARE & SADI) Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE) SoroushSamadian(CardioSHARE) David Withers(Taverna) Edward Kawas(SADI Service auto-generator)
  • 127. U of New Brunswick Dr. Chris BakerAlexandreRiazanov Carleton University Dr. Michel Dumontier Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger NichaellaKieth Jose Cruz
  • 129. Credits Benjamin VanderValk (SADI & CardioSHARE) Luke McCarthy (SADI & CardioSHARE) SoroushSamadian (CardioSHARE) IO Informatics (Knowledge Explorer API) Microsoft Research Fin This presentation available on SlideShare: keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’