SlideShare a Scribd company logo
1 of 129
SADI, SHARE and the Scientific Method The Quest for the Holy Grail
The Problem
The Problem
The Holy Grail:(this slide created circa 2002) Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Two novel technologies  developed in our lab are getting us very close to the Holy Grail!
Holy Grail Demo #1
Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
How do we query that database?
A Brief Digression…
“Database”
?
Boxes became ovals… Straight lines became curvy lines…
Boxes became ovals… Straight lines became curvy lines… …and you want us to give you a grant for THAT??
Relational Database “Graph”
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
“Foreign keys” are used to link tables in a database Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Links in Graphs consist of statements called“TRIPLES”    isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Both Data Sources are on the Same Machine Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Graph Data Sources (may be) on Independent Machines on the Web isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
“Meaning” of the connection between data-points is understood only by the database administrator Protein regulates Gene Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID “Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!) isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
Connect all of the graphs in the world  to one another And what do you get?
Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
How do you find information on this “Semantic Web” ??
SPARQL The query language used to discover and extract information represented in Graphs
SPARQL Unfortunately, YOU have to know which Web resources contain which Triples (HARD!) Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs (SLOW  AND  CUMBERSOME)
SPARQL If the data doesn’t existin any Graph at all…
Basically… A novel way of making Triples available on the Semantic Web, using a technology called Web Services “Services” for short
Basically… We invented SADI to overcome some/all of these problems …but I wont bore you with the technical details…
Detour EndsPlease resume speed
Holy Grail Demo #1 Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis How do we query that database?
SHARESemantic Health And Research Environment SPARQL enhanced by SADI
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more…
A  Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more… MUCH more!!
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  	uniprot:P47989 pred:isEncodedBy ?gene .  	?gene ont:isParticipantIn ?pathway .  }
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  	uniprot:P47989 pred:isEncodedBy ?gene .  	?gene ont:isParticipantIn ?pathway .  }
What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway  WHERE {  uniprot:P47989pred:isEncodedBy ?gene .  	?geneont:isParticipantIn ?pathway .  } Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
Now stick that query into SHARE
Recapwhat we just saw A standard SPARQL query was entered into SHARE, a SADI-aware query engine
Recapwhat we just saw The query was interpreted to extract the individual data/relationships being requested  (and any component/sub-properties, as we shall see later!)
Recapwhat we just saw The “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
Recapwhat we just saw Services capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
Recapwhat we just saw We posed, and answered a ~complex database query  WITHOUT A DATABASE (in fact, the data didn’t even have to exist...)
Holy Grail Demo #1 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Holy Grail Demo #2
Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>  PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>  SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { 	?patientrdf:typepatient:LikelyRejecter . 	?patient l:latestBUN ?bun .  	?patient l:latestCreatinine ?creat .  }
Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson MD
Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
?
The definition of a LikelyRejecter is encoded in a machine-readable document  written in the OWL language  (“Ontology”) “the regression line over creatinine measurements should have an increasing slope”
The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
Then…  Two magical events occur…
The machine figures out by itselfthe need to do a Linear Regression analysisin order to answer your question
The machine figures out by itselfhow and wherethat analysiscan be doneand does it automatically!
http://www.impactlab.net/2009/03/22/improve-your-brain-power/
The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis
VOILA!
How do we do that?!? We let the data describe itself! This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
Data exhibits “late binding”
Late binding:“purpose and meaning”of the data isnot determined untilthe moment it is required
Benefit of late binding Data is amenable to constant re-interpretation
Example? Blood Creatinine measurements were not dictated to be (only) Blood Creatinine measurements!
Example? The data had the ‘qualities/properties’ that allowed the machine to infer that they were Blood Creatinine measurements
Example? But the data also had the ‘qualities/properties’ that  allowed them to be interpreted as  X/Y coordinate data by another Service
http://www.flickr.com/people/faernworks/
Holy Grail Demo #2 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
Mark’s Manifesto What is my next “Holy Grail”?
Science Support for the in silico Scientific Method
The Scientific Method Discourse:  What do you believe?  What do I believe? Disagreement:  You’re wrong!  And I’m gonna prove it! Clarity:  This is the experiment I am going to do Reproducibility:  This is how I did it (“provenance”) Clarity:  This is my new hypothesis
The Scientific Method Discourse:  What do you believe?  What do I believe? Disagreement:  You’re wrong!  And I’m gonna prove it! Clarity:  This is the experiment I am going to do Reproducibility:  This is how I did it (“provenance”) Clarity:  This is my new hypothesis Workflows                 (e.g. myExperiment)
Another Brief Digression…
“Facebook” for Scientists http://myexperiment.org
An exciting evolution in the way Researchers express and share their in silico “Materials and Methods” Through things called ‘Workflows’
Workflows are explicit representationsof the method by which an analysis was doneand which resources are used to do it
Workflows can be very simple…    “Blast this sequence”
Or not... This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool)  Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
Why bother?
Taverna A workbench for designing and executingScientific Workflows
Load-up your data and press “play”! …Then go home for the weekend!  You are just one click away from your M.Sc.!!
By the by… The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
Detour EndsPlease resume speed
WORKFLOWS
At the moment  the Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
Large, centralized ontologies  (e.g. the Gene Ontology) that claim to represent community agreement about “biological reality”
…is that Science?
To restore the “traditions of Science” to in silico science The Semantic Web needs to encourage/facilitate personal opinion and debate
What has this got to do with SADI and SHARE?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>  PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>  PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>  SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { 	?patientrdf:typepatient:LikelyRejecter . 	?patient l:latestBUN ?bun .  	?patient l:latestCreatinine ?creat .  }
Likely Rejecter
I created a small ontology describing my definition ofa Likely Rejecter
… it was MY ontology!
I can re-use it
I can modify it as I change myworld-view
I can publish it for others to use
Others can modify it and/or compare it to THEIR world-view
Sharing my ontology gives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
Using SADI and SHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
http://www.dailymail.co.uk/femail/article-488234/Friends-dignity-self-respect---weight-wasnt-I-lost-slimming-club.html
…but there’s more…
“Likely Rejecter”
I made that up!  It came out of my head!
What’s another word for a world-view that you make-up? Hypothesis
The “Likely Rejecter” OWL Class is an explicitly-expressed hypothesis; Members of that class may or may not exist!
Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity Hypothesis Ischemia SADI +  SHARE Hypertension Blood Pressure Analytical Algorithm Database 1 Database 2
Join us! SADI and CardioSHARE are Open-Source projects Come join us – we’re having a lot of fun!! http://sadiframework.org
 Credits Benjamin VanderValk(SHARE & SADI) Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE) SoroushSamadian(CardioSHARE) David Withers(Taverna) Edward Kawas(SADI Service auto-generator)
U of New Brunswick Dr. Chris BakerAlexandreRiazanov Carleton University Dr. Michel Dumontier 	Marc-Alexandre Nolin 	Leonid Chepelev 	Steve Etlinger NichaellaKieth 	Jose Cruz
Microsoft Research
                          Credits Benjamin VanderValk (SADI & CardioSHARE) Luke McCarthy (SADI & CardioSHARE) SoroushSamadian (CardioSHARE) IO Informatics (Knowledge Explorer API) Microsoft Research  Fin This presentation available on SlideShare:  keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’

More Related Content

Viewers also liked

Crew, Foia, Documents 010156 - 010573
Crew, Foia, Documents  010156 - 010573Crew, Foia, Documents  010156 - 010573
Crew, Foia, Documents 010156 - 010573Obama White House
 
Fantasia In Zaire
Fantasia In ZaireFantasia In Zaire
Fantasia In ZaireStelarosa .
 
What Is Literary Criticism[1]2
What Is Literary Criticism[1]2What Is Literary Criticism[1]2
What Is Literary Criticism[1]2makeefer
 
New Media Presentation
New Media PresentationNew Media Presentation
New Media Presentationgaskinjo
 
Power Point 802.3
Power Point 802.3Power Point 802.3
Power Point 802.3roby90f
 
Crew documents 020334 - 020392
Crew documents 020334 - 020392Crew documents 020334 - 020392
Crew documents 020334 - 020392Obama White House
 
Intro a finanzas
Intro a finanzasIntro a finanzas
Intro a finanzasancrzamo
 
Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008paobazzi
 
Why Choose My Sensei
Why Choose My SenseiWhy Choose My Sensei
Why Choose My Senseiguest5c9bc8
 
Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622Obama White House
 
Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793Obama White House
 
Another Introduce to Redis
Another Introduce to RedisAnother Introduce to Redis
Another Introduce to Redisjiaqing zheng
 
Almost 2009
Almost 2009Almost 2009
Almost 2009paobazzi
 
VC_flier_HYD.compressed
VC_flier_HYD.compressedVC_flier_HYD.compressed
VC_flier_HYD.compressedSuneel Sharma
 
Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108Obama White House
 
Beautiful Lanscape
Beautiful LanscapeBeautiful Lanscape
Beautiful Lanscapefauzanmuslim
 

Viewers also liked (20)

Crew, Foia, Documents 010156 - 010573
Crew, Foia, Documents  010156 - 010573Crew, Foia, Documents  010156 - 010573
Crew, Foia, Documents 010156 - 010573
 
France 3 Lorraine : présence numérique 2010-2014
France 3 Lorraine : présence numérique 2010-2014France 3 Lorraine : présence numérique 2010-2014
France 3 Lorraine : présence numérique 2010-2014
 
Fantasia In Zaire
Fantasia In ZaireFantasia In Zaire
Fantasia In Zaire
 
What Is Literary Criticism[1]2
What Is Literary Criticism[1]2What Is Literary Criticism[1]2
What Is Literary Criticism[1]2
 
New Media Presentation
New Media PresentationNew Media Presentation
New Media Presentation
 
Power Point 802.3
Power Point 802.3Power Point 802.3
Power Point 802.3
 
Crew documents 020334 - 020392
Crew documents 020334 - 020392Crew documents 020334 - 020392
Crew documents 020334 - 020392
 
Intro a finanzas
Intro a finanzasIntro a finanzas
Intro a finanzas
 
MSE Part1-Chapter3
MSE Part1-Chapter3MSE Part1-Chapter3
MSE Part1-Chapter3
 
Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008Old Ivy Vs Ort 2008
Old Ivy Vs Ort 2008
 
Introdução
IntroduçãoIntrodução
Introdução
 
Why Choose My Sensei
Why Choose My SenseiWhy Choose My Sensei
Why Choose My Sensei
 
Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622Crew, Foia, Documents 011528 - 011622
Crew, Foia, Documents 011528 - 011622
 
Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793Crew, Foia, Documents 008692 - 008793
Crew, Foia, Documents 008692 - 008793
 
Another Introduce to Redis
Another Introduce to RedisAnother Introduce to Redis
Another Introduce to Redis
 
Almost 2009
Almost 2009Almost 2009
Almost 2009
 
VC_flier_HYD.compressed
VC_flier_HYD.compressedVC_flier_HYD.compressed
VC_flier_HYD.compressed
 
Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108Crew, Foia, Documents 011994 - 012108
Crew, Foia, Documents 011994 - 012108
 
Pysec
PysecPysec
Pysec
 
Beautiful Lanscape
Beautiful LanscapeBeautiful Lanscape
Beautiful Lanscape
 

Similar to The Scientific Method on the Semantic Web

The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalMark Wilkinson
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceMark Wilkinson
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Research - this time it's personal
Research - this time it's personalResearch - this time it's personal
Research - this time it's personalMark Wilkinson
 
C:\fakepath\bioit world2010
C:\fakepath\bioit world2010C:\fakepath\bioit world2010
C:\fakepath\bioit world2010guestdde063f8
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherNils Gehlenborg
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Novel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsNovel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsChristopher Petersen
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcUSD Bioinformatics
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Jun Zhao
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Mark Wilkinson
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 

Similar to The Scientific Method on the Semantic Web (20)

The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its Personal
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
Research - this time it's personal
Research - this time it's personalResearch - this time it's personal
Research - this time it's personal
 
C:\fakepath\bioit world2010
C:\fakepath\bioit world2010C:\fakepath\bioit world2010
C:\fakepath\bioit world2010
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Article
ArticleArticle
Article
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Novel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity RelationshipsNovel Approaches to Elucidating Structure Activity Relationships
Novel Approaches to Elucidating Structure Activity Relationships
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmc
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 

More from Mark Wilkinson

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1Mark Wilkinson
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluatorMark Wilkinson
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Mark Wilkinson
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th PlenaryMark Wilkinson
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur lsMark Wilkinson
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceMark Wilkinson
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesMark Wilkinson
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Mark Wilkinson
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico scienceMark Wilkinson
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012Mark Wilkinson
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the SingularityMark Wilkinson
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Mark Wilkinson
 

More from Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th PlenarysmartAPIs:  EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

The Scientific Method on the Semantic Web

  • 1. SADI, SHARE and the Scientific Method The Quest for the Holy Grail
  • 4. The Holy Grail:(this slide created circa 2002) Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 5. Two novel technologies developed in our lab are getting us very close to the Holy Grail!
  • 7. Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis
  • 8. How do we query that database?
  • 11.
  • 12.
  • 13.
  • 14. ?
  • 15. Boxes became ovals… Straight lines became curvy lines…
  • 16. Boxes became ovals… Straight lines became curvy lines… …and you want us to give you a grant for THAT??
  • 18. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 19. “Foreign keys” are used to link tables in a database Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 20. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Links in Graphs consist of statements called“TRIPLES” isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 21. Both Data Sources are on the Same Machine Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 22. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Graph Data Sources (may be) on Independent Machines on the Web isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 23. “Meaning” of the connection between data-points is understood only by the database administrator Protein regulates Gene Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 24. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID “Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!) isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
  • 25. Connect all of the graphs in the world to one another And what do you get?
  • 26. Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
  • 27. The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
  • 28. How do you find information on this “Semantic Web” ??
  • 29. SPARQL The query language used to discover and extract information represented in Graphs
  • 30. SPARQL Unfortunately, YOU have to know which Web resources contain which Triples (HARD!) Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs (SLOW AND CUMBERSOME)
  • 31. SPARQL If the data doesn’t existin any Graph at all…
  • 32.
  • 33. Basically… A novel way of making Triples available on the Semantic Web, using a technology called Web Services “Services” for short
  • 34. Basically… We invented SADI to overcome some/all of these problems …but I wont bore you with the technical details…
  • 36. Holy Grail Demo #1 Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis How do we query that database?
  • 37. SHARESemantic Health And Research Environment SPARQL enhanced by SADI
  • 38. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers
  • 39. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more…
  • 40. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more… MUCH more!!
  • 41. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 42. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
  • 43. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989pred:isEncodedBy ?gene . ?geneont:isParticipantIn ?pathway . } Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
  • 44. Now stick that query into SHARE
  • 45.
  • 46.
  • 47. Recapwhat we just saw A standard SPARQL query was entered into SHARE, a SADI-aware query engine
  • 48. Recapwhat we just saw The query was interpreted to extract the individual data/relationships being requested (and any component/sub-properties, as we shall see later!)
  • 49. Recapwhat we just saw The “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
  • 50. Recapwhat we just saw Services capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
  • 51. Recapwhat we just saw We posed, and answered a ~complex database query WITHOUT A DATABASE (in fact, the data didn’t even have to exist...)
  • 52. Holy Grail Demo #1 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 54. Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 55. Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson MD
  • 56. Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
  • 57. ?
  • 58. The definition of a LikelyRejecter is encoded in a machine-readable document written in the OWL language (“Ontology”) “the regression line over creatinine measurements should have an increasing slope”
  • 59. The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
  • 60. Then… Two magical events occur…
  • 61. The machine figures out by itselfthe need to do a Linear Regression analysisin order to answer your question
  • 62. The machine figures out by itselfhow and wherethat analysiscan be doneand does it automatically!
  • 64. The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis
  • 66. How do we do that?!? We let the data describe itself! This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
  • 67. Data exhibits “late binding”
  • 68.
  • 69. Late binding:“purpose and meaning”of the data isnot determined untilthe moment it is required
  • 70. Benefit of late binding Data is amenable to constant re-interpretation
  • 71. Example? Blood Creatinine measurements were not dictated to be (only) Blood Creatinine measurements!
  • 72. Example? The data had the ‘qualities/properties’ that allowed the machine to infer that they were Blood Creatinine measurements
  • 73. Example? But the data also had the ‘qualities/properties’ that allowed them to be interpreted as X/Y coordinate data by another Service
  • 75. Holy Grail Demo #2 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
  • 76. The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
  • 77. Mark’s Manifesto What is my next “Holy Grail”?
  • 78. Science Support for the in silico Scientific Method
  • 79.
  • 80. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis
  • 81. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis Workflows (e.g. myExperiment)
  • 83. “Facebook” for Scientists http://myexperiment.org
  • 84. An exciting evolution in the way Researchers express and share their in silico “Materials and Methods” Through things called ‘Workflows’
  • 85.
  • 86. Workflows are explicit representationsof the method by which an analysis was doneand which resources are used to do it
  • 87. Workflows can be very simple… “Blast this sequence”
  • 88. Or not... This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool) Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
  • 90. Taverna A workbench for designing and executingScientific Workflows
  • 91.
  • 92. Load-up your data and press “play”! …Then go home for the weekend! You are just one click away from your M.Sc.!!
  • 93. By the by… The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
  • 96.
  • 97. At the moment the Semantic Web in Healthcare and Life Sciencesaddresses these issues by attempting to create “consensus”
  • 98. Large, centralized ontologies (e.g. the Gene Ontology) that claim to represent community agreement about “biological reality”
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. To restore the “traditions of Science” to in silico science The Semantic Web needs to encourage/facilitate personal opinion and debate
  • 105. What has this got to do with SADI and SHARE?
  • 106. PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
  • 108. I created a small ontology describing my definition ofa Likely Rejecter
  • 109. … it was MY ontology!
  • 111. I can modify it as I change myworld-view
  • 112. I can publish it for others to use
  • 113. Others can modify it and/or compare it to THEIR world-view
  • 114. Sharing my ontology gives opportunities for “micro-attribution”“Credit” to me is automatic when someone uses my ontology in their ontology/query
  • 115. Using SADI and SHAREmypersonal world-view isexplicitlyexpressedand can bedynamically evaluated againstglobal data and knowledge
  • 119. I made that up! It came out of my head!
  • 120. What’s another word for a world-view that you make-up? Hypothesis
  • 121. The “Likely Rejecter” OWL Class is an explicitly-expressed hypothesis; Members of that class may or may not exist!
  • 122.
  • 123.
  • 124. Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity Hypothesis Ischemia SADI + SHARE Hypertension Blood Pressure Analytical Algorithm Database 1 Database 2
  • 125. Join us! SADI and CardioSHARE are Open-Source projects Come join us – we’re having a lot of fun!! http://sadiframework.org
  • 126. Credits Benjamin VanderValk(SHARE & SADI) Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE) SoroushSamadian(CardioSHARE) David Withers(Taverna) Edward Kawas(SADI Service auto-generator)
  • 127. U of New Brunswick Dr. Chris BakerAlexandreRiazanov Carleton University Dr. Michel Dumontier Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger NichaellaKieth Jose Cruz
  • 129. Credits Benjamin VanderValk (SADI & CardioSHARE) Luke McCarthy (SADI & CardioSHARE) SoroushSamadian (CardioSHARE) IO Informatics (Knowledge Explorer API) Microsoft Research Fin This presentation available on SlideShare: keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’