4. The Holy Grail:(this slide created circa 2002) Align the promoters of all serine threoninekinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
18. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
19. “Foreign keys” are used to link tables in a database Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
20. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Links in Graphs consist of statements called“TRIPLES” isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
21. Both Data Sources are on the Same Machine Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
22. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID Graph Data Sources (may be) on Independent Machines on the Web isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
23. “Meaning” of the connection between data-points is understood only by the database administrator Protein regulates Gene Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
24. Gene Table ----------------------- Gene ID Tissue ID Type ID Protein Table ----------------------- Protein Index Protein Name Regulates ID “Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!) isRepressorOf http://ncbi.nlm/NR/NR_14487 http://pdb.org/114487
25. Connect all of the graphs in the world to one another And what do you get?
26. Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
27. The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)
28. How do you find information on this “Semantic Web” ??
29. SPARQL The query language used to discover and extract information represented in Graphs
30. SPARQL Unfortunately, YOU have to know which Web resources contain which Triples (HARD!) Even if you do know this, SPARQL has significant limitations when attempting to query over disparate Graphs (SLOW AND CUMBERSOME)
31. SPARQL If the data doesn’t existin any Graph at all…
32.
33. Basically… A novel way of making Triples available on the Semantic Web, using a technology called Web Services “Services” for short
34. Basically… We invented SADI to overcome some/all of these problems …but I wont bore you with the technical details…
36. Holy Grail Demo #1 Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output ofevery conceivable analysis How do we query that database?
38. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers
39. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more…
40. A Novel SPARQL Query Engine Overcomes some of the limitations of traditional SPARQL query-handlers …and more… MUCH more!!
41. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
42. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway . }
43. What pathways does UniProt protein P47989 belong to? PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> PREFIX ont: <http://ontology.dumontierlab.com/> PREFIX uniprot: <http://lsrn.org/UniProt:> SELECT ?gene ?pathway WHERE { uniprot:P47989pred:isEncodedBy ?gene . ?geneont:isParticipantIn ?pathway . } Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question
47. Recapwhat we just saw A standard SPARQL query was entered into SHARE, a SADI-aware query engine
48. Recapwhat we just saw The query was interpreted to extract the individual data/relationships being requested (and any component/sub-properties, as we shall see later!)
49. Recapwhat we just saw The “triple-patterns” required to answer the query are passed to SADI for Web Service discovery
50. Recapwhat we just saw Services capable of generating those triple-patterns are automatically executed, the triples are stored, and the query is resolved.
51. Recapwhat we just saw We posed, and answered a ~complex database query WITHOUT A DATABASE (in fact, the data didn’t even have to exist...)
52. Holy Grail Demo #1 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
54. Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creat FROM <http://sadiframework.org/ontologies/patients.rdf> WHERE { ?patientrdf:typepatient:LikelyRejecter . ?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat . }
55. Likely Rejecter: A patient who has creatinine levelsthat are increasing over time - - Wilkinson MD
56. Likely Rejecter: …but there is no “likely rejecter” column or table in our database… only blood chemistry measurementsat various time-points
58. The definition of a LikelyRejecter is encoded in a machine-readable document written in the OWL language (“Ontology”) “the regression line over creatinine measurements should have an increasing slope”
59. The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…
66. How do we do that?!? We let the data describe itself! This is a different frommost of the bioinformatics world,where the person giving you the data also tells you how to interpret it
75. Holy Grail Demo #2 Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels. Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
76. The Holy Grail may not yet be in-handbut we can at least see it from here!So… now what?
80. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis
81. The Scientific Method Discourse: What do you believe? What do I believe? Disagreement: You’re wrong! And I’m gonna prove it! Clarity: This is the experiment I am going to do Reproducibility: This is how I did it (“provenance”) Clarity: This is my new hypothesis Workflows (e.g. myExperiment)
88. Or not... This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool) Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.
92. Load-up your data and press “play”! …Then go home for the weekend! You are just one click away from your M.Sc.!!
93. By the by… The SHARE application automatically creates a Workflow and then automatically runs it.This is where the data comes from to answer the queries… Workflows are a Good Thing™
121. The “Likely Rejecter” OWL Class is an explicitly-expressed hypothesis; Members of that class may or may not exist!
122.
123.
124. Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity Hypothesis Ischemia SADI + SHARE Hypertension Blood Pressure Analytical Algorithm Database 1 Database 2
125. Join us! SADI and CardioSHARE are Open-Source projects Come join us – we’re having a lot of fun!! http://sadiframework.org
126. Credits Benjamin VanderValk(SHARE & SADI) Luke McCarthy (SADI, SHARE, Taverna, CardioSHARE) SoroushSamadian(CardioSHARE) David Withers(Taverna) Edward Kawas(SADI Service auto-generator)
127. U of New Brunswick Dr. Chris BakerAlexandreRiazanov Carleton University Dr. Michel Dumontier Marc-Alexandre Nolin Leonid Chepelev Steve Etlinger NichaellaKieth Jose Cruz
129. Credits Benjamin VanderValk (SADI & CardioSHARE) Luke McCarthy (SADI & CardioSHARE) SoroushSamadian (CardioSHARE) IO Informatics (Knowledge Explorer API) Microsoft Research Fin This presentation available on SlideShare: keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’