This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)
So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Presentation to the J. Craig Venter Institute, Dec. 2014
1. “Shopping for data should be as easy as
shopping for shoes!”
Dr. Carole Goble
Professor, Dept. of Computer Science
University of Manchester
2. “A little bit of semantics goes a long way”
Dr. James Hendler
Artificial Intelligence Researcher
Rensselaer Polytechnic Institute
One of the originators of the Semantic Web
3. …but a lot of semantics goes a long, long way!
Mark Wilkinson
Isaac Peral Distinguished Researcher
Director, Fundación BBVA Chair in Biological Informatics
Center for Plant Biotechnology and Genomics
Technical University of Madrid
4. Making the Web a
biomedical research platform
from hypothesis through to publication
9. Trend #1
Multiple recent surveys of high-throughput biology
reveal that upwards of 50% of published studies
are not reproducible
- Baggerly, 2009
- Ioannidis, 2009
10. Trend #1
Similar (if not worse!) in clinical studies
- Begley & Ellis, Nature, 2012
- Booth, Forbes, 2012
- Huang & Gottardo, Briefings in Bioinformatics, 2012
11. Trend #1
“the most common errors are simple,
the most simple errors are common”
At least partially because the
analytical methodology was inappropriate
and/or not sufficiently described
- Baggerly, 2009
12. Trend #1
These errors pass peer review
The researcher is (sometimes) unaware of the error
The process that led to the error is not recorded
Therefore it cannot be detected during peer-review
13. Agencies have Noticed!
In March, 2012, the US Institute of Medicine ~said
“Enough is enough!”
14. Agencies have Noticed!
Institute of Medicine Recommendations
For Conduct of High-Throughput Research:
1. Rigorously-described, -annotated, and -followed data
management and manipulation procedures
2. “Lock down” the computational analysis pipeline once it
Evolution of Translational Omics Lessons Learned and the Path Forward. The
Institute of Medicine of the National Academies, Report Brief, March 2012.
has been selected
3. Publish the analytical workflow in a formal manner,
together with the full starting and result datasets
17. Trend #2
High-throughput technologies are becoming
cheaper and easier to use
But there are still very few experts trained in
statistical analysis of high-throughput data
18. Trend #2
The number of job postings for data scientist
positions increased by 15,000% between the
summers of 2011 and 2012
-- Indeed.com job trends data reported by
http://blogs.nature.com/naturejobs/2013/03/18/so-you-want-to-be-a-data-scientist
19. Trend #2
Therefore
Even small, moderately-funded laboratories
can now afford to produce more data
than they can manage or interpret
20. Trend #2
Therefore
Even small, moderately-funded laboratories
can now afford to produce more data
than they can manage or interpret
These labs will likely never be able to afford
a qualified data scientist
22. The Healthcare
Singularity and the
Age of Semantic
Medicine, Michael
Gillam, et al, The
Fourth Paradigm:
Data-Intensive
Scientific Discovery
Tony Hey (Editor),
2009
Slide adapted with
permission from
Joanne Luciano,
Presentation at
Health Web
Science Workshop
2012, Evanston IL,
USA
June 22, 2012.
Trend #3
23. “The Singularity”
The X-intercept is where, the moment a discovery is made,
it is immediately put into practice
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.
24. You
Are
Here
Scientific research would have to be
conducted within a medium that
immediately interpreted
and disseminated the results...
25. ...in a form that immediately (actively!) affected the
results of other researchers...
You
Are
Here
27. 3 intersecting
and problematic trends
Non-reproducible science that passes peer-review
Cheaper production of larger and more complex datasets
that require specialized expertise to analyze properly
Need to more rapidly disseminate and use new discoveries
32. When I do my analysis
I want to draw on the knowledge
of global domain-experts like
statisticians and pathologists...
...as if they were mentors sitting
in the chair beside me.
33. Please don’t make me find
all of the data and knowledge
that I require to do my experiment
...it simply isn’t possible anymore...
Image from: Mark Smiciklas
Intersection Consulting, cc-nca
34. Image from AJ Cann
cc-by-a license
I want to support peer review(ers)
so that I do better science.
43. This is the critical bit!
The link is explicitly labeled!
causally related to
???
44. http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
causally related with
45. http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
causally related with
48. Ontology Spectrum
Catalog/
ID
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Value Restrs. General
Logical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
49. Ontology Spectrum
Catalog/
ID
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Value Restrs. General
Logical
constraints
Most biomedical ontologies
e.g. Gene Ontology
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
50. Ontology Spectrum
Catalog/
ID
Ontologies being used in today’s talk
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Value Restrs. General
Logical
constraints
Most biomedical ontologies
e.g. Gene Ontology
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
51. Ontology Spectrum
Catalog/
ID
Discovery & Interpretation systems – flexible!
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Value Restrs. General
Logical
constraints
Categorization Systems
Like library shelves, inflexible
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
52. Remember, this is the critical bit!
causally related with
http://semanticscience.org/resource/SIO_000243
It’s relationships that make
the Semantic Web “Semantic”
54. Even with “deep semantics”
a lot of important information cannot be represented
on the Semantic Web
For example, all of the data that results from
analytical algorithms and statistical analyses
55.
56.
57. Varying estimates
put the size of the
Deep Web between
500 and 800 times
larger than the
surface Web
58. On the WWW
“automation” of
access to Deep Web
data happens through
“Web Services”
59. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
60. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
61. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
None, so far, has proven to be wildly successful
(in my opinion)
62. There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
None, so far, has proven to be wildly successful
(in my opinion)
…because describing what a Service does is HARD!
64. Scientific Web Services
are DIFFERENT!
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
65. “The service interfaces within bioinformatics are relatively
simple. An extensible or constrained interoperability
framework is likely to suffice for current demands: a fully
generic framework is currently not necessary.”
Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.
66. Scientific Web Services are DIFFERENT!
They’re simpler!
So perhaps we can solve the Semantic Web Service problem
as it pertains to this (important!) domain
67. With respect to the Semantic Web
What is missing from this list?
Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result
69. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web gets its semantics from relationships
70. causally related with
http://semanticscience.org/resource/SIO_000243
The Semantic Web gets its semantics from relationships
In 2008 I published a set of design-patterns
for scientific Semantic Web Services
that focuses on the biological relationship that the Service “exposes”
73. AACTCTTCGTAGTG...
SADI
BLAST
has_seq_string
has
homology
to
Terminal Flower
type
gene
species
A. thal.
has_seq_string
sequence
SADI requires you to explicitly declare
as part of your analytical output,
the biological relationship that your
algorithm “exposed”.
AACTCTTCGTAGTG...
sequence
74. I want to share several stories that demonstrate
the cool things that happen when you use
SADI + deep semantics
75. Story #1: SHARE
The Semantic Health
and Research Environment
76. A proof-of-concept workflow orchestrator
+ SADI Semantic Web Service registry
Objective: answer biologists’ questions
77. The SHARE registry
indexes all of the input/output/relationship
triples that can be generated by all known services
This is how SHARE discovers services
79. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
80. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
The query language here is SPARQL
The W3C-approved, standard query language for the Semantic Web
81. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
Note that there is no “FROM” clause!
We don’t tell it where it should get the information,
The machine has to figure that out by itself...
82. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
Starting data: the locus “DEF” (Deficiens)
83. What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}
Query: A series of relationships v.v. DEF
86. ...and in a few seconds you get your answer.
Based on the relationships in your query, SHARE queried its registry
to automatically discover SADI Services capable of generating those triples
87. Because it is the Semantic Web
The query results are live hyperlinks
to the respective Database or images
(The answer is IN the Web!)
88. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
89. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
90. What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
Note again that there is no “From” clause…
I have not told SHARE where to look for the
answer, I am simply asking my question
94. Two different
providers of
gene
information
(KEGG &
NCBI);
were found &
accessed
Two different
providers of
pathway
information
(KEGG and
GO);
were found &
accessed
95. The results are all links to the original data
(The answer is IN the Web!)
96. Show me the latest Blood Urea Nitrogen and Creatinine levels
of patients who appear to be rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
97. Show me the latest Blood Urea Nitrogen (BUN) and
Creatinine levels of patients who appear to be
rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}
98. Likely Rejecter:
A patient who has creatinine levels
that are increasing over time
- - Mark D Wilkinson’s definition
99. Likely Rejecter:
…but there is no “likely rejecter”
column or table in our database…
only blood chemistry measurements
at various time-points
100. Likely Rejecter:
So the data required to answer this question
DOESN’T EXIST!
101. My definition of a Likely Rejecter is encoded in
a machine-readable document written in the OWL Ontology language
Basically:
“the regression line over creatinine measurements should have an increasing slope”
102. Our ontology refers to other ontologies (possibly published by other people)
to learn about what the properties of “regression models” are
e.g. that regression models have slopes and intercepts
and that slopes and intercepts have decimal values
105. SHARE examines the query
Burrows around the Web reading
the various ontologies
then uses the discovered Class definitions as a template
to map a path from what it has, to what it needs, using
SADI services
106. Based on the Class definition
SHARE decides that it needs to do a
Linear Regression analysis
on the blood creatinine measurements
108. The conversation between SHARE and the registry
reveals the use of “Deep Semantics”
Q: Is there a SADI service that will consume instances of Patient and give
me instances of LikelyRejector
A: No
Q: Okay... So LikelyRejectors need a regression model of increasing slope
over their BloodCreatinine, so... Is there a SADI service that will consume
BloodCreatinine over time and give me its linear regression model?
A: No
Q: Okay... Blood Creatinine over time is a subclass of data of type
X/Y coordinate, so is there a service that consumes X/Y data and
returns its regression model?
A: Yes here’s the URL.
109. The SHARE system utilizes SADI to discover
analytical services on the Web that do linear regression analysis
and sends the data to be analyzed
110. This happens iteratively
(e.g. SHARE also has to examine the slope of the regression line
using another service, find the “latest” in a series of time measurements, etc.)
There is reasoning after every Service invocation
(i.e. after every clause in the query)
Once it is able to find instances (OWL Individuals)
of the LikelyRejector class, it continues with the
rest of the query
112. The way SHARE “interprets” data varies
depending on the context of the query
(i.e. which ontologies it reads – Mine? Yours?)
and on what part of the query
it is trying to answer at any given moment
(which ontological concept is relevant to that clause)
114. Example?
The data had the ‘qualities/properties’ that
allowed one machine to interpret
that they were Blood Creatinine measurements
(e.g. to determine which patients were rejecting)
115. Example?
But the data also had the ‘qualities/properties’ that
allowed another machine to interpret them as
Simple X/Y coordinate data
(e.g. the Linear Regression calculation tool)
116. Benefit
of Deep Semantics
Data is amenable to
constant re-interpretation
122. The Reality of Clinical Datasets
(this is a small snapshot of a dataset we worked on,
courtesy of Dr. Bruce McManus & Janet McManus, from the PROOF COE)
ID HEIGHT WEIGHT SBP CHOL HDL BMI
GR
SBP
GR
CHOL
GR
HDL
GR
pt1 1.82 177 128 227 55 0 0 1 0
pt2 179 196 13.4 5.9 1.7 1 0 1 0
Height in m and cm Chol in mmol/l and mg/l
...and other delicious weirdness
The clinical analyses described here
were supported in part by the
PROOF Center of Excellence
for the Prevention of Organ Failure
123. GOAL: reduce the likelihood of errors by
getting the clinical researcher
“out of the loop”
(as per the Institute of Medicine Recommendations)
124. Experiment:
Reproduce a clinical study
(from >10 years ago)
by logically encoding
the clinical diagnosis guidelines
of the American Heart Association
then ask SHARE to automatically
analyse the patient clinical data
125. Semantically defining globally-accepted clinical phenotypes;
Building on the expertise of others
SystolicBloodPressure =
GALEN:SystolicBloodPressure and
GALEN is a popular biomedical ontology
but it is largely, like GO, a series of
named but undefined Classes
("sio:has measurement value" some "sio:measurement" and
("sio:has unit" some “om: unit of measure”) and
(“om:dimension” value “om:pressure or stress dimension”) and
"sio:has value" some rdfs:Literal))
126. Semantically defining globally-accepted clinical phenotypes;
Building on the expertise of others
SystolicBloodPressure =
relationships like “has measurement valule”
GALEN:SystolicBloodPressure and
So we use OWL to extend the GALEN
Classes with rich, logical descriptors
that take advantage of rich semantic
and “dimension” and “has unit”
("sio:has measurement value" some "sio:measurement" and
("sio:has unit" some “om: unit of measure”) and
(“om:dimension” value “om:pressure or stress dimension”) and
"sio:has value" some rdfs:Literal))
127. Semantically defining globally-accepted clinical phenotypes;
Building on the expertise of others
SystolicBloodPressure =
GALEN:SystolicBloodPressure and
("sio:has measurement value" some "sio:measurement" and
("sio:has unit" some “om: unit of measure”) and
(“om:dimension” value “om:pressure or stress dimension”) and
"sio:has value" some rdfs:Literal))
Very general definition
“some kind of pressure unit”
(so that others can build on this as they wish!)
128. Semantically defining globally-accepted clinical phenotypes;
Building on the expertise of others
HighRiskSystolicBloodPressure (as defined by Framingham)
SystolicBloodPressure and
sio:hasMeasurement some
(sio:Measurement and
(“sio:has unit” value om:kilopascal) and
(sio:hasValue some double[>= "18.7"^^double])))
Now we are specific to our clinical study (Framingham definitions):
MUST be in kpascal and must be > 18.7
129. Running the Clinical Analysis
“Select the patients who are at-risk”
SELECT ?record ?convertedvalue ?convertedunit
FROM <./patient.rdf>
WHERE {
?record rdf:type measure:HighRiskSystolicBloodPressure .
?record sio:hasMeasurement ?measurement.
?measurement sio:hasValue ?Pressure.
}
All measurements have now been automatically
harmonized to KiloPascal, because we encoded the
semantics in the model
RecordID Start Val Start Unit Pressure End Unit
Pt1 15 cmHg 19.998 KiloPascal
Pt2 14.6 cmHg 19.465 KiloPascal
Pt1 148 mmHg 19.731 KiloPascal
Pt2 146 mmHg 19.465 KiloPascal
130. While doing this experiment, we noticed
some interesting anomalies…
131. Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their analysis
------------------
AHA BMI risk threshold: BMI=25
In our dataset the clinical researcher used BMI=26
------------------
AHA HDL guideline HDL<=1.03mmol/l
The dataset from our researcher: HDL<=0.89mmol/l
-------------------
132. Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their analysis
These Alterations Were Not Recorded
in Their Study Notes!
133. Adjusting our Semantic definitions and re-running the analysis
resulted in nearly 100% correspondence with the clinical researcher
HighRiskCholesterolRecord=
PatientRecord and
(sio:hasAttribute some
(cardio:SerumCholesterolConcentration and
sio:hasMeasurement some ( sio:Measurement and
(sio:hasUnit value cardio:mili-mole-per-liter) and
(sio:hasValue some double[>= 5.0]))))
HighRiskCholesterolRecord=
PatientRecord and
(sio:hasAttribute some
(cardio:SerumCholesterolConcentration and
sio:hasMeasurement some ( sio:Measurement and
(sio:hasUnit value cardio:mili-mole-per-liter) and
(sio:hasValue some double[>= 5.2]))))
134. Reflect on this for a second... Because this is important!
1. We semantically encoded clinical guidelines
2. We found that clinical researchers did not follow the official guidelines
3. Their “personalization” of the guidelines was unreported
4. Nevertheless, we were able to create “personalized” Semantic Models
5. These models reflect the opinion of an individual domain-expert
6. These models are shared on the Web
7. Can be automatically re-used by others to interpret their own data using
that clinical expert’s viewpoint
135. PREFIX AHA =http://americanheart.org/measurements/
PREFIX McManus=http://stpaulshospital.org/researchers/mcmanus/
AHA:HighRiskCholesterolRecord
PatientRecord and
(sio:hasAttribute some
(cardio:SerumCholesterolConcentration and
sio:hasMeasurement some ( sio:Measurement and
(sio:hasUnit value cardio:mili-mole-per-liter) and
(sio:hasValue some double[>= 5.0]))))
McManus:HighRiskCholesterolRecord
PatientRecord and
(sio:hasAttribute some
(cardio:SerumCholesterolConcentration and
sio:hasMeasurement some ( sio:Measurement and
(sio:hasUnit value cardio:mili-mole-per-liter) and
(sio:hasValue some double[>= 5.2]))))
136. To do the analysis using AHL guidelines
SELECT ?patient ?risk
WHERE {
?patient rdf:type AHA: HighRiskCholesterolRecord .
?patient ex:hasCholesterolProfile ?risk
}
137. To do the analysis using McManus’ expert-opinion
SELECT ?patient ?risk
WHERE {
?patient rdf:type McManus:HighRiskCholesterolRecord .
?patient ex:hasCholesterolProfile ?risk
}
144. Semantic Model of the Experiment
Note that every word in this
diagram is, in reality, a URL
(it’s a Semantic Web model)
i.e. It refers to the expertise of
other researchers, distributed
around the world on the Web
145. Set-up the Experimental Conditions
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
146. SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
Run the Experiment
147. SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
Run the Experiment
This is the URL that leads our computer
to the Semantic model of the problem
148. SHARE examines the semantic model of
Probable Interactors
Retrieves third-party expertise from the Web
Discusses with SADI
what analytical tools are necessary
Chooses the right tools for the problem
Solves the problem!
150. SHARE is aware of the context of the specific question being asked
151.
152. There are five very cool things about what you just saw...
153. There are five very cool things about what you just saw...
was able to create a
workflow based on a
semantic model
1.
154. There are five very cool things about what you just saw...
was able to create a
COMPUTATIONAL workflow
based on a BIOLOGICAL model
2.
155. There are five very cool things about what you just saw...
(this is important because we want
who don’t speak computerese!) 2.
this system to be used by clinicians and biologists
156. There are five very cool things about what you just saw...
The workflow it created, and services
selected, differed depending on the
context of the question
3.
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
157. There are five very cool things about what you just saw...
The machine was contextually “aware of”
The workflow it created, and services
chosen, differed depending on the
BOTH the biological model
context of the question
3.
AND the data it was analysing
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly
(...remember this... It will be important later!)
158. There are five very cool things about what you just saw...
The ontological model was abstract (and
shareable!), but the workflow generated
from that model was explicit and concrete
4.
159. There are five very cool things about what you just saw...
The ontological model was abstract (and
shareable!), but the workflow generated
from that model was explicit and concrete
4.
160. There are five very cool things about what you just saw...
The ontological model was abstract (and
shareable!), but the workflow generated
from that model was explicit and concrete
4.
This matters because…
161. Remember
Trend #1
“the most common errors are simple,
the most simple errors are common”
At least partially because the
analytical methodology was inappropriate
and/or not sufficiently described
162. Remember
Trend #1
“the most common errors are simple,
the most simple errors are common”
At least partially because the
analytical methodology was inappropriate
and/or not sufficiently described
Here, the methodology leading to a result is explicit
and automatically constructed from an abstract template
so this is (at least in part) a
Solved Problem
163. There are five very cool things about what you just saw...
The choice of tool-selection was
guided by the knowledge of
worldwide domain-experts encoded in
globally-distributed ontologies
(e.g. Expert high-throughput statisticians, etc...)
5.
164. There are five very cool things about what you just saw...
The choice of tool-selection was
guided by the knowledge of
worldwide domain-experts encoded in
globally-distributed ontologies
(e.g. Expert high-throughput statisticians, etc...)
And this matters because…
5.
165. Remember
Trend #2
Even small, moderately-funded laboratories
can now afford to produce more data
than they can manage or interpret
These labs will likely never be able to afford
a qualified data scientist
166. Remember
Trend #2
Even small, moderately-funded laboratories
can now afford to produce more data
than they can manage or interpret
These labs will likely never be able to afford
a qualified data scientist
But if the expert knowledge of data scientists is
encoded in ontologies, and can be discovered
in a contextually-aware manner… then this is a
SOLVED PROBLEM
167. Story #4: Personalized Health Info
Can we make the Health information
on the Web
more “personal”?
168. Remember when I said...
The machine was contextually “aware of”
BOTH the biological model
AND the data it was analysing
169. This “dual-awareness” provides some
very interesting opportunities
for personalizing a patient’s Health Research activity
170. PROBLEM:
Patients are self-educating
both about their personal medical situation
(e.g. getting themselves sequenced)
also surfing the Web, getting dubious advice
from sites of dubious authority
and joining social-health groups
to exchange (often anecdotal)
medical “advice” with other patients
171. PROBLEM:
Patients are self-educating
The information on any given site
may or may not
be relevant to THAT patient
Information on the Web is, by nature, not personalized
172. PROBLEM:
Clinicians often have patients
(especially chronically-ill patients)
on a “trajectory” of treatment
Medicine is complicated!
e.g. the treatment trajectory of the patient can be
multi-step, and a specific sign/symptom might be
perfectly normal at a particular phase in their
“flow” of treatment
173. PROBLEM SUMMARY
Patients are reading non-personalized medical text
of dubious quality and relevance
Clinicians have no way to intervene
in this self-education process
explaining to patients how the information they read
relates to their personal “health trajectory”
174. Now you might see why this is so relevant!
The machine was contextually “aware of”
BOTH the biological model
AND the data it was analysing
175. This is an early prototype of a
Patient-driven Personalized Medicine
Web interface
176. Basically, it is a set of SHARE queries
Attached to a local database
of patient information
Running behind a Web bookmarklet
177. The queries text-mine a Web page
then compare the concepts in the page
to the patient’s personal data
using a SHARE query
178. The queries text-mine a Web page
then compare the concepts in the page
to the patient’s personal data
using a SHARE query
(that could contain ontologies...
...ontologies designed by their clinician!!)
179.
180.
181.
182.
183. Matching based on official
name, compound name,
brand name, trade name,
or “common name”
191. In future iterations, we will enable the workflow
to be further customized through “personalized”
OWL Classes (e.g. Provided by your Clinician!!)
192. These OWL Classes might include information about the
current trajectory of your treatment for a chronic disease,
for example, such that what you read on the Web is
placed in the context of your expert Clinical care...
193. Frankly, I think it’s quite cool that people
patients
are creating and running
“personal health-research” workflows
at the touch of a button!
197. The Semantic Model represents
a possible solution to a problem
By my definition, that is a hypothesis
198. The Semantic Model represents
a possible solution to a problem
That hypothesis is tested by automatically converting it into a workflow;
199. The Semantic Model represents
a possible solution to a problem
That hypothesis is tested by automatically converting it into a workflow;
the workflow, and the results of the workflow are intimately tied to the hypothesis
200. The Semantic Model represents
a possible solution to a problem
i.e. You (or anyone!) can determine exactly which aspect
of the hypothesis led to which output data element, why, and how
201. The Semantic Model represents
a possible solution to a problem
“Exquisite Provenance”
a perfect record not only of what was done, when, and how
but also WHY
204. Richly annotated, citable, and queryable snippets of
scientific knowledge encoded in Linked Data/OWL
i.e. a way to publish data and knowledge on the Semantic Web
210. SADI services consume Linked Data on the Web
The ontologies provided to SHARE are
written in OWL, and are therefore
inherently part of the Web
211. SADI services consume Linked Data on the Web
The ontologies provided to SHARE are
written in OWL, and are therefore
inherently part of the Web
SADI services create novel semantic links
between existing data-points on the Web, or
between existing data and new data
212. SADI services consume Linked Data on the Web
The ontologies provided to SHARE are
written in OWL, and are therefore
inherently part of the Web
SADI services create novel semantic links
between existing data-points on the Web, or
between existing data and new data
The output of the automatically-generated workflow
is therefore Linked Data
and is therefore inherently part of the Web
213. SADI services consume Linked Data on the Web
The ontologies provided to SHARE are
written in OWL, and are therefore
inherently part of the Web
SADI services create novel semantic links
between existing data-points on the Web, or
between existing data and new data
The output of the automatically-generated workflow
is therefore Linked Data
and is therefore inherently part of the Web
The concluding NanoPublications are a combination
of Linked Data and OWL, and are published directly to the Web
214. The Life Science “Singularity”
We
Are
Here!
The Semantic Web is a cradle-to-grave
biomedical research platform
that can, and will, dramatically improve
how biomedical research is done
215. The important people
Luke McCarthy
(SADI/SHARE)
Benjamin Vandervalk
(SHARE)
Dr. Soroush Samadian
(clinical experiments)
Ian Wood
(Experiment-replication experiment)