Presentation to the J. Craig Venter Institute, Dec. 2014

“Shopping for data should be as easy as
shopping for shoes!”
Dr. Carole Goble
Professor, Dept. of Computer Science
University of Manchester

“A little bit of semantics goes a long way”
Dr. James Hendler
Artificial Intelligence Researcher
Rensselaer Polytechnic Institute
One of the originators of the Semantic Web

…but a lot of semantics goes a long, long way!
Mark Wilkinson
Isaac Peral Distinguished Researcher
Director, Fundación BBVA Chair in Biological Informatics
Center for Plant Biotechnology and Genomics
Technical University of Madrid

Making the Web a
biomedical research platform
from hypothesis through to publication

Publication
Discourse
Interpretation
Hypothesis
Experiment

Motivation:
3 intersecting trends in the Life Sciences
that are now, or soon will be,
extremely problematic

TREND #1
NON-REPRODUCIBLE SCIENCE &
THE FAILURE OF PEER REVIEW

Trend #1
Multiple recent surveys of high-throughput biology
reveal that upwards of 50% of published studies
are not reproducible
- Baggerly, 2009
- Ioannidis, 2009

Trend #1
Similar (if not worse!) in clinical studies
- Begley & Ellis, Nature, 2012
- Booth, Forbes, 2012
- Huang & Gottardo, Briefings in Bioinformatics, 2012

Trend #1
“the most common errors are simple,
the most simple errors are common”
At least partially because the
analytical methodology was inappropriate
and/or not sufficiently described
- Baggerly, 2009

Trend #1
These errors pass peer review
The researcher is (sometimes) unaware of the error
The process that led to the error is not recorded
Therefore it cannot be detected during peer-review

Agencies have Noticed!
In March, 2012, the US Institute of Medicine ~said
“Enough is enough!”

Agencies have Noticed!
Institute of Medicine Recommendations
For Conduct of High-Throughput Research:
1. Rigorously-described, -annotated, and -followed data
management and manipulation procedures
2. “Lock down” the computational analysis pipeline once it
Evolution of Translational Omics Lessons Learned and the Path Forward. The
Institute of Medicine of the National Academies, Report Brief, March 2012.
has been selected
3. Publish the analytical workflow in a formal manner,
together with the full starting and result datasets

TREND #2
BIGGER, CHEAPER DATA

Trend #2
High-throughput technologies are becoming
cheaper and easier to use

Trend #2
High-throughput technologies are becoming
cheaper and easier to use
But there are still very few experts trained in
statistical analysis of high-throughput data

Trend #2
The number of job postings for data scientist
positions increased by 15,000% between the
summers of 2011 and 2012
-- Indeed.com job trends data reported by
http://blogs.nature.com/naturejobs/2013/03/18/so-you-want-to-be-a-data-scientist

Trend #2
Therefore
Even small, moderately-funded laboratories
can now afford to produce more data
than they can manage or interpret

Trend #2
Therefore
These labs will likely never be able to afford
a qualified data scientist

TREND #3
“THE SINGULARITY”

The Healthcare
Singularity and the
Age of Semantic
Medicine, Michael
Gillam, et al, The
Fourth Paradigm:
Data-Intensive
Scientific Discovery
Tony Hey (Editor),
2009
Slide adapted with
permission from
Joanne Luciano,
Presentation at
Health Web
Science Workshop
2012, Evanston IL,
USA
June 22, 2012.
Trend #3

“The Singularity”
The X-intercept is where, the moment a discovery is made,
it is immediately put into practice
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USA
June 22, 2012.

You
Are
Here
Scientific research would have to be
conducted within a medium that
immediately interpreted
and disseminated the results...

...in a form that immediately (actively!) affected the
results of other researchers...
You
Are
Here

...without requiring them to be aware
of these new discoveries.
You
Are
Here

3 intersecting
and problematic trends
Non-reproducible science that passes peer-review
Cheaper production of larger and more complex datasets
that require specialized expertise to analyze properly
Need to more rapidly disseminate and use new discoveries

I don’t just want to reproduce
your experiment...

I want to re-use your experiment

In my own laboratory... On MY DATA!

When I do my analysis
I want to draw on the knowledge
of global domain-experts like
statisticians and pathologists...
...as if they were mentors sitting
in the chair beside me.

Please don’t make me find
all of the data and knowledge
that I require to do my experiment
...it simply isn’t possible anymore...
Image from: Mark Smiciklas
Intersection Consulting, cc-nca

Image from AJ Cann
cc-by-a license
I want to support peer review(ers)
so that I do better science.

How do we get there from here?

To overcome these intersecting problems
and to achieve the goals of transparent
reproducible research

We must learn how to
do research IN the Web
Not OVER the Web

The Semantic Web
causally related to

This is the critical bit!
The link is explicitly labeled!
causally related to
???

http://semanticscience.org/resource/SIO_000243
SIO_000243:
<owl:ObjectProperty rdf:about="&resource;SIO_000243">
<rdfs:label xml: lang="en"> is causally related with</rdfs:label>
<rdf:type rdf:resource="&owl;SymmetricProperty"/>
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<dc:description xml:lang="en"> A transitive, symmetric, temporal relation
in which one entity is causally related with another non-identical entity.
</dc:description>
<rdfs:subPropertyOf rdf:resource="&resource;SIO_000322"/>
</owl:ObjectProperty>
causally related with

Semantic Web Technologies
“deep semantics”

Ontology Spectrum
Catalog/
ID
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Value Restrs. General
Logical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty;
– updated by McGuinness.
Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Ontology Spectrum
Catalog/
ID
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Logical
constraints
Most biomedical ontologies
e.g. Gene Ontology

Ontology Spectrum
Catalog/
ID
Ontologies being used in today’s talk
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Logical
constraints
Most biomedical ontologies
e.g. Gene Ontology

Ontology Spectrum
Catalog/
ID
Discovery & Interpretation systems – flexible!
Selected
Logical
Constraints
(disjointness,
inverse, …)
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(Properties)
Informal
is-a
Formal
instance
Logical
constraints
Categorization Systems
Like library shelves, inflexible

Remember, this is the critical bit!
It’s relationships that make
the Semantic Web “Semantic”

Even with “deep semantics”
a lot of important information cannot be represented
on the Semantic Web
For example, all of the data that results from
analytical algorithms and statistical analyses

Varying estimates
put the size of the
Deep Web between
500 and 800 times
larger than the
surface Web

On the WWW
“automation” of
access to Deep Web
data happens through
“Web Services”

There are many suggestions for how to bring the Deep Web
into the Semantic Web using Semantic Web Services (SWS)

Describe input data
Describe output data
Describe how the system manipulates the data
Describe how the world changes as a result

Describe input data
None, so far, has proven to be wildly successful
(in my opinion)

Describe input data
None, so far, has proven to be wildly successful
(in my opinion)
…because describing what a Service does is HARD!

Lord, Phillip, et al. The Semantic Web–ISWC 2004 (2004): 350-364.

Scientific Web Services
are DIFFERENT!

“The service interfaces within bioinformatics are relatively
simple. An extensible or constrained interoperability
framework is likely to suffice for current demands: a fully
generic framework is currently not necessary.”

Scientific Web Services are DIFFERENT!
They’re simpler!
So perhaps we can solve the Semantic Web Service problem
as it pertains to this (important!) domain

With respect to the Semantic Web
What is missing from this list?
Describe input data

The Semantic Web gets its semantics from relationships

The Semantic Web gets its semantics from relationships
In 2008 I published a set of design-patterns
for scientific Semantic Web Services
that focuses on the biological relationship that the Service “exposes”

Design Pattern for
Web Services on the Semantic Web

AACTCTTCGTAGTG...
Web Service
BLAST

AACTCTTCGTAGTG...
SADI
BLAST
has_seq_string
has
homology
to
Terminal Flower
type
gene
species
A. thal.
has_seq_string
sequence
SADI requires you to explicitly declare
as part of your analytical output,
the biological relationship that your
algorithm “exposed”.
AACTCTTCGTAGTG...
sequence

I want to share several stories that demonstrate
the cool things that happen when you use
SADI + deep semantics

Story #1: SHARE
The Semantic Health
and Research Environment

A proof-of-concept workflow orchestrator
+ SADI Semantic Web Service registry
Objective: answer biologists’ questions

The SHARE registry
indexes all of the input/output/relationship
triples that can be generated by all known services
This is how SHARE discovers services

SHARE demonstrations
with increasing
semantic complexity

What is the phenotype of every allele of the
Antirrhinum majus DEFICIENS gene
SELECT ?allele ?image ?desc
WHERE {
locus:DEF genetics:hasVariant ?allele .
?allele info:visualizedByImage ?image .
?image info:hasDescription ?desc
}

WHERE {
}
The query language here is SPARQL
The W3C-approved, standard query language for the Semantic Web

WHERE {
}
Note that there is no “FROM” clause!
We don’t tell it where it should get the information,
The machine has to figure that out by itself...

WHERE {
}
Starting data: the locus “DEF” (Deficiens)

WHERE {
}
Query: A series of relationships v.v. DEF

...and in a few seconds you get your answer.
Based on the relationships in your query, SHARE queried its registry
to automatically discover SADI Services capable of generating those triples

Because it is the Semantic Web
The query results are live hyperlinks
to the respective Database or images
(The answer is IN the Web!)

What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}

What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>
PREFIX ont: <http://ontology.dumontierlab.com/>
PREFIX uniprot: <http://lsrn.org/UniProt:>
SELECT ?gene ?pathway
WHERE {
uniprot:P47989 pred:isEncodedBy ?gene .
?gene ont:isParticipantIn ?pathway .
}
Note again that there is no “From” clause…
I have not told SHARE where to look for the
answer, I am simply asking my question

Two different
providers of
gene
information
(KEGG &
NCBI);
were found &
accessed
Two different
providers of
pathway
information
(KEGG and
GO);
were found &
accessed

The results are all links to the original data
(The answer is IN the Web!)

Show me the latest Blood Urea Nitrogen and Creatinine levels
of patients who appear to be rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}

Show me the latest Blood Urea Nitrogen (BUN) and
Creatinine levels of patients who appear to be
rejecting their transplants
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#>
PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#>
SELECT ?patient ?bun ?creat
FROM <http://sadiframework.org/ontologies/patients.rdf>
WHERE {
?patient rdf:type patient:LikelyRejecter .
?patient l:latestBUN ?bun .
?patient l:latestCreatinine ?creat .
}

Likely Rejecter:
A patient who has creatinine levels
that are increasing over time
- - Mark D Wilkinson’s definition

Likely Rejecter:
…but there is no “likely rejecter”
column or table in our database…
only blood chemistry measurements
at various time-points

Likely Rejecter:
So the data required to answer this question
DOESN’T EXIST!

My definition of a Likely Rejecter is encoded in
a machine-readable document written in the OWL Ontology language
Basically:
“the regression line over creatinine measurements should have an increasing slope”

Our ontology refers to other ontologies (possibly published by other people)
to learn about what the properties of “regression models” are
e.g. that regression models have slopes and intercepts
and that slopes and intercepts have decimal values

SHARE examines the query
Burrows around the Web reading
the various ontologies
then uses the discovered Class definitions as a template
to map a path from what it has, to what it needs, using
SADI services

Based on the Class definition
SHARE decides that it needs to do a
Linear Regression analysis
on the blood creatinine measurements

The conversation between SHARE and the registry
reveals the use of “Deep Semantics”
Q: Is there a SADI service that will consume instances of Patient and give
me instances of LikelyRejector
A: No
Q: Okay... So LikelyRejectors need a regression model of increasing slope
over their BloodCreatinine, so... Is there a SADI service that will consume
BloodCreatinine over time and give me its linear regression model?
A: No
Q: Okay... Blood Creatinine over time is a subclass of data of type
X/Y coordinate, so is there a service that consumes X/Y data and
returns its regression model?
A: Yes  here’s the URL.

The SHARE system utilizes SADI to discover
analytical services on the Web that do linear regression analysis
and sends the data to be analyzed

This happens iteratively
(e.g. SHARE also has to examine the slope of the regression line
using another service, find the “latest” in a series of time measurements, etc.)
There is reasoning after every Service invocation
(i.e. after every clause in the query)
Once it is able to find instances (OWL Individuals)
of the LikelyRejector class, it continues with the
rest of the query

The way SHARE “interprets” data varies
depending on the context of the query
(i.e. which ontologies it reads – Mine? Yours?)
and on what part of the query
it is trying to answer at any given moment
(which ontological concept is relevant to that clause)

Example?
Blood Creatinine measurements
were not dictated to be
Blood Creatinine measurements

Example?
The data had the ‘qualities/properties’ that
allowed one machine to interpret
that they were Blood Creatinine measurements
(e.g. to determine which patients were rejecting)

Example?
But the data also had the ‘qualities/properties’ that
allowed another machine to interpret them as
Simple X/Y coordinate data
(e.g. the Linear Regression calculation tool)

Benefit
of Deep Semantics
Data is amenable to
constant re-interpretation

http://www.flickr.com/people/faernworks/

Story #2: Measurement Units
One example of the “little ways”
that Semantics will help researchers
day-by-day

Units must be harmonized
Don’t leave this up to the researcher
(it’s fiddly, time-consuming, and error-prone)

The Reality of Clinical Datasets
(this is a small snapshot of a dataset we worked on,
courtesy of Dr. Bruce McManus & Janet McManus, from the PROOF COE)
ID HEIGHT WEIGHT SBP CHOL HDL BMI
GR
SBP
GR
CHOL
GR
HDL
GR
pt1 1.82 177 128 227 55 0 0 1 0
pt2 179 196 13.4 5.9 1.7 1 0 1 0
Height in m and cm Chol in mmol/l and mg/l
...and other delicious weirdness 
The clinical analyses described here
were supported in part by the
PROOF Center of Excellence
for the Prevention of Organ Failure

GOAL: reduce the likelihood of errors by
getting the clinical researcher
“out of the loop”
(as per the Institute of Medicine Recommendations)

Experiment:
Reproduce a clinical study
(from >10 years ago)
by logically encoding
the clinical diagnosis guidelines
of the American Heart Association
then ask SHARE to automatically
analyse the patient clinical data

Semantically defining globally-accepted clinical phenotypes;
Building on the expertise of others
SystolicBloodPressure =
GALEN:SystolicBloodPressure and
GALEN is a popular biomedical ontology
but it is largely, like GO, a series of
named but undefined Classes
("sio:has measurement value" some "sio:measurement" and
("sio:has unit" some “om: unit of measure”) and
(“om:dimension” value “om:pressure or stress dimension”) and
"sio:has value" some rdfs:Literal))

relationships like “has measurement valule”
So we use OWL to extend the GALEN
Classes with rich, logical descriptors
that take advantage of rich semantic
and “dimension” and “has unit”

Very general definition
“some kind of pressure unit”
(so that others can build on this as they wish!)

HighRiskSystolicBloodPressure (as defined by Framingham)
SystolicBloodPressure and
sio:hasMeasurement some
(sio:Measurement and
(“sio:has unit” value om:kilopascal) and
(sio:hasValue some double[>= "18.7"^^double])))
Now we are specific to our clinical study (Framingham definitions):
MUST be in kpascal and must be > 18.7

Running the Clinical Analysis
“Select the patients who are at-risk”
SELECT ?record ?convertedvalue ?convertedunit
FROM <./patient.rdf>
WHERE {
?record rdf:type measure:HighRiskSystolicBloodPressure .
?record sio:hasMeasurement ?measurement.
?measurement sio:hasValue ?Pressure.
}
All measurements have now been automatically
harmonized to KiloPascal, because we encoded the
semantics in the model
RecordID Start Val Start Unit Pressure End Unit
Pt1 15 cmHg 19.998 KiloPascal
Pt2 14.6 cmHg 19.465 KiloPascal
Pt1 148 mmHg 19.731 KiloPascal
Pt2 146 mmHg 19.465 KiloPascal

While doing this experiment, we noticed
some interesting anomalies…

Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their analysis
------------------
AHA BMI risk threshold: BMI=25
In our dataset the clinical researcher used BMI=26
------------------
AHA HDL guideline HDL<=1.03mmol/l
The dataset from our researcher: HDL<=0.89mmol/l
-------------------

Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their analysis
These Alterations Were Not Recorded
in Their Study Notes!

Adjusting our Semantic definitions and re-running the analysis
resulted in nearly 100% correspondence with the clinical researcher
HighRiskCholesterolRecord=
PatientRecord and
(sio:hasAttribute some
(cardio:SerumCholesterolConcentration and
sio:hasMeasurement some ( sio:Measurement and
(sio:hasUnit value cardio:mili-mole-per-liter) and
(sio:hasValue some double[>= 5.0]))))
HighRiskCholesterolRecord=
PatientRecord and

Reflect on this for a second... Because this is important!
1. We semantically encoded clinical guidelines
2. We found that clinical researchers did not follow the official guidelines
3. Their “personalization” of the guidelines was unreported
4. Nevertheless, we were able to create “personalized” Semantic Models
5. These models reflect the opinion of an individual domain-expert
6. These models are shared on the Web
7. Can be automatically re-used by others to interpret their own data using
that clinical expert’s viewpoint

PREFIX AHA =http://americanheart.org/measurements/
PREFIX McManus=http://stpaulshospital.org/researchers/mcmanus/
AHA:HighRiskCholesterolRecord
PatientRecord and
McManus:HighRiskCholesterolRecord
PatientRecord and

To do the analysis using AHL guidelines
SELECT ?patient ?risk
WHERE {
?patient rdf:type AHA: HighRiskCholesterolRecord .
?patient ex:hasCholesterolProfile ?risk
}

To do the analysis using McManus’ expert-opinion
SELECT ?patient ?risk
WHERE {
?patient rdf:type McManus:HighRiskCholesterolRecord .
?patient ex:hasCholesterolProfile ?risk
}

Flexibility Transparency
Reproducibility Shareability Comparability
Simplicity Automation

Personalization
(I’m going to return to this point several times)

Story #3: in silico Science
Reproduce a peer-reviewed
scientific publication
by semantically modelling
the problem

The Publication
Discovering Protein Partners of a
Human Tumor Suppressor Protein

Original Study Simplified
Using what is known about protein interactions
in fly & yeast
predict new interactions with this
Human Tumor Suppressor

Semantic Model of the Experiment
OWL

Semantic Model of the Experiment
Note that every word in this
diagram is, in reality, a URL
(it’s a Semantic Web model)
i.e. It refers to the expertise of
other researchers, distributed
around the world on the Web

Set-up the Experimental Conditions
In a local data-file
provide the protein we are interested in
and the two species we wish to use in our comparison
taxon:9606 a i:OrganismOfInterest . # human
uniprot:Q9UK53 a i:ProteinOfInterest . # ING1
taxon:4932 a i:ModelOrganism1 . # yeast
taxon:7227 a i:ModelOrganism2 . # fly

SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
Run the Experiment

SELECT ?protein
FROM <file:/local/workflow.input.n3>
WHERE {
?protein a i:ProbableInteractor .
}
Run the Experiment
This is the URL that leads our computer
to the Semantic model of the problem

SHARE examines the semantic model of
Probable Interactors
Retrieves third-party expertise from the Web
Discusses with SADI
what analytical tools are necessary
Chooses the right tools for the problem
Solves the problem!

SHARE derives (and executes) the following analysis automatically

SHARE is aware of the context of the specific question being asked

There are five very cool things about what you just saw...

was able to create a
workflow based on a
semantic model
1.

was able to create a
COMPUTATIONAL workflow
based on a BIOLOGICAL model
2.

(this is important because we want
who don’t speak computerese!) 2.
this system to be used by clinicians and biologists

The workflow it created, and services
selected, differed depending on the
context of the question
3.

The machine was contextually “aware of”
The workflow it created, and services
chosen, differed depending on the
BOTH the biological model
context of the question
3.
AND the data it was analysing
(...remember this... It will be important later!)

The ontological model was abstract (and
shareable!), but the workflow generated
from that model was explicit and concrete
4.

The ontological model was abstract (and
shareable!), but the workflow generated
from that model was explicit and concrete
4.
This matters because…

Remember
Trend #1

Remember
Trend #1
Here, the methodology leading to a result is explicit
and automatically constructed from an abstract template
so this is (at least in part) a
Solved Problem

The choice of tool-selection was
guided by the knowledge of
worldwide domain-experts encoded in
globally-distributed ontologies
(e.g. Expert high-throughput statisticians, etc...)
5.

The choice of tool-selection was
guided by the knowledge of
worldwide domain-experts encoded in
globally-distributed ontologies
(e.g. Expert high-throughput statisticians, etc...)
And this matters because…
5.

Remember
Trend #2

Remember
Trend #2
But if the expert knowledge of data scientists is
encoded in ontologies, and can be discovered
in a contextually-aware manner… then this is a
SOLVED PROBLEM

Story #4: Personalized Health Info
Can we make the Health information
on the Web
more “personal”?

Remember when I said...

This “dual-awareness” provides some
very interesting opportunities
for personalizing a patient’s Health Research activity

PROBLEM:
Patients are self-educating
both about their personal medical situation
(e.g. getting themselves sequenced)
also surfing the Web, getting dubious advice
from sites of dubious authority
and joining social-health groups
to exchange (often anecdotal)
medical “advice” with other patients

PROBLEM:
Patients are self-educating
The information on any given site
may or may not
be relevant to THAT patient
Information on the Web is, by nature, not personalized

PROBLEM:
Clinicians often have patients
(especially chronically-ill patients)
on a “trajectory” of treatment
Medicine is complicated!
e.g. the treatment trajectory of the patient can be
multi-step, and a specific sign/symptom might be
perfectly normal at a particular phase in their
“flow” of treatment

PROBLEM SUMMARY
Patients are reading non-personalized medical text
of dubious quality and relevance
Clinicians have no way to intervene
in this self-education process
explaining to patients how the information they read
relates to their personal “health trajectory”

Now you might see why this is so relevant!

This is an early prototype of a
Patient-driven Personalized Medicine
Web interface

Basically, it is a set of SHARE queries
Attached to a local database
of patient information
Running behind a Web bookmarklet

The queries text-mine a Web page
then compare the concepts in the page
to the patient’s personal data
using a SHARE query

The queries text-mine a Web page
then compare the concepts in the page
to the patient’s personal data
using a SHARE query
(that could contain ontologies...
...ontologies designed by their clinician!!)

Matching based on official
name, compound name,
brand name, trade name,
or “common name” 

Still needs some work...
??!?!?

Link out to PubMed
Why the alert?

The SADI+SHARE workflow and reasoning was
personalized to YOUR medical data

In future iterations, we will enable the workflow
to be further customized through “personalized”
OWL Classes (e.g. Provided by your Clinician!!)

These OWL Classes might include information about the
current trajectory of your treatment for a chronic disease,
for example, such that what you read on the Web is
placed in the context of your expert Clinical care...

Frankly, I think it’s quite cool that people
patients
are creating and running
“personal health-research” workflows
at the touch of a button!

Almost the end…
Three brief final points....

Publication
Discourse
Interpretation
Hypothesis
Experiment
?
?

The Semantic Model represents
a possible solution to a problem

By my definition, that is a hypothesis

That hypothesis is tested by automatically converting it into a workflow;

That hypothesis is tested by automatically converting it into a workflow;
the workflow, and the results of the workflow are intimately tied to the hypothesis

i.e. You (or anyone!) can determine exactly which aspect
of the hypothesis led to which output data element, why, and how

“Exquisite Provenance”
a perfect record not only of what was done, when, and how
but also WHY

And this is important because...

“Exquisite Provenance”
is required
for the output data and knowledge
to be published as...

Richly annotated, citable, and queryable snippets of
scientific knowledge encoded in Linked Data/OWL
i.e. a way to publish data and knowledge on the Semantic Web

A “modest” vision for
pure in silico Science

Last point… perhaps this is not yet obvious…

SADI services consume Linked Data on the Web

The ontologies provided to SHARE are
written in OWL, and are therefore
inherently part of the Web

SADI services create novel semantic links
between existing data-points on the Web, or
between existing data and new data

The output of the automatically-generated workflow
is therefore Linked Data
and is therefore inherently part of the Web

The output of the automatically-generated workflow
is therefore Linked Data
and is therefore inherently part of the Web
The concluding NanoPublications are a combination
of Linked Data and OWL, and are published directly to the Web

The Life Science “Singularity”
We
Are
Here!
The Semantic Web is a cradle-to-grave
biomedical research platform
that can, and will, dramatically improve
how biomedical research is done

The important people
Luke McCarthy
(SADI/SHARE)
Benjamin Vandervalk
(SHARE)
Dr. Soroush Samadian
(clinical experiments)
Ian Wood
(Experiment-replication experiment)

Presentation to the J. Craig Venter Institute, Dec. 2014

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Presentation to the J. Craig Venter Institute, Dec. 2014

Ähnlich wie Presentation to the J. Craig Venter Institute, Dec. 2014 (20)

Mehr von Mark Wilkinson

Mehr von Mark Wilkinson (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Presentation to the J. Craig Venter Institute, Dec. 2014