the power of graphs for analyzing biological datasets
Davy Suvee
Janssen Pharmaceutica
about me
who am i ...
➡ working as an it lead / software architect @ janssen pharmaceutica
• dealing with big scientific data sets
• hands-on expertise in big data and NoSQL technologies
➡ founder of datablend
• provide big data and NoSQL consultancy
Davy Suvee • share practical knowledge and big data use cases via blog
@DSUVEE
outline
➡ getting visual insights into big data sets
★ gene expression clustering (mongodb, Neo4j, Gephi)
★ Mutation prevalence (cassandra, Neo4j, Gephi)
➡ fluxgraph, a time machine for you graphs ...
insights in big data
➡ typical approach through warehousing
★ star schema with fact tables and dimension tables
insights in big data
➡ typical approach through warehousing
★ star schema with fact tables and dimension tables
insights in big data
★ real-time visualization
★ filtering
★ metrics
★ layouting
1, 2
★ modular
1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
gene expression clustering
➡ oncology data set:
★ 4.800 samples
★ 27.000 genes
➡ Question:
★ for a particular subset of samples,
which genes are co-expressed?
graphs and time ...
➡ reproducible graph state
➡ towards a time-aware graph ...
➡ fluxgraph: a blueprints-compatible graph on top of Datomic
➡ make FluxGraph fully time-aware
★ travel your graph through time
★ time-scoped iteration of vertices and edges
★ temporal graph comparison
travel through time
FluxGraph fg = new FluxGraph();
Davy
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
travel through time
FluxGraph fg = new FluxGraph();
Davy
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
Vertex michael = ...
Michael
travel through time
FluxGraph fg = new FluxGraph();
Davy
kn
ow
Vertex davy = fg.addVertex();
s
davy.setProperty(“name”,”Davy”);
Peter
Vertex peter = ...
Vertex michael = ...
Edge e1 = Michael
fg.addEdge(davy, peter,“knows”);
travel through time
Davy
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
Michael
travel through time
David
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
Michael
travel through time
David
Date checkpoint = new Date();
kn
ow
s
davy.setProperty(“name”,”David”); Peter
kn
Edge e2 =
ow
fg.addEdge(davy, michael,“knows”);
s
Michael
travel through time by default
time
kn
Davy ow David
Davy
s
kn
ow
checkpoint
s
current
Peter Peter
kn
ow
s
Michael Michael
travel through time
time
kn
Davy ow David
Davy
s
kn
ow
checkpoint
s
current
Peter Peter
kn
ow
s
Michael Michael
fg.setCheckpointTime(checkpoint);
time-scoped iteration
t1 t2 t3 tcurrrent
change change change
Davy Davy’ Davy’’ Davy’’’
➡ how to find the version of the vertex you are interested in?
time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
time-scoped iteration
t1 t2 t3 tcurrrent
next next next
Davy Davy’ Davy’’ Davy’’’
previous previous previous
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
Interval valid = davy.getTimerInterval();
time-scoped iteration
➡ When does an element change?
➡ vertex:
★ setting or removing a property
★ add or remove it from an edge
★ being removed
time-scoped iteration
➡ When does an element change?
➡ vertex: ➡ edge:
★ setting or removing a property ★ setting or removing a property
★ add or remove it from an edge ★ being removed
★ being removed
time-scoped iteration
➡ When does an element change?
➡ vertex: ➡ edge:
★ setting or removing a property ★ setting or removing a property
★ add or remove it from an edge ★ being removed
★ being removed
➡ ... and each element is time-scoped!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph! David
difference ( , )=
kn
ow
s
use case: longitudinal patient data
t1 t2 t3 t4 t5
smoking smoking death
patient patient patient patient patient
cancer cancer
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
➡ example analysis:
★ if a male patient is no longer smoking in 2005
★ what are the chances of getting lung cancer in 2010, comparing
patients that smoked before 2005
patients that never smoked
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males =
fg.getVertices("gender", "male").iterator()
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males =
fg.getVertices("gender", "male").iterator()
while (males.hasNext()) {
Vertex p2005 = males.next();
boolean smoking2005 =
p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();
}
use case: longitudinal patient data
➡ which patients were smoking before 2005?
boolean smokingBefore2005 =
((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() {
public TimeAwareElement filter(TimeAwareVertex element) {
return element.getEdges(OUT, "smokingStatus").iterator().hasNext()
? element : null;
}
}).iterator().hasNext();
use case: longitudinal patient data
➡ which patients have cancer in 2010
working set of smokers
Graph g =
fg.difference(smokerws,
time2010.toDate(),
time2005.toDate());
use case: longitudinal patient data
➡ which patients have cancer in 2010
working set of smokers
Graph g =
fg.difference(smokerws,
time2010.toDate(),
time2005.toDate());
➡ extract the patients that have an edge to the cancer node