SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
Thursday
23.5
graphdatabases
aboutme
whoami...
DavySuvee
@DSUVEE
➡ bigdataarchitect@datablend-continuum
• providebigdataandnosqlconsultancy
• sharepracticalknowledgeandbigdatausecasesviablog
BigData
2-3yearsago...
Nowadays...
BigData
Whatisbigdata...
...largeandcomplexdatasetsthataredifficultto
processwithtraditionaldatabasemanagementtools...
Whatisbigdata...
BigData
...largeandcomplexdatasetsthataredifficultto
processwithtraditionaldatabasemanagementtools...
➡ store (nosql)
➡ enrich (datamining,ml,nlp,...)
➡ visualize (d3,gephi,mapbox,tableau,...)
➡ process/analyze (map/reduce,cep,storm,...)
Volume Variety Velocity
Dataexceedsthelimitsofvertically
scalabletoolsrequiringnovelstorage
solutions
Datatakesdifferentformatsthatmake
integrationcomplexandexpensive
Dataanalysistimewindowsaresmall
comparedtothespeedofdataacquistion
Theworldhaschanged...
Tacklingthevolumeproblem...
➡ Throwingourdataaway :-(
Whatwearecurrentlydoing...
➡ Storingpreprocesseddata :-/
➡ Trytostoreitanyway ;-(
Butwhy?
Tacklingthevolumeproblem...
VerticalScaling
€
Your database
Tacklingthevolumeproblem...
VerticalScaling
€
2
Your database
Tacklingthevolumeproblem...
VerticalScaling
€
3
Your database
Tacklingthevolumeproblem...
VerticalScaling
€
4
Your database
Tacklingthevolumeproblem...
VerticalScaling
€
4
HorizontalScaling
€ x #nodes
Your database
NoSQL
Tacklingthevarietyproblem...
Video
Audio
Socialstreams
Logfiles
Text
Massive
Unstuctured
Tacklingthevarietyproblem...
One,schema-structuredmodel Best-fit,schema-lessmodel
Your database
NoSQL
Key-ValueDatabases
Document-BasedDatabases
GraphDatabases
Wide-columnDatabases
ASIS...
Tacklingthevelocityproblem...
➡ Collect
Wewantto ...
➡ Process
➡ Query
in Real-Time
MASSIVEamountsof
Unstructured data
➡ Analyze
Tacklingthevelocityproblem...
Slowandoutdatedinformation Fastandrealtime
Your stack
NoSQL &
Big Data
BI
ETL
APP
SYNC
SYNC
APP
Map-Reduce
BI
(+ANALYTICS)
graphsareeverywhere...
alittlebitofgraphtheory...
Davy
age = 33
Datablend
btw = 123...
node/vertex
Janssen
sector = pharma
Kim
age = 26
gender = F
edge
founded
in: 2011
worked_for
from: 2008
to: 2013
knowssince: 2013
Advantages...?
➡ whiteboardfriendly
➡ schema-less
➡ index-freeadjacency(nojoins!)
Graph
Database
➡ queriesastraversals
➡ queriesaspatternmatching
Advantages...?
Products/projects...?
➡ databases:neo4j,orientdb,allegrograph,dex,...
➡ processing:pregel,giraph,hama,goldenorb,...
➡ APIs:blueprints
Graph
Database ➡ querylanguages:gremlin,cypher,sparql
Graphdatabase101 (neo4j)
GraphDatabaseService graph = ...
Node davy = graph.createNode();
davy.setProperty(“name”,”Davy”);
Davy
KimNode kim = graph.createNode();
kim.setProperty(“name”,”Kim”);
Graphdatabase101 (neo4j)
enum RelTypes implements RelationshipType
{
KNOWS, WORKED_FOR, FOUNDED
}
Davy
Kim
knows
Relationship davy_kim =
davy.createRelationshipTo(kim, RelTypes.KNOWS)
davy_kim.setProperty(“since”, 2013);
Graphdatabase101 (neo4j)
Relationship davy_datablend =
davy.createRelationshipTo(
datablend, RelTypes.FOUNDED)
davy_datablend.setProperty(“in”, 2011);
Davy
Datablend
founded
➡ howtoaccessthedatablend node?
Graphdatabase101 (neo4j)
Index<Node> nodeIndex =
graph.index().forNodes(“nodes”);
Node datablend = graph.createNode();
datablend.setProperty(“name”,”Datablend”);
nodeIndex.add(datablend, “name”, “Datablend”);
Node found = nodeIndex.get(“name”,”Datablend”).getSingle();
Graphdatabase101 (neo4j)
➡ findfriendsofmyfriends...
TraversalDescription td =
Traversal.description()
          .breadthFirst()
          .relationships(RelTypes.KNOWS, Direction.OUTGOING)
          .evaluator(Evaluators.toDepth(2));
Traverser traverser = td.traverse(davy);
for (Path path : traverser) { ... }
Graphdatabase101 (neo4j)
➡ findfriendsofmyfriends...
START davy=node:node_auto_index(name = “Davy”)
MATCH davy-[:KNOWS]->()-[:KNOWS]->fof
RETURN davy, fof
ExecutionEngine engine = new ExecutionEngine(graph);
ExecutionResults result = engine.execute(query);
for(Map<String,Object> row : result) { ... }
Usecases...?
➡ recommendations
➡ accesscontrol
➡ routing
Graph
Database ➡ socialcomputing/networks
➡ genealogy
insightsinbigdata
➡ typicalapproachthroughwarehousing
★ starschemawithfacttablesanddimensiontables
insightsinbigdata
➡ typicalapproachthroughwarehousing
★ starschemawithfacttablesanddimensiontables
insightsinbigdata
➡ typicalapproachthroughwarehousing
★ starschemawithfacttablesanddimensiontables
insightsinbigdata
★ real-timevisualization
★ filtering
★ metrics
★ layouting
★ modular1,2
1.http://gephi.org/plugins/neo4j-graph-database-support/ 2.http://github.com/datablend/gephi-blueprints-plugin
geneexpressionclustering
★ 4.800samples
★ 27.000genes
➡ oncologydataset:
➡ Question:
★ for a particular subset of samples,
whichgenesareco-expressed?
mongodbforstoringgeneexpressions
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,
  "sample_name" : "122551hp133a21.cel" ,
  "genomics_id" : 122551 ,
  "sample_id" : 343981 ,
  "donor_id" : 143981 ,
  "sample_type" : "Tissue" ,
  "sample_site" : "Ascending colon" ,
  "pathology_category" : "MALIGNANT" ,
  "pathology_morphology" : "Adenocarcinoma" ,
  "pathology_type" : "Primary malignant neoplasm of colon" ,
  "primary_site" : "Colon" ,
  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,
                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,
                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,
                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,
                     … ]
}
pearsoncorrelationthroughmap-reduce
pearson correlation
x y
43 99
21 65
25 79
42 75
57 87
59 81
0,52
co-expressiongraph
➡ createanodeforeachgene
➡ ifcorrelationbetweentwogenes>=0.8,drawanedgebetweenbothnodes
co-expressiongraph
mutationprevalence
mutationprevalence
mutationprevalence
mutationprevalence
analyzingrunningdata
<trkpt lon="4.723870977759361" lat="51.075748661533">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:39.000Z</time>
</trkpt>
<trkpt lon="4.724105251953006" lat="51.075623352080584">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:45.000Z</time>
</trkpt>
<trkpt lon="4.724143054336309" lat="51.07560558244586">
    <ele>29.799999237060547</ele>
    <time>2011-11-08T19:18:46.000Z</time>
</trkpt>
analyzingrunningdatathroughneo4j
➡ usingneo4jspatialextension
➡ createanodeforeachtrackedpoint
List<GeoPipeFlow> closests =
GeoPipeline.startNearestNeighborLatLonSearch(
runningLayer, to, 0.02).
sort("OrthodromicDistance").
getMin("OrthodromicDistance").toList();
➡connectsucceedingtrackingnodesinagraph
analyzingrunningdata
analyzinggoogleanalyticsdata
➡ sourceurl->targeturl
graphsandtime...
➡ fluxgraph:ablueprints-compatiblegraphontopofDatomic
➡ makeFluxGraphfullytime-aware
★ travelyourgraphthroughtime
★ time-scopediterationofverticesandedges
★ temporalgraphcomparison
➡ towardsatime-awaregraph...
➡ reproduciblegraphstate
travelthroughtime
FluxGraph fg = new FluxGraph();
travelthroughtime
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Davy
travelthroughtime
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Davy
Kim
Vertex kim = ...
travelthroughtime
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Davy
Kim
Vertex kim = ...
Vertex peter = ...
travelthroughtime
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
Peter
Davy
Kim
Vertex kim = ...
Vertex peter = ...
Edge e1 =
fg.addEdge(davy, kim, “knows”);
knows
travelthroughtime
Peter
Davy
Kim
knows
travelthroughtime
Date checkpoint = new Date();
Peter
Davy
Kim
knows
travelthroughtime
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Peter
Davy
Kim
knows
travelthroughtime
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Peter
Kim
knows
David
travelthroughtime
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Peter
Kim
Edge e2 =
fg.addEdge(davy, peter, “knows”);
knows
David
knows
travelthroughtime
Peter
Davy
Kim
DavidDavy
Kim
knows
knows
Peter
knows
checkpoint
current
time
by default
travelthroughtime
Peter
Davy
Kim
DavidDavy
Kim
knows
knows
Peter
knows
checkpoint
current
time
fg.setCheckpointTime(checkpoint);
travelthroughtime
Peter
Davy
Kim
DavidDavy
Kim
knows
knows
Peter
knows
checkpoint
current
time
fg.setCheckpointTime(checkpoint);
tcurrrentt3t2
time-scopediteration
change change change
Davy’’’Davy’ Davy’’
t1
Davy
➡howtofindtheversionofthevertexyouareinterestedin?
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
next next next
previouspreviousprevious
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
next next next
previouspreviousprevious
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
next next next
previouspreviousprevious
tcurrrentt3t2
time-scopediteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
Interval valid = davy.getTimerInterval();
PeterPeter
Davy
Kim
David Davy
Kim
temporalgraphcomparison
knows
knows
knows
current checkpoint
whatchanged?
temporalgraphcomparison
➡difference(A,B) = union(A,B)-B
➡...asa(immutable)graph!
difference ( , ) =
David
knows
t3t2t1
usecase:longitudinalpatientdata
patient patient
smoking
patient
smoking
t4
patient
cancer
t5
patient
cancer
death
usecase:longitudinalpatientdata
➡ historicaldatafor15.000patientsoveraperiodof10years(2001-2010)
➡ exampleanalysis:
★ ifamalepatientisnolongersmokingin2005
★ whatarethechancesofgettinglungcancerin2010,comparing
patientsthatsmokedbefore2005
patientsthatneversmoked
FluxGraph
http://github.com/datablend/fluxgraph
➡availableongithub
OpenInnovationNetworkingTool
➡ Manydifferentprojects,manydifferentpartners,manydifferentdomains...
★ howdowekeeptrack?
★ howcanwelearnfromthedata?
➡ Storethedateinit’smostnaturalform,agraph
➡usegraphalgorithmstoidentifytheimportanceofeachnodeandtheirrelatedones
OpenInnovationNetworkingTool
OpenInnovationNetworkingTool
Moregraphs...
➡ pharma
➡ geospatial
➡ dependencyanalysis
➡ ontology
➡ ...
Questions?
E-MAIL
info@datablend.be
Followus
twitter.com/data_blend
www.datablend.be
www.datablend.be info@datablend.be 0499/05.00.89
datablend-continuum

Weitere ähnliche Inhalte

Ähnlich wie Introduction to Graph Databases @ SAI

Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
Julien SIMON
 

Ähnlich wie Introduction to Graph Databases @ SAI (20)

Text mining lab (summer 2017) - Word Vector Representation
Text mining lab (summer 2017) - Word Vector RepresentationText mining lab (summer 2017) - Word Vector Representation
Text mining lab (summer 2017) - Word Vector Representation
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) Learning
 
Matplotlib demo code
Matplotlib demo codeMatplotlib demo code
Matplotlib demo code
 
Machine Learning and Go. Go!
Machine Learning and Go. Go!Machine Learning and Go. Go!
Machine Learning and Go. Go!
 
Welcome to python
Welcome to pythonWelcome to python
Welcome to python
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
Clojure for Data Science
Clojure for Data ScienceClojure for Data Science
Clojure for Data Science
 
Deep Learning for Developers
Deep Learning for DevelopersDeep Learning for Developers
Deep Learning for Developers
 
Interactive Visualization With Bokeh (SF Python Meetup)
Interactive Visualization With Bokeh (SF Python Meetup)Interactive Visualization With Bokeh (SF Python Meetup)
Interactive Visualization With Bokeh (SF Python Meetup)
 
Julia: The language for future
Julia: The language for futureJulia: The language for future
Julia: The language for future
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Getting more out of Matplotlib with GR
Getting more out of Matplotlib with GRGetting more out of Matplotlib with GR
Getting more out of Matplotlib with GR
 
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser BootsmaDSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
DSD-INT 2018 Work with iMOD MODFLOW models in Python - Visser Bootsma
 
Spark DataFrames for Data Munging
Spark DataFrames for Data MungingSpark DataFrames for Data Munging
Spark DataFrames for Data Munging
 

Mehr von datablend (6)

MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analytics
 
Coalition cocktail - hack the elections
Coalition cocktail - hack the electionsCoalition cocktail - hack the elections
Coalition cocktail - hack the elections
 
FluxGraph @ GraphDevRoom
FluxGraph @ GraphDevRoomFluxGraph @ GraphDevRoom
FluxGraph @ GraphDevRoom
 
The power of graphs to analyze biological data
The power of graphs to analyze biological dataThe power of graphs to analyze biological data
The power of graphs to analyze biological data
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
8 things I like about Datomic
8 things I like about Datomic8 things I like about Datomic
8 things I like about Datomic
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Introduction to Graph Databases @ SAI