SlideShare ist ein Scribd-Unternehmen logo
1 von 41
www.Objectivity.com
Nick Quinn
Lead Developer - InfiniteGraph
What are we talking about today?
Not that BaconThis Bacon!
• Intro to the Six Degrees Problem
• What is a Graph Database?
• Why Bacon in Graph Database?
• How we solved the problem
Images Courtesy of IMDB (www.imdb.com)
Six Degrees of Bacon
“…any individual involved in the Hollywood, California film industry
can be linked through his or her film roles to actor Kevin Bacon
within six steps”
[http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon]
Gina Menza
Images Courtesy of IMDB (www.imdb.com)
A Tale of Two Kevins
Why Six Degrees of Bacon?
Actor Age # of Projects
Kevin Bacon 54 76
Harrison Ford 70 70
Tom Cruise 50 40
Julia Roberts 45 50
Tom Hanks 56 73
Denzel Washington 58 53
Michael Caine 80 157
Kiefer Sutherland 46 82
Kevin Bacon
Images Courtesy of IMDB (www.imdb.com)
Bacon Numbers in Google
In the summer of 2012, Google started to allow users to find the
bacon number of any actor simply by following his or her name
with “bacon number”.
Morgan
Freeman
The Dark
Night
Rises
appeared in
Gary
Oldman
appeared in
Kevin
Bacon
Criminal
Law
appeared in
appeared in
www.google.com Graphical Representation
What is a Graph Database ?
The Physical Data Model
• Difference between relational & graph databases
Meetings
P1 Place TimeP2
Alice Denver 5-27-10Bob
Calls
From Time DurationTo
Bob 13:20 25Carlos
Bob 17:10 15Charlie
Payments
From Date AmountTo
Carlos 5-12-10 100000Charlie
Met
5-27-10
Alice
Called
13:20
Bob
Paid
100000
Carlos
Charlie
Called
17:10
Rows/Columns/Tables Relationship/Graph Optimized
Connecting Data
Person Building
?
Work Live RR Visit Eat Shop
Who is Gina Menza?
• How do we get meaning from highly
connected data?
Gina Menza
Jury Forewoman Miss Jeffries
Images Courtesy of IMDB (www.imdb.com)
Strength of Connections Matter!
• Why 6 degrees of separation and not 3.74?
• We need analysis tools in order to
– identify and filter out “unimportant” data and
– infer what needs to be filtered as we investigate it.
“When considering another
person in the world, a friend
of your friend knows a
friend of their friend”
- facebook
Why Bacon in a Graph Database ?
Graph Analysis
• Why use Graph Databases for graph analysis?
– Dynamic on Live Data
– Feedback/Inference
– Optimized for concurrent user access
– Handles big data problems
– Native Graph Traversal API
– Manage memory efficiently
Paths to Bacon
Bacon Number
(Degree of Separation / 2)
# of People
1 2823
2 323677
3 1088560
4 272905
5 22533
6 2300
Using the IMDB (www.imdb.com) data set, we can study how many paths
can be found by degrees of separation from Kevin Bacon. Out of 5,067,124
nodes and 11,505,797 edges, we get the following:
0
200000
400000
600000
800000
1000000
1200000
1 2 3 4 5 6
# of
People
Big Data + Graph = Big Graph Data
4 Degrees of Kevin Bacon
(Breadth First up to 20K connections)
Images generated using the IG Visualizer
Analyzing Bacon
• To be able to perform meaningful
analysis, these are things that you will need:
– Ingest IMDB Dataset – About 50 Formatted
compressed files (Largest > 200 MB)
– Custom algorithm support to perform meaningful
analysis
• Optimize queries to get results back in reasonable time
– Visualization tool to test and view the results of
the navigation (optional)
How IG Sizzles Your Bacon
Ingest
Update
Navigate
Massive graph data require efficient and intelligent tools
to analyze and understand it.
Super Simple Java API
Actor bacon = new Actor(“Kevin Bacon”);
imdbGraphDB.addVertex( bacon );
Movie apollo= new Movie(“Apollo 13”, 1995);
imdbGraphDB.addVertex( apollo );
ActedIn bacon2apollo = new ActedIn(“Jack Swigert”);
imdbGraphDB.addEdge(bacon2apollo, bacon, apollo, Ed
geKind.BIDIRECTIONAL, 1 /**weight**/);
Ingest
Scaling Writes
• Big/Fast data demands write performance
• Most NoSQL solutions allow you to scale
writes by…
– Partitioning the data
– Understanding your consistency requirements
– Allowing you to defer conflicts
Ingest
App-2
(Ingest V2)
App-2
(E23{ V2V3})
Scaling Graph Writes
ACID Transactions
InfiniteGraph
Objectivity/DB Persistence Layer
App-1
(Ingest V1)
App-3
(Ingest V3)
V1 V2 V3
App-1
(E1 2{ V1V2})
App-3
E12 E23
Ingest
High Performance Edge Ingest
IG Core/API
C1
C2
C3
E12
E23
TargetContainers
PipelineContainers
E(1->2)
E(3->1)
E(2->3)
E(2->1)
E(2->3)
E(3->1)
E(1->2)
E(3->2)
E(1->2)
E(2->3)
E(3->1)
E(2->1)
E(2->3)
E(3->1)
E(3->2)
E(1->2)
Pipeline
Agent
Ingest
Trade offs
• Excellent for efficient use of page cache
• Able to maintain full database consistency
• Achieves highest ingest rate in distributed
environments
• Almost always has highest “perceived” rate
• Trading Off :
• Eventual consistency in graph (connections)
• Updates are still atomic, isolated and durable but phased
• External agent performs graph building
Ingest
Result…
1 client
2 clients
4 clients
8 clients
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1
2
4
NodesandEdgespersecond
1 client
2 clients
4 clients
8 clients
Ingest
Scaling Reads and Query
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Partitioning and Read Replicas… easy right !
Why are Graphs Different ?
Distributed API
Application(s)
Partition 1 Partition 3Partition 2 Partition ...n
Processor Processor Processor Processor
Navigate
Distributed Navigation
• Detect local hops and perform in memory
traversal
• Send the partial path to the distributed
processing to continue the navigation.
• Intelligently cache remote data when accessed
frequently
• Route tasks to other hosts when it is optimal
Navigate
Distributed Navigation Server
Processor
Distributed API
Partition 1 Partition 2
Processor
Application
A
X
Y
B
C
D
E
P(A,B,C,D)
F
G
Navigate
GraphViews
Leveraging Schema in the Graph
Patient Prescription
Drug
Ingredient
Outcome
Complaint
Visit
Allergy
Physician
Navigate
Schema Enables Views
• GraphViews are extremely powerful
• Allow Big Data to appear small !
• Connection inference can lead to exponential
gains in query performance
• Views are reusable between queries
• Views can be persisted
• Built into the native kernel
Navigate
Problem of Supernodes
In Graph Theory, a “supernode” is a vertex with a
disproportionally high number of connected edges.
Supernodes make it difficult to do a navigational query in
real-time due to the amount of effort it may be to pursue
paths through it that may be unfruitful.
Navigate
Images generated using the IG Visualizer
Supernodes in Bacon
Navigate
In the IMDB data set, some examples of supernodes may be talk
shows, awards shows, compilations or variety shows.
Images generated using the IG Visualizer
How to avoid supernodes
1. Setting policies on the navigation like the
NoRevisitPolicy , MaximumResultCountPolicy and
MaximumPathDepthPolicy can be used to customize the
overall behavior of the navigation.
PolicyChain policies = new PolicyChain();
// Only traverse the same vertex once
policies.addPolicy(new NoRevisitPolicy());
// limits the number of paths that will be returned to 10K
policies.addPolicy(new MaximumResultCountPolicy(10000));
// limits the path depth to 6
policies.addPolicy(new MaximumPathDepthPolicy(6));
Navigate
How to avoid supernodes
2. Graph View to exclude or limit types
GraphView view = new GraphView();
//Excludes all instances of TvShow from navigation
view.excludeClass(myDb.getTypeId(TvShow.class.getName()));
//Excludes all movies made for TV/Video
view.excludeClass(myDb.getTypeId(Movie.class.getName()), “de
tails.madeForTv || details.madeForVideo”);
//Include ActedIn w/ characterName not containing “Himself”
view.excludeClass(myDb.getTypeId(WorkedOn.class.getName()));
view.includeClass(myDb.getTypeId(ActedIn.class.getName()),
“!CONTAINS(characterName, “Himself”)”);
Navigate
Kevin Bacon
Actor
The
Following
TV Show
Behind the
Scenes
Movie
Apollo 13
Movie
HimselfRyan Hardy
Jack Swigert
How to avoid supernodes
3. Using these policies and graph view, we can
filter the size of the result set in our navigation:
Navigator navigator =
bacon.navigate(view, Guide.SIMPLE_BREADTH_FI
RST, Qualifier.ANY, new
VertexPredicate(Person.class, ""), policies,
myResultHandler);
navigator.start();
Navigate
Filtered Views in Bacon
The results of this navigation would look something like this…
Navigate
Images generated using the IG Visualizer
Why InfiniteGraph™?
• Objectivity/DB is a proven foundation
– Building distributed databases since 1993
– A complete database management system
• Concurrency, transactions, cache, schema, query, indexing
• It’s a Graph Specialist !
– Simple but powerful API tailored for data navigation.
– Easy to configure distribution model
Advanced Configured Placement
• Physically co-locate “closely related” data
• Driven through a declarative placement model
• Dramatically speeds “local” reads
Facility Data Page(s)Patient Data Page(s)
Mr
Citizen
Visit Visit
Dr
Jones
San
Jose
Facility
Dr
Smith
Primary
Physician
HasHas With
At
Located Located
Facility Data Page(s)
Dr
Blake
Sunny-
vale
Dr
Quinn
Located Located
With
At
Fully Distributed Data Model
Zone 2Zone 1
HostA
IG Core/API
Distributed Object and Relationship Persistence Layer
Customizable Placement
HostB HostC HostX
AddVertex()
Polyglot NoSQL Architectures
Distributed Data
Processing
Platform Document
Graph
Database
RDBMS
Partitioned Distributed DB (often Document / KV)
Users
Applications
External/LegacyData
TransformationMDM
Business
What else!
• Distributed update.
Update
… we are working on it.
Conclusion
I hope that you enjoyed the bacon.
My apologies to my kosher friends for any offense.
Look out for new features coming soon!
QUESTIONS?

Weitere ähnliche Inhalte

Ähnlich wie Revisiting the Six Degrees Problem with a Graph Database - Nick Quinn

Web based interactive big data visualization
Web based interactive big data visualizationWeb based interactive big data visualization
Web based interactive big data visualizationWenli Zhang
 
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...Jay Hung
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Introduction to HTML5 & CSS3
Introduction to HTML5 & CSS3Introduction to HTML5 & CSS3
Introduction to HTML5 & CSS3Pravasini Sahoo
 
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)Ontico
 
EscConf - Deep Dive Frontend Optimization
EscConf - Deep Dive Frontend OptimizationEscConf - Deep Dive Frontend Optimization
EscConf - Deep Dive Frontend OptimizationJonathan Klein
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
Ancient To Modern: Upgrading nearly a decade of Plone in public radio
Ancient To Modern: Upgrading nearly a decade of Plone in public radioAncient To Modern: Upgrading nearly a decade of Plone in public radio
Ancient To Modern: Upgrading nearly a decade of Plone in public radioCristopher Ewing
 
Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.CUBRID
 
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...Ontico
 
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday SeasonG3 Communications
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setKognitio
 
01 foundations
01 foundations01 foundations
01 foundationsankit_ppt
 
There and Back Again, A Developer's Tale
There and Back Again, A Developer's TaleThere and Back Again, A Developer's Tale
There and Back Again, A Developer's TaleNeo4j
 
Advanced #6 clean architecture
Advanced #6  clean architectureAdvanced #6  clean architecture
Advanced #6 clean architectureVitali Pekelis
 
Talk Paris Infovis 091207132953 Phpapp01(2)
Talk Paris Infovis 091207132953 Phpapp01(2)Talk Paris Infovis 091207132953 Phpapp01(2)
Talk Paris Infovis 091207132953 Phpapp01(2)johnnybiz
 

Ähnlich wie Revisiting the Six Degrees Problem with a Graph Database - Nick Quinn (20)

Web based interactive big data visualization
Web based interactive big data visualizationWeb based interactive big data visualization
Web based interactive big data visualization
 
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...
Velocity NY 2013 - From Slow to Fast: Improving Performance on Intuit Website...
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Introduction to HTML5 & CSS3
Introduction to HTML5 & CSS3Introduction to HTML5 & CSS3
Introduction to HTML5 & CSS3
 
performance.ppt
performance.pptperformance.ppt
performance.ppt
 
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)
Внедрение SDLC в боевых условиях / Егор Карбутов (Digital Security)
 
EscConf - Deep Dive Frontend Optimization
EscConf - Deep Dive Frontend OptimizationEscConf - Deep Dive Frontend Optimization
EscConf - Deep Dive Frontend Optimization
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Ancient To Modern: Upgrading nearly a decade of Plone in public radio
Ancient To Modern: Upgrading nearly a decade of Plone in public radioAncient To Modern: Upgrading nearly a decade of Plone in public radio
Ancient To Modern: Upgrading nearly a decade of Plone in public radio
 
Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.Growing in the Wild. The story by CUBRID Database Developers.
Growing in the Wild. The story by CUBRID Database Developers.
 
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...
Growing in the wild. The story by cubrid database developers (Esen Sagynov, E...
 
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
5 Steps To Deliver The Fastest Mobile Shopping Experience This Holiday Season
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
01 foundations
01 foundations01 foundations
01 foundations
 
Html5 more than just html5 v final
Html5  more than just html5 v finalHtml5  more than just html5 v final
Html5 more than just html5 v final
 
There and Back Again, A Developer's Tale
There and Back Again, A Developer's TaleThere and Back Again, A Developer's Tale
There and Back Again, A Developer's Tale
 
Ceilosca
CeiloscaCeilosca
Ceilosca
 
Advanced #6 clean architecture
Advanced #6  clean architectureAdvanced #6  clean architecture
Advanced #6 clean architecture
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
Talk Paris Infovis 091207132953 Phpapp01(2)
Talk Paris Infovis 091207132953 Phpapp01(2)Talk Paris Infovis 091207132953 Phpapp01(2)
Talk Paris Infovis 091207132953 Phpapp01(2)
 

Kürzlich hochgeladen

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Kürzlich hochgeladen (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Revisiting the Six Degrees Problem with a Graph Database - Nick Quinn

  • 2. What are we talking about today? Not that BaconThis Bacon! • Intro to the Six Degrees Problem • What is a Graph Database? • Why Bacon in Graph Database? • How we solved the problem Images Courtesy of IMDB (www.imdb.com)
  • 3. Six Degrees of Bacon “…any individual involved in the Hollywood, California film industry can be linked through his or her film roles to actor Kevin Bacon within six steps” [http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon] Gina Menza Images Courtesy of IMDB (www.imdb.com) A Tale of Two Kevins
  • 4. Why Six Degrees of Bacon? Actor Age # of Projects Kevin Bacon 54 76 Harrison Ford 70 70 Tom Cruise 50 40 Julia Roberts 45 50 Tom Hanks 56 73 Denzel Washington 58 53 Michael Caine 80 157 Kiefer Sutherland 46 82 Kevin Bacon Images Courtesy of IMDB (www.imdb.com)
  • 5. Bacon Numbers in Google In the summer of 2012, Google started to allow users to find the bacon number of any actor simply by following his or her name with “bacon number”. Morgan Freeman The Dark Night Rises appeared in Gary Oldman appeared in Kevin Bacon Criminal Law appeared in appeared in www.google.com Graphical Representation
  • 6. What is a Graph Database ?
  • 7. The Physical Data Model • Difference between relational & graph databases Meetings P1 Place TimeP2 Alice Denver 5-27-10Bob Calls From Time DurationTo Bob 13:20 25Carlos Bob 17:10 15Charlie Payments From Date AmountTo Carlos 5-12-10 100000Charlie Met 5-27-10 Alice Called 13:20 Bob Paid 100000 Carlos Charlie Called 17:10 Rows/Columns/Tables Relationship/Graph Optimized
  • 8. Connecting Data Person Building ? Work Live RR Visit Eat Shop
  • 9. Who is Gina Menza? • How do we get meaning from highly connected data? Gina Menza Jury Forewoman Miss Jeffries Images Courtesy of IMDB (www.imdb.com)
  • 10. Strength of Connections Matter! • Why 6 degrees of separation and not 3.74? • We need analysis tools in order to – identify and filter out “unimportant” data and – infer what needs to be filtered as we investigate it. “When considering another person in the world, a friend of your friend knows a friend of their friend” - facebook
  • 11. Why Bacon in a Graph Database ?
  • 12. Graph Analysis • Why use Graph Databases for graph analysis? – Dynamic on Live Data – Feedback/Inference – Optimized for concurrent user access – Handles big data problems – Native Graph Traversal API – Manage memory efficiently
  • 13. Paths to Bacon Bacon Number (Degree of Separation / 2) # of People 1 2823 2 323677 3 1088560 4 272905 5 22533 6 2300 Using the IMDB (www.imdb.com) data set, we can study how many paths can be found by degrees of separation from Kevin Bacon. Out of 5,067,124 nodes and 11,505,797 edges, we get the following: 0 200000 400000 600000 800000 1000000 1200000 1 2 3 4 5 6 # of People
  • 14. Big Data + Graph = Big Graph Data 4 Degrees of Kevin Bacon (Breadth First up to 20K connections) Images generated using the IG Visualizer
  • 15. Analyzing Bacon • To be able to perform meaningful analysis, these are things that you will need: – Ingest IMDB Dataset – About 50 Formatted compressed files (Largest > 200 MB) – Custom algorithm support to perform meaningful analysis • Optimize queries to get results back in reasonable time – Visualization tool to test and view the results of the navigation (optional)
  • 16. How IG Sizzles Your Bacon Ingest Update Navigate Massive graph data require efficient and intelligent tools to analyze and understand it.
  • 17. Super Simple Java API Actor bacon = new Actor(“Kevin Bacon”); imdbGraphDB.addVertex( bacon ); Movie apollo= new Movie(“Apollo 13”, 1995); imdbGraphDB.addVertex( apollo ); ActedIn bacon2apollo = new ActedIn(“Jack Swigert”); imdbGraphDB.addEdge(bacon2apollo, bacon, apollo, Ed geKind.BIDIRECTIONAL, 1 /**weight**/); Ingest
  • 18. Scaling Writes • Big/Fast data demands write performance • Most NoSQL solutions allow you to scale writes by… – Partitioning the data – Understanding your consistency requirements – Allowing you to defer conflicts Ingest
  • 19. App-2 (Ingest V2) App-2 (E23{ V2V3}) Scaling Graph Writes ACID Transactions InfiniteGraph Objectivity/DB Persistence Layer App-1 (Ingest V1) App-3 (Ingest V3) V1 V2 V3 App-1 (E1 2{ V1V2}) App-3 E12 E23 Ingest
  • 20. High Performance Edge Ingest IG Core/API C1 C2 C3 E12 E23 TargetContainers PipelineContainers E(1->2) E(3->1) E(2->3) E(2->1) E(2->3) E(3->1) E(1->2) E(3->2) E(1->2) E(2->3) E(3->1) E(2->1) E(2->3) E(3->1) E(3->2) E(1->2) Pipeline Agent Ingest
  • 21. Trade offs • Excellent for efficient use of page cache • Able to maintain full database consistency • Achieves highest ingest rate in distributed environments • Almost always has highest “perceived” rate • Trading Off : • Eventual consistency in graph (connections) • Updates are still atomic, isolated and durable but phased • External agent performs graph building Ingest
  • 22. Result… 1 client 2 clients 4 clients 8 clients 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 1 2 4 NodesandEdgespersecond 1 client 2 clients 4 clients 8 clients Ingest
  • 23. Scaling Reads and Query Distributed API Application(s) Partition 1 Partition 3Partition 2 Partition ...n Processor Processor Processor Processor Partitioning and Read Replicas… easy right !
  • 24. Why are Graphs Different ? Distributed API Application(s) Partition 1 Partition 3Partition 2 Partition ...n Processor Processor Processor Processor Navigate
  • 25. Distributed Navigation • Detect local hops and perform in memory traversal • Send the partial path to the distributed processing to continue the navigation. • Intelligently cache remote data when accessed frequently • Route tasks to other hosts when it is optimal Navigate
  • 26. Distributed Navigation Server Processor Distributed API Partition 1 Partition 2 Processor Application A X Y B C D E P(A,B,C,D) F G Navigate
  • 27. GraphViews Leveraging Schema in the Graph Patient Prescription Drug Ingredient Outcome Complaint Visit Allergy Physician Navigate
  • 28. Schema Enables Views • GraphViews are extremely powerful • Allow Big Data to appear small ! • Connection inference can lead to exponential gains in query performance • Views are reusable between queries • Views can be persisted • Built into the native kernel Navigate
  • 29. Problem of Supernodes In Graph Theory, a “supernode” is a vertex with a disproportionally high number of connected edges. Supernodes make it difficult to do a navigational query in real-time due to the amount of effort it may be to pursue paths through it that may be unfruitful. Navigate Images generated using the IG Visualizer
  • 30. Supernodes in Bacon Navigate In the IMDB data set, some examples of supernodes may be talk shows, awards shows, compilations or variety shows. Images generated using the IG Visualizer
  • 31. How to avoid supernodes 1. Setting policies on the navigation like the NoRevisitPolicy , MaximumResultCountPolicy and MaximumPathDepthPolicy can be used to customize the overall behavior of the navigation. PolicyChain policies = new PolicyChain(); // Only traverse the same vertex once policies.addPolicy(new NoRevisitPolicy()); // limits the number of paths that will be returned to 10K policies.addPolicy(new MaximumResultCountPolicy(10000)); // limits the path depth to 6 policies.addPolicy(new MaximumPathDepthPolicy(6)); Navigate
  • 32. How to avoid supernodes 2. Graph View to exclude or limit types GraphView view = new GraphView(); //Excludes all instances of TvShow from navigation view.excludeClass(myDb.getTypeId(TvShow.class.getName())); //Excludes all movies made for TV/Video view.excludeClass(myDb.getTypeId(Movie.class.getName()), “de tails.madeForTv || details.madeForVideo”); //Include ActedIn w/ characterName not containing “Himself” view.excludeClass(myDb.getTypeId(WorkedOn.class.getName())); view.includeClass(myDb.getTypeId(ActedIn.class.getName()), “!CONTAINS(characterName, “Himself”)”); Navigate Kevin Bacon Actor The Following TV Show Behind the Scenes Movie Apollo 13 Movie HimselfRyan Hardy Jack Swigert
  • 33. How to avoid supernodes 3. Using these policies and graph view, we can filter the size of the result set in our navigation: Navigator navigator = bacon.navigate(view, Guide.SIMPLE_BREADTH_FI RST, Qualifier.ANY, new VertexPredicate(Person.class, ""), policies, myResultHandler); navigator.start(); Navigate
  • 34. Filtered Views in Bacon The results of this navigation would look something like this… Navigate Images generated using the IG Visualizer
  • 35. Why InfiniteGraph™? • Objectivity/DB is a proven foundation – Building distributed databases since 1993 – A complete database management system • Concurrency, transactions, cache, schema, query, indexing • It’s a Graph Specialist ! – Simple but powerful API tailored for data navigation. – Easy to configure distribution model
  • 36. Advanced Configured Placement • Physically co-locate “closely related” data • Driven through a declarative placement model • Dramatically speeds “local” reads Facility Data Page(s)Patient Data Page(s) Mr Citizen Visit Visit Dr Jones San Jose Facility Dr Smith Primary Physician HasHas With At Located Located Facility Data Page(s) Dr Blake Sunny- vale Dr Quinn Located Located With At
  • 37. Fully Distributed Data Model Zone 2Zone 1 HostA IG Core/API Distributed Object and Relationship Persistence Layer Customizable Placement HostB HostC HostX AddVertex()
  • 38. Polyglot NoSQL Architectures Distributed Data Processing Platform Document Graph Database RDBMS Partitioned Distributed DB (often Document / KV) Users Applications External/LegacyData TransformationMDM Business
  • 39. What else! • Distributed update. Update … we are working on it.
  • 40. Conclusion I hope that you enjoyed the bacon. My apologies to my kosher friends for any offense. Look out for new features coming soon!

Hinweis der Redaktion

  1. Kevin Norwood Bacon, 1958 in Pennsylvania
  2. The “six degrees of kevin bacon” game was created in early 1994 by three Albright College students after Kevin Bacon made a comment in Premiere magazine that he had worked with everyone in Hollywood while promoting “The River Wild”.Initially, he disliked the game because he thought he was being ridiculed, but since, he used it in commercials and has started his own social charity website sixdegrees.org due to its popularity.
  3. First Project: Animal House as “Chip Diller” 1978 According to IMDB, he has acted in 76 projects (Movies, TV shows). More than typical at his age because of his willingness to take small parts…
  4. Bacon Number: How many movies (degrees) separate any hollywood film actor from Kevin Bacon?
  5. A Graph Database is not just a graph style interface on top of a relational database. We store the data in such a way that optimizes around graph use cases. For any element, you can find the adjacent object without having to look it up in an index, instead the element has a direct pointer. Offer some kind of traversal query API and the traversal of relationships is optimized because there are no lookups. Explain the slide: As the number of people goes up, the number of meetings, calls and payments will go up and the size of these tables will grow. If you are interested in graph type questions like how many calls did Bob make or maybe how many times did Bob pay Charlie, it is much easier and much more suited to your needs to store the data in such a way that it is easier to answer those.
  6. So what is a Graph Database? It is a database that represents and stores data in its connected form, as a graph structure. So instead of rows and columns, you have vertex and edge objects. Data structure for representing complicated relationships between entities. Some obvious use cases lie in health care, bioinformatics, social networks, network management, crime detection, fraud analysis, etc.
  7. Gina Menza had a non-speaking part in both Sleepers and Outbreak. She was in “Wag the Dog”, but never really had any future parts. She is not a strong connector between the Kevins.In November of 2011, Facebook and the University of Milan released an annoucement that “there are on average 3.74 degrees of separation between any one Facebook user and another within the US”. Does this kill the six degrees problem? No, because we are not asking for any “Gina Menza” connection, we are looking for “Gary Oldman” connections. In other words, we are looking for meaningful connections between people. For example, the facebook pronouncement breaks down if you are not happy with just two people that liked “Pepsi”, but only with two people that are connected by people that they are tagged in photos with. The strength of the connections matters!
  8. Gina Menza had a non-speaking part in both Sleepers and Outbreak. She was in “Wag the Dog”, but never really had any future parts. She is not a strong connector between the Kevins.In November of 2011, Facebook and the University of Milan released an annoucement that “there are on average 3.74 degrees of separation between any one Facebook user and another within the US”. Does this kill the six degrees problem? No, because we are not asking for any “Gina Menza” connection, we are looking for “Gary Oldman” connections. In other words, we are looking for meaningful connections between people. For example, the facebook pronouncement breaks down if you are not happy with just two people that liked “Pepsi”, but only with two people that are connected by people that they are tagged in photos with. The strength of the connections matters!
  9. Alternate ways to handle graph analysisBatch Processing – MapReduce with Hadoop with some layer on top (Hama, GraphLab, Faunus)Used for global analytics, not real-timeAll in memory - Jung, NetworkX, iGraphRich ecosystem of visualization and algorithmic packagesThe data set size would be limited by the amount of memory that can be used. If ~ millions of edges, the amount of memory is usually too small.
  10. Kevin Bacon – 4 Degrees up to 20,000 connectionsEdge Line Color = Black (which is why it seems blacked out)Green Background = MoviesPink Background = TV ShowsPurple Background = Actors/ActressesWhite Background = Distribution Companies
  11. Some data is only actionable momentarilyIntelligenceIT SecuritySite/page visitFinancial / trading behaviorPresents a different type of challengeLatency of batch data processing becomes problematic
  12. 3 nodes, 2 edges in the middle. If both the edges come in at the same time, you will have to wait for the lock on the vertex in the middle. If this type of ingest is happening a much larger level, you will constantly be waiting for locks on shared vertex objects.
  13. This is our answer to how to scale the ingest. Updates to the vertex objects are staged in target containers and then sorted and moved to pipeline containers where they are picked up and processed by pipeline agents which apply the updates asynchronously.
  14. This an eventually consistent solution. We still support fully ACID transactions, but you have the choice. As you are doing the navigation, you may not necessarily see the update right away. You can set the consistency preference on a transaction basis, so some can be fully ACID and some eventually consistent.
  15. Ingest rates on arrays of machines with pretty modest hardware on racks
  16. In a lot of cases, especially KV stores and even RDBMS’s, it is relatively simple to scale the reads in especially if you have partitioned the data.
  17. In a graph, it looks like more of a mess. It is not as easy to do this where the data is connected and we are performing a navigational query.
  18. Pregel-like: wraps up a message and sends it over to the next hostDistributed Cache: Pieces of graph stored in memory, caches remote data stored frequentlyInfiniteGraph makes smart decisions about when to send a message vs doing it by bringing in remote cache in memory.
  19. InfiniteGraph supports schema, but does not restrict connection types between verticesWe take advantage of schema when applying filters to graph using graph viewsWorking on hybrid schema support
  20. 1: All movies and shows that are associated with Kevin Bacon2: All projects with the word “Show” are highlighted3: All projects with the word “Award” are highlighted4: Just shows the actors associated with the Jay Leno show which is clearly a supernode
  21. Also filtered on billing position on the edge types to only get the top 5 actors/actresses for each project.Path to Kevin Spacey through “Tremors”, Fred Ward and “Henry and June”Path to Morgan Freeman through “White Water Summer”, Sean Astin, and “The Long Way Home”
  22. Mixing multiple databases together and using the strengths of each oneUse caseWeb Scale Application on the Frontend with relevant database to store user profiles, read and writeHadoop/MapReduce – on backend to crunch user data to target relevant advertising to themStore the output of that data in such a way to get good query performanceUse the graph database when the connected data can be seen such a way that it makes sense to retrieve it using graph traversals.