Making sense of the Graph Revolution

www.Objectivity.com

Making Sense of the
Graph Revolution
Nick Quinn, Principal
Engineer, InfiniteGraph
11/21/13

1

Why Call it a Revolution?
• “a forcible overthrow of the current order in
favor of a new system.”
• NoSQL (Not Only SQL)
– Driven by Choice + Big Data Needs
•
•
•
•

Scalable
Performing
Distributed
Highly Available

Big Data + Graph = Big Graph Data
• Social Scale
– 1 billion vertices, 100 billion edges

• Web Scale
– 50 billion vertices, 1 trillion edges

• Brain Scale
– 100 billion vertices, 100 trillion edges
AND GROWING!

Why Call it a Graph Revolution?
• After 2011, NoSQL and Graph database begin to
follow same trend line and forecast.

The Growing Graph Database Landscape

What is a Graph Database?
• A graph database is a native storage engine that
enables efficient storage and retrieval of graph
structured data.
• Graph databases are typically used:
– When the data source is highly connected,
– Where the connections are important (add value to the
data), and
– When the user access pattern requires traversals of
those connections.

What is a Graph Database
• Graph Databases have a unique data model
(Vertices and Edges).
VERTEX

2

N

EDGE

• They are optimized around concurrent access of
persisted data, so users can navigate the data as it
is being added or updated.

Why Use a Graph Database?
Relational Database
Think about the SQL query for finding all links between the two “blue” rows... Good luck!
Table_A

Table_B

Table_C

Table_D

Table_E

Table_F

Relational databases aren’t good at handling complex relationships!

Table_G

Why use a Graph Database?

Relational Database
Table_A

Table_B

Table_C

Table_D

Table_E

Table_F

Table_G

Objectivity/DB or InfiniteGraph - The solution can be found with a few lines of code
A3

G4

Specialized Graph Use Cases
• Cyber Security – Identifying potential cyber threats
and their targets
• Network Management – Offer answers to very
complex navigational queries on a social network
that needs near real-time answers
• Targeted Advertising – Customize marketing to
the consumer by compiling a large knowledge
graph with an integrated recommendation engine

Example 1 - Ad Placement Networks
Smartphone Ad placement - based on the the user’s profile and location data
captured by opt-in applications.

•

The location data can be stored and distilled in a key-value and column store
hybrid database, such as Cassandra

•

The locations are matched with geospatial data to deduce user interests.

•

As Ad placement orders arrive, an application built on a graph database such
as InfiniteGraph, matches groups of users with Ads:

•

Maximizes relevance for the user.

•

Yields maximum value for the advertiser and the placer.

Example 2 - Market Analysis
The 10 companies that control a majority of U.S. consumer goods brands

Example 3 - Seed To Consumer Tracking

?

Supply Chain Management Use Case
• Identify the optimal route for a fleet of trucks at a
particular time of the year is quite complex.
– number of drivers to pay and their salaries
– gas, weather patterns, timing requirements, container
sizes, distances, roads, hazards, repairs

• Find the most optimal route during the winter in
which certain highways will tend to become
hazardous around the Great Lakes.

• Find the most cost-effective route in December with
weather conditions X and highway conditions Y, and
stay below Z latitude while optimizing costs to
achieve a rush delivery
GraphView myView = new GraphView();
myView.excludeClass(myGraphDb.getTypeId(Highway.class.
getName()),“(weather.precipitation > precipitationX &&
weather.temperature < temperatureX) || traffic.speed <
speedY || traffic.accidents > accidentsY ”);
myView.excludeClass(myGraphDb.getTypeId(City.class.get
Name()), “latitude >= Z”);

City origin,target = …; // Use query or index to lookup “origin” & “target” city
VertexIdentifier resultQualifier = new VertexIdentifier(target);
// Set policies
PolicyChain myPolicies = new PolicyChain();
myPolicies.addPolicy(new MaximumPathDepthPolicy(MAXIMUM_STEPS));
myPolicies.addPolicy(new NoRevisitPolicy()); // Don’t revisit the cities more than once
// Define logic on how to process results
NavigationResultHandler myNavHandler = new NavigationResultHandler()
{
@Override
public void handleResultPath(Path result)
{
// The first path returned is the shortest path, but may not be the cheapest
float cost = calculateCost(result);
float time = calculateTime(result);
// Minimize cost
…
}
@Override
public void handleNavigatorFinished(Navigator navigator){}
};
Navigator navigator = origin.navigate(myView, Guide.DEPTH_FIRST_SEARCH, Qualifier.ANY
/** Path Qualifier **/, resultQualifier, myPolicies, myNavHandler);
navigator.start();

Graph Database Challenge #1:
Reading Distributed Data
• If your graph data is distributed, traversing a
desired path across partitions can be extremely
difficult and slow.

Reading Distributed Data
• Mitigate bottlenecks and optimize performance by
using the following strategies:
– Custom Placement: data isolation/localization of
logically related information (to achieve close to
subgraph partitioning) in order to minimize the number of
network calls
– Distributed Navigation Engine: Distributes the load on
the partitions where the data is located.

Reading Distributed Data:
Custom Placement

• Consider the case where you are placing medical data for
hospitals and patients. Using a custom placement model
you can achieve fairly high isolation of the subgraphs.
– Doctor ↔ Hospitals, Patients ↔ Visits.

Distributed Navigation Engine
• Google Pregel (2010)
– Batch algorithms on large graphs
– Avoids passing graph state instead sends messages
– Apache Giraph, Jpregel, Hama
while any vertex is active or max iterations not reached:
for each vertex:  this loop is run in parallel
process messages from neighbors (update internal state)
send messages to neighbors
possibly synchronize results
set active flag (unless no messages or state doesn’t change)

Distributed Navigation Engine
• Pregel is optimized for large distributed graph analytics
• Limitation on Pregel logic: When the traversal is
occurring locally, the logic is to still execute by sending
messages from vertex to vertex
• Ideally, when local, the traversal should be executed in
memory and when remote, pregel logic should be used.
– InfiniteGraph’s Distributed Navigation Engine uses the
QueryServer (oqs) to achieve this optimized behavior.

Supernodes
• A supernode is a vertex with a disproportionally
high number of outgoing edges.
– Inefficient to traverse through these vertices

Supernodes (Avoid the Tonight Show!)
In the IMDB data set, some examples of supernodes may be talk
shows, awards shows, compilations or variety shows.

Supernodes:
GraphViews and Policies
• With InfiniteGraph, we offer two strategies to
addressing the supernode problem within the
navigation context.
– Use GraphViews to filter out vertex or edge types
– Globally limit the number of edges traversed using the
FanoutLimitPolicy

Supernodes:
GraphViews and Policies
• Consider calculating number of links to interesting
companies on LinkedIn.
– If you are connected to recruiters, the navigation result
set can be slowed down and possibly polluted if
traversing through these recruiters.
GraphView myView = new GraphView();
myView.excludeClass(myGraphDb.getTypeId(Person.class
.getName()), “CONTAINS(profession, ‘recruiter’)”;
PolicyChain chain = new PolicyChain();
// Limits # of edges traversed to 10
chain.addPolicy(new FanoutLimitPolicy(10));

Supernodes:
Edge Discovery Methods
• If walking the graph, edge discovery methods are
available on the vertex API allows for easy lookup.
Vertex start = …; // lookup by query or index
// Get all ‘Facebook’ connections
EdgePredicate edgeQualifier = new
EdgePredicate(Knows.class, “how == ‘Facebook’”);
Iterable edgeHandles = start.getEdges(edgeQualifier);

• More edge discovery methods and optimizations
are coming!

Writing Distributed Data
App-1
(E1 2{ V1V21)
(Ingest V })

App-2
(E23{ V2V32)
(Ingest V})

App-3
(Ingest V3)

InfiniteGraph
Objectivity/DB Persistence Layer

VV1
1

EE12
12

VV2
2

EE23
23

VV3
3

Writing Distributed Data
• Concurrent writes (multithreaded, multiprocess
and/or multiuser access) to a database that holds
highly connected data
highly contentious locking behavior
poor write performance retrying transactions

• NoSQL databases with relaxed consistency modes
typically offer higher write performance
– System maintains data integrity (ACID), handles lock
conflicts, optimizes batch processing

Writing Distributed Data:
Accelerated Ingest (Pipelining)
• InfiniteGraph offers relaxed consistency ingest
mode, Accelerated Ingest.
– Vertex, Edge objects are placed immediately
– Edge updates are “pipelined” (no lock contention) and
updates are batch processed (optimized)
– Graph is built up in background
– Achieves highest rate of ingest in distributed
environments

Writing Distributed Data:
Accelerated Ingest (Pipelining)
IG Core/API

EE23
23

Target Containers

EE12
12

E(2->3)

E(1->2)

E(3->1)

E(2->1)

E(1->2)

E(2->3)

E(2->3)
E(3->1)

E(1->2)
E(3->2)

E(2->1)
E(2->3)

E(3->1)

Pipeline
CC1
1

Pipeline Containers

E(1->2)

CC2
2

E(3->1)
E(3->2)

Agent

CC3
3

Acclerated Ingest Performance Results

Tools
• Typically, when databases don’t offer tools for
analysis or visualization, the tools that are used are
general purpose.
• Tools offered by databases are generally integrated
well with native features.
– Sometimes exposing “hidden” features
– These tools can generally be useful for debugging and
development of applications built on top of the database.

Tools: The IG Visualizer
• Excellent for development and debugging of
application built on top of IG database.

Why InfiniteGraph ?
™

• Objectivity/DB is a proven foundation
– Building distributed databases since 1993
– A complete database management system
• Concurrency, transactions, cache, schema, query, indexing

• It’s a Graph Specialist !
– Simple but powerful API tailored for data navigation.
– Easy to configure distribution model

Making sense of the Graph Revolution

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Making sense of the Graph Revolution

Ähnlich wie Making sense of the Graph Revolution (20)

Mehr von InfiniteGraph

Mehr von InfiniteGraph (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Making sense of the Graph Revolution