managing big data

Introduction to
Neo4j
By: Suveeksha Jain
Mtech 2 sem

Agenda
• Trends in Data
• NOSQL
• What is a Graph?
• What is a Graph Database?
• What is Neo4j?

Data is getting bigger:
“Every 2 days we
create as much
information as we did
up to 2003”
– Eric Schmidt, Google

Less than 10% of the NOSQL Vendors

Types of NoSQL data base
• Key-value database
• Document database
• Column family stores
• Graph database

Key Value Stores
• Most Based on Dynamo: Amazon Highly
Available Key-Value Store
• Data Model:
– Global key-value mapping
– Big scalable HashMap
– Highly fault tolerant (typically)
• Examples:
– Redis, Riak, Voldemort.

Key Value Stores: Pros and Cons
• Pros:
– Simple data model
– Scalable
• Cons
– Create your own “foreign keys”
– Poor for complex data

Column Family
• Most Based on BigTable: Google’s Distributed
Storage System for Structured Data
• Data Model:
– A big table, with column families
– Map Reduce for querying/processing
• Examples:
– HBase, HyperTable, Cassandra

Column Family: Pros and Cons
• Pros:
– Supports Simi-Structured Data
– Naturally Indexed (columns)
– Scalable
• Cons
– Poor for interconnected data

Document Databases
• Data Model:
– A collection of documents
– A document is a key value collection
– Index-centric, lots of map-reduce
• Examples:
– CouchDB, MongoDB

Document Databases: Pros and Cons
• Pros:
– Simple, powerful data model
– Scalable
• Cons
– Poor for interconnected data
– Query model limited to keys and indexes
– Map reduce for larger queries

Graph Databases
• Data Model:
– Nodes and Relationships
• Examples:
– Neo4j, OrientDB, InfiniteGraph, AllegroGraph

Graph Databases: Pros and Cons
• Pros:
– Powerful data model, as general as RDBMS
– Connected data locally indexed
– Easy to query
• Cons
– Sharding ( lots of people working on this)
• Scales UP reasonably well

What is a Graph?
• An abstract representation of a set of objects
where some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc, Relationship)

Different Kinds of Graphs
• Undirected Graph
• Directed Graph
• Pseudo Graph
• Multi Graph
• Hyper Graph

More Kinds of Graphs
• Weighted Graph
• Labeled Graph
• Property Graph

What is a Graph Database?
• A database with an explicit graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of
a local step (or hop) remains the same
• Plus an Index for lookups

Compared to Relational Databases
Optimized for aggregation Optimized for connections

Compared to Key Value Stores
Optimized for simple look-ups Optimized for traversing connected data

Compared to Key Value Stores
Optimized for “trees” of data Optimized for seeing the forest and the
trees, and the branches, and the trunks

What is Neo4j?
• A Graph Database + Lucene Index
• Full ACID (atomicity, consistency, isolation,
durability)
• A schema-free labeled Property Graph
• High Availability (with Enterprise Edition)
• perfect for complex, highly connected data

The property graph model
๏Core abstractions:
•Nodes
•Relationships between nodes
•Properties on both
๏ Traversal framework
• High performance
queries on connected
data sets
name = “Jordi”
age = 29
type = KNOWS
time = 4 years
type = car
vendor = “Honda”
model = “Civic”

Good For
• Highly connected data (social networks)
• Recommendations (e-commerce)
• Path Finding (how do I know you?)
• A* (Least Cost path)
• Data First Schema (bottom-up, but you still
need to design)

Ten Reasons for Choosing Neo4j
• World’s Best and First Graph Database
• Biggest and Most Active Graph Community on the
Planet
o 1,000,000+ downloads, adding 50,000 downloads per month
o 20,000+ graph education registrants
o 20,000+ Meetup members
o 500+ Neo4j events per year
o 100+ technology and service partners
o 200 enterprise subscription customers, including 50+ of the Global 2000

• Highly Performant Read and Write Scalability, Without
Compromise
• High Performance Thanks to Native Graph Storage &
Processing
• Easy to Learn
• Easy to Use
• Easier than Ever to Load Your Data into Neo4j
• Whiteboard-friendly Data Modeling to Simplify the
Development Cycle
• Superb Value for Enterprise and Startup Projects

// then traverse to find results
start n=(people-index, name, “Andreas”)
match (n)--()--(foaf) return foaf
n

Cypher
// get node 0
start a=(0) return a
// traverse from node 1
start a=(1) match (a)-->(b) return b
// return friends of friends
start a=(1) match (a)--()--(c) return c
Pattern Matching Query Language (like SQL for graphs)

Social data (customer: brand-name social network)
name = “Mike”
age = 29
disclosure = public
name = “Charlie”
last_name =
“Runkle”
name = “Dani”
last_name =
“California”
age = 27
KNOWS
name = “Hank”
last_name = “Moody”
age = 42
age = 3 days
name =
“Karen”
KNOWS
name = “Marcy Runkle”

Spatial data (customer: large telecom company)
name = “Omni Hotel”
lat = 3492848
long = 283823423 length = 7 miles
name = ...
lat, long = ...
name = “Swedland”
lat = 23410349
long = 2342348852
ROAD
name = “The Tavern”
lat = 1295238237
long = 234823492
length = 3 miles
name = ...
ROAD
name = ...

Social AND spatial data
name = “Omni Hotel”
lat = 3492848
long = 283823423 weight = 10
name = “Pere”
beer_qual = expert
name = “Maria”
age = 30
beer_qual = non-
existant
LIKES
name = “The Tavern”
lat = 1295238237
long = 234823492
length = 3 miles
name = ...
ROAD
name = “Jordi”

Query structure
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
LIMIT 10

MATCH
describes the pattern
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
SKIP 5 LIMIT 10

WHERE
filters the result set
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
SKIP 5 LIMIT 10

RETURN
returns the result rows
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
SKIP 5 LIMIT 10

ORDER BY
LIMIT SKIP
sort and paginate
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
SKIP 5 LIMIT 10

WITH
combines query parts
like a pipe
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
SKIP 5 LIMIT 10

Collections
powerful datastructure
handlingMATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
AS ids
LIMIT 10

CREATE
creates nodes, relationships
and patterns
CREATE - nodes, rels, structures
CREATE (y:Year {year:2014})
FOREACH (m IN range(1,12) |
CREATE
(:Month {month:m})-[:IN]->(y)
)

MERGE
matches or creates
MERGE (y:Year {year:2014})
ON CREATE
SET y.created = timestamp()
FOREACH (m IN range(1,12) |
MERGE
(:Month {month:m})-[:IN]->(y)
)

SET, REMOVE
update attributes and labels
MATCH (year:Year)
WHERE year.year % 4 = 0 OR
year.year % 100 <> 0 AND
year.year % 400 = 0
SET year:Leap
WITH year
MATCH (year)<-[:IN]-(feb:Month {month:2})
SET feb.days = 29
CREATE (feb)<-[:IN]-(:Day {day:29})

managing big data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie managing big data

Ähnlich wie managing big data (20)

Mehr von Suveeksha

Mehr von Suveeksha (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

managing big data