7. Types of NoSQL data base
• Key-value database
• Document database
• Column family stores
• Graph database
8. Key Value Stores
• Most Based on Dynamo: Amazon Highly
Available Key-Value Store
• Data Model:
– Global key-value mapping
– Big scalable HashMap
– Highly fault tolerant (typically)
• Examples:
– Redis, Riak, Voldemort.
10. Key Value Stores: Pros and Cons
• Pros:
– Simple data model
– Scalable
• Cons
– Create your own “foreign keys”
– Poor for complex data
11. Column Family
• Most Based on BigTable: Google’s Distributed
Storage System for Structured Data
• Data Model:
– A big table, with column families
– Map Reduce for querying/processing
• Examples:
– HBase, HyperTable, Cassandra
12.
13. Column Family: Pros and Cons
• Pros:
– Supports Simi-Structured Data
– Naturally Indexed (columns)
– Scalable
• Cons
– Poor for interconnected data
14. Document Databases
• Data Model:
– A collection of documents
– A document is a key value collection
– Index-centric, lots of map-reduce
• Examples:
– CouchDB, MongoDB
15.
16. Document Databases: Pros and Cons
• Pros:
– Simple, powerful data model
– Scalable
• Cons
– Poor for interconnected data
– Query model limited to keys and indexes
– Map reduce for larger queries
17. Graph Databases
• Data Model:
– Nodes and Relationships
• Examples:
– Neo4j, OrientDB, InfiniteGraph, AllegroGraph
18. Graph Databases: Pros and Cons
• Pros:
– Powerful data model, as general as RDBMS
– Connected data locally indexed
– Easy to query
• Cons
– Sharding ( lots of people working on this)
• Scales UP reasonably well
20. What is a Graph?
• An abstract representation of a set of objects
where some pairs are connected by links.
Object (Vertex, Node)
Link (Edge, Arc, Relationship)
21. Different Kinds of Graphs
• Undirected Graph
• Directed Graph
• Pseudo Graph
• Multi Graph
• Hyper Graph
22. More Kinds of Graphs
• Weighted Graph
• Labeled Graph
• Property Graph
23. What is a Graph Database?
• A database with an explicit graph structure
• Each node knows its adjacent nodes
• As the number of nodes increases, the cost of
a local step (or hop) remains the same
• Plus an Index for lookups
29. What is Neo4j?
• A Graph Database + Lucene Index
• Full ACID (atomicity, consistency, isolation,
durability)
• A schema-free labeled Property Graph
• High Availability (with Enterprise Edition)
• perfect for complex, highly connected data
30. The property graph model
๏Core abstractions:
•Nodes
•Relationships between nodes
•Properties on both
๏ Traversal framework
• High performance
queries on connected
data sets
name = “Jordi”
age = 29
type = KNOWS
time = 4 years
type = car
vendor = “Honda”
model = “Civic”
31. Good For
• Highly connected data (social networks)
• Recommendations (e-commerce)
• Path Finding (how do I know you?)
• A* (Least Cost path)
• Data First Schema (bottom-up, but you still
need to design)
32. Ten Reasons for Choosing Neo4j
• World’s Best and First Graph Database
• Biggest and Most Active Graph Community on the
Planet
o 1,000,000+ downloads, adding 50,000 downloads per month
o 20,000+ graph education registrants
o 20,000+ Meetup members
o 500+ Neo4j events per year
o 100+ technology and service partners
o 200 enterprise subscription customers, including 50+ of the Global 2000
33. • Highly Performant Read and Write Scalability, Without
Compromise
• High Performance Thanks to Native Graph Storage &
Processing
• Easy to Learn
• Easy to Use
• Easier than Ever to Load Your Data into Neo4j
• Whiteboard-friendly Data Modeling to Simplify the
Development Cycle
• Superb Value for Enterprise and Startup Projects
35. // then traverse to find results
start n=(people-index, name, “Andreas”)
match (n)--()--(foaf) return foaf
n
36. Cypher
// get node 0
start a=(0) return a
// traverse from node 1
start a=(1) match (a)-->(b) return b
// return friends of friends
start a=(1) match (a)--()--(c) return c
Pattern Matching Query Language (like SQL for graphs)
37. Social data (customer: brand-name social network)
name = “Mike”
age = 29
disclosure = public
name = “Charlie”
last_name =
“Runkle”
name = “Dani”
last_name =
“California”
age = 27
KNOWS
name = “Hank”
last_name = “Moody”
age = 42
age = 3 days
name =
“Karen”
KNOWS
name = “Marcy Runkle”
38. Spatial data (customer: large telecom company)
name = “Omni Hotel”
lat = 3492848
long = 283823423 length = 7 miles
name = ...
lat, long = ...
name = “Swedland”
lat = 23410349
long = 2342348852
ROAD
name = “The Tavern”
lat = 1295238237
long = 234823492
length = 3 miles
name = ...
ROAD
name = ...
39. Social AND spatial data
name = “Omni Hotel”
lat = 3492848
long = 283823423 weight = 10
name = “Pere”
beer_qual = expert
name = “Maria”
age = 30
beer_qual = non-
existant
LIKES
name = “The Tavern”
lat = 1295238237
long = 234823492
length = 3 miles
name = ...
ROAD
name = “Jordi”
40. Query structure
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
LIMIT 10
41. MATCH
describes the pattern
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
SKIP 5 LIMIT 10
42. WHERE
filters the result set
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
SKIP 5 LIMIT 10
43. RETURN
returns the result rows
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
SKIP 5 LIMIT 10
44. ORDER BY
LIMIT SKIP
sort and paginate
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
SKIP 5 LIMIT 10
45. WITH
combines query parts
like a pipe
MATCH (n:Label)-[:REL]->(m:Label)
WHERE n.prop < 42
WITH n, count(m) as cnt,
collect(m.attr) as attrs
WHERE cnt > 12
RETURN n.prop,
extract(a2 in
filter(a1 in attrs
WHERE a1 =~ "...-.*")
| substr(a2,4,size(a2)-1)]
AS ids
ORDER BY length(ids) DESC
SKIP 5 LIMIT 10
48. MERGE
matches or creates
MERGE (y:Year {year:2014})
ON CREATE
SET y.created = timestamp()
FOREACH (m IN range(1,12) |
MERGE
(:Month {month:m})-[:IN]->(y)
)
49. SET, REMOVE
update attributes and labels
MATCH (year:Year)
WHERE year.year % 4 = 0 OR
year.year % 100 <> 0 AND
year.year % 400 = 0
SET year:Leap
WITH year
MATCH (year)<-[:IN]-(feb:Month {month:2})
SET feb.days = 29
CREATE (feb)<-[:IN]-(:Day {day:29})