Neo4j Introduction at Imperial College London

An Introduction to
Neo4j
Michal Bachman
@bachmanm

Roadmap
• Intro to NOSQL
• Intro to Graph Databases
• Intro to Neo4j
• A bit of hacking
• Current research
• Q&A

@bachmanm

Not Only SQL

@bachmanm

Why NOSQL now?

Driving trends

@bachmanm

Trend 1: Data Size

@bachmanm

Trend 2: Connectedness
GGG
Onotologies

RDFa

Folksonomies
Information connectivity

Tagging

Wikis

UGC

Blogs

Feeds

Hypertext
Text
Documents

@bachmanm

Trend 3: Semi-structured Data

@bachmanm

Trend 4: Application Architecture (80’s)

Application

DB

@bachmanm

Trend 4: Application Architecture (90’s)

App App App

DB

@bachmanm

Application Application Application

DB DB DB

@bachmanm

Side note: RDBMS performance
Salary List

@bachmanm

Four NOSQL Categories

@bachmanm

Key-Value Stores
• “Dynamo: Amazon’s Highly Available Key-
Value Store” (2007)
• Data model:
– Global key-value mapping
– Big scalable HashMap
– Highly fault tolerant (typically)
• Examples:
– Riak, Redis, Voldemort

@bachmanm

Pros and Cons
• Strengths
– Simple data model
– Great at scaling out horizontally
• Scalable
• Available
• Weaknesses:
– Simplistic data model
– Poor for complex data

@bachmanm

Column Family (BigTable)
• Google’s “Bigtable: A Distributed Storage
System for Structured Data” (2006)
• Data model:
– A big table, with column families
– Map-reduce for querying/processing
• Examples:
– HBase, HyperTable, Cassandra

@bachmanm

Pros and Cons
• Strengths
– Data model supports semi-structured data
– Naturally indexed (columns)
– Good at scaling out horizontally
• Weaknesses:
– Unsuited for interconnected data

@bachmanm

Document Databases
• Data model
– Collections of documents
– A document is a key-value collection
– Index-centric, lots of map-reduce
• Examples
– CouchDB, MongoDB

@bachmanm

Pros and Cons
• Strengths
– Simple, powerful data model (just like SVN!)
– Good scaling (especially if sharding supported)
• Weaknesses:
– Unsuited for interconnected data
– Query model limited to keys (and indexes)
• Map reduce for larger queries

@bachmanm

Graph Databases
• Data model:
– Nodes with properties
– Named relationships with properties
– Hypergraph, sometimes
• Examples:
– Neo4j (of course), Sones GraphDB, OrientDB,
InfiniteGraph, AllegroGraph

@bachmanm

Pros and Cons
• Strengths
– Powerful data model
– Fast
• For connected data, can be many orders of magnitude
faster than RDBMS
• Weaknesses:
– Sharding
• Though they can scale reasonably well
• And for some domains you can shard too!

@bachmanm

Social Network “path exists”
Performance
• Experiment:
• ~1k persons # persons query time

• Average 50 friends per Relational 1000 2000ms
database
person
Neo4j 1000 2ms
• pathExists(a,b)
Neo4j 1000000 2ms
limited to depth 4
• Caches warm to
eliminate disk IO

@bachmanm

What are graphs good for?
• Recommendations
• Business intelligence
• Social computing
• Geospatial
• MDM
• Systems management
• Web of things
• Genealogy
• Time series data
• Product catalogue
• Web analytics
• Scientific computing (especially bioinformatics)
• Indexing your slow RDBMS
• And much more!

@bachmanm

Neo4j is a Graph Database

So we need to detour through a little
graph theory

@bachmanm

Meet Leonhard Euler
• Swiss mathematician
• Inventor of Graph
Theory (1736)

@bachmanm
http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg

http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg @bachmanm

Property Graph Model
name: Michal Bachman

• nodes / vertices
• relationships / edges
title: Intro to Neo4j
• properties duration: 45

name: Neo4j name: NOSQL

@bachmanm

Graphs are very whiteboard-friendly

@bachmanm

32 billion nodes
32 billion relationships
64 billion properties
@bachmanm

http://opfm.jpl.nasa.gov/

@bachmanm

http://news.xinhuanet.com

@bachmanm

Community

Advanced

Enterprise

@bachmanm

How do I use it?

@bachmanm

Getting started is easy
• Single package download, includes server stuff
– http://neo4j.org/download/
• For developer convenience, Ivy (or whatever):
– <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/>

@bachmanm

Run it!
• Server is easy to start stop
– cd <install directory>
– bin/neo4j start
– bin/neo4j stop
• Provides a REST API in addition to the other
APIs we’ve seen
• Provides some ops support
– JMX, data browser, graph visualisation

@bachmanm

Embed it!
• If you want to host the database in your
process just load the jars

• And point the config at the right place on disk

• Embedded databases can be HA too
– You don’t have to run as server

@bachmanm

name: Phil Johnson

title: Cognitive Psychology
duration: 30 name: Michal Bachman

name: UX

title: Intro to Neo4j
duration: 45

name: Martin Macke

name: Jeremy White INTERESTED name: Neo4j name: NOSQL

@bachmanm

GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo");

Transaction tx = neo.beginTx();
try {
Node speaker = neo.createNode();
speaker.setProperty("name", "Michal Bachman");

Node talk = neo.createNode();
talk.setProperty("title", "Intro to Neo4j");

Relationship delivers
= speaker.createRelationshipTo(talk,
DynamicRelationshipType.withName("DELIVERS"));
delivers.setProperty("day", ”Saturday");

neo.index().forNodes("people")
.add(speaker, "name", "Michal Bachman");
} finally {
tx.finish();
}

name: Michal Bachman DELIVERS title: Intro to Neo4j
day: Saturday

@bachmanm

Core API
• Nodes
– Properties (optional K-V pairs)
• Relatiosnhips
– Start node (required)
– End node (required)
– Properties (optional K-V pairs)

@bachmanm

All Conference Topics

@bachmanm

All Conference Topics
Node webExpo = neo.getReferenceNode();
for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) {
Node speaker = talksAt.getStartNode();
for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) {
Node talk = delivers.getEndNode();
for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) {
String topicName = (String) about.getEndNode().getProperty(NAME);
//add to result...
}
}
}

-------------------
Printing all topics
All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software,
responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design,
marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j,
rest, css, design, publishing, nosql. Took: 2 ms

Which talks should I attend?

@bachmanm

Which talks should I attend?
TraversalDescription talksTraversal = Traversal.description()
.uniqueness(Uniqueness.NONE)
.breadthFirst()
.relationships(INTERESTED, OUTGOING)
.relationships(ABOUT, INCOMING)
.evaluator(Evaluators.atDepth(2));

Node attendee =
neo.index().forNodes("people").get("name", ”Jeremy White").getSingle();

Iterable<Node> talks = talksTraversal.traverse(attendee).nodes();

//iterate over talks and print

------------------------------------------
Suggesting talks for 100 random attendees.
...
Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms
Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms
Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms
Suggested talks for 100 random attendees in 449 ms

What do we have in common?

@bachmanm

What do we have in common?
//retrieve attendeeOne and attendeeTwo from index

int maxDepth = 2;
Iterable<Path> paths = GraphAlgoFactory
.allPaths(Traversal.expanderForAllTypes(), maxDepth)
.findAllPaths(attendeeOne, attendeeTwo);

for (Path path : paths) {
//print it
}

------------------------------------------------------------
Finding things in common for 100 random couples of attendees
...
Karel Kunc and Phil Smith:

(Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith),
(Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith).
Took: 0 ms.
...

Found things in common for 100 random couples of attendees in 142 ms.

Youngsters, Y U No Like Java?

@bachmanm

Who is my beer mate?

myself beerMate:?

talk:?

@bachmanm


(myself) (beerMate)

(talk)

@bachmanm

start myself=node:people(name = "Emil Votruba")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;

@bachmanm

Cypher Query
start myself=node:people(name = ”Alex Smart")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)



limit 5;

@bachmanm

Cypher Query
start myself=node:people(name = ”Emil Votruba")

match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate)



limit 5;

@bachmanm


@bachmanm

Current Research
• Graph partitioning
• Graph analytics (“OLAP” and predictive)
• Performance improvements
• Query languages
• MVCC and single-threaded write models
• ACID (tradeoffs for weakening C and I)
• Yield and Harvest in distributed systems
• Application-level
– Recommendations
– Protein interactions
–…

@bachmanm

Questions?
Neo4j: http://neo4j.org
Neo Technology: http://neotechnology.com
Twitter: @bachmanm
Code: git://github.com/bachmanm/neo4j-imperial.git

Neo4j Introduction at Imperial College London

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Neo4j Introduction at Imperial College London

Ähnlich wie Neo4j Introduction at Imperial College London (20)

Mehr von Michal Bachman

Mehr von Michal Bachman (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Neo4j Introduction at Imperial College London

Hinweis der Redaktion