Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
×

# Neo4j: Graph-like power

3.147 Aufrufe

Veröffentlicht am

Graph Databases in NoSQL world. Neo4j and Cypher.

Veröffentlicht in: Technologie
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Als Erste(r) kommentieren

### Neo4j: Graph-like power

1. 1. Graph-like power Roman R. MATCH (a:Actor),(m:Movie) WHERE a.name ='Keanu Reeves' AND m.title='The Matrix' CREATE (actor)-[:ACTS_IN]->(movie)
2. 2. Today ○ Graphs in NoSQL world ○ classification ○ definition ○ components ○ Neo4j ○ nodes, rels, props, indexes ○ Cypher ○ PHP and Neo4j ○ Demo ○ Alternatives ○ Q/A 1
3. 3. NoSQL Databases Key-Value Document Graph Column (BigTable ) MemcacheDB Redis Riak Cassandra CouchDB Neo4j TITAN HBase/Hadoop OrientDB 2 Elasticsearch RavenDB Tokyo Cabinet Infinite GraphAllegroGraph NoSQL MongoDB
4. 4. What is a Graph in math 3 ● represent a connected set of objects ● graph: ○ vertex (node/points) ○ edge (arc/line/relationship/arrow) - undirected ○ attribute (property) - on node/relationship ● types: ○ pair: G = (V, E) ○ digraph: D = (V, A) ○ mixed: G = (V, E, A) V = {1, 2, 3, 4, 5, 6} E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
5. 5. What is a Graph database 4 ● stores data in a graph and retrieving vast networks of data ● shines when storing richly-connected data ● consists of nodes, connected by relationships ○ A Graph —records data in→ Nodes —which have→ Properties ○ Nodes —are organized by→ Rels —which also have→ Properties ○ Nodes —are grouped by→ Labels —into→ Sets ○ A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes ○ An Index —maps from→ Properties —to either→ Nodes or Rels ○ A Graph Database —manages a→ Graph and —also manages related→ Indexes
6. 6. Nodes, Rels, Props, Labels 5 A Graph —records data in→ Nodes —which have→ Properties Nodes —are organized by→ Relationships —which also have→ Properties Nodes —are grouped by→ Labels —into→ Sets
7. 7. Graph Traversal 6 A Traversal —navigates→ a Graph it —identifies→ Paths —which order→ Nodes what music do my friends like that I don’t yet own if this power supply goes down, what web services are affected?
8. 8. Graph Index 7 An Index —maps from→ Properties —to either→ Nodes or Rels find the Account for username master-of-graphs
9. 9. Graph 8 A Graph Database —manages a→ Graph and —also manages related→ Indexes
10. 10. How looks Graph database 9
11. 11. A Graph Database transforms a RDBMS 10
12. 12. A Graph Database elaborates a Key-Value Store 11 K* = key V* = value
13. 13. A Graph Database relates Column-Family 12 ● BigTable databases are an evolution of key-value, using "families" to allow grouping of rows ● stored in a graph, the families could become hierarchical, and the relationships among data becomes explicit
14. 14. A Graph Database navigates a Document Store 13 D=Document, S=Subdocument, V=Value, D2/S2 = reference
15. 15. NoSQL Data Models 14 90% of all use cases Relational Databases
16. 16. 15
17. 17. ● intuitive, using a graph model for data representation ● reliable, fully transactional, upholds ACID ● durable and fast, using a custom disk-based, native storage engine ● massively scalable, up to several billion nodes/relationships/properties ● highly-available, when distributed across multiple machines ● expressive, with a powerful, human readable declarative graph query language ● fast, with a powerful traversal framework for high-speed graph queries ● embeddable, with a few small jars ● simple, accesible by a convenient REST API interface or an object- oriented JAVA API ● indexes are based on Apache Lucene, supports Secondary Indexes ● has been in commercial development for 10 years and in production for over 7 years; since 2003; ● Cross-platform; Simple set-up; Well documented; Open source; ● GPL for Community, AGPL for Enterprise 16 Neo4j features
18. 18. ● CPU - Intel Core i3/i7 ● Memory - 2GB .. 16/32GB ● Disk - 10GB SATA .. SSD w/ SATA ● Filesystem - ext4 .. ext4/ZFS ● Software - Oracle JAVA 7 17 Neo4j requirements
19. 19. ● Neo4j Community ○ Open-Source High Performance ○ fully ACID transactional graph database ● Neo4j Enterprise ○ High-Performance Cache (up to 10x faster) ○ Horizontal scalability with Neo4j Clustering (predictable scalability) ○ High-availability and online backups ○ Cache based sharding (shard your graph in memory) ○ Advanced Monitoring (operational metrics) ○ Certified for Windows and Linux ○ Email/Phone Support (10x5, 24x7 hours) ○ Subscriptions ■ Personal (up to 3 devs, \$100k annual revenue) = FREE ■ Startups (<\$10M funding, <\$5M annual revenue) = \$12k ■ Business (medium, to Global 2000) = Contact Sales 18 Neo4j license
20. 20. 19 ● for the simple friends of friends query, Neo4j is 60% faster than MySQL ● for friends of friends of friends, Neo is 180 times faster ● and for the depth four query, Neo4j is 1,135 times faster ● and MySQL just chokes on the depth 5 query Neo4j vs. Mysql
21. 21. Neo4j: Nodes ● fundamental units that form a graph ● can have key/value-style properties ● index nodes and relationships by {key, value} pairs ● represent entities 20
22. 22. Neo4j: Relationships #1/2 ● connect entities and structure domain ● allow for finding related data ● are always directed (outgoing or incoming) ● are equally well traversed in either direction ● can have relationships to itself ● have a relationship type (label) 21
23. 23. Neo4j: Relationships #2/2 22
24. 24. Neo4j: Properties ● nodes and relationships can have properties ● are key-value pairs ○ key is a string ○ values can be either a primitive or an array of one primitive type ■ boolean, String, int, int[], etc ■ Java Language Specification ● entity attributes, rels qualities, and metadata 23
25. 25. Neo4j: Labels ● used to group nodes into sets ● any number of labels, including none ● can be added and removed during runtime ● can be used to mark temporary states for nodes ● names case-sensitive ● CamelCase (convention) 24
26. 26. Neo4j: Paths ● is one or more nodes with connecting relationships ● shortest path: ● a path of length one: ● a path of length one: 25
27. 27. Neo4j: Traversal ● Traversal Framework from box ● means visiting nodes, following relationships by rules ● in most cases only a subgraph is visited ● callback based traversal API ○ you can specify the traversal rules ● traversing breadth- or depth-first ● open Java API 26
28. 28. Neo4j: graph algorithms ● A* (> uses the A* algorithm to find the cheapest path between two nodes) ● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path between two nodes) ● PathWithLength (> all paths of a certain length (depth) between two nodes) ● Shortest paths (shortestPath Default > find all the shortest paths between two nodes) ● All simple paths (allSimplePaths > find all simple paths between two nodes; without loops;) ● All paths (allPaths > find all available paths between two nodes) 27
29. 29. Neo4j: Schema ● is schema-optional graph database 28
30. 30. ● introduced in Neo4j 2.0 ● eventually available (populating in the background, is not immediately available for querying) ○ come online after fully populated ○ failed status (drop and recreate the index) ● can be created on labels group ● indexed Nodes & Rels ● node_auto_indexing=false, node_keys_indexable Neo4j: Index 29
31. 31. Neo4j: Constraints ● can help you keep your data clean ● specify the rules for what your data should look like ● unique constraints is the only available constraint type 30
32. 32. ● single server instance ○ nodes = 2^35 (~34 billion) ○ relationships = 2^35 (~34 billion) ○ labels = 2^31 (~2 billion) ○ properties = 2^36 to 2^38 depending on property types (maximum ~274 billion, always at least ~68 billion) ○ relationship types = 2^15 (~ 32’000) 31 Neo4j: Data Size
33. 33. ● powerful graph query language ● relatively simple ● declarative grammar (say what you want, not how) ● humane query language ● self-explanatory (based on English prose and neat iconography) ● written in Scala ● pattern-matching (borrows expression approaches from SPARQL) ● aggregation, ordering, limits ● create, update, delete ● structure and most of keywords inspired by SQL ● changing rather rapidly (CYPHER 1.9 START ...) Cypher Query Language 32 “Makes the simple things easy, and the complex things possible”
34. 34. Cypher patterns #1/2 33 ● (a) ● (b) ● (a)-->(b) ● (a)-->(b)-->(c) ● (b)-->(c)<--(a) ● (b)-->()<--(a) ● (a)--(b) ● (a)-(*5)->(b) ● (a)-(*3..5)->(b) ○ (a)-(*3..)->(b) ○ (a)-(*..5)->(b) ○ (a)-(*)->(b)
35. 35. Cypher patterns #2/2 34 ● (a:Label)-->(m) ● (a:User:Admin)-->(m) ● (a)--(m) ● (a)-[r]->(m) ● (a)-[ACTED_IN]->(m) ● (a)-[r:SOME|ELSE|WTH]->(m)
36. 36. Cypher: START / RETURN “It all starts with the START” Michael Hunger, Cypher webinar, Sep 2012 ● designates the start points ● START is optional (in Neo4j >= 2.0) Examples: ● START <lookup> RETURN <expression> ● START n=node(0) RETURN n ● START n=node(*) RETURN n.name 35
37. 37. Cypher: MATCH ● primary way of getting data from the database ● START <lookup> MATCH <pattern> RETURN <expr> ● OPTIONAL MATCH <lookup> RETURN <expr> Examples: ● MATCH (n) RETURN count(n) ● MATCH (actor:Actor) RETURN actor.name; ● START me=node(0) MATCH (me)--(f) RETURN f.name ● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO 36
38. 38. ● creates nodes and relationships ● CREATE (<name>[:label] [properties,..]) ● CREATE (<node-in>)-[<var>:RELATION [properties,..]]->(<node-out>); ● CREATE UNIQUE ... Examples: ● CREATE (n:Actor { name:"Keanu Reeves" }); ● CREATE (keanu)-[:ACTED_IN]->(matrix) ● MATCH (keanu {name:”..”}) SET keanu.age=49 RETURN Cypher: CREATE / SET 37
39. 39. Cypher: WHERE ● filters the results ● MATCH <pattern> WHERE <condition> RETURN <expr> Examples: ● WHERE n.name =~ “(?i)John.*” ● WHERE NOT .. ● WHERE type(rel) =~ “Perso.*” 38
40. 40. Cypher: RETURN ● creates the result table ● any query can return data ● can be nodes, relationships, or properties on these ● RETURN DISTINCT <expression> AS x ● RETURN aggregate(expr) as alias ● RETURN nodes, rels, properties ● RETURN expressions of funcs and operators ● RETURN aggregation funcs on the above 39
41. 41. Cypher: etc ● CASE / WHEN / ELSE ● ORDER BY node.key, node2.key, .. ASC|DESC ● LIMIT / SKIP ● WITH (WITH count(*) as c) ● UNION / UNION ALL (combining results from multiple queries) ● USING INDEX/SCAN ● MERGE / SET / DELETE / REMOVE / FORECH ● Expressions ● Operators ● Comments ● Functions: ALL, ANY, LENGTH, {Math}, {String}, ... 40
42. 42. ● any updating query will run in a transaction ● ACID ● “it is very important to finish each transaction” ● write lock on node/rel: ○ adding, changing or removing prop on a node/rel ● write lock on node: ○ creating or deleting a node ● write lock on node and both its nodes: ○ creating or deleting a relationship Cypher: Transactions 41
43. 43. Cypher: Aggregation ● count(node/rel/prop) ● count(n), count(n.prop) ● sum(n.prop) ● avg(n.prop) ● percentileDisc(n.prop, {median}) ● stdev(n.prop, {median}) - calculate deviation from group ● max(n.prop, {median}) ● collect(n.prop, {median}) ● RETURN n, count(*) 42
44. 44. ● SELECT * FROM Person WHERE name=“Valentin” and age > 30 ● START person=node:Person(node=”Valentin”) WHERE person.age > 30 RETURN person Cypher: back to SQL #1/5 43
45. 45. Cypher: back to SQL #2/5 ● SELECT “Email”.* FROM Person JOIN “Email” ON “Person”.id = “Email”.person_id WHERE “Person”.name = “Benedikt” ● START person=node:Person(name=”Benedikt”) MATCH person-[:email]->email RETURN email 44
46. 46. Cypher: back to SQL #3/5 ● show me all people that are both actors and directors ● SELECT name FROM Person WHERE person_id IN (SELECT person_id FROM Actor) AND person_id IN (SELECT person_id FROM Director) ● START person=node:Person(“name:*”) WHERE (person)-[:ACTS_IN]->() AND (person)-[:DIRECTED]->() RETURN person.name 45
47. 47. Cypher: back to SQL #4/5 ● show me all Tom Hanks’s co-actors ● SELECT DISTICT co_actor.name FROM Person tom JOIN Movie a1 ON tom.person_in = a1.person_id JOIN Actor a2 ON a1.movie_id = a2.movie_id JOIN Person co_actor ON co_actor.person_id = a2.person_id WHERE tom.name = “Tom Hanks” ● START tom=node:Person(name=”Tom Hanks”) MATCH tom-[:ACTS_IN]->movie, co_actor-[:ACTS_IN]->movie RETURN DISTINCT co_actor.name 46
48. 48. Cypher: back to SQL #5/5 ● show me all Lucy’s favorite directors ● SELECT dir.name, count(*) FROM Person lucy JOIN Actor on Person.person_id = Actor.person_id JOIN Director ON Actor.movie_id = Director.movie_id JOIN Person dir ON Director.person_id = dir.person_id WHERE lucy.name = “Lucy Liu” GROUP BY dir.name ORDER BY count(*) DESC ● START lucy=node:Person(name=”Lucy Liu”) MATCH lucy-[:ACTS_IN]->movie, director-[:DIRECTED]->movie RETURN director.name, count(*) ORDER BY director.name, count(*) DESC 47
49. 49. START lucy = node:Person(name=”Lucy Lui”), kevin = node:Person(name=”Kevin Bacon”) MATCH p = shortestPath( lucy-[:ACTS_IN*]-kevin ) RETURN EXTRACT (n in NODES(p): COALESCE(n.name?, n.title?)) 48 Cypher: back to SQL #6/5
50. 50. Neo4j Shell ● command-line shell for running Cypher queries ● supports remote shell ● :schema ● bash# neo4j-shell -path data/graph.db -readonly -config conf/neo4j.properties -c “<command>” 49
51. 51. Neo4j: Security ● does not deal with data encryption explicitly ● can be used all means built into the Java ● can be used encrypted datastore ● webadmin https 50
52. 52. ● manipulate data stored in RDF format ● focused on match triple sets PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } SPARQL 51
53. 53. ● graph traversal language ● scripting language ● Pipe & Filter (similar to jQuery) ● across different graph databases ● based on Groovy (limited to Java) ● not as stable in Neo4j ● XPath like ● ./outE[label=”family”]/inV/@name ● g.v(1).out('likes').in('likes').out('likes').groupCount(m) ● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000} ● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2) Gremlin 52
54. 54. Neo4j and PHP ● everyman/neo4jphp < packagist.org ○ PHP wrapper for the Neo4j using REST interface ○ Follows the PSR-0 autoloading standard ○ Basic wrappers for all components ○ Last update - a month ago ○ supports Gremlin ● Neo4j-PHP OGM < a lot of based on ○ Object Graph Mapper, inspired by Doctrine ○ based on DoctrineCommon ○ borrows significantly DoctrineORM design ○ uses annotations on classes ○ MIT Licence ● Neo4J PHP REST API client ○ Using Neo4j REST API ○ Node create/find/delete ○ Relationship create/list/filter 53
55. 55. High Availability with Neo4j ● in HA - a single master and zero or more slaves ● slave synchronizing with the master to preserve consistency ● master write to slave before transaction completes 54
56. 56. Demo Neo4j.org Example Datasets: ● DrWho (nodes=1'060; rels=2'286) ● Cineasts Movies & Actors (nodes=64'069; rels=121'778) ● Hubway Data Challenge (nodes=554'674; rels=2'011'904) GraphGist: ● JIRA and neo4j ● PHP and neo4j ● Kant in neo4j XSS 55
57. 57. Gephi (win, nix, mac) 56