Odessapy2013 - Graph databases and Python
Upcoming SlideShare
Loading in...5
×
 

Odessapy2013 - Graph databases and Python

on

  • 46,813 Views

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews) ...

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.

Statistiken

Views

Gesamtviews
46,813
Views auf SlideShare
26,076
Views einbetten
20,737

Actions

Gefällt mir
34
Downloads
138
Kommentare
0

36 Einbettungen 20,737

http://datascience101.wordpress.com 17441
http://mnemosyne.de-blog.jp 2374
http://bigdata.blog.hu 394
http://feedly.com 186
https://twitter.com 110
http://newsblur.com 44
http://digg.com 34
http://www.newsblur.com 32
https://www.google.com 26
http://www.mnemosyne.de-blog.jp 19
https://datascience101.wordpress.com 15
http://inoreader.com 11
http://www.feedspot.com 8
http://webcache.googleusercontent.com 7
http://translate.googleusercontent.com 4
http://www.inoreader.com 3
http://127.0.0.1 3
http://reader.aol.com 3
https://www.google.co.jp 2
https://www.google.co.in 2
http://en.wordpress.com 2
https://www.google.com.sg 2
https://www.google.ca 2
http://datascie 1
http://www.google.co.jp 1
http://pt-br.wordpress.com 1
https://www.google.tt 1
http://cloud.feedly.com 1
https://www.google.es 1
https://www.google.co.kr 1
https://www.google.com.au 1
http://moderation.local 1
http://yoleoreader.com 1
http://xianguo.com 1
http://www.linkedin.com 1
https://www.linkedin.com 1
Mehr ...

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Adobe PDF

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

    Odessapy2013 - Graph databases and Python Odessapy2013 - Graph databases and Python Presentation Transcript

    • graphs databases! and python Maksym Klymyshyn CTO @ GVMachines Inc. (zakaz.ua)
    • What’s inside? ‣ PostgreSQL ‣ Neo4j ‣ ArangoDB
    • Python Frameworks ‣ Bulbflow ‣ py4neo ‣ NetworkX ‣ Arango-python
    • Relational to Graph model crash course “Switching from relational to the graph model”! by Luca Garulli http://goo.gl/z08qwk! ! http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model
    • My motivation is quite simple:
    • “The best material model of a cat is another, or preferably the same, cat.” –Norbert Wiener
    • Old good Postgres
    • create table nodes ( node integer primary key, name varchar(10) not null, feat1 char(1), feat2 char(1)) ! create table edges ( a integer not null references nodes(node) on update cascade on delete cascade, b integer not null references nodes(node) on update cascade on delete cascade, primary key (a, b)); ! create index a_idx ON edges(a); create index b_idx ON edges(b); ! create ! unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b)); ; and no self-loops alter table edges add constraint no_self_loops_chk check (a <> b); ! insert insert insert insert insert insert insert ! into into into into into into into nodes nodes nodes nodes nodes nodes nodes values values values values values values values (1, (2, (3, (4, (5, (6, (7, 'node1', 'node2', 'node3', 'node4', 'node5', 'node6', 'node7', 'x', 'x', 'x', 'z', 'x', 'x', 'x', 'y'); 'w'); 'w'); 'w'); 'y'); 'z'); 'y'); insert into edges values (1, 3), (2, 1), (2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1); ! ; directed graph select * from nodes n left join edges e on n.node = e.b where e.a = 2; ! ; undirected graph select * from nodes where node in (select case when a=1 then b else a end from edges where 1 in (a,b)); !
    • Я из Одессы, я просто бухаю.
    • Neo4j
    • Most famous graph database. • 1,333 mentions within repositories on Github • 1,140,000 results in Google • 26,868 tweets • Really nice Admin interface • Awesome help tips
    • A lot of python libraries Py2Neo, Neomodel, neo4django, bulbflow
    • ; Create a node1, node2 and ; relation RELATED between two nodes CREATE (node1 {name:"node1"}), (node2 {name: "node2"}), (node1)-[:RELATED]->(node2); !
    • neo4j is friendly and powerful. The only thing is a bit complex querying language – Cypher
    • py4neo nodes from py2neo import neo4j, node, rel ! ! graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") ! die_hard = graph_db.create( node(name="Bruce Willis"), node(name="John McClane"), node(name="Alan Rickman"), node(name="Hans Gruber"), node(name="Nakatomi Plaza"), rel(0, "PLAYS", 1), rel(2, "PLAYS", 3), rel(1, "VISITS", 4), rel(3, "STEALS_FROM", 4), rel(1, "KILLS", 3))
    • py4neo paths from py2neo import neo4j, node ! graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") alice, bob, carol = node(name="Alice"), node(name="Bob"), node(name="Carol") abc = neo4j.Path( alice, "KNOWS", bob, "KNOWS", carol) abc.create(graph_db) abc.nodes # [node(**{'name': 'Alice'}), # node(**{‘name': ‘Bob'}), # node(**{‘name': 'Carol'})]
    • Alice KNOWS Bob KNOWS Carol
    • bulbflow framework from bulbs.neo4jserver import Graph g = Graph() james = g.vertices.create(name="James") julie = g.vertices.create(name="Julie") g.edges.create(james, "knows", julie)
    • FlockDB OrientDB InfoGrid HyperGraphDB WAT?
    • ArangoDB
    • “In any investment, you expect to have fun and make profit.” –Michael Jordan
    • I’m developer of python driver for ArangoDB
    • • NoSQL Database storage • Graph of documents • AQL (arango query language) to execute graph queries • Edge data type to create edges between nodes (with properties) • Multiple edges collections to keep different kind of edges • Support of Gremlin graph query language
    • Small experiment with graphs and twitter:! I’ve looked on my tweets and people who added it to favorites. After that I’ve looked to that person’s tweets and did the same thing with people who favorited their tweets.
    • 1-level depth
    • 2-level depth
    • 3-level depth
    • Code behind from arango import create ! arango = create(db="tweets_maxmaxmaxmax") arango.database.create() arango.tweets.create() arango.tweets_edges.create( type=arango.COLLECTION_EDGES) !
    • Here we creating edge from from_doc to to_doc ! from_doc = arango.tweets.documents.create({}) to_doc = arango.tweets.documents.create({}) arango.tweets_edges.edges.create(from_doc, to_doc) Getting edges for tweet 196297127 query = db.tweets_edge.query.over( F.EDGES( "tweets_edges", ~V("tweets/196297127"), ~V("outbound")))
    • Full example • Sample dataset with 10 users • Relations between users • Visualise within admin interface
    • Sample dataset from arango import create ! def dataset(a): a.database.create() a.users.create() a.knows.create(type=a.COLLECTION_EDGES) ! for u in range(10): a.users.documents.create({ "name": "user_{}".format(u), "age": u + 20, "gender": u % 2 == 0}) ! ! a = create(db="experiments") dataset(a)
    • Relations between users def relations(a): rels = ( (0, 1), (0, 2), (2, 3), (4, 3), (3, 5), (5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8)) ! ! ! get_user = lambda id: a.users.query.filter( "obj.name == 'user_{}'".format(id)).execute().first for f, t in rels: what = "user_{} knows user_{}".format(f, t) from_doc, to_doc = get_user(f), get_user(t) a.knows.edges.create(from_doc, to_doc, {"what": what}) print ("{}->{}: {}".format(from_doc.id, to_doc.id, what)) a = create(db="experiments") relations(a)
    • Relations between users users/2744664487->users/2744926631: users/2744664487->users/2745123239: users/2745123239->users/2745319847: users/2745516455->users/2745319847: users/2745319847->users/2745713063: users/2745713063->users/2744926631: users/2744664487->users/2745713063: users/2745713063->users/2745909671: users/2745909671->users/2746106279: users/2746106279->users/2746302887: users/2746499495->users/2746302887: user_0 user_0 user_2 user_4 user_3 user_5 user_0 user_5 user_6 user_7 user_9 knows knows knows knows knows knows knows knows knows knows knows user_1 user_2 user_3 user_3 user_5 user_1 user_5 user_6 user_7 user_8 user_8
    • AQL, getting paths FOR p IN PATHS(users, knows, 'outbound') FILTER p.source.name == 'user_5' RETURN p.vertices[*].name from arango import create from arango.aql import F, V ! ! def querying(a): for data in a.knows.query.over( F.PATHS("users", "knows", ~V("outbound"))) .filter("obj.source.name == '{}'".format("user_5")) .result("obj.vertices[*].name") .execute(wrapper=lambda c, i: i): print (data) ! ! a = create(db="experiments") ! querying(a)
    • Paths output ['user_5'] ['user_5', ['user_5', ['user_5', ['user_5', 'user_1'] 'user_6'] 'user_6', 'user_7'] 'user_6', 'user_7', 'user_8']
    • Links • Arango paths: http://goo.gl/n2L3SK • Neo4j: http://goo.gl/au5y9I • Scraper: http://goo.gl/nvMFGk! • Visualiser: http://goo.gl/Rzdwci
    • Thanks. Q’s? ! @maxmaxmaxmax