SlideShare a Scribd company logo
1 of 45
Download to read offline
Importing data quickly
and easily
Michael Hunger @mesirii
Mark Needham @markhneedham
The data set
The data set
‣ Stack Exchange API
‣ Stack Exchange Data Dump
Stack Exchange API
{ "items": [{
"question_id": 24620768,
"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
"title": "Neo4j cypher query: get last N elements",
"answer_count": 1,
"score": 1,
.....
"creation_date": 1404771217,
"body_markdown": "I have a graph....How can I do that?",
"tags": ["neo4j", "cypher"],
"owner": {
"reputation": 815,
"user_id": 1212067,
....
"link": "http://stackoverflow.com/users/1212067/"
},
"answers": [{
"owner": {
"reputation": 488,
"user_id": 737080,
"display_name": "Chris Leishman",
....
},
"answer_id": 24620959,
"share_link": "http://stackoverflow.com/a/24620959",
....
"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
"title": "Neo4j cypher query: get last N elements"
}]
}
JSON to CSV
JSON ??? CSV
LOAD
CSV
Initial Model
{ "items": [{
"question_id": 24620768,
"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
"title": "Neo4j cypher query: get last N elements",
"answer_count": 1,
"score": 1,
.....
"creation_date": 1404771217,
"body_markdown": "I have a graph....How can I do that?",
"tags": ["neo4j", "cypher"],
"owner": {
"reputation": 815,
"user_id": 1212067,
....
"link": "http://stackoverflow.com/users/1212067/"
},
"answers": [{
"owner": {
"reputation": 488,
"user_id": 737080,
"display_name": "Chris Leishman",
....
},
"answer_id": 24620959,
"share_link": "http://stackoverflow.com/a/24620959",
....
"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
"title": "Neo4j cypher query: get last N elements"
}]
}
jq: Converting JSON to CSV
jq: Converting questions to CSV
jq -r '.[] | .items[] |
[.question_id,
.title,
.up_vote_count,
.down_vote_count,
.creation_date,
.last_activity_date,
.owner.user_id,
.owner.display_name,
(.tags | join(";"))] |
@csv ' so.json
jq: Converting questions to CSV
$ head -n5 questions.csv
question_id,title,up_vote_count,down_vote_count,creation_date,last_activity_date,
owner_user_id,owner_display_name,tags
33023306,"How to delete multiple nodes by specific ID using Cypher",
0,0,1444328760,1444332194,260511,"rayman","jdbc;neo4j;cypher;spring-data-neo4j"
33020796,"How do a general search across string properties in my nodes?",
1,0,1444320356,1444324015,1429542,"osazuwa","ruby-on-rails;neo4j;neo4j.rb"
33018818,"Neo4j match nodes related to all nodes in collection",
0,0,1444314877,1444332779,1212463,"lmazgon","neo4j;cypher"
33018084,"Problems upgrading to Spring Data Neo4j 4.0.0",
0,0,1444312993,1444312993,1528942,"Grégoire Colbert","neo4j;spring-data-neo4j"
jq: Converting answers to CSV
jq -r '.[] | .items[] |
{ question_id: .question_id, answer: .answers[]? } |
[.question_id,
.answer.answer_id,
.answer.title,
.answer.owner.user_id,
.answer.owner.display_name,
(.answer.tags | join(";")),
.answer.up_vote_count,
.answer.down_vote_count] |
@csv'
jq: Converting answers to CSV
$ head -n5 answers.csv
question_id,answer_id,answer_title,owner_id,owner_display_name,tags,up_vote_count,
down_vote_count
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher",
3248864,"FylmTM","",0,0
33020796,33021958,"How do a general search across string properties in my nodes?",
2920686,"FrobberOfBits","",0,0
33018818,33020068,"Neo4j match nodes related to all nodes in collection",158701,"
Stefan Armbruster","",0,0
33018818,33024273,"Neo4j match nodes related to all nodes in collection",974731,"
cybersam","",0,0
Time to import into Neo4j...
Introducing Cypher
‣ The Graph Query Language
‣ Declarative language (think SQL) for graphs
‣ ASCII art based
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user:User {name:"Michael Hunger"})
Label PropertyNode
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user)-[:PROVIDED]->(answer)
Relationship
‣ MATCH find a pattern in the graph
Cypher primer
MATCH (answer:Answer)<-[:PROVIDED]-(user:User),
(answer)-[:ANSWERS]->(question)
WHERE user.display_name = "Michael Hunger"
RETURN question, answer;
‣ MERGE find pattern if it exists,
create it if it doesn’t
MERGE (user:User {name:"Mark Needham"})
MERGE (question:Question {title: "..."})
MERGE (answer:Answer {text: "..."})
MERGE (user)-[:PROVIDED]->(answer)
MERGE (answer)-[:ANSWERS]->(question);
Cypher primer
Import using LOAD CSV
‣ LOAD CSV iterates CSV files applying the
provided query line by line
LOAD CSV [WITH HEADERS] FROM [URI/File path]
AS row
CREATE ...
MERGE ...
MATCH ...
LOAD CSV: The naive version
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
MERGE (question:Question {
id:row.question_id,
title: row.title,
up_vote_count: row.up_vote_count,
creation_date: row.creation_date})
MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name})
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: Start with a sample
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {
id:row.question_id,
title: row.title,
up_vote_count: row.up_vote_count,
creation_date: row.creation_date})
MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name})
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: MERGE on a key
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date
MERGE (owner:User {id:row.owner_user_id})
ON CREATE SET owner.display_name = row.owner_display_name
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: Index/Constrain those keys
CREATE INDEX ON :Label(property);
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
Tip: Index those keys
CREATE INDEX ON :Label(property);
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE;
CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE;
CREATE INDEX ON :Question(title);
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date;
Tip: One MERGE per statement
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (owner:User {id:row.owner_user_id})
ON CREATE SET owner.display_name = row.owner_display_name;
Tip: One MERGE per statement
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MATCH (question:Question {id:row.question_id})
MATCH (owner:User {id:row.owner_user_id})
MERGE (owner)-[:ASKED]->(question);
Tip: One MERGE per statement
Tip: Use DISTINCT
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
UNWIND split(row.tags, ";") AS tag
WITH distinct tag
MERGE (:Tag {name: tag});
Tip: Use periodic commit
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date;
Periodic commit
‣ Neo4j keeps all transaction state in memory
which is problematic for large CSV files
‣ USING PERIODIC COMMIT flushes the
transaction after a certain number of rows
‣ Default is 1000 rows but it’s configurable
‣ Currently only works with LOAD CSV
Tip: Script your import commands
Tip: Use neo4j-shell to load script
$ ./neo4j-enterprise-2.3.0/bin/neo4j-shell
-file import.cql [-path so.db]
LOAD CSV: Summary
‣ ETL power tool
‣ Built into Neo4J since version 2.1
‣ Can load data from any URL
‣ Good for medium size data
(up to 10M rows)
Bulk loading an initial data set
‣ Introducing the Neo4j Import Tool
‣ Find it in the bin folder of your Neo4j
download
‣ Used to large sized initial data sets
‣ Skips the transactional layer of Neo4j and
concurrently writes store files directly
Importing into Neo4j
:ID(Crime) :LABEL description
export NEO=neo4j-enterprise-2.3.0
$NEO/bin/neo4j-import 
--into stackoverflow.db 
--id-type string 
--nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz 
--nodes:User extracted/Users_header.csv,extracted/Users.csv.gz 
--nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz 
--relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz 
--relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz
--relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz 
--relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz
Expects files in a certain format
:ID(Crime) :LABEL descriptionpostId:ID(Post) title body
Nodes
userId:ID(User) displayname views
Rels
:START_ID(User) :END_ID(Post)
<?xml version="1.0" encoding="utf-16"?>
<posts>
...
<row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667"
Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId="451518"
LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02:50.557" LastActivityDate="
2015-08-01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or
double?" Tags="&lt;c#&gt;&lt;winforms&gt;&lt;type-conversion&gt;&lt;opacity&gt;"
AnswerCount="13" CommentCount="1" FavoriteCount="28" CommunityOwnedDate="2012-10-31T16:42:
47.213" />
...
</posts>
What do we have?
<posts>
...
<row Id="4" PostTypeId="1"
AcceptedAnswerId="7" CreationDate="
2008-07-31T21:42:52.667" Score="358"
ViewCount="24247" Body="..."
OwnerUserId="8" LastEditorUserId="
451518" LastEditorDisplayName="Rich
B" LastEditDate="2014-07-28T10:02:
50.557" LastActivityDate="2015-08-
01T12:55:11.380" Title="When setting
a form's opacity should I use a
decimal or double?"
XML to CSV
Java program
The generated files
$ cat extracted/Posts_header.csv
"postId:ID(Post)","title","postType:INT",
"createdAt","score:INT","views:INT","answers:INT",
"comments:INT","favorites:INT","updatedAt"
The generated files
$ gzcat extracted/Posts.csv.gz | head -n2
"4","When setting a forms opacity should I use a
decimal or double?","1","2008-07-31T21:42:52.667","
358","24247","13","1","28","2014-07-28T10:02:50.557"
"6","Why doesn’t the percentage width child in
absolutely positioned parent work?","1","2008-07-
31T22:08:08.620","156","11840","5","0","7","2015-04-
26T14:37:49.673"
The generated files
$ cat extracted/PostsRels_header.csv
":START_ID(Post)",":END_ID(Post)"
Importing into Neo4j
:ID(Crime) :LABEL description
export NEO=neo4j-enterprise-2.3.0
$NEO/bin/neo4j-import 
--into stackoverflow.db 
--id-type string 
--nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz 
--nodes:User extracted/Users_header.csv,extracted/Users.csv.gz 
--nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz 
--relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz 
--relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz
--relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz 
--relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz 
IMPORT DONE in 3m 10s 661ms. Imported:
31138574 nodes
77930024 relationships
218106346 properties
Tip: Make sure your data is clean
:ID(Crime) :LABEL description
‣ Use a consistent line break style
‣ Ensure headers are consistent with data
‣ Quote Special characters
‣ Escape stray quotes
‣ Remove non-text characters
Even more tips
:ID(Crime) :LABEL description
‣ Get the fastest disk you can
‣ Use separate disk for input and output
‣ Compress your CSV files
‣ The more cores the better
‣ Separate headers from data
The End
‣ https://github.com/mdamien/stackoverflow-neo4j
‣ http://neo4j.com/blog/import-10m-stack-overflow-questions/
‣ http://neo4j.com/blog/cypher-load-json-from-url/
Michael Hunger @mesirii
Mark Needham @markhneedham

More Related Content

What's hot

Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETTomas Jansson
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329Douglas Duncan
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181Mahmoud Samir Fayed
 
Using Scala Slick at FortyTwo
Using Scala Slick at FortyTwoUsing Scala Slick at FortyTwo
Using Scala Slick at FortyTwoEishay Smith
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applicationsSkills Matter
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
Web осень 2012 лекция 6
Web осень 2012 лекция 6Web осень 2012 лекция 6
Web осень 2012 лекция 6Technopark
 
The Ring programming language version 1.5.4 book - Part 43 of 185
The Ring programming language version 1.5.4 book - Part 43 of 185The Ring programming language version 1.5.4 book - Part 43 of 185
The Ring programming language version 1.5.4 book - Part 43 of 185Mahmoud Samir Fayed
 
NoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовNoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовCodeFest
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLGabriela Ferrara
 
Web весна 2013 лекция 6
Web весна 2013 лекция 6Web весна 2013 лекция 6
Web весна 2013 лекция 6Technopark
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
Getting Creative with WordPress Queries, Again
Getting Creative with WordPress Queries, AgainGetting Creative with WordPress Queries, Again
Getting Creative with WordPress Queries, AgainDrewAPicture
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
MySQL 5.7 NF – JSON Datatype 활용
MySQL 5.7 NF – JSON Datatype 활용MySQL 5.7 NF – JSON Datatype 활용
MySQL 5.7 NF – JSON Datatype 활용I Goo Lee
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesDoris Chen
 

What's hot (20)

Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NET
 
Php forum2015 tomas_final
Php forum2015 tomas_finalPhp forum2015 tomas_final
Php forum2015 tomas_final
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181
 
Using Scala Slick at FortyTwo
Using Scala Slick at FortyTwoUsing Scala Slick at FortyTwo
Using Scala Slick at FortyTwo
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applications
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Web осень 2012 лекция 6
Web осень 2012 лекция 6Web осень 2012 лекция 6
Web осень 2012 лекция 6
 
The Ring programming language version 1.5.4 book - Part 43 of 185
The Ring programming language version 1.5.4 book - Part 43 of 185The Ring programming language version 1.5.4 book - Part 43 of 185
The Ring programming language version 1.5.4 book - Part 43 of 185
 
NoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросовNoSQL для PostgreSQL: Jsquery — язык запросов
NoSQL для PostgreSQL: Jsquery — язык запросов
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQL
 
Web весна 2013 лекция 6
Web весна 2013 лекция 6Web весна 2013 лекция 6
Web весна 2013 лекция 6
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
Getting Creative with WordPress Queries, Again
Getting Creative with WordPress Queries, AgainGetting Creative with WordPress Queries, Again
Getting Creative with WordPress Queries, Again
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Python database access
Python database accessPython database access
Python database access
 
MySQL 5.7 NF – JSON Datatype 활용
MySQL 5.7 NF – JSON Datatype 활용MySQL 5.7 NF – JSON Datatype 활용
MySQL 5.7 NF – JSON Datatype 활용
 
Ajax Performance Tuning and Best Practices
Ajax Performance Tuning and Best PracticesAjax Performance Tuning and Best Practices
Ajax Performance Tuning and Best Practices
 

Similar to Graph Connect: Importing data quickly and easily

Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerNeo4j
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
Scalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンScalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンHiroshi Ito
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
The Ring programming language version 1.5.3 book - Part 53 of 184
The Ring programming language version 1.5.3 book - Part 53 of 184The Ring programming language version 1.5.3 book - Part 53 of 184
The Ring programming language version 1.5.3 book - Part 53 of 184Mahmoud Samir Fayed
 
The Ring programming language version 1.5.3 book - Part 43 of 184
The Ring programming language version 1.5.3 book - Part 43 of 184The Ring programming language version 1.5.3 book - Part 43 of 184
The Ring programming language version 1.5.3 book - Part 43 of 184Mahmoud Samir Fayed
 
Rich Internet Applications con JavaFX e NetBeans
Rich Internet Applications  con JavaFX e NetBeans Rich Internet Applications  con JavaFX e NetBeans
Rich Internet Applications con JavaFX e NetBeans Fabrizio Giudici
 
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...BIWUG
 
Quick start guide to java script frameworks for sharepoint apps spsbe-2015
Quick start guide to java script frameworks for sharepoint apps spsbe-2015Quick start guide to java script frameworks for sharepoint apps spsbe-2015
Quick start guide to java script frameworks for sharepoint apps spsbe-2015Sonja Madsen
 
The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210Mahmoud Samir Fayed
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202Mahmoud Samir Fayed
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB
 
Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringMiro Wengner
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWaveData Works MD
 
MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)Mike Dirolf
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 

Similar to Graph Connect: Importing data quickly and easily (20)

Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Scalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンScalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーン
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Discovering Django - zekeLabs
Discovering Django - zekeLabsDiscovering Django - zekeLabs
Discovering Django - zekeLabs
 
The Ring programming language version 1.5.3 book - Part 53 of 184
The Ring programming language version 1.5.3 book - Part 53 of 184The Ring programming language version 1.5.3 book - Part 53 of 184
The Ring programming language version 1.5.3 book - Part 53 of 184
 
The Ring programming language version 1.5.3 book - Part 43 of 184
The Ring programming language version 1.5.3 book - Part 43 of 184The Ring programming language version 1.5.3 book - Part 43 of 184
The Ring programming language version 1.5.3 book - Part 43 of 184
 
Php summary
Php summaryPhp summary
Php summary
 
Rich Internet Applications con JavaFX e NetBeans
Rich Internet Applications  con JavaFX e NetBeans Rich Internet Applications  con JavaFX e NetBeans
Rich Internet Applications con JavaFX e NetBeans
 
DataMapper
DataMapperDataMapper
DataMapper
 
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...
Quickstartguidetojavascriptframeworksforsharepointapps spsbe-2015-15041903264...
 
Quick start guide to java script frameworks for sharepoint apps spsbe-2015
Quick start guide to java script frameworks for sharepoint apps spsbe-2015Quick start guide to java script frameworks for sharepoint apps spsbe-2015
Quick start guide to java script frameworks for sharepoint apps spsbe-2015
 
The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210The Ring programming language version 1.9 book - Part 52 of 210
The Ring programming language version 1.9 book - Part 52 of 210
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineering
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 

More from Mark Needham

Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsMark Needham
 
This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018Mark Needham
 
Building a recommendation engine with python and neo4j
Building a recommendation engine with python and neo4jBuilding a recommendation engine with python and neo4j
Building a recommendation engine with python and neo4jMark Needham
 
Graph Connect: Tuning Cypher
Graph Connect: Tuning CypherGraph Connect: Tuning Cypher
Graph Connect: Tuning CypherMark Needham
 
Graph Connect Europe: From Zero To Import
Graph Connect Europe: From Zero To ImportGraph Connect Europe: From Zero To Import
Graph Connect Europe: From Zero To ImportMark Needham
 
Optimizing cypher queries in neo4j
Optimizing cypher queries in neo4jOptimizing cypher queries in neo4j
Optimizing cypher queries in neo4jMark Needham
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueMark Needham
 
The Football Graph - Neo4j and the Premier League
The Football Graph - Neo4j and the Premier LeagueThe Football Graph - Neo4j and the Premier League
The Football Graph - Neo4j and the Premier LeagueMark Needham
 
Scala: An experience report
Scala: An experience reportScala: An experience report
Scala: An experience reportMark Needham
 
Mixing functional programming approaches in an object oriented language
Mixing functional programming approaches in an object oriented languageMixing functional programming approaches in an object oriented language
Mixing functional programming approaches in an object oriented languageMark Needham
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mark Needham
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mark Needham
 
F#: What I've learnt so far
F#: What I've learnt so farF#: What I've learnt so far
F#: What I've learnt so farMark Needham
 

More from Mark Needham (14)

Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
 
This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018This week in Neo4j - 3rd February 2018
This week in Neo4j - 3rd February 2018
 
Building a recommendation engine with python and neo4j
Building a recommendation engine with python and neo4jBuilding a recommendation engine with python and neo4j
Building a recommendation engine with python and neo4j
 
Graph Connect: Tuning Cypher
Graph Connect: Tuning CypherGraph Connect: Tuning Cypher
Graph Connect: Tuning Cypher
 
Graph Connect Europe: From Zero To Import
Graph Connect Europe: From Zero To ImportGraph Connect Europe: From Zero To Import
Graph Connect Europe: From Zero To Import
 
Optimizing cypher queries in neo4j
Optimizing cypher queries in neo4jOptimizing cypher queries in neo4j
Optimizing cypher queries in neo4j
 
Football graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier LeagueFootball graph - Neo4j and the Premier League
Football graph - Neo4j and the Premier League
 
The Football Graph - Neo4j and the Premier League
The Football Graph - Neo4j and the Premier LeagueThe Football Graph - Neo4j and the Premier League
The Football Graph - Neo4j and the Premier League
 
Scala: An experience report
Scala: An experience reportScala: An experience report
Scala: An experience report
 
Visualisations
VisualisationsVisualisations
Visualisations
 
Mixing functional programming approaches in an object oriented language
Mixing functional programming approaches in an object oriented languageMixing functional programming approaches in an object oriented language
Mixing functional programming approaches in an object oriented language
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#
 
Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#Mixing functional and object oriented approaches to programming in C#
Mixing functional and object oriented approaches to programming in C#
 
F#: What I've learnt so far
F#: What I've learnt so farF#: What I've learnt so far
F#: What I've learnt so far
 

Recently uploaded

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 

Recently uploaded (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 

Graph Connect: Importing data quickly and easily

  • 1. Importing data quickly and easily Michael Hunger @mesirii Mark Needham @markhneedham
  • 3. The data set ‣ Stack Exchange API ‣ Stack Exchange Data Dump
  • 4. Stack Exchange API { "items": [{ "question_id": 24620768, "link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements", "title": "Neo4j cypher query: get last N elements", "answer_count": 1, "score": 1, ..... "creation_date": 1404771217, "body_markdown": "I have a graph....How can I do that?", "tags": ["neo4j", "cypher"], "owner": { "reputation": 815, "user_id": 1212067, .... "link": "http://stackoverflow.com/users/1212067/" }, "answers": [{ "owner": { "reputation": 488, "user_id": 737080, "display_name": "Chris Leishman", .... }, "answer_id": 24620959, "share_link": "http://stackoverflow.com/a/24620959", .... "body_markdown": "The simplest would be to use an ... some discussion on this here:...", "title": "Neo4j cypher query: get last N elements" }] }
  • 5. JSON to CSV JSON ??? CSV LOAD CSV
  • 6. Initial Model { "items": [{ "question_id": 24620768, "link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements", "title": "Neo4j cypher query: get last N elements", "answer_count": 1, "score": 1, ..... "creation_date": 1404771217, "body_markdown": "I have a graph....How can I do that?", "tags": ["neo4j", "cypher"], "owner": { "reputation": 815, "user_id": 1212067, .... "link": "http://stackoverflow.com/users/1212067/" }, "answers": [{ "owner": { "reputation": 488, "user_id": 737080, "display_name": "Chris Leishman", .... }, "answer_id": 24620959, "share_link": "http://stackoverflow.com/a/24620959", .... "body_markdown": "The simplest would be to use an ... some discussion on this here:...", "title": "Neo4j cypher query: get last N elements" }] }
  • 8. jq: Converting questions to CSV jq -r '.[] | .items[] | [.question_id, .title, .up_vote_count, .down_vote_count, .creation_date, .last_activity_date, .owner.user_id, .owner.display_name, (.tags | join(";"))] | @csv ' so.json
  • 9. jq: Converting questions to CSV $ head -n5 questions.csv question_id,title,up_vote_count,down_vote_count,creation_date,last_activity_date, owner_user_id,owner_display_name,tags 33023306,"How to delete multiple nodes by specific ID using Cypher", 0,0,1444328760,1444332194,260511,"rayman","jdbc;neo4j;cypher;spring-data-neo4j" 33020796,"How do a general search across string properties in my nodes?", 1,0,1444320356,1444324015,1429542,"osazuwa","ruby-on-rails;neo4j;neo4j.rb" 33018818,"Neo4j match nodes related to all nodes in collection", 0,0,1444314877,1444332779,1212463,"lmazgon","neo4j;cypher" 33018084,"Problems upgrading to Spring Data Neo4j 4.0.0", 0,0,1444312993,1444312993,1528942,"Gr&#233;goire Colbert","neo4j;spring-data-neo4j"
  • 10. jq: Converting answers to CSV jq -r '.[] | .items[] | { question_id: .question_id, answer: .answers[]? } | [.question_id, .answer.answer_id, .answer.title, .answer.owner.user_id, .answer.owner.display_name, (.answer.tags | join(";")), .answer.up_vote_count, .answer.down_vote_count] | @csv'
  • 11. jq: Converting answers to CSV $ head -n5 answers.csv question_id,answer_id,answer_title,owner_id,owner_display_name,tags,up_vote_count, down_vote_count 33023306,33024189,"How to delete multiple nodes by specific ID using Cypher", 3248864,"FylmTM","",0,0 33020796,33021958,"How do a general search across string properties in my nodes?", 2920686,"FrobberOfBits","",0,0 33018818,33020068,"Neo4j match nodes related to all nodes in collection",158701," Stefan Armbruster","",0,0 33018818,33024273,"Neo4j match nodes related to all nodes in collection",974731," cybersam","",0,0
  • 12. Time to import into Neo4j...
  • 13. Introducing Cypher ‣ The Graph Query Language ‣ Declarative language (think SQL) for graphs ‣ ASCII art based
  • 14. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question);
  • 15. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user:User {name:"Michael Hunger"}) Label PropertyNode
  • 16. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user)-[:PROVIDED]->(answer) Relationship
  • 17. ‣ MATCH find a pattern in the graph Cypher primer MATCH (answer:Answer)<-[:PROVIDED]-(user:User), (answer)-[:ANSWERS]->(question) WHERE user.display_name = "Michael Hunger" RETURN question, answer;
  • 18. ‣ MERGE find pattern if it exists, create it if it doesn’t MERGE (user:User {name:"Mark Needham"}) MERGE (question:Question {title: "..."}) MERGE (answer:Answer {text: "..."}) MERGE (user)-[:PROVIDED]->(answer) MERGE (answer)-[:ANSWERS]->(question); Cypher primer
  • 19. Import using LOAD CSV ‣ LOAD CSV iterates CSV files applying the provided query line by line LOAD CSV [WITH HEADERS] FROM [URI/File path] AS row CREATE ... MERGE ... MATCH ...
  • 20. LOAD CSV: The naive version LOAD CSV WITH HEADERS FROM "questions.csv" AS row MERGE (question:Question { id:row.question_id, title: row.title, up_vote_count: row.up_vote_count, creation_date: row.creation_date}) MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name}) MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 21. Tip: Start with a sample LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question { id:row.question_id, title: row.title, up_vote_count: row.up_vote_count, creation_date: row.creation_date}) MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name}) MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 22. Tip: MERGE on a key LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date MERGE (owner:User {id:row.owner_user_id}) ON CREATE SET owner.display_name = row.owner_display_name MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 23. Tip: Index/Constrain those keys CREATE INDEX ON :Label(property); CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
  • 24. Tip: Index those keys CREATE INDEX ON :Label(property); CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE; CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE; CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE; CREATE INDEX ON :Question(title);
  • 25. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date; Tip: One MERGE per statement
  • 26. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (owner:User {id:row.owner_user_id}) ON CREATE SET owner.display_name = row.owner_display_name; Tip: One MERGE per statement
  • 27. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MATCH (question:Question {id:row.question_id}) MATCH (owner:User {id:row.owner_user_id}) MERGE (owner)-[:ASKED]->(question); Tip: One MERGE per statement
  • 28. Tip: Use DISTINCT LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 UNWIND split(row.tags, ";") AS tag WITH distinct tag MERGE (:Tag {name: tag});
  • 29. Tip: Use periodic commit USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "questions.csv" AS row MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date;
  • 30. Periodic commit ‣ Neo4j keeps all transaction state in memory which is problematic for large CSV files ‣ USING PERIODIC COMMIT flushes the transaction after a certain number of rows ‣ Default is 1000 rows but it’s configurable ‣ Currently only works with LOAD CSV
  • 31. Tip: Script your import commands
  • 32. Tip: Use neo4j-shell to load script $ ./neo4j-enterprise-2.3.0/bin/neo4j-shell -file import.cql [-path so.db]
  • 33. LOAD CSV: Summary ‣ ETL power tool ‣ Built into Neo4J since version 2.1 ‣ Can load data from any URL ‣ Good for medium size data (up to 10M rows)
  • 34. Bulk loading an initial data set ‣ Introducing the Neo4j Import Tool ‣ Find it in the bin folder of your Neo4j download ‣ Used to large sized initial data sets ‣ Skips the transactional layer of Neo4j and concurrently writes store files directly
  • 35. Importing into Neo4j :ID(Crime) :LABEL description export NEO=neo4j-enterprise-2.3.0 $NEO/bin/neo4j-import --into stackoverflow.db --id-type string --nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz --nodes:User extracted/Users_header.csv,extracted/Users.csv.gz --nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz --relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz --relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz --relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz --relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz
  • 36. Expects files in a certain format :ID(Crime) :LABEL descriptionpostId:ID(Post) title body Nodes userId:ID(User) displayname views Rels :START_ID(User) :END_ID(Post)
  • 37. <?xml version="1.0" encoding="utf-16"?> <posts> ... <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId="451518" LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02:50.557" LastActivityDate=" 2015-08-01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or double?" Tags="&lt;c#&gt;&lt;winforms&gt;&lt;type-conversion&gt;&lt;opacity&gt;" AnswerCount="13" CommentCount="1" FavoriteCount="28" CommunityOwnedDate="2012-10-31T16:42: 47.213" /> ... </posts> What do we have?
  • 38. <posts> ... <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate=" 2008-07-31T21:42:52.667" Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId=" 451518" LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02: 50.557" LastActivityDate="2015-08- 01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or double?" XML to CSV Java program
  • 39. The generated files $ cat extracted/Posts_header.csv "postId:ID(Post)","title","postType:INT", "createdAt","score:INT","views:INT","answers:INT", "comments:INT","favorites:INT","updatedAt"
  • 40. The generated files $ gzcat extracted/Posts.csv.gz | head -n2 "4","When setting a forms opacity should I use a decimal or double?","1","2008-07-31T21:42:52.667"," 358","24247","13","1","28","2014-07-28T10:02:50.557" "6","Why doesn’t the percentage width child in absolutely positioned parent work?","1","2008-07- 31T22:08:08.620","156","11840","5","0","7","2015-04- 26T14:37:49.673"
  • 41. The generated files $ cat extracted/PostsRels_header.csv ":START_ID(Post)",":END_ID(Post)"
  • 42. Importing into Neo4j :ID(Crime) :LABEL description export NEO=neo4j-enterprise-2.3.0 $NEO/bin/neo4j-import --into stackoverflow.db --id-type string --nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz --nodes:User extracted/Users_header.csv,extracted/Users.csv.gz --nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz --relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz --relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz --relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz --relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz IMPORT DONE in 3m 10s 661ms. Imported: 31138574 nodes 77930024 relationships 218106346 properties
  • 43. Tip: Make sure your data is clean :ID(Crime) :LABEL description ‣ Use a consistent line break style ‣ Ensure headers are consistent with data ‣ Quote Special characters ‣ Escape stray quotes ‣ Remove non-text characters
  • 44. Even more tips :ID(Crime) :LABEL description ‣ Get the fastest disk you can ‣ Use separate disk for input and output ‣ Compress your CSV files ‣ The more cores the better ‣ Separate headers from data
  • 45. The End ‣ https://github.com/mdamien/stackoverflow-neo4j ‣ http://neo4j.com/blog/import-10m-stack-overflow-questions/ ‣ http://neo4j.com/blog/cypher-load-json-from-url/ Michael Hunger @mesirii Mark Needham @markhneedham