Presented on November 22, 2014 @ SQLSaturday #350 in Winnipeg, MB Canada
Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server... Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!
Session Level: Intermediate
2. Who am I?
My name is Stéphane Fréchette
SQL Server MVP | Consultant | Speaker | Database & BI Architect | NoSQL.
Drums, good food and fine wine. Founder @ukubu, @GatineauOuverte,
@TEDxGatineau
I have a passion for architecting, designing and building solutions that
matter.
Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com
2 | 11/25/2014 | SQLSaturday Winnipeg #350
4. Session Outline
What is a Graph?
What is Neo4j?
Data Modeling – The Property Graph
Cypher Query Language
Importing Data…
Use Cases
Demos
Resources
4 | 11/25/2014 | SQLSaturday Winnipeg #350
5. What is a Graph?
5 | 11/25/2014 | SQLSaturday Winnipeg #350
10. What is Neo4j?
An open-source graph database by Neo
Technology. Neo4j stores data in nodes
connected by directed, typed
relationships with properties on both,
also know as a Property Graph
Fully ACID compliant
Massively scalable, up to several billion
nodes/relationships/properties
Highly-available, when distributed across
multiple machines
Accessible by a convenient REST
interface or an object-oriented Java API
11/10 | 25/2014 | SQLSaturday Winnipeg #350
11. Data Modeling
From SQL Server to Graph
Property Graph
11 | 11/25/2014 | SQLSaturday Winnipeg #350
12. Example: Meetup Data In SQL Server
Member MeetupOrganizer MeetupMember Meetup
ID Member
1 Daniel
2 Stephane
3 John
4 Randy
ID Name
1 Ottawa SQL Server User
Group
2 Ottawa JavaScript
3 Ottawa Visio User Group
4 Ottawa Tableau User Group
5 Dirty Dancing Ottawa
MemberID MeetupID
2 1
1 2
3 3
2 4
3 5
MemberID MeetupID
3 1
3 2
4 2
4 4
1 5
12 | 11/25/2014 | SQLSaturday Winnipeg #350
13. Example: Meetup Data In a Graph Member Meetup
name: ‘Stephane’
name: ‘Ottawa Tableau User Group’
name: ‘Ottawa SQL Server User
Group’
name: ‘John’
name: ‘Ottawa JavaScript’
name: ‘Ottawa Visio User Group’
name: ‘Dirty Dancing Ottawa’
name: ‘Randy’
name: ‘Daniel’
13 | 11/25/2014 | SQLSaturday Winnipeg #350
14. Cypher Query Language
Cypher is a declarative graph query language that allows
for expressive and efficient querying and updating of the
graph store
Pattern-matching
Declarative: what to retrieve, not how to retrieve it
Inspired from other known Language (SQL, SPARQL, Haskell, Python)
Aggregation, Ordering, Limit
Update the Graph
14 | 11/25/2014 | SQLSaturday Winnipeg #350
15. Cypher and T-SQL
Cypher also has a number of keywords that have a direct
equivalence with SQL which makes it a curiously familiar
language
WHERE
ORDER BY
LIMIT
SUM, COUNT, STDEVP, MIN, MAX etc…
LTRIM, UPPER, LOWER, REPLACE, LEFT, RIGHT, SUBSTRING
DISTINCT
CASE (SQL Server Pros) – [:WILL_LOVE] -> (Cypher)
15 | 11/25/2014 | SQLSaturday Winnipeg #350
20. Importing Data…
Some important considerations…
Different import scenarios
Dataset size: 1000s, 100000s, 10000000s
Dataset format (source): Database, File (CSV, Spreadsheet, GraphML, Geoff),
Service, Other
Import type: Initial Bulk Load, Incremental Load, Initial Bulk Load + Incremental Load
Different import tools
Spreadsheet based
Neo4j-shell based: (Cypher, neo4j-shell-tools, Cypher LOAD CSV)
Command-line based: Batch Importer
Neo4j Brower based
ETL Tools: (Talend, Mulesoft, Pentaho Kettle)
Custom software: (Java API, REST API, Spring Data Neo4j)
20 | 11/25/2014 | SQLSaturday Winnipeg #350
21. Many different mappings
Import
Scenarios
Import
Tools
Not always clear what you should be using
Depends on your skillsets, dataset size… (lots of other stuff)
Choose wisely!
21 | 11/25/2014 | SQLSaturday Winnipeg #350
24. Importing using Spreadsheets
Very small size datasets < 1000, easy to use
Format data in
spreadsheet
Generate Cypher
statements with
formulas
Copy and Execute
Cypher in Neo4j
browser
24 | 11/25/2014 | SQLSaturday Winnipeg #350
26. Importing using neo4j-shell-tools
Small to medium size datasets
https://github.com/jexp/neo4j-shell-tools
Format data in CSV
files
Create import-cypher
commands
for
neo4j-shell-tools
Execute commands
from neo4j-shell
26 | 11/25/2014 | SQLSaturday Winnipeg #350
28. Importing using LOAD CSV
Native Cypher
Format data in
CSV files
Create
“LOAD CSV”
commands
Execute
command from
neo4j-shell or
browser
Additional
“cleanup” for
Labels and
RelTypes
28 | 11/25/2014 | SQLSaturday Winnipeg #350
30. Importing using Batch Importer
Non-transactional import, suited for very very large datasets
Format data in
TSV files
Execute Batch
Import
command
Copy store
files to Neo4j
Server
directory
Start Neo4j
Server with
generated store
files
30 | 11/25/2014 | SQLSaturday Winnipeg #350
31. Use Cases
Principal uses of Graph Database include:
Network and Data Center Management
(Queries: Impact Analysis, Root Cause Analysis, Quality-of-Service Mapping, Asset Management)
Authorization and Access
(Queries : Access Management, Interconnected Group Organization, Provenance)
Social
(Queries : Friend Recommendations, Sharing & Collaboration, Influencer Analysis)
Geo
(Queries : Routing, Logistics, Capacity Planning)
Recommendations
(Queries : Product, Social, Service, and Professional Recommendations)
Fraud Detection
http://www.neotechnology.com/neo4j-use-cases/
31 | 11/25/2014 | SQLSaturday Winnipeg #350