ソーシャルグラフのデータ解析2. • ( )
• @kimuras
• G(2007 )
•
•
•
11 8 5
4. Agenda
• Introduction
• The past work
• Introduction to GraphDB
• Introduction to Neo4j
• Introduction to analysis sample
11 8 5
7. mixi
30000000
ID
22500000
# of member id
15000000
7500000
0
2007 2008 2009 2010 2011
year
11 8 5
30. Relational Databases
Dump &
Denormalization
from_id to_id id name age
1 2 1 Kimura 18
1 3 2 kato 45
2 3 3 ito 21
11 8 5
31. Relational Databases
Dump &
Denormalization
from_id to_id id name age Key value
1 2 1 Kimura 18 From:1 2,3
1 3 2 kato 45 From:2 3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
32. Relational Databases
Dump &
Denormalization
from_id to_id id name age Key value
1 2 1 Kimura 18 From:1 2,3
1 3 2 kato 45 From:2 3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
33. Relational Databases
Dump &
Denormalization
from_id to_id id name age Key value
1 2 1 Kimura 18 From:1 2,3
1 3 2 kato 45 From:2 3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
34. Relational Databases
Dump &
Denormalization
from_id to_id id name age Key value
1 2 1 Kimura 18 From:1 2,3
1 3 2 kato 45 From:2 3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
35. Relational Databases
Dump &
reimplementation Denormalization
from_id to_id id name age Key value
1 2 1 Kimura 18 From:1 2,3
1 3 2 kato 45 From:2 3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
36. Relational Databases
Dump &
reimplementation Denormalization
from_id to_id id name age Key value
1
1
2
3
maintenance cost
1
2
Kimura
kato
18
45
From:1
From:2
2,3
3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
11 8 5
37. Relational Databases
Dump &
reimplementation Denormalization
from_id to_id id name age Key value
1
1
2
3
maintenance cost
1
2
Kimura
kato
18
45
From:1
From:2
2,3
3
2 3 3 ito 21 Prof:1 Kimuras,18
Prof:2 Kato,45
scalability
11 8 5
42. What is graph
Vertex (node : )
Undirected graph ( )
Edge ( )
11 8 5
46. What is graph
Vertex (node : )
Directed graph ( )
Edge ( )
11 8 5
48. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
11 8 5
49. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
NAME: ITO
PROP: Female
AGE: 21
11 8 5
50. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
NAME: ITO
PROP: Female
AGE: 21
11 8 5
51. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
NAME: ITO
PROP: Female
AGE: 21
11 8 5
52. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
ID: 3 NAME: ITO
LABEL: Like PROP: Female
Since: 2011/08/06 AGE: 21
OutGoing: 2
11 8 5
53. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
ID: 3 NAME: ITO
LABEL: Like PROP: Female
Since: 2011/08/06 AGE: 21
OutGoing: 2
11 8 5
54. What is GraphDB
ID: 1
Vertex (node : )
NAME: kimura
PROP: Male
AGE: 18
Edge ( )
ID: 2
ID: 3 NAME: ITO
LABEL: Like PROP: Female
Since: 2011/08/06 AGE: 21
OutGoing: 2
11 8 5
57. GraphDB Neo4j
• True ACID transactions
• High availability
• Scales to billions of nods and relationships
• High speed querying through traversals
Single instance(GPLv3) Multiple instance(AGPLv3)
Embedded EmbeddedGraphDatabase HighlyAvailableGraphDatabase
Standalone Neo4j Server Neo4j Server high availability mode
http://neo4j.org/
11 8 5
58. Other my favorite features
for Neo4j
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
59. Other my favorite features
for Neo4j
• RESTful APIs
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
60. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
61. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
62. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
• Implemented graph algorithm
– A*, Dijkstra
– High speed traverse
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
63. Other my favorite features
for Neo4j
• RESTful APIs
• Query Language(Cypher)
• Full indexing
– lucene
• Implemented graph algorithm
– A*, Dijkstra
– High speed traverse
• Gremlin supported
– Like a query language
http://www.tinkerpop.com/post/4633229547/tinkerpop-graph-stack
11 8 5
67. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system
Server
11 8 5
68. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
11 8 5
69. Introduction simple Neo4j usecase
Single node Multi node
Embedded
Analyses system Analyses system
Analyses system Analyses system
Server
11 8 5
70. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
11 8 5
71. Introduction simple Neo4j usecase
Single node Multi node
Analyses system
Embedded
Analyses system
Analyses system Analyses system
Server
11 8 5
72. Introduction to simple
embedded Neo4j
• Insert Vertices & make Relationships
• Single node & Embedded
• Traversal sample
11 8 5
73. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
11 8 5
74. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) {
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
11 8 5
75. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
11 8 5
76. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally {
tx.finish();
}
graphDb.shutdown();
}
}
11 8 5
77. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
11 8 5
78. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
firstNode.setProperty("Name", "Kimura");
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
11 8 5
79. Insert vertices,
make relationship
public final class InputVertex {
public static void main(final String[] args) { ID: 1
GraphDatabaseService graphDb = new NAME: kimura
EmbeddedGraphDatabase("/tmp/neo4j");
Transaction tx = graphDb.beginTx();
try {
Node firstNode = graphDb.createNode();
ID: 3
firstNode.setProperty("Name", "Kimura"); Relation: Like
Node secondNode = graphDb.createNode();
secondNode.setProperty("Name", "Kato");
firstNode.createRelationshipTo(secondNode,
DynamicRelationshipType.withName("LIKE"));
tx.success();
} finally { ID: 2
tx.finish(); NAME: Kato
}
graphDb.shutdown();
}
}
11 8 5
80. Batch Insert
• Non thread safe, non transaction
• But very fast!
public final class Batch {
public static void main(final String[] args) {
BatchInserter inserter = new BatchInserterImpl("/tmp/neo4j",
BatchInserterImpl.loadProperties("/tmp/neo4j.props"));
Map<String, Object> prop = new HashMap<String, Object>();
prop.put("Name", "Kimura");
prop.put("Age", 21);
long node1 = inserter.createNode(prop);
prop.put("Name", "Kato");
prop.put("Age", 21);
long node2 = inserter.createNode(prop);
inserter.createRelationship(node1, node2,
DynamicRelationshipType.withName("LIKE"), null);
inserter.shutdown();
}
}
11 8 5
81. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
Order.DEPTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
82. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//
Order.DEPTH_FIRST, BREADTH_FIRST
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
83. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//
Order.DEPTH_FIRST, BREADTH_FIRST
//
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
ReturnableEvaluator.ALL_BUT_START_NODE,
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
84. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//
Order.DEPTH_FIRST, BREADTH_FIRST
//
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
//
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
85. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//
Order.DEPTH_FIRST, BREADTH_FIRST
//
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
//
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
//
DynamicRelationshipType.withName("LIKE"),
Direction.OUTGOING);
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
86. Traversal sample
•
public static void main(final String[] args) {
GraphDatabaseService graphDB = new EmbeddedGraphDatabase(args[0]);
Node node = graphDB.getNodeById(1);
Traverser friends = node.traverse(
//
Order.DEPTH_FIRST, BREADTH_FIRST
//
StopEvaluator.END_OF_GRAPH, DEPTH_ONE
//
ReturnableEvaluator.ALL_BUT_START_NODE, ALL, isReturnableNode()
//
DynamicRelationshipType.withName("LIKE"),
//
Direction.OUTGOING); INCOMING, BOTH
for (Node nodeBuf : friends) {
TraversalPosition currentPosition = friends.currentPosition();
}
}
11 8 5
101. experiment
• mixi Neo4j
•
• Machine: 24 core CPU, Memory 65GB
• Neo4j: BatchInsert, community, embedded
• Data
• 1.5 60
11 8 5
102. experiment
• mixi Neo4j
•
• Machine: 24 core CPU, Memory 65GB
• Neo4j: BatchInsert, community, embedded
• Data
• 1.5 60
513m17sec (about 8.6h)
11 8 5
103. Network Dataset
• Stanford Large Network Dataset Collection
• SNAP has a Wide variety of graph data!
Social Networks Communication networks
Citation networks Collaboration networks
Web graphs Product co-purchasing networks
Internet peer-to-peer networks Road networks
Autonomous systems graphs Signed networks
Wikipedia networks and metadata Memetracker and Twitter
http://snap.stanford.edu/data/index.html
11 8 5
105. Architecture
Service
Database Analyses Visualization
(Social Graph)
11 8 5
106. Architecture
Service
Database Analyses Visualization
(Social Graph)
11 8 5
119. •
• = Vertex ( )
11 8 5
120. •
• = Vertex ( )
1 1
1
11 8 5
121. •
• = Vertex ( )
2
1 1
2
1
2
11 8 5
122. •
• = Vertex ( )
2
1 1
2
1
2
11 8 5
123. •
• = Vertex ( )
2
1 1
4
2
1
2
11 8 5
124. •
• = Vertex ( )
2
1 1
4
2
1
2
11 8 5
125. mixi
• 1000
• summary
Min 1st Que. Median Mean 3rd Que. Max
1.00 3.00 10.00 25.69 30.00 903.00
11 8 5
129. •
• ≒
=0/3=0
=1/3
11 8 5
130. •
• ≒
=0/3=0
=1/3
=2/3
11 8 5
131. •
• ≒
=0/3=0
=1/3
=2/3
=3/3=1
11 8 5
132. • 1000
• summary
Min 1st Que. Median Mean 3rd Que. Max
0.00 0.00 0.1157 0.2071 0.2667 1.000
11 8 5
140. • 2hop Social Graph
• Edge
• ( )
• Vertex
• ( )
• Gephi
http://gephi.org/
11 8 5
142. • Social Graph
•
• GraphDB
• Neo4j
• R
• Visualization
11 8 5