Driving Behavioral Change for Information Management through Data-Driven Gree...
Traversing Graph Databases with Gremlin
1. Traversing Graph Databases with Gremlin
Marko A. Rodriguez
Graph Systems Architect
http://markorodriguez.com
NoSQL New York City Meetup – May 16, 2011
Gremlin G = (V, E)
May 10, 2011
6. Blueprints –> Pipes –> Groovy –> Gremlin
To understand Gremlin, its important to understand Groovy,
Blueprints, and Pipes.
Bl
ue s
pr ip
e
in P
ts
7. Gremlin is a Domain Specific Language
• Gremlin 0.7+ uses Groovy as its host language.2
• Gremlin 0.7+ takes advantage of Groovy’s meta-programming, dynamic
typing, and closure properties.
• Groovy is a superset of Java and as such, natively exposes the full JDK
to Gremlin.
2
Groovy available at http://groovy.codehaus.org/.
8. Gremlin is for Property Graphs
name = "lop"
lang = "java"
weight = 0.4 3
name = "marko"
age = 29 created
weight = 0.2
9
1 created
8 created
12
7 weight = 1.0
weight = 0.5
weight = 0.4 6
knows
knows 11 name = "peter"
age = 35
name = "josh"
4 age = 32
2
10
name = "vadas"
age = 27
weight = 1.0
created
5
name = "ripple"
lang = "java"
A graph is composed of vertices (nodes), edges (links), and properties (keys/values).
9. Gremlin is for Blueprints-enabled Graph Databases
• Blueprints can be seen as the JDBC for property graph databases.3
• Provides a collection of interfaces for graph database providers to implement.
• Provides tests to ensure the operational semantics of any implementation are correct.
• Provides numerous “helper utilities” to make working with graph databases easy.
Blueprints
3
Blueprints is available at http://blueprints.tinkerpop.com.
12. Gremlin Compiles Down to Pipes
• Pipes is a data flow framework for evaluating lazy graph traversals.4
• A Pipe extends Iterator, Iterable and can be chained together
to create processing pipelines.
• A Pipe can be embedded into another Pipe to allow for nested
processing.
Pipes
4
Pipes is available at http://pipes.tinkerpop.com.
13. A Pipes Detour - Chained Iterators
• This Pipeline takes objects of type A and turns them into objects of
type D.
D
D
A
A A Pipe1 B Pipe2 C Pipe3 D D
A D
A
Pipeline
Pipe<A,D> pipeline =
new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)
14. A Pipes Detour - Simple Example
“What are the names of the people that marko knows?”
B name=peter
knows
A knows C name=pavel
name=marko
created
created
D name=gremlin
15. A Pipes Detour - Simple Example
Pipe<Vertex,Edge> pipe1 = new OutEdgesPipe("knows");
Pipe<Edge,Vertex> pipe2 = new InVertexPipe();
Pipe<Vertex,String> pipe3 = new PropertyPipe<String>("name");
Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3);
pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));
InVertexPipe()
OutEdgesPipe("knows")
PropertyPipe("name")
B name=peter
knows
A knows C name=pavel
name=marko
created
created
D name=gremlin
HINT: The benefit of Gremlin is that this Java verbosity is reduced to g.v(‘A’).outE(‘knows’).inV.name.
17. A Pipes Detour - Creating Pipes
public class NumCharsPipe extends AbstractPipe<String,Integer> {
public Integer processNextStart() {
String word = this.starts.next();
return word.length();
}
}
When extending the base class AbstractPipe<S,E> all that is required is
an implementation of processNextStart().
20. The Many Ways of Using Gremlin
• Gremlin has a REPL to be run from the shell.
• Gremlin can be natively integrated into any Groovy class.
• Gremlin can be interacted with indirectly through Java, via Groovy.
• Gremlin has a JSR 223 ScriptEngine as well.
21. Pipe = Step: 3 Generic Steps
In general a Pipe is a “step” in Gremlin. Gremlin is founded on a collection
of atomic steps. The syntax is generally seen as step.step.step. There
are 3 categories of steps.
• transform: map the input to some output. S → T
outE, inV, paths, copySplit, fairMerge, . . .
• filter: either output the input or not. S → (S ∪ ∅)
back, except, uniqueObject, andFilter, orFilter, . . .
• sideEffect: output the input and yield a side-effect. S → S
aggregate, groupCount, . . .
22. Abstract/Derived/Inferred Adjacency
outE
in outE
V
loop(2){..} back(3)
{i
t.
sa
la
} ry
}
friend_of_a_friend_who_earns_less_than_friend_at_work
• outE, inV, etc. is low-level graph speak (the domain is the graph).
• codeveloper is high-level domain speak (the domain is software development).5
5
In this way, Gremlin can be seen as a DSL (domain-specific language) for creating DSL’s for your
graph applications. Gremlin’s domain is “the graph.” Build languages for your domain on top of Gremlin
(e.g. “software development domain”).
23. Developer's FOAF Import
Graph
Friend-Of-A-Friend
Graph
Developer Imports Friends at Work
Graph Graph
You need not make
Software Imports Friendship derived graphs explicit.
Graph Graph You can, at runtime,
compute them.
Moreover, generate
them locally, not
Developer Created globally (e.g. ``Marko's
Employment
Graph friends from work
Graph
relations").
This concept is related
to automated reasoning
and whether reasoned
relations are inserted
into the explicit graph or
Explicit Graph computed at query time.
24. Integrating the JDK (Java API)
• Groovy is the host language for Gremlin. Thus, the JDK is natively
available in a path expression.
gremlin> v.out(‘friend’){it.name.matches(‘M.*’)}.name
==>Mattias
==>Marko
==>Matt
...
gremlin> x.out(‘rated’).transform{JSONParser.get(it.uri).stars}.mean()
...
gremlin> y.outE(‘rated’).filter{TextAnalysis.isElated(it.review)}.inV
...
25. Time for a demo
marko$ gremlin
,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin>
Gremlin G = (V, E)
http://gremlin.tinkerpop.com
http://groups.google.com/group/gremlin-users
http://tinkerpop.com
26. Graph Bootcamp
http://markorodriguez.com/services/workshops/graph-bootcamp/
• June 23-24th in Chicago, Illinois.
• August X-Yth in Denver, Colorado.
• Book a private gig for your organization.