Graph Analytics For Fun and Profit

Graph Analytics
For Fun and Profit

Hello!
I am David Bechberger
Sr. Architect for Data and Analytics at Gene by
Gene, a bioinformatics company specializing
in genetic genealogy.
You can find me at:
@bechbd
www.linkedin.com/in/davebechberger

What we do at
Swab Sequence Analysis Insight

What this talk isn’t
◎A through review of graph analytic
techniques
◎A review of all graph analytic frameworks
◎A deep dive into any of the techniques we
discuss

What this talk is
◎Where to start with Graph Analytics
◎OLTP and OLAP in Gremlin
◎Practical Examples using …..

Family
Trees
◎We all have them
◎I know them well
◎They are natural
graphs

Or more specifically this
name
owns individual
family
tree
member_of
is_known_as
is_spouse
is_first_cousin

Example - Find the names of all family members in a tree
T1
F1
I1
Bob
F2
I2
I3
I4
Steve
Joan
Rick
owns
member_of:
Husband
member_of:
Sonis_known _as
is_known _as
is_known _as
is_known _as
member_of:
Husband
member_of:
Wife
member_of:
Wife

Gremlin Example - Finding the names of all family members
for tree owner
g.V().has(‘tree’, ‘unique_id, ‘T1')
.out(‘owns’)
.sideEffect(
out('is_known_as').properties('full_name')
.store('name')
)
.out('member_of').in('member_of')
.sideEffect(
out('is_known_as').properties('full_name')
.store('name')
)
.cap('name')

◎Tinkerpop supports both
◎Gremlin can be used to
query in either
◎But their are differences….
Apache Tinkerpop Gremlin OLTP and OLAP

OLTP
◎ Depth First
◎ Lazy Evaluation - Low
memory usage
◎ Real-time (ms/sub-
sec)
Gremlin OLTP versus OLAP
OLAP
◎ Breadth First
◎ Eager evaluation -
High memory usage
◎ Long Running
(min/hour)

OLTP
◎ Cannot run certain
queries or steps (e.g.
pageRank, bulk
loading)
◎ Limited time a query
◎ Local operations
Limitations
OLAP
◎ Some steps are
prohibitive like path(),
simplePath(), etc.
◎ Barrier Steps (count(),
min(), max(), etc.)
◎ Global Operations

What insights are we going to gain
◎Who in this tree is the most important?
◎Who in this tree is 6 degrees from Kevin
Bacon?
◎Who in this tree married their first cousin?

1.
Centrality Analysis
Finding Importance

Degree
Centrality
Count the edges

Example - Who is the member of the most families?
g.V().hasLabel('individual')
.project('person', 'degree')
.by('full_name')
.by(bothE('member_of').count())
.order().by(select('degree'), decr).limit(5)

Eigenvector
Centrality
Relative importance matters
.6
.3 .5
.4
.2 .2
.2

Example - Who is in the most important individual?
g.V().hasLabel('individual')
.repeat(
groupCount('m').by('full_name')
.timeLimit(100)
).times(5).cap('m')
.order(local).by(values, decr)
.limit(local, 5).next()

PageRank
Similar to the Eigenvector
Centrality but with scaling
25
3
2
5
1
3
2
22

Example - Whose lineage exerts the most influence over this
family tree?
g.V().withComputer().hasLabel('individual')
.pageRank()
.by(bothE('member_of')).by('rank')
.order().by('rank', decr)
.valueMap('full_name', 'rank').limit(5)

Answer
Degree EigenVector PageRank
Name Value
Henry VIII 7
Charlemagne 6
Jan 5
Ferdinand VII 5
Philip II 5
Name Value
Mary 149950
Margret 124221
Henry VIII 107539
Son 90715
Daughter 86961
Name Value
Joan of the
Tower 0.784
Edward III 0.774
Elenor 0.774
John of
Eltham 0.719
Frederick
William III 0.681

And many
more...
Closeness Centrality
Betweeness Centrality
Katz Centrality
Freeman Centrality …...

Practical Examples
◎Who is the most important person in my
family's history?
◎Who in my family history has been the most
prolific?

2.
Path Analysis
Who in this tree is 6 degrees from
Kevin Bacon?

Simple
Path
Don’t Repeat yourself

Cyclic
Path
Ok then Repeat yourself

Example - What long is the lineage between Queen Victoria
and Henry VIII?
SimplePath
g.V('@I1@').repeat(timeLimit(60000)
.simplePath()).until(hasId('@I828@'))
.path().limit(1).count(local)
CyclicPath
g.V('@I1@').repeat(timeLimit(60000)
.cyclicPath()).until(hasId('@I828@'))
.path().limit(1).count(local)

SimplePath
25 steps
Answer
CyclicPath
27 steps

Practical Examples
◎How am I related to X in my family?
◎Does this family tree contain clusters of
people?

3.
Pattern Detection
Finding what is hidden

Pattern Detection in Gremlin
◎Gremlin has the ability to be imperative
○ g.V().in().out()......
◎Or Declarative
○ g.V().match(
__.as(‘a’).....as(‘b’), //predicate 1
__.as(‘b’).....as(‘c’), //predicate 2
__.as(‘c’).where(‘c’, eq(‘b’)).as(‘c’)
).select(‘b’, ‘c’)

Example - Who is married to their first cousin?
g.V().match(
__.as('e').has('individual','sex','M').as('husband'),
__.as('husband').in('is_spouse').as('wifes'),
__.as('husband').both('is_first_cousin').as('cousin'),
__.as('cousin').where('cousin',eq('wifes')).as('wife')
).select('husband',’wife')
.by('full_name').fold().unfold()

Answer
Husband Wife
1 Albert Augustus Charles Victoria /Hanover/
2 Leopold_I Margaret Teresa
3 Alexander_I the_Fierce Sybil
4 Philip_IV Mariana of_Austria

Practical Examples
◎Merging trees together based on potential
common ancestors using pattern matching

Example - Which women who married their first cousin had
the greatest number of families?
g.V().match(
__.as('e').has('individual','sex','M').as('husband'),
__.as('husband').in('is_spouse').as('wifes'),
__.as('husband').both('is_first_cousin').as('cousin'),
__.as('cousin').where('cousin',eq('wifes')).as('wife')
).select('wife')
.project('person','degree')
.by('full_name')
.by(bothE('member_of').count())
.order().by(select('degree'), decr).limit(5)

Answer
Wife Degree
1 Victoria /Hanover/ 2
2 Margaret Teresa 3
3 Sybil 4
4 Mariana of_Austria 2

Thanks!
Any questions?
You can find me at:
dave@bechberger.com
@bechbd
www.linkedin.com/in/davebechberger

Graph Analytics For Fun and Profit

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Graph Analytics For Fun and Profit

Editor's Notes