Weitere Ă€hnliche Inhalte Ăhnlich wie Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5 (20) KĂŒrzlich hochgeladen (20) Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 52. © 2020 TigerGraph. All Rights Reserved
Today's Presenter
2
Victor Lee
Head of Product Strategy & Developer Relations
â BS in Electrical Engineering and Computer
Science from UC Berkeley, MS in Electrical
Engineering from Stanford University
â PhD in Computer Science from Kent State
University focused on graph data mining
â 20+ years in tech industry
3. © 2020 TigerGraph. All Rights Reserved
Some Housekeeping Items
â Although your phone is muted we do want to answer your questions -
submit your questions at any time using the Q&A tab in the menu
â The webinar is being recorded and will uploaded to our website shortly
(https://www.tigergraph.com/webinars/) and the URL will be emailed
you
â If you have issues with Zoom please contact the panelists via chat
3
4. © 2020 TigerGraph. All Rights Reserved
Move Faster with TigerGraph Cloud
4
Built for agile teams who would rather build innovative applications than
procure hardware or configure and manage databases
â Start for free
â Move to production with distributed data and HA replication
5. © 2020 TigerGraph. All Rights Reserved
Todayâs Outline
5
1
3
2
Recap of Parts 1 to 4:
Path, Centrality, Community, and
Similarity Algorithms
Training a Classifier: K-Nearest Neighbors
Introduction to machine learning
Use Cases for Classification
4 Demo
GSQL Queries for training & classifying
6. © 2020 TigerGraph. All Rights Reserved
Review: Analytics with Graph Algorithms
â Graph algorithms answer fundamental questions about
connected data
â Each algorithm in a library is tool in an analytics toolkit
â Building blocks for more complex business questions
6
Specialized functions Combine to make
something better
7. © 2020 TigerGraph. All Rights Reserved
Example Questions/Analyses for Graph Algorithms
Which entity is most centrally
located?
â For delivery logistics or greatest visibility
â Closeness Centrality, Betweenness
Centrality algorithms
7
How much influence does this entity
exert over the others?
â For market penetration & buyer influence
â PageRank algorithm
Which entity has similar
relationships to this entity?
â For grouping customers, products, etc.
â Cosine Similarity, Jaccard Similarity,
SimRank, RoleSim algorithms
What are the natural community
groupings in the graph?
â For partitioning risk groups, workgroups,
product offerings, etc.
â Community Detection, MinCut algorithms
8. © 2020 TigerGraph. All Rights Reserved
Summary for Shortest Path Algorithms
Graph Gurus 26
8
1
4
3
Graph Algorithms - tools and building
blocks for analyzing graph data
GSQL Algorithm Library - runs in-database,
high-performance,
easy to read and modify
Shortest Path Algorithms - different
algorithms for weighted and unweighted
graphs
2 Learning To Use Algorithms - know what
problem they solve, pros and cons
9. © 2020 TigerGraph. All Rights Reserved
Summary for Centrality Algorithms
Graph Gurus 27
9
1
4
3
Centrality Algorithms - abstract
concepts of location and travel.
Customizing GSQL Library algorithms -
easy and familiar, like procedural SQL
PageRank - uses directed referral edges to
find the most influential nodes. Personalized
PageRank is localized.
2 Closeness and Betweenness - use shortest
paths. Betweenness is more complex.
10. © 2020 TigerGraph. All Rights Reserved
Summary for Community Detection Algorithms
Graph Gurus 29
10
1
3
2
Community Detection Algorithms
Use connectedness to decide boundaries
Strict vs. Lenient Community Rules
Black & white rules are not always helpful.
Louvain uses relative density.
Communities are Clusters, not Partitions
Don't have to include everyone.
Can overlap?
4 Pre- or Post- step with other algorithms
Many algorithms assume you start from just
one connected community
11. © 2020 TigerGraph. All Rights Reserved 11
1
3
2
Similarity is in the Eye of the Beholder
What factors matter to you? How much?
Jaccard and Cosine Similarity
Counting matches vs. measuring numerical
alignment
Graph modeling helps with Similarity
Hub-and-spoke view
4 Deeper Measures: SimRank and RoleSim
Define similarity recursively, look multiple hops
deep (globally)
Summary for Similarity Algorithms
Graph Gurus 30
12. © 2020 TigerGraph. All Rights Reserved
Some Types of Graph Algorithms
â Search
â Path Finding & Analytics
â Centrality / Ranking
â Clustering / Community Detection
â Similarity
â Classification
12
13. © 2020 TigerGraph. All Rights Reserved
Classifying 4 types of
good & bad telecom
users,using basic and
graph features
Tim Sarah JohnFred
Prankster Regular Customer Sale
s
Fraudster
Age of sim card 2 weeks 4 weeks 3 weeks 2 weeks
% of one directional calls 50% 10% 55% 60%
% rejected calls 40% 5% 28% 25%
Stable group Yes Yes No No
Many in-group connections No Yes No Yes
3-step friend relation No Yes No Yes
Prediction by machine learning with
deep link graph features
Likely Prankster Regular Customer Likely Fraudster Likely Sales
13
Download the solution brief: https://info.tigergraph.com/MachineLearning
14. © 2020 TigerGraph. All Rights Reserved
Other Use Cases
â Cold, Flu, COVID-19 or other?
14
Medical Diagnosis
â Word recognition, part of speech
â Question or statement?
â Sarcasm or not?
â Sentiment analysis
â Expected response
Understanding Natural Language
15. © 2020 TigerGraph. All Rights Reserved
1. Rule-based:
"If it walks like a duck and talks
like a duck, then it's a duck."
Common Classifiers
â The two methods are equivalent
â But how do you derive the rules / decisions?
â Not talking about classification by legislation: "Because I say so"
â Are there ways to make a "best" classifier?
15
2. Decision Tree
Walk like duck?
Talk like duck?
DUCKNot duckNot duck
Y
YN
N
16. © 2020 TigerGraph. All Rights Reserved
Classifier Induction (e.g. Learning how to classify)
16
â Need a set of training instances, where you know both (1) features
and (2) the classes (labels) of each instance.
â Use some statistical method to correlate the features to the labels.
Item/Case Feature1 Feature2 Feature3 Label:
Classification
1 red 2.3 yes A
2 blue 4.1 yes B
3 red 4.0 yes A
Learning a Classifier is one type of Supervised Machine Learning
17. © 2020 TigerGraph. All Rights Reserved
k-Nearest Neighbor Classification (kNN)
â Concept: Predict an entity's class by
looking at the classes of the
"nearest" other entities.
â Question: What is distance?
â Physical distance? â Clustering
â Or, some concept of similarity?
â How many neighbors to consider?
â Within a radius?
â Up to a certain number?
17
?
18. © 2020 TigerGraph. All Rights Reserved
k-Nearest Neighbor Classification (kNN)
Consider the K closest neighbors: from nearest to farthest:
â Pick the class that is represented most often
â The prediction depends on the value of K:
18
?
k red yellow unlabeled Prediction
2 0 2 0 yellow
3 0 2 1 yellow
4 1 2 1 yellow
5 2 2 1 red/yellow
6 3 2 1 red
7 3 2 2 red
8 3 3 2 red/yellow
9 3 3 3 red/yellow
10 4 3 3 red
11 4 4 3 red/yellow
12 4 5 3 yellow
19. © 2020 TigerGraph. All Rights Reserved
kNN as a Machine Learning Task
â As long as you have a distance/similarity function, you don't
need any additional "intelligence", exceptâŠ
â ML task: Find the best value of K
19
Distance calculation
Simple counting
Item you want
to classify
k (?)
Prediction
Could also work on optimizing the distance
function, but that's out of scope of our
current discussion.
20. © 2020 TigerGraph. All Rights Reserved
kNN for Graph Data
20
kNN is a general purpose classifier. How does it work with graphs?
â It's really about the distance/similarity function.
â For our kNN, we cosine neighborhood similarity.
â It's easy to replace this with Jaccard neighborhood similarity.
A W
B
X
Y
Z
2
3
12
1
4Similarity(A,B) =
f(shared neighbors)
21. © 2020 TigerGraph. All Rights Reserved
Cross Validation
21
â In Machine Learning, first train a model, and then validate
(check) the accuracy of the model. Split your labeled data
into subsets:
â Training set (bigger part)
â Validation set (smaller part): Use the trained model to see if
correctly predicts the actual labels in the validation set.
1 2 ... n-1 n
training set
n-Fold Cross Validation Leave-out-one Cross Validation
Repeat, leaving
out a different
one(segment)
each time
22. © 2020 TigerGraph. All Rights Reserved
TigerGraph kNN_cosine_cv algorithm
Given
â a (fully or partially) labeled graph
â a similarity measure
â a range of k values chosen by the user
For each value of k:
â For each labeled item Q in the graph:
â Predict Q's label by looking at the classes of the k closest entities
â Accuracy(k) = number (%) of correct predictions
Select the k which produced the highest Accuracy(k)
22
25. © 2020 TigerGraph. All Rights Reserved
GSQL Graph Algorithm Library
â Written in GSQL - high-level, parallelized
â Open-source, user-extensible
â Well-documented
25
docs.tigergraph.com/graph-algorithm-library
26. © 2020 TigerGraph. All Rights Reserved
TigerGraph GSQL Graph Algorithm Library
â Call each algorithm as a GSQL query
or as a RESTful endpoint
â Run the algorithms in-database (don't
export the data)
â Option to update the graph with the
algorithm results
â Able to modify/customize the
algorithms. Turing-complete
language.
â Massively parallel processing to
handle big graphs
26
27. © 2020 TigerGraph. All Rights Reserved
Summary
27
1
3
2
Graph Algorithms
Key tool for data scientists
k-Nearest Neighbors
Predicts the classification by looking at the
classes of the similar/nearby items.
What's the right value for k?
Classification Algorithms
Bridge the gap to machine learning
4 Advanced Analytics with Graph Algorithms
https://docs.tigergraph.com/graph-algorithm-library
29. © 2020 TigerGraph. All Rights Reserved
More Questions?
Join our Developer Forum
https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
Sign up for our Developer OfïŹce Hours (every Thursday at 11 AM PST)
https://info.tigergraph.com/officehours
29
30. © 2020 TigerGraph. All Rights Reserved
Additional Resources
Start Free at TigerGraph Cloud Today!
https://www.tigergraph.com/cloud/
Test Drive Online Demo
https://www.tigergraph.com/demo
Download the Developer Edition
https://www.tigergraph.com/download/
Guru Scripts
https://github.com/tigergraph/ecosys/tree/master/guru_scripts
30
31. © 2020 TigerGraph. All Rights Reserved
Upcoming Online Events
Graph Gurus 33: GSQL Writing Best Practices - Part 2
Wednesday, April 8, at 11am PDT
https://info.tigergraph.com/graph-gurus-33
31