A Journey Into the Emotions of Software Developers
Machine Learning in the Cloud with GraphLab
1. Machine
Learning
in
the
Cloud
with
GraphLab
Danny
Bickson
Applied
machine
learning
day,
January
20,
2014
MS
2. Needless
to
Say,
We
Need
Machine
Learning
for
Big
Data
6
Billion
Flickr
Photos
28
Million
Wikipedia
Pages
1
Billion
Facebook
Users
72
Hours
a
Minute
YouTube
“…
data
a
new
class
of
economic
asset,
like
currency
or
gold.”
3. Big
Learning
How
will
we
design
and
implement
parallel
learning
systems?
4. A
ShiU
Towards
Parallelism
GPUs
Multicore
Clusters
Clouds
Supercomputers
! G
Muatexperts
repeatedly
solve
the
same
parallel
rad L
e students
design
challenges:
!
!
Race
condiZons,
distributed
state,
communicaZon…
The
resulZng
code
is:
!
difficult
to
maintain,
extend,
debug…
Avoid
these
problems
by
using
high-‐level
abstrac4ons
5. MapReduce
for
Data-‐Parallel
ML
!
Excellent
for
large
data-‐parallel
tasks!
Data-Parallel
MapReduce
Feature
ExtracZon
Cross
ValidaZon
CompuZng
Sufficient
StaZsZcs
Graph-Parallel
Is
there
more
to
Machine
Learning
Graphical
Models
Gibbs
Sampling
Belief
PropagaZon
VariaZonal
Opt.
Collabora4ve
Filtering
Semi-‐Supervised
Learning
?
Tensor
FactorizaZon
Label
PropagaZon
CoEM
Graph
Analysis
PageRank
Triangle
CounZng
6. The
Power
of
Dependencies
where
the
value
is!
Carnegie Mellon University
11. CollaboraZve
Filtering:
ExploiZng
Dependencies
Women
on
the
Verge
of
a
Nervous
Breakdown
The
CelebraZon
What
do
I
recommend???
City
of
God
Wild
Strawberries
La
Dolce
Vita
12. Machine
Learning
Pipeline
Data
Extract
Features
images
faces
docs
movie
raZngs
important
words
side
info
Graph
Formation
similar
faces
shared
words
rated
movies
Structured
Machine
Learning
Algorithm
belief
propagaZon
LDA
collaboraZve
filtering
Value
from
Data
face
labels
doc
topics
movie
recommend.
13. Parallelizing
Machine
Learning
Data
Extract
Features
Graph
Formation
Graph
Ingress
mostly
data-‐parallel
Structured
Machine
Learning
Algorithm
Graph-‐Structured
Computa4on
graph-‐parallel
Value
from
Data
15. Example
of
a
Graph-‐Parallel
Algorithm
Carnegie Mellon University
16. PageRank
Depends on rank
of who follows them…
Depends on rank
of who follows her
What’s the rank
of this user?
Rank?
Loops
in
graph
è
Must
iterate!
17. PageRank
IteraZon
R[j]
Iterate
unZl
convergence:
wji
R[i]
“My
rank
is
weighted
average
of
my
friends’
ranks”
X
R[i] = ↵ + (1 ↵)
wji R[j]
(j,i)2E
!
!
α
is
the
random
reset
probability
wji
is
the
prob.
transiZoning
(similarity)
from
j
to
i
18. ProperZes
of
Graph
Parallel
Algorithms
Dependency
Graph
Local
Updates
IteraZve
ComputaZon
My
Rank
Friends
Rank
21. Data
Graph
Data
associated
with
verZces
and
edges
Graph:
•
Social
Network
Vertex
Data:
•
User
profile
text
•
Current
interests
esZmates
Edge
Data:
•
Similarity
weights
22. How
do
we
program
graph
computaZon?
“Think
like
a
Vertex.”
-‐Malewicz
et
al.
[SIGMOD’10]
Carnegie Mellon University
23. Update
FuncZons
User-‐defined
program:
applied
to
vertex
transforms
data
in
scope
of
vertex
pagerank(i,
scope){
//
Get
Neighborhood
data
(R[i],
wij,
R[j])
ßscope;
//
Update
the
vertex
data
Update
funcZon
applied
(asynchronously)
R[i] ← α + (1− α ) ∑ w ji × R[ j];
in
parallel
unZl
convergence
j∈N [i]
//
Reschedule
Neighbors
if
needed
if
R[i]
changes
then
Many
schedulers
available
eschedule_neighbors_of(i);
r to
prioriZze
computaZon
}
Dynamic
computa4on
24. The
GraphLab
Framework
Graph
Based
Data
Representa4on
Scheduler
Update
FuncZons
User
Computa4on
Consistency
Model
29. Problem:
ExisZng
distributed
graph
computaZon
systems
perform
poorly
on
Natural
Graphs
Carnegie Mellon University
30. Achilles
Heel:
Idealized
Graph
AssumpZon
Assumed…
Small
degree
è
Easy
to
parZZon
But,
Natural
Graphs…
Many
high
degree
verZces
(power-‐law
degree
distribuZon)
è
Very
hard
to
parZZon
32. High
Degree
VerZces
are
Common
Popular
Movies
Users
“Social”
People
NeQlix
Movies
Hyper
Parameters
θ
θ
B
θ
θ
Z
Z
Z
Z
Z
Z
Z
Z
w
w
Z
Z
w
w
Z
Z
w
w
Z
Z
Z
Z
w
w
w
w
w
w
w
w
w
w
Docs
α
Common
Words
LDA
Obama
Words
33. Power-‐Law
Graphs
are
Difficult
to
Par44on
CPU 1
!
!
CPU 2
Power-‐Law
graphs
do
not
have
low-‐cost
balanced
cuts
[Leskovec
et
al.
08,
Lang
04]
TradiZonal
graph-‐parZZoning
algorithms
perform
poorly
on
Power-‐Law
Graphs.
[Abou-‐Rjeili
et
al.
06]
33
34. GraphLab
2
Solu4on
Program
For
This
!
!
Run
on
This
Machine 1
Machine 2
Split
High-‐Degree
verZces
New
Abstrac4on
à
Leads
to
this
Split
Vertex
Strategy
35. GAS
DecomposiZon
Gather
(Reduce)
Accumulate
informaZon
about
neighborhood
Y
Y
Y
⌃
+
+
…
+
à
Scader
Apply
the
accumulated
value
to
center
vertex
Σ
Y
Parallel
“Sum”
Apply
Y
Update
adjacent
edges
and
verZces.
Y’
Y’
36. GraphChi:
Going
small
with
GraphLab
7. After
8. After
Solve
huge
problems
on
small
or
embedded
devices?
Key:
Exploit
non-‐volaZle
memory
(starZng
with
SSDs
and
HDs)
37. GraphChi
–
disk-‐based
GraphLab
Challenge:
Random
Accesses
Novel
GraphChi
solu4on:
Parallel
sliding
windows
method
è
minimizes
number
of
random
accesses
38. GraphChi
–
disk-‐based
GraphLab
!
Novel
Parallel
Sliding
Windows
algorithm
!
!
Fast!
Solves
tasks
as
large
as
current
distributed
systems
Minimizes
non-‐sequenZal
disk
accesses
!
!
Efficient
on
both
SSD
and
hard-‐
drive
Parallel,
asynchronous
execuZon
39. Sample
Results
Triangle
Coun4ng
Belief
Propaga4on
TwiYer
graph
(1.5B
edges)
Altavista
Graph
(6.7B
edges)
GraphChi
-‐
1
Mac
Mini
GraphChi
-‐
1
Mac
Mini
Hadoop
-‐
1600
nodes
[1]
Hadoop
-‐
100
machines
[2]
0
100
200
300
400
500
minutes
0
5
[1]
S.
Suri
and
S.
Vassilvitskii.
CounZng
triangles
and
the
curse
of
the
last
reducer.
WWW’
2011
[2]
U.
Kang,
D.
H.
Chau,
and
C.
Faloutsos.
Inference
of
Beliefs
on
Billion-‐Scale
Graphs.
KDD-‐LDMTA’10,
pages
1–7,
June
2010.
10
15
20
25
minutes
30
41. Efficient
MulZcore
CollaboraZve
Filtering
LeBuSiShu
team
–
5th
place
in
track1,
ACM
KDD
CUP
2011
Yao
Wu
Qiang
Yan
Qing
Yang
InsZtute
of
AutomaZon
Chinese
Academy
of
Sciences
Danny
Bickson
Yucheng
Low
Machine
Learning
Dept
Carnegie
Mellon
University
ACM
KDD
CUP
Workshop
2011
Carnegie Mellon University