Machine Learning in the Cloud with GraphLab

Machine
Learning
in
the
Cloud
with
GraphLab

Danny
Bickson

Applied
machine
learning
day,
January
20,
2014
MS

Needless
to
Say,
We
Need

Machine
Learning
for
Big
Data

6
Billion

Flickr
Photos

28
Million

Wikipedia
Pages

1
Billion

Facebook
Users

72
Hours
a
Minute

YouTube

“…
data
a
new
class
of
economic
asset,

like
currency
or
gold.”

Big
Learning

How
will
we

design
and
implement

parallel
learning
systems?

A
ShiU
Towards
Parallelism

GPUs

Multicore

Clusters

Clouds

Supercomputers

! G
Muatexperts

repeatedly
solve
the
same
parallel

rad L
e students

design
challenges:

!

!

Race
condiZons,
distributed
state,
communicaZon…

The
resulZng
code
is:

!

diﬃcult
to
maintain,
extend,
debug…

Avoid
these
problems
by
using

high-‐level
abstrac4ons

MapReduce
for
Data-‐Parallel
ML

!

Excellent
for
large
data-‐parallel
tasks!

Data-Parallel

MapReduce

Feature

ExtracZon

Cross

ValidaZon

CompuZng
Suﬃcient

StaZsZcs

Graph-Parallel

Is
there
more
to

Machine
Learning

Graphical
Models

Gibbs
Sampling

Belief
PropagaZon

VariaZonal
Opt.

Collabora4ve

Filtering

Semi-‐Supervised

Learning

?

Tensor
FactorizaZon

Label
PropagaZon

CoEM

Graph
Analysis

PageRank

Triangle
CounZng

The
Power
of

Dependencies

where
the
value
is!

Carnegie Mellon University

Label
a
Face
and
Propagate

Pairwise
similarity
not
enough…

Not similar enough
to be sure

Propagate
SimilariZes
&
Co-‐occurrences

for
Accurate
PredicZons

similarity

edges

co-‐occurring

faces

further
evidence

CollaboraZve
Filtering:
Independent
Case

Lord
of
the
Rings

Star
Wars
IV

Star
Wars
I

Harry
Poder

Pirates
of
the
Caribbean

CollaboraZve
Filtering:
ExploiZng
Dependencies

Women
on
the
Verge
of
a

Nervous
Breakdown

The
CelebraZon

What
do
I

recommend???

City
of
God

Wild
Strawberries

La
Dolce
Vita

Machine
Learning
Pipeline

Data

Extract
Features

images

faces

docs

movie

raZngs

important

words

side

info

Graph
Formation

similar

faces

shared

words

rated

movies

Structured
Machine
Learning
Algorithm
belief

propagaZon

LDA

collaboraZve

ﬁltering

Value
from
Data

face

labels

doc

topics

movie

recommend.

Parallelizing
Machine
Learning

Data

Extract
Features

Graph
Formation

Graph
Ingress

mostly
data-‐parallel

Structured
Machine
Learning
Algorithm

Graph-‐Structured

Computa4on

graph-‐parallel

Value
from
Data

ML
Tasks
Beyond
Data-‐Parallelism

Data-Parallel

Graph-Parallel

Map
Reduce

Feature

ExtracZon

Cross

ValidaZon

CompuZng
Suﬃcient

StaZsZcs

Graphical
Models

Gibbs
Sampling

Belief
PropagaZon

VariaZonal
Opt.

Collabora4ve

Filtering

Tensor
FactorizaZon

Semi-‐Supervised

Learning

Label
PropagaZon

CoEM

Graph
Analysis

PageRank

Triangle
CounZng

Example
of
a

Graph-‐Parallel

Algorithm


PageRank

Depends on rank
of who follows them…

Depends on rank
of who follows her

What’s the rank
of this user?

Rank?

Loops
in
graph
è
Must
iterate!

PageRank
IteraZon

R[j]

Iterate
unZl
convergence:

wji

R[i]

“My
rank
is
weighted

average
of
my
friends’
ranks”

X
R[i] = ↵ + (1 ↵)
wji R[j]
(j,i)2E

!
!

α
is
the
random
reset
probability

wji
is
the
prob.
transiZoning
(similarity)
from
j
to
i

ProperZes
of
Graph
Parallel
Algorithms

Dependency

Graph

Local

Updates

IteraZve

ComputaZon

My
Rank

Friends
Rank

Addressing
Graph-‐Parallel
ML

Data-Parallel

Map
Reduce

Feature

ExtracZon

Cross

ValidaZon

CompuZng
Suﬃcient

StaZsZcs

Graph-Parallel

Graph-‐Parallel
AbstracZon

Map
Reduce?

Graphical
Models

Gibbs
Sampling

Belief
PropagaZon

VariaZonal
Opt.

Collabora4ve

Filtering

Tensor
FactorizaZon

Semi-‐Supervised

Learning

Label
PropagaZon

CoEM

Data-‐Mining

PageRank

Triangle
CounZng

Data
Graph

Data
associated
with
verZces
and
edges

Graph:

• 
Social
Network

Vertex
Data:

• 
User
proﬁle
text

• 
Current
interests
esZmates

Edge
Data:

• 
Similarity
weights

How
do
we
program

graph
computaZon?

“Think
like
a
Vertex.”

-‐Malewicz
et
al.
[SIGMOD’10]


Update
FuncZons

User-‐deﬁned
program:
applied
to

vertex
transforms
data
in
scope
of
vertex

pagerank(i,
scope){

//
Get
Neighborhood
data

(R[i],
wij,
R[j])
ßscope;

//
Update
the
vertex
data
Update
funcZon
applied
(asynchronously)

R[i] ← α + (1− α ) ∑ w ji × R[ j];
in
parallel
unZl
convergence

j∈N [i]

//
Reschedule
Neighbors
if
needed

if
R[i]
changes
then

Many
schedulers
available
eschedule_neighbors_of(i);

r to
prioriZze
computaZon

}

Dynamic

computa4on

The
GraphLab
Framework

Graph
Based

Data
Representa4on

Scheduler

Update
FuncZons

User
Computa4on

Consistency
Model

AlternaZng
Least

Squares

CoEM

Lasso

SVD

Belief
PropagaZon

LDA

Splash
Sampler

Bayesian
Tensor

FactorizaZon

PageRank

SVM

Gibbs
Sampling

Dynamic
Block
Gibbs
Sampling

K-‐Means

Linear
Solvers

…Many
others…

Matrix

FactorizaZon

Never
Ending
Learner
Project
(CoEM)

Hadoop

95
Cores

7.5
hrs

Distributed

GraphLab

32
EC2

machines

80
secs

0.3% of Hadoop time

2 orders of mag faster è
2 orders of mag cheaper

Thus
far…

GraphLab
1
provided
exciZng

scaling
performance

But…

We
couldn’t
scale
up
to

Altavista
Webgraph
2002

1.4B
ver4ces,
6.7B
edges


Natural
Graphs


[Image
from
WikiCommons]

Problem:

ExisZng
distributed
graph

computaZon
systems
perform

poorly
on
Natural
Graphs


Achilles
Heel:

Idealized
Graph
AssumpZon

Assumed…

Small
degree
è

Easy
to
parZZon

But,
Natural
Graphs…

Many
high
degree
verZces

(power-‐law
degree
distribuZon)

è

Very
hard
to
parZZon

Power-‐Law
Degree
DistribuZon

10

Number
of
VerZces

count

10

8

10

High-‐Degree

VerZces:

1%
verZces
adjacent

to
50%
of
edges

6

10

4

10

2

10

0

10

AltaVista
WebGraph

1.4B
VerZces,
6.6B
Edges

0

10

2

10

4

Degree

10
degree

6

10

8

10

High
Degree
VerZces
are
Common

Popular
Movies

Users

“Social”
People

NeQlix

Movies

Hyper
Parameters

θ

θ

B

θ

θ

Z

Z

Z

Z

Z

Z

Z

Z

w

w

Z

Z

w

w

Z

Z

w

w

Z

Z

Z

Z

w

w

w

w

w

w

w

w

w

w

Docs

α

Common
Words

LDA

Obama

Words

Power-‐Law
Graphs
are

Diﬃcult
to
Par44on

CPU 1
!

!

CPU 2

Power-‐Law
graphs
do
not
have
low-‐cost
balanced

cuts
[Leskovec
et
al.
08,
Lang
04]

TradiZonal
graph-‐parZZoning
algorithms
perform

poorly
on
Power-‐Law
Graphs.

[Abou-‐Rjeili
et
al.
06]

33

GraphLab
2
Solu4on

Program

For
This

!
!

Run
on
This

Machine 1

Machine 2

Split
High-‐Degree
verZces

New
Abstrac4on
à
Leads
to
this
Split
Vertex
Strategy

GAS
DecomposiZon

Gather
(Reduce)

Accumulate
informaZon

about
neighborhood

Y

Y

Y

⌃

+

+
…
+

à

Scader

Apply
the
accumulated

value
to
center
vertex

Σ

Y

Parallel

“Sum”

Apply

Y

Update
adjacent
edges

and
verZces.

Y’

Y’

GraphChi:
Going
small
with
GraphLab

7. After

8. After

Solve
huge
problems
on

small
or
embedded

devices?

Key:
Exploit
non-‐volaZle
memory

(starZng
with
SSDs
and
HDs)

GraphChi
–
disk-‐based
GraphLab

Challenge:

Random
Accesses

Novel
GraphChi
solu4on:

Parallel
sliding
windows
method
è

minimizes
number
of
random
accesses

GraphChi
–
disk-‐based
GraphLab

!

Novel
Parallel
Sliding

Windows
algorithm

!

!

Fast!

Solves
tasks
as
large
as
current

distributed
systems

Minimizes
non-‐sequenZal
disk

accesses

!

!

Eﬃcient
on
both
SSD
and
hard-‐
drive

Parallel,
asynchronous

execuZon

Sample
Results

Triangle
Coun4ng

Belief
Propaga4on

TwiYer
graph
(1.5B
edges)

Altavista
Graph
(6.7B
edges)

GraphChi
-‐
1
Mac

Mini

GraphChi
-‐
1
Mac

Mini

Hadoop
-‐
1600

nodes
[1]

Hadoop
-‐
100

machines
[2]

0

100

200

300

400

500

minutes

0

5

[1]
S.
Suri
and
S.
Vassilvitskii.
CounZng
triangles
and
the
curse
of
the
last
reducer.
WWW’
2011

[2]
U.
Kang,
D.
H.
Chau,
and
C.
Faloutsos.
Inference
of
Beliefs
on
Billion-‐Scale
Graphs.
KDD-‐LDMTA’10,
pages
1–7,
June
2010.

10

15

20

25

minutes

30

Triangle
CounZng
on
Twider
Graph

40M
Users

Total:
34.8
Billion
Triangles

1.2B
Edges

Hadoop

1636
Machines

423
Minutes

59
Minutes

59
Minutes,
1
Mac
Mini!

GraphChi

GraphLab2

64
Machines,
1024
Cores

1.5
Minutes

Hadoop results from [Suri & Vassilvitskii WWW ‘11]

Eﬃcient
MulZcore

CollaboraZve
Filtering

LeBuSiShu
team
–

5th
place
in
track1,
ACM
KDD
CUP
2011

Yao
Wu

Qiang
Yan

Qing
Yang

InsZtute
of
AutomaZon

Chinese
Academy
of
Sciences

Danny
Bickson

Yucheng
Low

Machine
Learning
Dept

Carnegie
Mellon
University

ACM
KDD
CUP
Workshop
2011

Neylix
CollaboraZve
Filtering

!

AlternaZng
Least
Squares
Matrix
FactorizaZon

Model:
0.5
million
nodes,
99
million
edges

4

10

3

10

Runtime(s)

MPI
Hadoop
MPI

Hadoop

GraphLab

2

10

GraphLab

1

10

4 8

16

24

32 40
#Nodes

48

56

64

Intel
Labs
Report
on
GraphLab

Data
source:
Nezih
Yigitbasi,
Intel
Labs

GraphLab
team
@
WSDM
13

Future
Plans

Learn:

GraphLab

Notebook

Prototype:

pip
install
graphlab

è
local
prototyping

ProducZon:

Same
code
scales
-‐

execute
on
EC2

cluster

GraphLab
Internship
Plan

GraphLab
Conferences

2012

è

2013

Machine Learning in the Cloud with GraphLab

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Machine Learning in the Cloud with GraphLab

Ähnlich wie Machine Learning in the Cloud with GraphLab (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine Learning in the Cloud with GraphLab