SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Machine	
  Learning	
  in	
  the	
  Cloud	
  with	
  GraphLab	
  

Danny	
  Bickson	
  

Applied	
  machine	
  learning	
  day,	
  January	
  20,	
  2014	
  MS	
  
Needless	
  to	
  Say,	
  We	
  Need	
  
Machine	
  Learning	
  for	
  Big	
  Data	
  

6	
  Billion	
  	
  
Flickr	
  Photos	
  

28	
  Million	
  	
  
Wikipedia	
  Pages	
  

1	
  Billion	
  
Facebook	
  Users	
  

72	
  Hours	
  a	
  Minute	
  
YouTube	
  

“…	
  data	
  a	
  new	
  class	
  of	
  economic	
  asset,	
  
like	
  currency	
  or	
  gold.”	
  
Big	
  Learning	
  
How	
  will	
  we	
  
design	
  and	
  implement	
  	
  
parallel	
  learning	
  systems?	
  
	
  
A	
  ShiU	
  Towards	
  Parallelism	
  

GPUs

Multicore

Clusters

Clouds

Supercomputers

! G	
  Muatexperts	
  	
  	
  repeatedly	
  solve	
  the	
  same	
  parallel	
  
	
  rad L	
  e students

design	
  challenges:	
  
!

!

Race	
  condiZons,	
  distributed	
  state,	
  communicaZon…	
  	
  

The	
  resulZng	
  code	
  is:	
  
!

difficult	
  to	
  maintain,	
  extend,	
  debug…	
  	
  

Avoid	
  these	
  problems	
  by	
  using	
  	
  
high-­‐level	
  abstrac4ons	
  
MapReduce	
  for	
  Data-­‐Parallel	
  ML	
  
!

Excellent	
  for	
  large	
  data-­‐parallel	
  tasks!	
  
Data-Parallel

MapReduce	
  
Feature	
  	
  
ExtracZon	
  

Cross	
  
ValidaZon	
  

CompuZng	
  Sufficient	
  
StaZsZcs	
  	
  

Graph-Parallel

Is	
  there	
  more	
  to	
  
Machine	
  Learning	
  

Graphical	
  Models	
  
Gibbs	
  Sampling	
  
Belief	
  PropagaZon	
  
VariaZonal	
  Opt.	
  

Collabora4ve	
  	
  
Filtering	
  

Semi-­‐Supervised	
  	
  
Learning	
  

?	
  

Tensor	
  FactorizaZon	
  

Label	
  PropagaZon	
  
CoEM	
  

Graph	
  Analysis	
  

PageRank	
  
Triangle	
  CounZng	
  
The	
  Power	
  of	
  
Dependencies	
  
	
  
where	
  the	
  value	
  is!	
  
Carnegie Mellon University
Label	
  a	
  Face	
  and	
  Propagate	
  
Pairwise	
  similarity	
  not	
  enough…	
  

Not similar enough
to be sure
Propagate	
  SimilariZes	
  &	
  Co-­‐occurrences	
  
for	
  Accurate	
  PredicZons	
  	
  

similarity	
  
edges	
  
co-­‐occurring	
  
faces	
  
further	
  evidence	
  
CollaboraZve	
  Filtering:	
  Independent	
  Case	
  
Lord	
  of	
  the	
  Rings	
  

Star	
  Wars	
  IV	
  

Star	
  Wars	
  I	
  

Harry	
  Poder	
  

Pirates	
  of	
  the	
  Caribbean	
  	
  
CollaboraZve	
  Filtering:	
  ExploiZng	
  Dependencies	
  
Women	
  on	
  the	
  Verge	
  of	
  a	
  
Nervous	
  Breakdown	
  

The	
  CelebraZon	
  

What	
  do	
  I	
  	
  
recommend???	
  

City	
  of	
  God	
  

Wild	
  Strawberries	
  

La	
  Dolce	
  Vita	
  
Machine	
  Learning	
  Pipeline	
  
Data

Extract
Features

images	
  

faces	
  

docs	
  
movie	
  	
  
raZngs	
  

important	
  
words	
  

	
  

	
  

	
  

	
  

side	
  	
  
info	
  

Graph
Formation

similar	
  
faces	
  
	
  

shared	
  
words	
  
rated	
  
movies	
  
	
  

Structured
Machine
Learning
Algorithm
belief	
  
propagaZon	
  
	
  

LDA	
  
	
  

collaboraZve	
  
filtering	
  

Value
from
Data

face	
  
labels	
  
	
  

doc	
  
topics	
  
movie	
  
recommend.	
  
	
  
Parallelizing	
  Machine	
  Learning	
  
Data

Extract
Features

Graph
Formation

Graph	
  Ingress	
  

mostly	
  data-­‐parallel	
  

Structured
Machine
Learning
Algorithm

Graph-­‐Structured	
  
Computa4on	
  
graph-­‐parallel	
  

Value
from
Data
ML	
  Tasks	
  Beyond	
  Data-­‐Parallelism	
  	
  
Data-Parallel

Graph-Parallel

Map	
  Reduce	
  
Feature	
  	
  
ExtracZon	
  

Cross	
  
ValidaZon	
  

CompuZng	
  Sufficient	
  
StaZsZcs	
  	
  

Graphical	
  Models	
  
Gibbs	
  Sampling	
  
Belief	
  PropagaZon	
  
VariaZonal	
  Opt.	
  

Collabora4ve	
  	
  
Filtering	
  

Tensor	
  FactorizaZon	
  

Semi-­‐Supervised	
  	
  
Learning	
  
Label	
  PropagaZon	
  
CoEM	
  

Graph	
  Analysis	
  

PageRank	
  
Triangle	
  CounZng	
  
Example	
  of	
  a	
  
Graph-­‐Parallel	
  
Algorithm	
  

Carnegie Mellon University
PageRank	
  

Depends on rank
of who follows them…

Depends on rank
of who follows her

What’s the rank
of this user?

Rank?	
  

Loops	
  in	
  graph	
  è	
  Must	
  iterate!	
  
PageRank	
  IteraZon	
  
R[j]	
  

Iterate	
  unZl	
  convergence:	
  

wji	
  

R[i]	
  

“My	
  rank	
  is	
  weighted	
  	
  
average	
  of	
  my	
  friends’	
  ranks”	
  
X
R[i] = ↵ + (1 ↵)
wji R[j]
(j,i)2E

!
!

α	
  is	
  the	
  random	
  reset	
  probability	

wji	
  is	
  the	
  prob.	
  transiZoning	
  (similarity)	
  from	
  j	
  to	
  i
ProperZes	
  of	
  Graph	
  Parallel	
  Algorithms	
  
Dependency	
  
Graph	
  

Local	
  
Updates	
  

IteraZve	
  
ComputaZon	
  
My	
  Rank	
  

Friends	
  Rank	
  
Addressing	
  Graph-­‐Parallel	
  ML	
  
Data-Parallel

Map	
  Reduce	
  
Feature	
  	
  
ExtracZon	
  

Cross	
  
ValidaZon	
  

CompuZng	
  Sufficient	
  
StaZsZcs	
  	
  

Graph-Parallel

Graph-­‐Parallel	
  AbstracZon	
  
Map	
  Reduce?	
  
Graphical	
  Models	
  
Gibbs	
  Sampling	
  
Belief	
  PropagaZon	
  
VariaZonal	
  Opt.	
  

Collabora4ve	
  	
  
Filtering	
  

Tensor	
  FactorizaZon	
  

Semi-­‐Supervised	
  	
  
Learning	
  
Label	
  PropagaZon	
  
CoEM	
  

Data-­‐Mining	
  

PageRank	
  
Triangle	
  CounZng	
  
Carnegie Mellon University
Data	
  Graph	
  
Data	
  associated	
  with	
  verZces	
  and	
  edges	
  
Graph:	
  
• 	
  Social	
  Network	
  
Vertex	
  Data:	
  
• 	
  User	
  profile	
  text	
  
• 	
  Current	
  interests	
  esZmates	
  
Edge	
  Data:	
  
• 	
  Similarity	
  weights	
  	
  
How	
  do	
  we	
  program	
  	
  
graph	
  computaZon?	
  

“Think	
  like	
  a	
  Vertex.”	
  
-­‐Malewicz	
  et	
  al.	
  [SIGMOD’10]	
  
Carnegie Mellon University
Update	
  FuncZons	
  

User-­‐defined	
  program:	
  applied	
  to	
  	
  
vertex	
  transforms	
  data	
  in	
  scope	
  of	
  vertex	
  
pagerank(i,	
  scope){	
  
	
  	
  //	
  Get	
  Neighborhood	
  data	
  
	
  	
  (R[i],	
  wij,	
  R[j])	
  ßscope;	
  
	
  

//	
  Update	
  the	
  vertex	
  data
Update	
  funcZon	
  applied	
  (asynchronously)	
  	
  
	
  
	
   R[i] ← α + (1− α ) ∑ w ji × R[ j];
in	
  parallel	
  unZl	
  convergence	
  
j∈N [i]
	
  
	
  	
  //	
  Reschedule	
  Neighbors	
  if	
  needed	
  
	
  
	
  	
  if	
  R[i]	
  changes	
  then	
  	
  
	
  	
  	
  

	
  

Many	
  schedulers	
  available	
  eschedule_neighbors_of(i);	
  	
  
	
  	
  	
  	
  r to	
  prioriZze	
  computaZon	
  
}	
  

Dynamic	
  	
  
computa4on	
  
The	
  GraphLab	
  Framework	
  
Graph	
  Based	
  
Data	
  Representa4on	
  

Scheduler	
  

Update	
  FuncZons	
  
User	
  Computa4on	
  

Consistency	
  Model	
  
AlternaZng	
  Least	
  	
  
Squares	
  
CoEM	
  
Lasso	
  

SVD	
  

Belief	
  PropagaZon	
  

LDA	
  

Splash	
  Sampler	
  
Bayesian	
  Tensor	
  	
  
FactorizaZon	
  
PageRank	
  

SVM	
  
Gibbs	
  Sampling	
  
Dynamic	
  Block	
  Gibbs	
  Sampling	
  
K-­‐Means	
  
Linear	
  Solvers	
  

…Many	
  others…	
  

Matrix	
  
FactorizaZon	
  
Never	
  Ending	
  Learner	
  Project	
  (CoEM)	
  
Hadoop	
  

95	
  Cores	
  

7.5	
  hrs	
  

Distributed	
  
GraphLab	
  

32	
  EC2	
  
machines	
  

80	
  secs	
  

0.3% of Hadoop time

2 orders of mag faster è
2 orders of mag cheaper
Thus	
  far…	
  

GraphLab	
  1	
  provided	
  exciZng	
  
scaling	
  performance	
  
But…	
  

We	
  couldn’t	
  scale	
  up	
  to	
  
	
  
Altavista	
  Webgraph	
  2002
	
  

1.4B	
  ver4ces,	
  6.7B	
  edges
	
  
Carnegie Mellon University
Natural	
  Graphs	
  

Carnegie Mellon University	

[Image	
  from	
  WikiCommons]	
  
Problem:	
  
ExisZng	
  distributed	
  graph	
  
computaZon	
  systems	
  perform	
  
poorly	
  on	
  Natural	
  Graphs	
  

Carnegie Mellon University
Achilles	
  Heel:	
  	
  	
  Idealized	
  Graph	
  AssumpZon	
  
Assumed…	
  

Small	
  degree	
  è	
  	
  
Easy	
  to	
  parZZon	
  

But,	
  Natural	
  Graphs…	
  

Many	
  high	
  degree	
  verZces	
  
(power-­‐law	
  degree	
  distribuZon)	
  	
  
è	
  	
  
Very	
  hard	
  to	
  parZZon	
  
Power-­‐Law	
  Degree	
  DistribuZon	
  
10

Number	
  of	
  VerZces	
  
count

10

8

10

High-­‐Degree	
  	
  
VerZces:	
  	
  
1%	
  verZces	
  adjacent	
  
to	
  50%	
  of	
  edges	
  	
  

6

10

4

10

2

10

0

10

AltaVista	
  WebGraph	
  
1.4B	
  VerZces,	
  6.6B	
  Edges	
  
0

10

2

10

4

Degree	
  
10
degree

6

10

8

10
High	
  Degree	
  VerZces	
  are	
  Common	
  
Popular	
  Movies	
  
Users	
  

“Social”	
  People	
  

NeQlix	
  
Movies	
  

Hyper	
  Parameters	
  

θ	

 θ	


B	

θ	

 θ	


Z	

Z	

Z	

Z	

Z	

Z	

Z	

Z	

w	

w	

 Z	

Z	

w	

w	

 Z	

Z	

w	

w	

 Z	

Z	

Z	

Z	

w	

w	

w	

w	

w	

w	

w	

w	

w	

w	


Docs	
  

α	


Common	
  Words	
  

LDA	
  
Obama	
  

Words	
  
Power-­‐Law	
  Graphs	
  are	
  	
  
Difficult	
  to	
  Par44on	
  

CPU 1
!

!

CPU 2

Power-­‐Law	
  graphs	
  do	
  not	
  have	
  low-­‐cost	
  balanced	
  
cuts	
  [Leskovec	
  et	
  al.	
  08,	
  Lang	
  04]	
  
TradiZonal	
  graph-­‐parZZoning	
  algorithms	
  perform	
  
poorly	
  on	
  Power-­‐Law	
  Graphs.	
  
[Abou-­‐Rjeili	
  et	
  al.	
  06]	
  
33	
  
GraphLab	
  2	
  Solu4on	
  
Program	
  
For	
  This	
  

!
!

Run	
  on	
  This	
  
Machine 1

Machine 2

Split	
  High-­‐Degree	
  verZces	
  
New	
  Abstrac4on	
  à	
  Leads	
  to	
  this	
  Split	
  Vertex	
  Strategy	
  
GAS	
  DecomposiZon	
  
Gather	
  (Reduce)	
  

Accumulate	
  informaZon	
  
about	
  neighborhood	
  

Y	
  

Y	
  

Y	
  

⌃

+	
  	
   +	
  …	
  +	
  	
  	
  	
  	
  	
  à	
  	
  

Scader	
  

Apply	
  the	
  accumulated	
  	
  
value	
  to	
  center	
  vertex	
  

Σ	


Y	
  

Parallel	
  
“Sum”	
  

Apply	
  

Y	
  

Update	
  adjacent	
  edges	
  
and	
  verZces.	
  

Y’	
  

Y’	
  
GraphChi:	
  Going	
  small	
  with	
  GraphLab	
  
7. After

8. After

Solve	
  huge	
  problems	
  on	
  
small	
  or	
  embedded	
  
devices?	
  
Key:	
  Exploit	
  non-­‐volaZle	
  memory	
  	
  
(starZng	
  with	
  SSDs	
  and	
  HDs)	
  
GraphChi	
  –	
  disk-­‐based	
  GraphLab	
  
Challenge:	
  
	
  	
  	
  	
  Random	
  Accesses	
  

Novel	
  GraphChi	
  solu4on:	
  
	
  	
  	
  	
  Parallel	
  sliding	
  windows	
  method	
  è	
  
	
  	
  	
  	
  	
  minimizes	
  number	
  of	
  random	
  accesses	
  
GraphChi	
  –	
  disk-­‐based	
  GraphLab	
  
!

Novel	
  Parallel	
  Sliding	
  	
  
Windows	
  algorithm	
  

!

!

Fast!	
  
Solves	
  tasks	
  as	
  large	
  as	
  current	
  
distributed	
  systems	
  
Minimizes	
  non-­‐sequenZal	
  disk	
  
accesses	
  	
  
!

!

Efficient	
  on	
  both	
  SSD	
  and	
  hard-­‐
drive	
  

Parallel,	
  asynchronous	
  
execuZon	
  
Sample	
  Results	
  
Triangle	
  Coun4ng	
  

Belief	
  Propaga4on	
  

TwiYer	
  graph	
  (1.5B	
  edges)	
  

Altavista	
  Graph	
  (6.7B	
  edges)	
  

GraphChi	
  -­‐	
  1	
  Mac	
  
Mini	
  

GraphChi	
  -­‐	
  1	
  Mac	
  
Mini	
  

Hadoop	
  -­‐	
  1600	
  
nodes	
  [1]	
  

Hadoop	
  -­‐	
  100	
  
machines	
  [2]	
  
0	
  

100	
  

200	
  

300	
  

400	
  

500	
  

minutes	
  

0	
  

5	
  

[1]	
  S.	
  Suri	
  and	
  S.	
  Vassilvitskii.	
  CounZng	
  triangles	
  and	
  the	
  curse	
  of	
  the	
  last	
  reducer.	
  WWW’	
  2011	
  
[2]	
  U.	
  Kang,	
  D.	
  H.	
  Chau,	
  and	
  C.	
  Faloutsos.	
  Inference	
  of	
  Beliefs	
  on	
  Billion-­‐Scale	
  Graphs.	
  KDD-­‐LDMTA’10,	
  pages	
  1–7,	
  June	
  2010.	
  	
  

10	
  

15	
  

20	
  

25	
  

minutes	
  

30	
  
Triangle	
  CounZng	
  on	
  Twider	
  Graph	
  
40M	
  Users	
  	
  	
   Total:	
  34.8	
  Billion	
  Triangles	
  

1.2B	
  Edges	
  

Hadoop	
  

1636	
  Machines	
  
423	
  Minutes	
  
59	
  Minutes	
  
59	
  Minutes,	
  1	
  Mac	
  Mini!	
  

GraphChi	
  

GraphLab2	
  

64	
  Machines,	
  1024	
  Cores	
  
1.5	
  Minutes	
  

Hadoop results from [Suri & Vassilvitskii WWW ‘11]	
  
Efficient	
  MulZcore	
  
CollaboraZve	
  Filtering	
  

LeBuSiShu	
  team	
  –	
  	
  
5th	
  place	
  in	
  track1,	
  ACM	
  KDD	
  CUP	
  2011	
  

Yao	
  Wu	
  

Qiang	
  Yan	
  

Qing	
  Yang	
  

InsZtute	
  of	
  AutomaZon	
  
Chinese	
  Academy	
  of	
  Sciences	
  

Danny	
  Bickson	
  

Yucheng	
  Low	
  

Machine	
  Learning	
  Dept	
  
Carnegie	
  Mellon	
  University	
  

ACM	
  KDD	
  CUP	
  Workshop	
  2011	
   Carnegie Mellon University
Neylix	
  CollaboraZve	
  Filtering	
  
!

AlternaZng	
  Least	
  Squares	
  Matrix	
  FactorizaZon	
  

Model:	
  0.5	
  million	
  nodes,	
  99	
  million	
  edges	
  
	
  
4

10

3

10

Runtime(s)

	
  

MPI
Hadoop
MPI	
  

Hadoop	
  
GraphLab

2

10

GraphLab	
  

1

10

4 8

16

24

32 40
#Nodes

48

56

64
Intel	
  Labs	
  Report	
  on	
  GraphLab	
  

Data	
  source:	
  Nezih	
  Yigitbasi,	
  Intel	
  Labs	
  
ACM	
  KDD	
  CUP	
  2012	
  
GraphLab	
  team	
  @	
  WSDM	
  13	
  
Future	
  Plans	
  
Future	
  Plans	
  
Learn:	
  	
  
GraphLab	
  
Notebook	
  

Prototype:	
  	
  
pip	
  install	
  graphlab	
  	
  
è
local	
  prototyping	
  

ProducZon:	
  	
  
Same	
  code	
  scales	
  -­‐	
  	
  	
  
execute	
  on	
  EC2	
  
cluster	
  
GraphLab	
  Internship	
  Plan	
  
GraphLab	
  Conferences	
  

2012	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  è	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2013	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningConnected Data World
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OSri Ambati
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInBill Liu
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)Emil Eifrem
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsAditya Parameswaran
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 

Was ist angesagt? (20)

Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Knowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep LearningKnowledge graphs, meet Deep Learning
Knowledge graphs, meet Deep Learning
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2O
 
Artificial Intelligence at LinkedIn
Artificial Intelligence at LinkedInArtificial Intelligence at LinkedIn
Artificial Intelligence at LinkedIn
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 

Andere mochten auch

Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseAapo Kyrölä
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisMLconf
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 

Andere mochten auch (10)

Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defenseLarge-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
Large-Scale Graph Computation on Just a PC: Aapo Kyrola Ph.D. thesis defense
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 

Ähnlich wie Machine Learning in the Cloud with GraphLab

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...AMD Developer Central
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
Data coffee - Support vector machine usage with complex data
Data coffee - Support vector machine usage with complex dataData coffee - Support vector machine usage with complex data
Data coffee - Support vector machine usage with complex dataDr. Branislav Majerník
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsWQ Fan
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...Amazon Web Services
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsFrancesca Lazzeri, PhD
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: OverviewGeoffrey Fox
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 

Ähnlich wie Machine Learning in the Cloud with GraphLab (20)

CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
CC-4007, Large-Scale Machine Learning on Graphs, by Yucheng Low, Joseph Gonza...
 
F14 lec12graphs
F14 lec12graphsF14 lec12graphs
F14 lec12graphs
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
Data coffee - Support vector machine usage with complex data
Data coffee - Support vector machine usage with complex dataData coffee - Support vector machine usage with complex data
Data coffee - Support vector machine usage with complex data
 
Mr bi
Mr biMr bi
Mr bi
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
NEW LAUNCH! Graph-based Approaches for Cyber Investigative Analytics Using GP...
 
PointNet
PointNetPointNet
PointNet
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
DDBMS
DDBMSDDBMS
DDBMS
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Machine Learning in the Cloud with GraphLab

  • 1. Machine  Learning  in  the  Cloud  with  GraphLab   Danny  Bickson   Applied  machine  learning  day,  January  20,  2014  MS  
  • 2. Needless  to  Say,  We  Need   Machine  Learning  for  Big  Data   6  Billion     Flickr  Photos   28  Million     Wikipedia  Pages   1  Billion   Facebook  Users   72  Hours  a  Minute   YouTube   “…  data  a  new  class  of  economic  asset,   like  currency  or  gold.”  
  • 3. Big  Learning   How  will  we   design  and  implement     parallel  learning  systems?    
  • 4. A  ShiU  Towards  Parallelism   GPUs Multicore Clusters Clouds Supercomputers ! G  Muatexperts      repeatedly  solve  the  same  parallel    rad L  e students design  challenges:   ! ! Race  condiZons,  distributed  state,  communicaZon…     The  resulZng  code  is:   ! difficult  to  maintain,  extend,  debug…     Avoid  these  problems  by  using     high-­‐level  abstrac4ons  
  • 5. MapReduce  for  Data-­‐Parallel  ML   ! Excellent  for  large  data-­‐parallel  tasks!   Data-Parallel MapReduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graph-Parallel Is  there  more  to   Machine  Learning   Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Semi-­‐Supervised     Learning   ?   Tensor  FactorizaZon   Label  PropagaZon   CoEM   Graph  Analysis   PageRank   Triangle  CounZng  
  • 6. The  Power  of   Dependencies     where  the  value  is!   Carnegie Mellon University
  • 7. Label  a  Face  and  Propagate  
  • 8. Pairwise  similarity  not  enough…   Not similar enough to be sure
  • 9. Propagate  SimilariZes  &  Co-­‐occurrences   for  Accurate  PredicZons     similarity   edges   co-­‐occurring   faces   further  evidence  
  • 10. CollaboraZve  Filtering:  Independent  Case   Lord  of  the  Rings   Star  Wars  IV   Star  Wars  I   Harry  Poder   Pirates  of  the  Caribbean    
  • 11. CollaboraZve  Filtering:  ExploiZng  Dependencies   Women  on  the  Verge  of  a   Nervous  Breakdown   The  CelebraZon   What  do  I     recommend???   City  of  God   Wild  Strawberries   La  Dolce  Vita  
  • 12. Machine  Learning  Pipeline   Data Extract Features images   faces   docs   movie     raZngs   important   words           side     info   Graph Formation similar   faces     shared   words   rated   movies     Structured Machine Learning Algorithm belief   propagaZon     LDA     collaboraZve   filtering   Value from Data face   labels     doc   topics   movie   recommend.    
  • 13. Parallelizing  Machine  Learning   Data Extract Features Graph Formation Graph  Ingress   mostly  data-­‐parallel   Structured Machine Learning Algorithm Graph-­‐Structured   Computa4on   graph-­‐parallel   Value from Data
  • 14. ML  Tasks  Beyond  Data-­‐Parallelism     Data-Parallel Graph-Parallel Map  Reduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Tensor  FactorizaZon   Semi-­‐Supervised     Learning   Label  PropagaZon   CoEM   Graph  Analysis   PageRank   Triangle  CounZng  
  • 15. Example  of  a   Graph-­‐Parallel   Algorithm   Carnegie Mellon University
  • 16. PageRank   Depends on rank of who follows them… Depends on rank of who follows her What’s the rank of this user? Rank?   Loops  in  graph  è  Must  iterate!  
  • 17. PageRank  IteraZon   R[j]   Iterate  unZl  convergence:   wji   R[i]   “My  rank  is  weighted     average  of  my  friends’  ranks”   X R[i] = ↵ + (1 ↵) wji R[j] (j,i)2E ! ! α  is  the  random  reset  probability wji  is  the  prob.  transiZoning  (similarity)  from  j  to  i
  • 18. ProperZes  of  Graph  Parallel  Algorithms   Dependency   Graph   Local   Updates   IteraZve   ComputaZon   My  Rank   Friends  Rank  
  • 19. Addressing  Graph-­‐Parallel  ML   Data-Parallel Map  Reduce   Feature     ExtracZon   Cross   ValidaZon   CompuZng  Sufficient   StaZsZcs     Graph-Parallel Graph-­‐Parallel  AbstracZon   Map  Reduce?   Graphical  Models   Gibbs  Sampling   Belief  PropagaZon   VariaZonal  Opt.   Collabora4ve     Filtering   Tensor  FactorizaZon   Semi-­‐Supervised     Learning   Label  PropagaZon   CoEM   Data-­‐Mining   PageRank   Triangle  CounZng  
  • 21. Data  Graph   Data  associated  with  verZces  and  edges   Graph:   •   Social  Network   Vertex  Data:   •   User  profile  text   •   Current  interests  esZmates   Edge  Data:   •   Similarity  weights    
  • 22. How  do  we  program     graph  computaZon?   “Think  like  a  Vertex.”   -­‐Malewicz  et  al.  [SIGMOD’10]   Carnegie Mellon University
  • 23. Update  FuncZons   User-­‐defined  program:  applied  to     vertex  transforms  data  in  scope  of  vertex   pagerank(i,  scope){      //  Get  Neighborhood  data      (R[i],  wij,  R[j])  ßscope;     //  Update  the  vertex  data Update  funcZon  applied  (asynchronously)         R[i] ← α + (1− α ) ∑ w ji × R[ j]; in  parallel  unZl  convergence   j∈N [i]      //  Reschedule  Neighbors  if  needed        if  R[i]  changes  then             Many  schedulers  available  eschedule_neighbors_of(i);            r to  prioriZze  computaZon   }   Dynamic     computa4on  
  • 24. The  GraphLab  Framework   Graph  Based   Data  Representa4on   Scheduler   Update  FuncZons   User  Computa4on   Consistency  Model  
  • 25. AlternaZng  Least     Squares   CoEM   Lasso   SVD   Belief  PropagaZon   LDA   Splash  Sampler   Bayesian  Tensor     FactorizaZon   PageRank   SVM   Gibbs  Sampling   Dynamic  Block  Gibbs  Sampling   K-­‐Means   Linear  Solvers   …Many  others…   Matrix   FactorizaZon  
  • 26. Never  Ending  Learner  Project  (CoEM)   Hadoop   95  Cores   7.5  hrs   Distributed   GraphLab   32  EC2   machines   80  secs   0.3% of Hadoop time 2 orders of mag faster è 2 orders of mag cheaper
  • 27. Thus  far…   GraphLab  1  provided  exciZng   scaling  performance   But…   We  couldn’t  scale  up  to     Altavista  Webgraph  2002   1.4B  ver4ces,  6.7B  edges   Carnegie Mellon University
  • 28. Natural  Graphs   Carnegie Mellon University [Image  from  WikiCommons]  
  • 29. Problem:   ExisZng  distributed  graph   computaZon  systems  perform   poorly  on  Natural  Graphs   Carnegie Mellon University
  • 30. Achilles  Heel:      Idealized  Graph  AssumpZon   Assumed…   Small  degree  è     Easy  to  parZZon   But,  Natural  Graphs…   Many  high  degree  verZces   (power-­‐law  degree  distribuZon)     è     Very  hard  to  parZZon  
  • 31. Power-­‐Law  Degree  DistribuZon   10 Number  of  VerZces   count 10 8 10 High-­‐Degree     VerZces:     1%  verZces  adjacent   to  50%  of  edges     6 10 4 10 2 10 0 10 AltaVista  WebGraph   1.4B  VerZces,  6.6B  Edges   0 10 2 10 4 Degree   10 degree 6 10 8 10
  • 32. High  Degree  VerZces  are  Common   Popular  Movies   Users   “Social”  People   NeQlix   Movies   Hyper  Parameters   θ θ B θ θ Z Z Z Z Z Z Z Z w w Z Z w w Z Z w w Z Z Z Z w w w w w w w w w w Docs   α Common  Words   LDA   Obama   Words  
  • 33. Power-­‐Law  Graphs  are     Difficult  to  Par44on   CPU 1 ! ! CPU 2 Power-­‐Law  graphs  do  not  have  low-­‐cost  balanced   cuts  [Leskovec  et  al.  08,  Lang  04]   TradiZonal  graph-­‐parZZoning  algorithms  perform   poorly  on  Power-­‐Law  Graphs.   [Abou-­‐Rjeili  et  al.  06]   33  
  • 34. GraphLab  2  Solu4on   Program   For  This   ! ! Run  on  This   Machine 1 Machine 2 Split  High-­‐Degree  verZces   New  Abstrac4on  à  Leads  to  this  Split  Vertex  Strategy  
  • 35. GAS  DecomposiZon   Gather  (Reduce)   Accumulate  informaZon   about  neighborhood   Y   Y   Y   ⌃ +     +  …  +            à     Scader   Apply  the  accumulated     value  to  center  vertex   Σ Y   Parallel   “Sum”   Apply   Y   Update  adjacent  edges   and  verZces.   Y’   Y’  
  • 36. GraphChi:  Going  small  with  GraphLab   7. After 8. After Solve  huge  problems  on   small  or  embedded   devices?   Key:  Exploit  non-­‐volaZle  memory     (starZng  with  SSDs  and  HDs)  
  • 37. GraphChi  –  disk-­‐based  GraphLab   Challenge:          Random  Accesses   Novel  GraphChi  solu4on:          Parallel  sliding  windows  method  è            minimizes  number  of  random  accesses  
  • 38. GraphChi  –  disk-­‐based  GraphLab   ! Novel  Parallel  Sliding     Windows  algorithm   ! ! Fast!   Solves  tasks  as  large  as  current   distributed  systems   Minimizes  non-­‐sequenZal  disk   accesses     ! ! Efficient  on  both  SSD  and  hard-­‐ drive   Parallel,  asynchronous   execuZon  
  • 39. Sample  Results   Triangle  Coun4ng   Belief  Propaga4on   TwiYer  graph  (1.5B  edges)   Altavista  Graph  (6.7B  edges)   GraphChi  -­‐  1  Mac   Mini   GraphChi  -­‐  1  Mac   Mini   Hadoop  -­‐  1600   nodes  [1]   Hadoop  -­‐  100   machines  [2]   0   100   200   300   400   500   minutes   0   5   [1]  S.  Suri  and  S.  Vassilvitskii.  CounZng  triangles  and  the  curse  of  the  last  reducer.  WWW’  2011   [2]  U.  Kang,  D.  H.  Chau,  and  C.  Faloutsos.  Inference  of  Beliefs  on  Billion-­‐Scale  Graphs.  KDD-­‐LDMTA’10,  pages  1–7,  June  2010.     10   15   20   25   minutes   30  
  • 40. Triangle  CounZng  on  Twider  Graph   40M  Users       Total:  34.8  Billion  Triangles   1.2B  Edges   Hadoop   1636  Machines   423  Minutes   59  Minutes   59  Minutes,  1  Mac  Mini!   GraphChi   GraphLab2   64  Machines,  1024  Cores   1.5  Minutes   Hadoop results from [Suri & Vassilvitskii WWW ‘11]  
  • 41. Efficient  MulZcore   CollaboraZve  Filtering   LeBuSiShu  team  –     5th  place  in  track1,  ACM  KDD  CUP  2011   Yao  Wu   Qiang  Yan   Qing  Yang   InsZtute  of  AutomaZon   Chinese  Academy  of  Sciences   Danny  Bickson   Yucheng  Low   Machine  Learning  Dept   Carnegie  Mellon  University   ACM  KDD  CUP  Workshop  2011   Carnegie Mellon University
  • 42. Neylix  CollaboraZve  Filtering   ! AlternaZng  Least  Squares  Matrix  FactorizaZon   Model:  0.5  million  nodes,  99  million  edges     4 10 3 10 Runtime(s)   MPI Hadoop MPI   Hadoop   GraphLab 2 10 GraphLab   1 10 4 8 16 24 32 40 #Nodes 48 56 64
  • 43. Intel  Labs  Report  on  GraphLab   Data  source:  Nezih  Yigitbasi,  Intel  Labs  
  • 44. ACM  KDD  CUP  2012  
  • 45. GraphLab  team  @  WSDM  13  
  • 47. Future  Plans   Learn:     GraphLab   Notebook   Prototype:     pip  install  graphlab     è local  prototyping   ProducZon:     Same  code  scales  -­‐       execute  on  EC2   cluster  
  • 49. GraphLab  Conferences   2012                          è                    2013