SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
DeepWalk: Online Learning of
Social Representations
Bryan Perozzi, Rami Al-Rfou, Steven Skiena
Stony Brook University
ACM SIG-KDD
August 26, 2014
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Features From Graphs
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
AdjacencyMatrix
|V|
Bryan Perozzi DeepWalk: Online Learning of Social Representations
A first step in machine learning for graphs is to
extract graph features:
● node: degree
● pairs: # of common neighbors
● groups: cluster assignments
What is a Graph Representation?
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V| d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
We can also create features by transforming the
graph into a lower dimensional latent
representation.
DeepWalk
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V|
DeepWalk
d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
DeepWalk learns a latent representation of
adjacency matrices using deep learning
techniques developed for language modeling.
Visual Example
Bryan Perozzi DeepWalk: Online Learning of Social Representations
On Zachary’s Karate Graph:
Input Output
Advantages of DeepWalk
● Anomaly Detection
● Attribute Prediction
● Clustering
● Link Prediction
● ...
|V|
DeepWalk
d << |V|
Latent Dimensions
AdjacencyMatrix
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Scalable - An online algorithm that does not
use entire graph at once
● Walks as sentences metaphor
● Works great!
● Implementation available: bit.ly/deepwalk
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Language Modeling
Learning a
representation
means learning a
mapping function
from word co-
occurrence
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Rumelhart+, 2003]
We hope that the learned
representations capture
inherent structure
[Baroni et al, 2009]
World of Word Embeddings
This is a very active research topic in NLP.
• Importance sampling and hierarchical classification were proposed to speed up training.
[F. Morin and Y.Bengio, AISTATS 2005] [Y. Bengio and J. Sencal, IEEENN 2008] [A. Mnih, G.
Hinton, NIPS 2008]
• NLP applications based on learned representations.
[Colbert et al. NLP (Almost) from Scratch, (JMLR), 2011.]
• Recurrent networks were proposed to learn sequential representations.
[Tomas Mikolov et al. ICASSP 2011]
• Composed representations learned through recursive networks were used for parsing,
paraphrase detection, and sentiment analysis.
[ R. Socher, C. Manning, A. Ng, EMNLP (2011, 2012, 2013) NIPS (2011, 2012) ACL (2012,
2013) ]
• Vector spaces of representations are developed to simplify compositionality.
[ T. Mikolov, G. Corrado, K. Chen and J. Dean, ICLR 2013, NIPS 2013]
Word Frequency in Natural Language
Co-OccurrenceMatrix
■ Words frequency in a natural
language corpus follows a power
law.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Connection: Power Laws
Vertex frequency in random walks on
scale free graphs also follows a power
law.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Vertex Frequency in SFG
■ Short truncated random walks
are sentences in an artificial
language!
■ Random walk distance is
known to be good features for
many problems
Scale Free Graph
Bryan Perozzi DeepWalk: Online Learning of Social Representations
The Cool Idea
Short random walks =
sentences
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Random Walks
■ We generate random walks for each vertex
in the graph.
■ Each short random walk has length .
■ Pick the next step uniformly from the vertex
neighbors.
■ Example:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Representation Mapping
Bryan Perozzi DeepWalk: Online Learning of Social Representations
■ Map the vertex under focus ( ) to
its representation.
■ Define a window of size
■ If = 1 and =
Maximize:
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Random Walks
2
1 3
4 5
Input: Graph Representation Mapping
Hierarchical Softmax Output: Representation
Hierarchical Softmax
Calculating involves O(V) operations
for each update! Instead:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Consider the graph vertices
as leaves of a balanced
binary tree.
● Maximizing
is equivalent to maximizing
the probability of the path
from the root to the node.
specifically, maximizingC1
C2
C3
Each of {C1,
C2,
C3
} is a
logistic binary classifier.
Learning
■ Learned parameters:
■ Vertex representations
■ Tree binary classifiers weights
■ Randomly initialize the representations.
■ For each {C1,
C2,
C3
} calculate the loss
function.
■ Use Stochastic Gradient Descent to update
both the classifier weights and the vertex
representation simultaneously.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Mikolov+, 2013]
Deep Learning for Networks
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Input: Graph
Random Walks
Representation Mapping
2
1 3
4 5Hierarchical Softmax Output: Representation
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Attribute Prediction
The Semi-Supervised Network Classification
problem:
Bryan Perozzi DeepWalk: Online Learning of Social Representations
2
1
3
4
6
5
Stony Brook
studentsGooglers
INPUT
A partially labelled graph with node
attributes.
OUTPUT
Attributes for nodes which do not
have them.
Baselines
■ Approximate Inference Techniques:
❑ weighted vote Relational Neighbor (wvRN)
■ Latent Dimensions
❑ Spectral Methods
■ SpectralClustering [Tang+, ‘11]
■ MaxModularity [Tang+, ‘09]
❑ k-means
■ EdgeCluster [Tang+, ‘09]
Bryan Perozzi DeepWalk: Online Learning of Social Representations
[Macskassy+, ‘03]
Results: BlogCatalog
Bryan Perozzi
DeepWalk performs well, especially when labels are
sparse.
Results: Flickr
Bryan Perozzi
Results: YouTube
Bryan Perozzi
Spectral Methods do not scale to large graphs.
Parallelization
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Parallelization doesn’t affect representation quality.
● The sparser the graph, the easier to achieve linear
scalability. (Feng+, NIPS ‘11)
Outline
Bryan Perozzi DeepWalk: Online Learning of Social Representations
● Introduction: Graphs as Features
● Language Modeling
● DeepWalk
● Evaluation: Network Classification
● Conclusions & Future Work
Variants / Future Work
■ Streaming
❑ No need to ever store entire graph
❑ Can build & update representation as new data
comes in.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
■ “Non-Random” Walks
❑ Many graphs occur through as a by-product of
interactions
❑ One could outside processes (users, etc) to feed
the modeling phase
❑ [This is what language modeling is doing]
Take-away
Language Modeling techniques can be
used for online learning of network
representations.
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Thanks!
Bryan Perozzi DeepWalk: Online Learning of Social Representations
Bryan Perozzi
@phanein
bperozzi@cs.stonybrook.edu
DeepWalk available at:
http://bit.ly/deepwalk

Weitere ähnliche Inhalte

Was ist angesagt?

Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
David Przybilla
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 

Was ist angesagt? (20)

Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Link prediction
Link predictionLink prediction
Link prediction
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Label propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLPLabel propagation - Semisupervised Learning with Applications to NLP
Label propagation - Semisupervised Learning with Applications to NLP
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksPR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)混合モデルとEMアルゴリズム(PRML第9章)
混合モデルとEMアルゴリズム(PRML第9章)
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
PRML勉強会 #4 @筑波大学 発表スライド
PRML勉強会 #4 @筑波大学 発表スライドPRML勉強会 #4 @筑波大学 発表スライド
PRML勉強会 #4 @筑波大学 発表スライド
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
PR 127: FaceNet
PR 127: FaceNetPR 127: FaceNet
PR 127: FaceNet
 

Ähnlich wie DeepWalk: Online Learning of Representations

Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
Janna Joceli Omena
 

Ähnlich wie DeepWalk: Online Learning of Representations (20)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
Introduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable RenderingIntroduction to 3D Computer Vision and Differentiable Rendering
Introduction to 3D Computer Vision and Differentiable Rendering
 
GraphTour Boston - Graphs for AI and ML
GraphTour Boston - Graphs for AI and MLGraphTour Boston - Graphs for AI and ML
GraphTour Boston - Graphs for AI and ML
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in RGentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
 
3D Environment HOMENavi
3D Environment HOMENavi3D Environment HOMENavi
3D Environment HOMENavi
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLab
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
 
Metrics in virtual worlds
Metrics in virtual worldsMetrics in virtual worlds
Metrics in virtual worlds
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 

Kürzlich hochgeladen

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

DeepWalk: Online Learning of Representations

  • 1. DeepWalk: Online Learning of Social Representations Bryan Perozzi, Rami Al-Rfou, Steven Skiena Stony Brook University ACM SIG-KDD August 26, 2014
  • 2. Outline Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 3. Features From Graphs ● Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... AdjacencyMatrix |V| Bryan Perozzi DeepWalk: Online Learning of Social Representations A first step in machine learning for graphs is to extract graph features: ● node: degree ● pairs: # of common neighbors ● groups: cluster assignments
  • 4. What is a Graph Representation? ● Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... |V| d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations We can also create features by transforming the graph into a lower dimensional latent representation.
  • 5. DeepWalk ● Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... |V| DeepWalk d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations DeepWalk learns a latent representation of adjacency matrices using deep learning techniques developed for language modeling.
  • 6. Visual Example Bryan Perozzi DeepWalk: Online Learning of Social Representations On Zachary’s Karate Graph: Input Output
  • 7. Advantages of DeepWalk ● Anomaly Detection ● Attribute Prediction ● Clustering ● Link Prediction ● ... |V| DeepWalk d << |V| Latent Dimensions AdjacencyMatrix Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Scalable - An online algorithm that does not use entire graph at once ● Walks as sentences metaphor ● Works great! ● Implementation available: bit.ly/deepwalk
  • 8. Outline Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 9. Language Modeling Learning a representation means learning a mapping function from word co- occurrence Bryan Perozzi DeepWalk: Online Learning of Social Representations [Rumelhart+, 2003] We hope that the learned representations capture inherent structure [Baroni et al, 2009]
  • 10. World of Word Embeddings This is a very active research topic in NLP. • Importance sampling and hierarchical classification were proposed to speed up training. [F. Morin and Y.Bengio, AISTATS 2005] [Y. Bengio and J. Sencal, IEEENN 2008] [A. Mnih, G. Hinton, NIPS 2008] • NLP applications based on learned representations. [Colbert et al. NLP (Almost) from Scratch, (JMLR), 2011.] • Recurrent networks were proposed to learn sequential representations. [Tomas Mikolov et al. ICASSP 2011] • Composed representations learned through recursive networks were used for parsing, paraphrase detection, and sentiment analysis. [ R. Socher, C. Manning, A. Ng, EMNLP (2011, 2012, 2013) NIPS (2011, 2012) ACL (2012, 2013) ] • Vector spaces of representations are developed to simplify compositionality. [ T. Mikolov, G. Corrado, K. Chen and J. Dean, ICLR 2013, NIPS 2013]
  • 11. Word Frequency in Natural Language Co-OccurrenceMatrix ■ Words frequency in a natural language corpus follows a power law. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 12. Connection: Power Laws Vertex frequency in random walks on scale free graphs also follows a power law. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 13. Vertex Frequency in SFG ■ Short truncated random walks are sentences in an artificial language! ■ Random walk distance is known to be good features for many problems Scale Free Graph Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 14. The Cool Idea Short random walks = sentences Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 15. Outline Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 16. Deep Learning for Networks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 17. Deep Learning for Networks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 18. Random Walks ■ We generate random walks for each vertex in the graph. ■ Each short random walk has length . ■ Pick the next step uniformly from the vertex neighbors. ■ Example: Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 19. Deep Learning for Networks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 20. Representation Mapping Bryan Perozzi DeepWalk: Online Learning of Social Representations ■ Map the vertex under focus ( ) to its representation. ■ Define a window of size ■ If = 1 and = Maximize:
  • 21. Deep Learning for Networks Bryan Perozzi DeepWalk: Online Learning of Social Representations Random Walks 2 1 3 4 5 Input: Graph Representation Mapping Hierarchical Softmax Output: Representation
  • 22. Hierarchical Softmax Calculating involves O(V) operations for each update! Instead: Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Consider the graph vertices as leaves of a balanced binary tree. ● Maximizing is equivalent to maximizing the probability of the path from the root to the node. specifically, maximizingC1 C2 C3 Each of {C1, C2, C3 } is a logistic binary classifier.
  • 23. Learning ■ Learned parameters: ■ Vertex representations ■ Tree binary classifiers weights ■ Randomly initialize the representations. ■ For each {C1, C2, C3 } calculate the loss function. ■ Use Stochastic Gradient Descent to update both the classifier weights and the vertex representation simultaneously. Bryan Perozzi DeepWalk: Online Learning of Social Representations [Mikolov+, 2013]
  • 24. Deep Learning for Networks Bryan Perozzi DeepWalk: Online Learning of Social Representations Input: Graph Random Walks Representation Mapping 2 1 3 4 5Hierarchical Softmax Output: Representation
  • 25. Outline Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 26. Attribute Prediction The Semi-Supervised Network Classification problem: Bryan Perozzi DeepWalk: Online Learning of Social Representations 2 1 3 4 6 5 Stony Brook studentsGooglers INPUT A partially labelled graph with node attributes. OUTPUT Attributes for nodes which do not have them.
  • 27. Baselines ■ Approximate Inference Techniques: ❑ weighted vote Relational Neighbor (wvRN) ■ Latent Dimensions ❑ Spectral Methods ■ SpectralClustering [Tang+, ‘11] ■ MaxModularity [Tang+, ‘09] ❑ k-means ■ EdgeCluster [Tang+, ‘09] Bryan Perozzi DeepWalk: Online Learning of Social Representations [Macskassy+, ‘03]
  • 28. Results: BlogCatalog Bryan Perozzi DeepWalk performs well, especially when labels are sparse.
  • 30. Results: YouTube Bryan Perozzi Spectral Methods do not scale to large graphs.
  • 31. Parallelization Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Parallelization doesn’t affect representation quality. ● The sparser the graph, the easier to achieve linear scalability. (Feng+, NIPS ‘11)
  • 32. Outline Bryan Perozzi DeepWalk: Online Learning of Social Representations ● Introduction: Graphs as Features ● Language Modeling ● DeepWalk ● Evaluation: Network Classification ● Conclusions & Future Work
  • 33. Variants / Future Work ■ Streaming ❑ No need to ever store entire graph ❑ Can build & update representation as new data comes in. Bryan Perozzi DeepWalk: Online Learning of Social Representations ■ “Non-Random” Walks ❑ Many graphs occur through as a by-product of interactions ❑ One could outside processes (users, etc) to feed the modeling phase ❑ [This is what language modeling is doing]
  • 34. Take-away Language Modeling techniques can be used for online learning of network representations. Bryan Perozzi DeepWalk: Online Learning of Social Representations
  • 35. Thanks! Bryan Perozzi DeepWalk: Online Learning of Social Representations Bryan Perozzi @phanein bperozzi@cs.stonybrook.edu DeepWalk available at: http://bit.ly/deepwalk