Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Community analysis using graph representation learning on social networks

478 Aufrufe

Veröffentlicht am

In a world more and more connected, new and complex interaction
patterns can be extracted in the communication between people.
This is extremely valuable for brands that can better understand
the interests of users and the trends on social media to better target
their products. In this paper, we aim to analyze the communities
that arise around commercial brands on social networks to understand
the meaning of similarity, collaboration, and interaction
among users.We exploit the network that builds around the brands
by encoding it into a graph model.We build a social network graph,
considering user nodes and friendship relations; then we compare
it with a heterogeneous graph model, where also posts and hashtags
are considered as nodes and connected to the different node
types; we finally build also a reduced network, generated by inducing
direct user-to-user connections through the intermediate
nodes (posts and hashtags). These different variants are encoded
using graph representation learning, which generates a numerical
vector for each node. Machine learning techniques are applied to
these vectors to extract valuable insights for each user and for the
communities they belong to. In the paper, we report on our experiments
performed on an emerging fashion brand on Instagram, and
we show that our approach is able to discriminate potential customers
for the brand, and to highlight meaningful sub-communities
composed by users that share the same kind of content on social
networks.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Community analysis using graph representation learning on social networks

  1. 1. Community Analysis Using Graph Representation Learning On Social Networks Marco Brambilla and Mattia Gasparini Politecnico di Milano
  2. 2. Introduction • Development of platforms such as Instagram and Facebook increased levels of interaction among people • Variety of social networks data exploited to map users behavior • Graphs perfectly fit for modeling all the interactions of these users 2
  3. 3. Problem Statement • Analysis of communities on on-line social networks, applying machine learning on graphs • Representation learning is used to extract valuable information about users inside the community • Classification of consumer and business users • Grouping of similar users 3
  4. 4. Representation Learning • Define a continuos representation for each node of the graph (embedding) to easily apply machine learning techniques on graphs • Embeddings are based on neighbourhood nodes: 4 u u :
  5. 5. Node2vec • Emeddings computations performed using node2vec algorithm[1], included in the Stanford Network Analysis Platform (SNAP) library • The algorithm calculates the embeddings solving an optimization problem: max 𝑓 𝑢 ∈𝑉 log Pr(𝑁𝑠(𝑢)|𝑓 𝑢 ) 5 [1] Grover and Leskovec. 2016. node2vec: Scalable Feature Learning for Networks.
  6. 6. Node2vec 6 OutputInput Node2vec algorithms calculates embeddings such that similarities between graph nodes and vectors are preserved.
  7. 7. Case Study • Emerging Italian fashion brand: Emporio Le Sirenuse • Products: luxury swimsuits and dresses • Case study is focused on the brand, its competitors and their communities, defined as the set of followers users on social network 7 http://www.fashiondatasensing.polimi.it/
  8. 8. Related Work • Users’ communities defined using graph’s structural properties [himelboim2017, deeb2017, guerrero2017] • Brand-related communities have a specific role, with business strategies as final target [ramadan2018, kim2014, campbell2014] • Fashion brands gain major advantages from social media [brambilla2017, schmidt2017] 8
  9. 9. Analysis Pipeline 9 The proposed solution defines a method to handle all the steps of the analysis.
  10. 10. 1 – Data Collection • Web scraping of 10 brands and their followers data from Instagram • Time window: from 1 𝑠𝑡 January 2017 to 1 𝑠𝑡 November 2017 • Final database : 400K users, 10M posts 10
  11. 11. 2 – Graph Construction • Graphs are built using several entities: users that we want to analyze (𝑈𝑡), their posts (𝑃), hashtags referenced in the posts (𝐻) and mentioned users (𝑈 𝑚) • Symmetrically, three different types of edges are defined: o 𝐸 𝑜𝑤𝑛𝑒𝑟 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑈𝑡, 𝑒2 ∈ 𝑃} o 𝐸𝑡𝑎𝑔 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑃, 𝑒2 ∈ 𝑇} o 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑃, 𝑒2 ∈ 𝑈 𝑚} 11
  12. 12. 2 – Graph Construction • Three graph models are used for the analysis: 1. Mixed network: 𝐺 𝑀 = 𝑈, 𝑃, 𝑇 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸𝑡𝑎𝑔, 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 2. Hashtags network: 𝐺ℎ = 𝑈𝑡, 𝑃, 𝑇 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸𝑡𝑎𝑔 3. Mentions network: 𝐺 𝑚 = 𝑈𝑡, 𝑈 𝑚, 𝑃 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 • 𝐺ℎ and 𝐺 𝑚 are subgraphs of 𝐺 𝑀: they map the influence of specific social media aspects 12
  13. 13. Example Hashtags Network 13 The central part of the graph features the most connected nodes, which correspond to the users that have many hashtags in common.
  14. 14. 3 – Graph Reduction • A reduction process is applied to 𝐺ℎ and 𝐺 𝑚 to obtain «classical» social networks, where the nodes are the users and the edges are weighted based on the number of shared entities: 𝑤𝑖𝑗 = 𝑡𝑖 ∩ 𝑡𝑗 , 𝑖𝑓 𝑖, 𝑗 ∈ 𝐺ℎ 𝑚𝑖 ∩ 𝑚𝑗 , 𝑖𝑓 𝑖, 𝑗 ∈ 𝐺 𝑚 where 𝑖, 𝑗 ⊂ 𝑈𝑡, 𝑡𝑖,𝑗 ⊆ 𝑇, 𝑚𝑖,𝑗 ⊆ 𝑈 𝑚 • 𝐺ℎ and 𝐺 𝑚, the reduced hashtags and reduced mentions networks, are generated 14
  15. 15. Reduced Graph Example 15 Reduced mentions network 𝐺 𝑚: edges are weighted based on number of common mentioned users.
  16. 16. 4 – Features Extraction • Both heterogeneous networks 𝐺ℎ,𝑚 and reduced networks 𝐺ℎ,𝑚 are used to extract the embeddings • Feature vectors dimension is fixed for the two types of networks: 𝑑 𝐺 = 8 and 𝑑 𝐺 = 4, respectively. • Hyper-parameter tuning for 𝑝 and 𝑞 in supervised setting 16
  17. 17. 5 – Classification • Domain specific task: «Discriminate between consumer and non-consumer users» • Ground-truth of 351 labelled users defined with domain experts • Three features set are tested: • Social media account data(#followers, #following, #posts, bio) • Complete network embeddings • Reduced network embeddings 17
  18. 18. 5 – Classification Experiment 18 Description of the user is valuable if a good fraction of the neighborhood is exploited, which is not always feasible for complete networks.
  19. 19. 5 – Classification Experiment on Reduced Networks 19 Performance and number of classified users increase with the number of user nodes included in the model, even if they are not classified: they enrich the neighborhood and, by consequence, the features vector.
  20. 20. 6 – Clustering • Hashtags reduced networks 𝐺ℎ used as proxy to content-based similarity • K-means is applied on extracted features vectors • Focus on 𝐺ℎ of Emporio Le Sirenuse community 20
  21. 21. 6 – Clustering Network Input 21 Hashtags Reduced Network 𝐺ℎ of Emporio Le Sirenuse community.
  22. 22. 6 – Clustering Features 22 Embeddings extracted from the network. First two features components are used for visualization.
  23. 23. 6 – Clustering Output 23 K selection: plot of inertia against number of clusters
  24. 24. 6 – Output Network 24 Application of clustering output to the reduced network
  25. 25. 6 – Cluster Validation: Domain Experts • Domain experts are provided with a subset of users for each cluster • Manual inspection of user profiles, providing feedback about the patterns present in each cluster 25
  26. 26. 6 – Cluster Validation: Experts Feedback • Cluster 0, 1 and 2 very well defined: professionals users, such as showrooms and other brands • Cluster 3 contains regular users that share contents about holidays in Italy • Clusters 3, 4, 5 and 6 composed mostly by regular users, too 26
  27. 27. 6 – Cluster Labels 27 Cluster labels extracted using the set of hashtags shared at least by two users inside the cluster.
  28. 28. 28 FOOD LUXURY HIPSTER INTERNATIONAL INTERIOR DESIGN VINTAGE ITALIAN HOLIDAYS 6 – Final Result
  29. 29. Conclusion • Results: • Definition of an effective method to analyze communities inside social network domain • Modeling of user similarities through network features • Detection of content-driven sub-communities • Future work: • Inclusion of time variable 29
  30. 30. Questions? Contacts: Marco Brambilla: marco.brambilla@polimi.it Mattia Gasparini: mattia.gasparini@polimi.it @marcobrambi @datascience_mi http://www.fashiondatasensing.polimi.it/ http://datascience.deib.polimi.it

×