SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Knowledge Discovery in Databases Group




Link Prediction in Social Networks
                           Friday, 07 May 2009


                Svitlana Volkova
      Fulbright Master Student in Computer Science
     Computing and Information Sciences Department
                 Kansas State University

             234 Nichols Hall room 218, Manhattan, KS 66506-2302
           E-mail: svitlana[AT]k-state.edu or svitlana.volkova[AT]gmail.com
             Phones: mob. +1(785) 320 0113 | work +1(785) 532 7853
Agenda

 Introduction
 Related studies
 Methodology
   Mathematical representation for link prediction task
   Similarity measures
 Experiment
   Crawling Facebook
   Facebook Database
   Visualization Tools
 Conclusions
Why am I interested in social networks?


Visible Reasons
   Young
   Curious
   Like challenging tasks




Invisible Reasons
   Links from the social network reflect social behaviors of individuals




   Quantitative and Qualitative assessment of human relationships
Phenomenon of social networks

 The person who built the modern social network theory was
  the Stanley Milgram.




       [Social network] is a map of the individuals,
     and the ways how they are related to each other.
www.trustmesecurity.com/.../case/socialnetwork
Why is it difficult to predict links in social
networks?




 Collective structure


 Highly dynamic


 Sparse
Supervised vs. unsupervised methods in link
                       prediction task
                                Unsupervised                        Single Relational Table
                                 methods use
                                various similarity                    Data representation is
                                    measures                            “propositional” = “feature
                                                                        vector” or “attribute value”
              Supervised
           methods extract
         structural features to                                     Relational Data Mining
            learn a mapping                                           Inductive Logic Programming
                function
                                                                       (ILP)
Learning a binary classifier that will predict whether a
      link exists between a given pair of nodes
J48          OneR         IB1          Logistic      NaiveBayes
                               OR
AdaBoost                   Bagging                    RandForest
                               OR
Support Vector Machines (SVM)           Genetic Programming (GP)
                               OR
Bayesian networks(BN) and Probabilistic Relational Models (PRMs)
Classification based on features of entities

 Dr. William Hsu considered the problems of predicting,
 classifying annotating friends relations in social networks by
 application feature constructing approach




 Tim Weninger proposed genetic programming-based symbolic
 regression approach to the construction of the relational
 features for link analysis task in social networks

         Entity Attributes             Graph-Based Features
      (user/pair dependent)                (relational)

  Number of neighbors             Length of shortest path
  Interests                       Neighborhood overlap
  Topic model                     Relative importance
  Geographical location


  Interest popularity             Node’s Indegree/Outdegree
  Friends/Friend’s age            Forward/Backward deleted distance
Related Investigations in link prediction area


  exploring relational structure, clustering
                                                  [Jensen 2003, Getoor 2001]


  using links to predict classes/attributes of entities
                                       [Getoor,Taskar, Koller, Provost, Jensen]


  predicting link types based on known entity classes
                                                         [Taskar, Koller 2003]
  predicting links based on location in high-dimensional space
                                                            [Hoff et al., 2003]
  ranking potential links using a single graph-based feature
                                                             [Kleinberg 2004]
Mining tasks in network-structured data
                                                                  The identity of all objects is known
Node-related Tasks                                                + some link structure is known =>
                                                                  predict unobserved links
• Node-ranking
• Node-classification
• Node-clustering                                                 New objects arrive with information
                                                                  about some of their links + info
Structure-related Tasks                                           about some attributes => predict
                                                                  links among new objects

• Link prediction
• Structured pattern mining




                                               Link
                                            Prediction
                                              Tasks


                           Link                                               Link
                                     Link Type      Link Weight
                         Existence                                         Cardinality
Mathematic representation for unobserved link
     prediction task in social networks




                                        Time
Classification of measures for link prediction
                 approaches
                                       Link Prediction
                                         Approaches




       Node-wise Similarity           Topological Pattern         Probabalistic Model
        based Approaches               based Approaches            based Approaches




            Similarity measure in                                     Probabilistic relational
                                            Node base patterns
              binary classifiers                                             models




               Pairwise kernel                                          Bayesian relational
                                            Path based patterns
                  matrices                                                   models




             Statictical relational                                    Stochastic relational
                                           Graph based patterns
                   learning                                                  models
Node-wise Similarity based Approaches
Node-wise Similarity based Approaches (cont.)
Topological pattern based Approaches
Topological pattern based Approaches (cont.)
Comparison of similarity measures
                 Common      Jaccard’s    Adamic/Adar   Preferential     Kartz
                Neighbors    Similarity     Measure      Measure        Measure
  Common
 Neighbors          1           0.92          0.94         0.31          0.61
 Jaccard’s
 Similarity        0.92          1            0.97         0.53          0.75
Adamic/Adar
  Measure          0.94         0.97           1           0.49          0.70
Preferential
 Measure           0.31         0.53          0.49              1        0.84

   Katz
  Measure          0.61         0.75          0.70         0.84            1


http://www.cs.cornell.edu/home/kleinber/link-pred.pdf                    Correlation among differemt similarity measures
                                                          1.2
                                                                                            y = 0.0736x + 0.5443                     Common
                                                                                                 R² = 0.9102                         Neighbors
                                                            1
                                                                                                              y = 0.1173x + 0.2586   Jaccard’s
                                                                                                                   R² = 0.6507       Similarity
                                                          0.8
                                                                                                            y = -0.0604x + 1.0273    Adamic/Adar
                                                                                                                 R² = 0.3531         Measure
                                                          0.6                                                                        Preferential
                                                                                                                                     Measure
                                                                                                             y = -0.073x + 1.0535
                                                          0.4                                                                        Kartz Measure
                                                                                                                  R² = 0.4092
                                                                                              y = -0.1038x + 1.0881
                                                          0.2                                      R² = 0.4681


                                                            0
                                                                    0          2              4             6               8
                                                                                   Level of similarity, x
Crawling
  Facebook Social
     Network
 Crawler is automatic program which
   explores the WWW, following the
  links and searching for information
         or building a database.
It is used to build automated indexes
   for the Web, allowing users to do
       keyword searches for Web
              documents.




                                        www2007.org/posters/poster1057.pdf
http://www.flickr.com/photos/ikhnaton2/533233247/sizes/o/
Why we are more interested in Facebook?


     Betweenness                                       Degree
                          200 millions users
        Bridge                                    Flow betweenness
      Centrality                                      centrality
    Centralization        Doubling in size     Eigenvector centrality
                           once every six           Local Bridge
       Closeness
                               months                 Prestige
     Path Length
                          by 100,000 users           Radiality
 Clustering coefficient
                               per day                 Reach
       Cohesion
        Degree                                  Structural cohesion
  (Individual-level)                           Structural equivalence
         Density
Conclusions
Prediction task for previously unobserved links in social networks


 Concept of social network + social graph representation + mining tasks in network-
   structured data
 Related studies + existed approaches
    supervised vs. unsupervised methods
    single table data representation as feature vector vs. relational data mining
    link prediction task as classification with range of induces: J48, OneR, IB1, Logistic,
     NaiveBayes etc.
    other available approaches for resolving given task e.g. SVM, GP, BN, PRMs etc.
 Mathematic representation + classification and description of similarity measures
 The experiment was planned based on crawling technique with application of free open-
   source SQL full-text search engine Sphinx on Facebook corpus
 The visualization tools for social networks graph representation


                     http://www.youtube.com/watch?v=neAAzVquaRU
Thank you for attention!!!

Weitere ähnliche Inhalte

Andere mochten auch

What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?Jérôme KUNEGIS
 
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측Kyunghoon Kim
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 

Andere mochten auch (6)

What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Arab Blogs Test
Arab Blogs TestArab Blogs Test
Arab Blogs Test
 
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
 
Ppt
PptPpt
Ppt
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 

Ähnlich wie Social Networks

Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data GraphsSelectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data GraphsWagner Andreas
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisDmitry Grapov
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talkLei Wang
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksApril Smith
 
Mining Social Graph Data
Mining Social Graph DataMining Social Graph Data
Mining Social Graph DataDrew Conway
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondNeo4j
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.MohammadMoreb
 
Chapter 10 link prediction
Chapter 10 link predictionChapter 10 link prediction
Chapter 10 link predictionAbanobZakaria1
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 
Abstract
AbstractAbstract
Abstractbutest
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012KM Chicago
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectioncsandit
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...ssuser4b1f48
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]sdnumaygmailcom
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataDmitry Grapov
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 

Ähnlich wie Social Networks (20)

Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data GraphsSelectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
Mining Social Graph Data
Mining Social Graph DataMining Social Graph Data
Mining Social Graph Data
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Chapter 10 link prediction
Chapter 10 link predictionChapter 10 link prediction
Chapter 10 link prediction
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Declarative analysis of noisy information networks
Declarative analysis of noisy information networksDeclarative analysis of noisy information networks
Declarative analysis of noisy information networks
 
Public profile
Public profilePublic profile
Public profile
 
Abstract
AbstractAbstract
Abstract
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 

Mehr von Svitlana volkova

Mehr von Svitlana volkova (18)

EACL'12 Poster
EACL'12 PosterEACL'12 Poster
EACL'12 Poster
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
Web Intelligence 2010
Web Intelligence 2010Web Intelligence 2010
Web Intelligence 2010
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
MS Thesis Short
MS Thesis ShortMS Thesis Short
MS Thesis Short
 
IEEE ISI'10
IEEE ISI'10IEEE ISI'10
IEEE ISI'10
 
MedEx'10
MedEx'10MedEx'10
MedEx'10
 
Multilingual Ner Using Wiki
Multilingual Ner Using WikiMultilingual Ner Using Wiki
Multilingual Ner Using Wiki
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Methods Of Reliability Analysis
Methods Of Reliability AnalysisMethods Of Reliability Analysis
Methods Of Reliability Analysis
 
Ohio Project
Ohio ProjectOhio Project
Ohio Project
 
Ukraine Presentation
Ukraine PresentationUkraine Presentation
Ukraine Presentation
 
Ukraine Presentation at Kansas State University
Ukraine Presentation at Kansas State UniversityUkraine Presentation at Kansas State University
Ukraine Presentation at Kansas State University
 
Communicatons Fulbright
Communicatons FulbrightCommunicatons Fulbright
Communicatons Fulbright
 
Communications Ternopil
Communications TernopilCommunications Ternopil
Communications Ternopil
 

Kürzlich hochgeladen

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Kürzlich hochgeladen (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Social Networks

  • 1. Knowledge Discovery in Databases Group Link Prediction in Social Networks Friday, 07 May 2009 Svitlana Volkova Fulbright Master Student in Computer Science Computing and Information Sciences Department Kansas State University 234 Nichols Hall room 218, Manhattan, KS 66506-2302 E-mail: svitlana[AT]k-state.edu or svitlana.volkova[AT]gmail.com Phones: mob. +1(785) 320 0113 | work +1(785) 532 7853
  • 2. Agenda  Introduction  Related studies  Methodology  Mathematical representation for link prediction task  Similarity measures  Experiment  Crawling Facebook  Facebook Database  Visualization Tools  Conclusions
  • 3. Why am I interested in social networks? Visible Reasons  Young  Curious  Like challenging tasks Invisible Reasons  Links from the social network reflect social behaviors of individuals  Quantitative and Qualitative assessment of human relationships
  • 4. Phenomenon of social networks  The person who built the modern social network theory was the Stanley Milgram. [Social network] is a map of the individuals, and the ways how they are related to each other.
  • 6. Why is it difficult to predict links in social networks?  Collective structure  Highly dynamic  Sparse
  • 7. Supervised vs. unsupervised methods in link prediction task Unsupervised  Single Relational Table methods use various similarity  Data representation is measures “propositional” = “feature vector” or “attribute value” Supervised methods extract structural features to  Relational Data Mining learn a mapping  Inductive Logic Programming function (ILP) Learning a binary classifier that will predict whether a link exists between a given pair of nodes J48 OneR IB1 Logistic NaiveBayes OR AdaBoost Bagging RandForest OR Support Vector Machines (SVM) Genetic Programming (GP) OR Bayesian networks(BN) and Probabilistic Relational Models (PRMs)
  • 8. Classification based on features of entities Dr. William Hsu considered the problems of predicting, classifying annotating friends relations in social networks by application feature constructing approach Tim Weninger proposed genetic programming-based symbolic regression approach to the construction of the relational features for link analysis task in social networks Entity Attributes Graph-Based Features (user/pair dependent) (relational)  Number of neighbors  Length of shortest path  Interests  Neighborhood overlap  Topic model  Relative importance  Geographical location  Interest popularity  Node’s Indegree/Outdegree  Friends/Friend’s age  Forward/Backward deleted distance
  • 9. Related Investigations in link prediction area  exploring relational structure, clustering [Jensen 2003, Getoor 2001]  using links to predict classes/attributes of entities [Getoor,Taskar, Koller, Provost, Jensen]  predicting link types based on known entity classes [Taskar, Koller 2003]  predicting links based on location in high-dimensional space [Hoff et al., 2003]  ranking potential links using a single graph-based feature [Kleinberg 2004]
  • 10. Mining tasks in network-structured data The identity of all objects is known Node-related Tasks + some link structure is known => predict unobserved links • Node-ranking • Node-classification • Node-clustering New objects arrive with information about some of their links + info Structure-related Tasks about some attributes => predict links among new objects • Link prediction • Structured pattern mining Link Prediction Tasks Link Link Link Type Link Weight Existence Cardinality
  • 11. Mathematic representation for unobserved link prediction task in social networks Time
  • 12. Classification of measures for link prediction approaches Link Prediction Approaches Node-wise Similarity Topological Pattern Probabalistic Model based Approaches based Approaches based Approaches Similarity measure in Probabilistic relational Node base patterns binary classifiers models Pairwise kernel Bayesian relational Path based patterns matrices models Statictical relational Stochastic relational Graph based patterns learning models
  • 14. Node-wise Similarity based Approaches (cont.)
  • 16. Topological pattern based Approaches (cont.)
  • 17. Comparison of similarity measures Common Jaccard’s Adamic/Adar Preferential Kartz Neighbors Similarity Measure Measure Measure Common Neighbors 1 0.92 0.94 0.31 0.61 Jaccard’s Similarity 0.92 1 0.97 0.53 0.75 Adamic/Adar Measure 0.94 0.97 1 0.49 0.70 Preferential Measure 0.31 0.53 0.49 1 0.84 Katz Measure 0.61 0.75 0.70 0.84 1 http://www.cs.cornell.edu/home/kleinber/link-pred.pdf Correlation among differemt similarity measures 1.2 y = 0.0736x + 0.5443 Common R² = 0.9102 Neighbors 1 y = 0.1173x + 0.2586 Jaccard’s R² = 0.6507 Similarity 0.8 y = -0.0604x + 1.0273 Adamic/Adar R² = 0.3531 Measure 0.6 Preferential Measure y = -0.073x + 1.0535 0.4 Kartz Measure R² = 0.4092 y = -0.1038x + 1.0881 0.2 R² = 0.4681 0 0 2 4 6 8 Level of similarity, x
  • 18. Crawling Facebook Social Network Crawler is automatic program which explores the WWW, following the links and searching for information or building a database. It is used to build automated indexes for the Web, allowing users to do keyword searches for Web documents. www2007.org/posters/poster1057.pdf
  • 20. Why we are more interested in Facebook? Betweenness Degree 200 millions users Bridge Flow betweenness Centrality centrality Centralization Doubling in size Eigenvector centrality once every six Local Bridge Closeness months Prestige Path Length by 100,000 users Radiality Clustering coefficient per day Reach Cohesion Degree Structural cohesion (Individual-level) Structural equivalence Density
  • 21.
  • 22.
  • 23.
  • 24. Conclusions Prediction task for previously unobserved links in social networks  Concept of social network + social graph representation + mining tasks in network- structured data  Related studies + existed approaches  supervised vs. unsupervised methods  single table data representation as feature vector vs. relational data mining  link prediction task as classification with range of induces: J48, OneR, IB1, Logistic, NaiveBayes etc.  other available approaches for resolving given task e.g. SVM, GP, BN, PRMs etc.  Mathematic representation + classification and description of similarity measures  The experiment was planned based on crawling technique with application of free open- source SQL full-text search engine Sphinx on Facebook corpus  The visualization tools for social networks graph representation http://www.youtube.com/watch?v=neAAzVquaRU
  • 25. Thank you for attention!!!