SlideShare ist ein Scribd-Unternehmen logo
1 von 91
DETECTION OF DISHONEST
BEHAVIORS IN ON-LINE
NETWORKS USING GRAPH-BASED
RANKING TECHNIQUES
Francisco Javier Ortega Rodríguez

Supervised by
Prof. Dr. José Antonio Troyano Jiménez
Motivation
2
Motivation
3


       WWW: Web Search
        A   new business model
          Advertisements  on the web pages
          More web traffic  More visits to (or views of) the ads



         Search   Engine Optimization (SEO) is born

          White   Hat SEO



          Black   Hat SEO                 Web Spam!
Motivation
4


       Social Networks
         Reputation   of users similar to relevance of web
         pages


         Higher   reputation can imply some benefits


         Malicious   users manipulate the TRS’s
           On-line marketplaces: money
           Social news sites: slant the contents of the web site
           Simply for “trolling” (for pleasure)
Motivation
5
Motivation
6


       Hypothesis

        The detection of dishonest behaviors in on-
        line networks can be carried out with graph-
        based techniques, flexible enough to include
        in their schemes specific information (in the
        form of features of the elements in a graph)
        about the network to be processed and the
        concrete task to be solved.
Roadmap
7
Web Spam Detection
8



       Web spam mechanisms try to increase the
        web traffic to specific web sites

       Reach the top positions of a web search
        engine
         Relatedness:      similarity to the user query
           Changing    the content of the web page


         Visibility:   relevance in the collection
           Getting   a high number of references
Web Spam Detection
9



       Content-based methods: self promotion
         HiddenHTML code
         Keyword stuffing
Web Spam Detection
10



        Link-based methods: mutual promotion
          Link-farms

          PR-sculpting
Roadmap
11
Web Spam Detection
12


        Relevant web spam detection methods:
          Link-based     approaches
            PageRank-based




            Adaptations:
                 Truncated PageRank [Castillo et al. 2007]
                 TrustRank [Gyongy et al. 2004]
Web Spam Detection
13


        Relevant web spam detection methods:
          Link-based     approaches
            Pros:
                 Tackle the link-based spam methods
                 The ranking can be directly used as the result of an user
                  query


            Cons:
                 Do not take into account the content of the web pages
                 Need human intervention in some specific parts
Web Spam Detection
14


        Relevant web spam detection methods:
          Content-based   approaches


                     Database
                                  WP’s   Size   Compressibilit   Avg. word lenght   ..
                                                y                                   .
                                   1      …           …                 …           …
                                   2      …           …                 …           …
                                   …      …           …                 …           …

                      Classifie
                         r


           Spam                     Not Spam
Web Spam Detection
15


        Relevant web spam detection methods:
          Content-based       approaches
            Pros:
                 Deal with content-based spam methods
                 Binary classification methods

            Cons:
                 Very slow in comparison to the link-based methods
                 Based on user-specified features
                 Do not take into account the topology of the web graph
Web Spam Detection
16


        Relevant web spam detection methods:
          Hybrid   approaches


                     Database   WP’s   Size   %       Compressibilit   Out-links    Avg. word   ..
                                              In-     y                / In-links   lenght      .
                                              links
                                 1      …      …            …             …             …       …
                                 2      …      …            …             …             …       …
                                 …      …      …            …             …             …       …




                                                                   Link-based
                                                                   metrics
Web Spam Detection
17


        Relevant web spam detection methods:
          Hybrid    approaches
            Pros:
                 Combine the pros of link and content-based methods.
                 Really effective in the classification of web pages


            Cons:
                 Need user-specified features for both the content and the
                  link-based heuristics.

            Opportunity:
                 Do not take advantage of the global topology of the web
                  graph
Roadmap
18
PolaritySpam
19


        Intuition
          Include   content-based knowledge in a link-based
           system.
                               Content      Propagatio    Ranking
                              Evaluation    n algorithm



                               Databas
                                  e


                             Selection of
                              sources
PolaritySpam
20


        Content Evaluation


                          Content
                         Evaluation



                          Databas
                             e
PolaritySpam
21


        Content Evaluation
          Acquire    useful knowledge from the textual content

          Content-based     heuristics
            Adequate  for spam detection
            Easy to compute
            Highest discriminative ability



          A-priori   spam likelihood of a web page
PolaritySpam
22


        Content Evaluation
          Small   set of heuristics [Ntoulas et al., 2006]
            Compressibility




            Average   length of words




         A high value of the metrics implies an a-priori
          high spam likelihood of a web page
PolaritySpam
23


        Selection of Sources




                          Databas
                             e


                         Selection of
                          sources
PolaritySpam
24


        Selection of Sources
          Automatically
                       pick a set of a-priori spam and not-
          spam web pages, Sources- and
          Sources+, respectively

          Take   into account the content-basedmi1 , mi 2 ,..., mij }
                                            M i { heuristics
            Given   a web page wpi with metrics:
PolaritySpam
25


        Selection of Sources
          Most   Spamy/Not-Spamy sources (S-NS)
                                                   Sources



          Content-based   S-NS (CS-NS)



                                    Sources


          Content-based   Graph Sources (C-GS)
PolaritySpam
26


        Propagation algorithm


                                 Propagatio    Ranking
                                 n algorithm
PolaritySpam
27


        Propagation algorithm:
          PageRank-based      algorithm



          Idea: propagate a-priori information from a
           specific set of web pages, Sources

          A-priori   scores for the Sources

                         ei ¹ 0Ûwpi Î Sources
PolaritySpam
28


        Propagation algorithm:
          Two   scores for each web page, vi:




     ei+ ¹ 0 Û wpi Î Sources+       Set of a-priori non-spam web
                                    pages
     ei- ¹ 0 Û wpi Î Sources-       Set of a-priori spam web pages
PolaritySpam
29




                    Propagatio    Ranking
                    n algorithm
PolaritySpam
30


        Evaluation:
          Dataset



          Baseline



          Evaluation   Methods

          Results
PolaritySpam
31


        Evaluation:
          Dataset
            WEBSPAM-UK           2006 (Università degli Studi di Milano)

            Metrics:
                 98 million pages
                 11,400 hosts manually labeled
                     7,423 hosts are labeled as spam

                     About 10 million web pages are labeled as spam



            Processed      with Terrier IR Platform
                 http://terrier.org
PolaritySpam
32


        Evaluation:
          Baseline:   TrustRank [Gyongy et al., 2004]

            Link-based   web spam detection method

            Personalized   PageRank equation

            Propagation   from a set of hand-picked web pages
PolaritySpam
33


        Evaluation:
          Evaluation   methods: PR-Buckets




                                         …


         Bucket 1             Bucket 2        Bucket
                                              N
PolaritySpam
34


        Evaluation:
          Evaluation   methods: PR-Buckets
             Evaluation   metric: number of spam web pages in each
              bucket




                                               …


         Bucket 1                Bucket 2                 Bucket
                                                          N
PolaritySpam
35


        Evaluation:
          Evaluation   methods: PR-Buckets
             Evaluation   metric: number of spam web pages in each
              bucket




                                               …


         Bucket 1                Bucket 2                 Bucket
                                                          N
PolaritySpam
36


        Evaluation:
          Normalized     Discounted Cumulative Gain (nDCG):
            Global   metric: measures the demotion of spam web
            pages




            Sum   the “relevance” scores of not-spam web pages
PolaritySpam
37


        Evaluation:
          Normalized     Discounted Cumulative Gain (nDCG):
            Global   metric: measures the demotion of spam web
            pages




            Sum   the “relevance” scores of not-spam web pages
PolaritySpam
38


        Evaluation:
          Normalized     Discounted Cumulative Gain (nDCG):
            Global   metric: measures the demotion of spam web
            pages




            Sum   the “relevance” scores of not-spam web pages
PolaritySpam
39


                                 Evaluation:
                                    PR-Buckets            evaluation
                                1000
     Number of Spam Web Pages




                                 100




                                  10




                                   1
                                       1   2   3      4       5        6    7     8   9   10

                                                             Buckets

                                               TrustRank    S-NS   CS-NS   C-GS
PolaritySpam
40


        Evaluation:
          nDCG   evaluation


                             nDCG
                  TrustRan   0.7381
                      k
                   S-NS      0.4230
                  CS-NS      0.8621
                   C-GS      0.8648
PolaritySpam
41


                                   Evaluation:
                                     Content-based                       heuristics
                           1000
Number of Spam Web Pages




                            100




                             10




                              1
                                    1       2      3       4         5        6     7         8        9         10

                                                                    Buckets

                                  AverageLength   Compressibility    AllMetrics   TrustRank       PolaritySpam
Roadmap
42
Trust & Reputation in Social
     Networks
43


        Trust and reputation are key concepts in
         social networks
          Similar   to the relevance of web pages in the
          WWW


        Reputation: assessment of the
         trustworthiness of a user in a social
         network, according to his behavior and the
         opinions of the other users.
Trust & Reputation in Social
     Networks
44


        Example: On-line marketplaces




          Trustworthiness as determining as the price
          Higher reputation implies more sales

          Positive and negative opinions
Trust & Reputation in Social
     Networks
45


        Main goal: gain high reputation
          Obtain   positive feedbacks from the customers
            Sell
                some bargains
            Special offers

          Give negative opinions for sellers that can be
           competitors.
          Obtain false positive opinions from other accounts

         (not necessarily other users).


          Dishonest behaviors!
Roadmap
46
Trust & Reputation in Social
     Networks
47


        TRS’s in real world
          Moderators
              Special set of users with specific responsibilities

          Example:      Slashdot.org
            A hierarchy of moderators
            A special user, No_More_Trolls, maintains a list of known
             trolls


          Drawbacks:
            Scalability
            Subjectivity
Trust & Reputation in Social
     Networks
48


        TRS’s in real world
          Unsupervised        TRS’s
            Users rate the contents of the system (and also other
             users)
            Scalability problem: rely on the users
            Subjectivity problem: decentralized


          Examples:    Digg.com, eBay.com

          Drawbacks:
              Unsupervised!
Trust & Reputation in Social
         Networks
49


        Transivity of Trust and Distrust [Guha et al.,
         2004]
            Multiplicative distrust
                The enemy of my enemy is my friend


            Additive distrust
                Don’t trust someone not trusted by someone you don’t trust


            Neutral distrust
                Don’t take into account your enemies’ opinions
Trust & Reputation in Social
         Networks
50


        Threats of TRS’s
                  Orchestrated attacks


                  Camouflage behind good behavior


                  Malicious Spies


                  Camouflage behind judgments
Trust & Reputation in Social
     Networks
51


        Threats of TRS’s
            Orchestrated attacks: obtaining positive opinions
             from other accounts (not necessarily other users)



                                                            6
                                 1                  7
                                     2

                         0                   3
                                                                 9

                                         5              8
                             4
Trust & Reputation in Social
     Networks
52


        Threats of TRS’s
            Camouflage behind good behavior: feigning good
             behavior in order to obtain positive feedback from
             others.

                                                            6
                               1                    7
                                   2

                       0                   3
                                                                9

                                       5                8
                           4
Trust & Reputation in Social
     Networks
53


        Threats of TRS’s
            Malicious spies: using an “honest” account to
             provide positive opinions to malicious users.


                                                             6
                                1                    7
                                    2

                        0               3
                                                                 9

                                                         8
                            4               5
Trust & Reputation in Social
     Networks
54


        Threats of TRS’s
            Camouflage behind judgments: giving negative
             feedback to users who can be competitors.



                                                         6
                                1                7
                                    2

                        0                   3
                                                             9

                                        5            8
                            4
Roadmap
55
PolarityTrust
56


        Intuition
          Compute  a ranking of the users in a social
           network according to their trustworthiness

          Take into account both positive and negative
           feedback

          Graph-based    ranking algorithm to obtain two
           scores ifor each node:
             PT (v )
              PT (vi ) : positive reputation of user i
                      : negative reputation of user i
PolarityTrust
57


        Intuition
          Propagation   algorithm for the opinions of the users
            Given a set of trustworthy users
            Their PT+ and PT- scores are propagated to their
             neighbors
            … and so on

                                                     6
                     1                      7
                         2


              0              3
                                                         9
                     4   5                       8
PolarityTrust
58


        Algorithm
          Propagation       schema of the opinions of the users
            Different behavior depending on the type of relation
               between the users: positive or negative

                         PT   ⁺ (b) ↑                    ⁻
                                                     PT (e) ↑
                         b                           e

               a                              d
           ⁺
         PT (a)
                         c
                                          ⁻
                                        PT (d)
                                                     f
                              ⁻
                          PT (c) ↑                       ⁺
                                                     PT (f) ↑
PolarityTrust
59


        Algorithm
          The scores of the nodes influence the scores of
          their neighbors


                              PT (vi )
PolarityTrust
60


        Algorithm
          The  scores of the nodes influence the scores of
           their neighbors


                         PT (vi )   (1 d )ei   d




     Set of
     sources
PolarityTrust
61


        Algorithm
          The scores of the nodes influence the scores of
          their neighbors

                                                              pij
                 PT (vi ) (1 d )ei   d                                     PT (v j )
                                          j In ( i )            | p jk |
                                                       k Out ( v j )




                         Direct relation with the PT+ of positively voting
                         users
PolarityTrust
62


        Algorithm
          The  scores of the nodes influence the scores of
           their neighbors

                                                 pij                                            pij
     PT (vi ) (1 d )ei   d                                    PT (v j )                                    PT (v j )
                             j In ( i )            | p jk |               j In ( i )            | p jk |
                                          k Out ( v j )                                k Out ( v j )




                                Inverse relation with the PT- of negatively voting
                                users
PolarityTrust
63


         Algorithm
           The  scores of the nodes influence the scores of
            their neighbors

                                                 pij                                            pij
     PT (vi ) (1 d )ei   d                                    PT (v j )                                    PT (v j )
                             j In ( i )            | p jk |               j In ( i )            | p jk |
                                          k Out ( v j )                                k Out ( v j )




                                                 pij                                            pij
     PT (vi ) (1 d )ei   d                                    PT (v j )                                    PT (v j )
                             j In ( i )            | p jk |               j In ( i )            | p jk |
                                          k Out ( v j )                                k Out ( v j )
PolarityTrust
64


        Non-Negative Propagation
          Problems caused by negative opinions from
          malicious users

          Solution:
                   dynamically avoid the propagation of
          these opinions from malicious users
                                PR⁻(b) ↑
                                b

                           a
                       ⁻
                   PR (a)
                                c
                                    ⁺
                                PR (c) ↑
PolarityTrust
65


        Action-Reaction Propagation
          Problems     caused by dishonest voting attacks
            Positive   votes to malicious users
                 Orchestrated attacks, malicious spies…
            Negative   votes to good users
                 Camouflage behind bad judgments


          React   against bad actions: dishonest voting
            Penalize users who performs these actions
            Proportional to the trustworthiness of the nodes been
             affected
PolarityTrust
66


        Action-Reaction Propagation
          Computation:
            Relation between the number of dishonest votes and
            the total number of votes

            Applied   after each iteration of the ranking algorithm

                           b
                                                       a

                                       d



                               c
PolarityTrust
67


        Complete Formulation
PolarityTrust
68




        Evaluation
          Datasets



          Baselines



          Results
PolarityTrust
69


        Evaluation
          Datasets
            Barabasi   & Albert
                 Preferential attachment property

            Randomly     generated attacks

            Metrics   of the dataset
                 104nodes per graph
                 103 malicious users
                 100 malicious spies
PolarityTrust
70


        Evaluation
          Datasets
            Slashdot   Zoo
                 Graph of users in Slashdot.org
                 Friend and Foe relationships
                 Gold Standard = list of Foes of the special user
                  No_More_Trolls


            Metrics   of the dataset
                 71,500 users in total
                 24% negative edges
                 96 known trolls
                 Source set: CmdrTaco and his friends  6 users in total
PolarityTrust
71


        Evaluation
          Baselines
            EigenTrust    [Kamvar et al. 2003]
                 It does not take into account negative opinions


            Fans   Minus Freaks
                 (Number of friends – Number of foes)

            Signed   Spectral Ranking [Kunegis et al. 2009]

            Negative   Ranking [Kunegis et al. 2009]
PolarityTrust
72


           Evaluation
             Results:       Randomly generated datasets
                nDCG

     Threats   ET         FmF      SR    NR      PTNN     PTAR      PT
        A       0.833     0.843 0.599 0.749      0.876     0.906    0.987
       AB       0.833     0.844    0.811 0.920   0.876     0.906    0.987
      ABC       0.842     0.719 0.816 0.920      0.877     0.903    0.984
      ABCD      0.823     0.723 0.818 0.937      0.879     0.903    0.984
     ABCDE      0.753     0.777 0.877 0.933      0.966     0.862    0.982

       A: No estrategies
       B: Orchestrated attack              ET: EigenTrust
                                                                   PTNN: Non-Negative Propagation
       C: Camouflage behind good           FmF: Fans Minus
                                                                   PTAR: Action-Reaction
       behaviors                           Freaks
                                                                   Propagation
       D: Malicious spies                  SR: Spectral Ranking
                                                                   PT: PolarityTrust
       E: Camouflage behind judgments      NR: Negative Ranking
PolarityTrust
73


        Evaluation
          Results:      Slashdot Zoo dataset


              ET      FmF       SR        NR        PTNN         PTAR    PT
     nDCG     0.31    0.460     0.479     0.477     0.593        0.570   0.588
              0



         ET: EigenTrust
                                PTNN: Non-Negative Propagation
         FmF: Fans Minus
                                PTAR: Action-Reaction
         Freaks
                                Propagation
         SR: Spectral Ranking
                                PT: PolarityTrust
         NR: Negative Ranking
PolarityTrust
74


         Evaluation
               Results:     Trolling Slashdot
                 nDCG

     Threats     ET      FmF     SR    NR    PTNN     PTAR     PT
          A      0.310 0.460 0.479 0.477     0.593     0.570   0.588
       AB        0.308 0.460 0.478 0.477     0.593     0.570   0.588
      ABC        0.311 0.460 0.474 0.484     0.593     0.570   0.588
      ABCD       0.370 0.476 0.501 0.501     0.580     0.570   0.586
     ABCDE       0.370 0.475 0.501 0.496     0.580     0.574   0.588

      A: No estrategies
      B: Orchestrated attack            ET: EigenTrust
                                                               PTNN: Non-Negative Propagation
      C: Camouflage behind good         FmF: Fans Minus
                                                               PTAR: Action-Reaction
      behaviors                         Freaks
                                                               Propagation
      D: Malicious spies                SR: Spectral Ranking
                                                               PT: PolarityTrust
      E: Camouflage behind judgments    NR: Negative Ranking
PolarityTrust
75


        Evaluation
          Include   a set of sources of distrust

          In   Slashdot Zoo Dataset:
            Sources   of trust: CmdrTaco and friends

            Sources   of distrust: 5 random foes of No_More_Trolls


          Many  possible methods to choose the sources of
          distrust
PolarityTrust
76


          Evaluation
            Results:            Sources os trust and distrust
                 nDCG

                    Sources of Trust       Sources of Trust &
                                           Distrust
      Threats     PTNN     PTAR     PT     PTNN      PTAR       PT
           A      0.593 0.570 0.588         0.846    0.790      0.846
           AB     0.593 0.570 0.588         0. 846   0.790      0.846
         ABC      0.593 0.570 0.588         0.846    0.790      0.846
        ABCD      0.580 0.570 0.586         0.775    0.739      0.782
       ABCDE      0.580 0.574 0.588         0.774    0.741      0.781
     A: No estrategies                                               PTNN: Non-Negative Propagation
     B: Orchestrated attack          D: Malicious spies              PTAR: Action-Reaction
     C: Camouflage behind good       E: Camouflage behind            Propagation
     behaviors                       judgments                       PT: PolarityTrust
Roadmap
77
Conclusions
78


        Final Remarks

          Development of two systems for the detection of
          dishonest behaviors in on-line networks
            Web  Spam Detection: PolaritySpam
            Trust and Reputation: PolarityTrust


          Propagation   of some a-priori information
            Web  Spam: Textual content of the web pages
            Trust and Reputation: Trust and distrust sources sets
Conclusions
79


        Final Remarks

          Web   Spam Detection
            Unlike
                  existent approaches, include content-based
            knowledge into a link-based technique

            Unsupervised   methods for the selection of sources

            Propagate   information of the sources through the
            network

            Two   simple metrics improve state-of-the-art methods
Conclusions
80


        Final Remarks
          Trust   and Reputation in social networks
            Negative    links improve the discriminative ability of
            TRS’s
            Propagationestrategies to deal with different attacks
            against a TRS
                 Non-Negative propagation
                 Action-Reaction propagation

            Interrelated   scores modeling the transitivity of trust
            and distrust
            Flexible   to be adapted to different situations and
Conclusions
81


        Future Work

          PolaritySpam
           Applicability   of more content-based metrics

           Aditional   methods for the selection of sources
                Propagation ability of each source


           Infer   negative relations between web pages
                According to their textual content
                Apply similar propagation schemas as in PolarityTrust
Conclusions
82


        Future Work
          PolarityTrust
            Study   other possible attacks
                 Playbook sequences (omniscience of the attackers)
                 Analyze the casuistry of the different social networks

            Selection   of sources of trust and distrust
                 Link-based methods


            Study other contexts with positive and negative
            relations:
                 Trending topics
                 Authorities in the blogosphere
Conclusions
83


        Future Work
          Both   techniques
            Study   of the parallelization of both algorithms
                 Many works on the parallelization of PageRank
                 Saving time and memory


            Detection   of Spam on the social networks
                 Spam messages and spam user accounts

            Recommender       Systems
                 NLP and Opinion Mining techniques in a link-based
                  system
                 Use the positive and negative information
Curriculum Vitae
84


        Academic and Research milestones
          2006:    Degree on Computer Science

          2006:    Funded Student in the Itálica research
          group

          2008:   Master of Advances Studies:
            “STR:   A graph-based tagger generator”


          2010:    Research stay at the University of Glasgow
            IR   Group (Dr. Iadh Ounis and Dr. Craig Macdonald)
Curriculum Vitae
85


        26 contributions to conferences and journals
         5  JCR
          10 International Conferences

          2 CORE B

          4 CORE C

          4 ISI Proceedings

          3 Lecture Notes in Computer Sciences

          3 CiteSeer Venue Impact Ratings



        Proyectos de investigación
Curriculum Vitae
86


        Contributions related to the thesis
                                                        PolarityRank
         National Conf.
                                  TexRank for
         International                                                        PolarityTrus
                                  Tagging
         Conf.                                                                t
         JCR
                     System
                     Combinatio                         STR
                     n Methods                                                     PolaritySpa
                                                                                   m
                                                                       Web Spam
                                                                       Detection
                                     Improving a
                                     Tagger Generator
                                     in IE
Curriculum Vitae
87


        Contributions related to the thesis
         National Conf.
                                  TexRank for
         International
                                  Tagging
         Conf.
         JCR
                     System
                     Combinatio
                     n Methods



                                     Improving a
                                     Tagger Generator
                                     in IE
         TextRank como motor de aprendizaje en tareas de etiquetado, SEPLN
         2006
         Bootstrapping Applied to a Corpus Generation Task, EUROCAST 2007
         Improving the Performance of a Tagger Generator in an Information Extraction
         Application, Journal of Universal Computer Science (2007)
Curriculum Vitae
88


        Contributions related to the thesis
         National Conf.
         International
         Conf.
         JCR

                                                 STR




         STR: A Graph-based Tagging Technique, International Journal on
         Artificial Intelligence Tools (2011)
Curriculum Vitae
89


        Contributions related to the thesis
                                                PolarityRank
         National Conf.
         International
         Conf.
         JCR




                                                               Web Spam
                                                               Detection



         A Knowledge-Rich Approach to Featured-based Opinion Extraction from
         Product Reviews, SMUC 2010 (CIKM 2010)

         Combining Textual Content and Hyperlinks in Web Spam Detection, NLDB
         2010
Curriculum Vitae
90


          Contributions related to the thesis
           National Conf.
           International                                             PolarityTrus
           Conf.                                                     t
           JCR


                                                                          PolaritySpa
                                                                          m




     PolarityTrust: Measuring Trust and Reputation in Social Networks, ITA 2011

     PolaritySpam: Propagating Content-based Information Through a Web Graph to
     Detect Web Spam, International Journal of Innovative Computing, Information and
     Control (2012)
DETECTION OF DISHONEST
BEHAVIORS IN ON-LINE
NETWORKS USING GRAPH-BASED
RANKING TECHNIQUES
Francisco Javier Ortega Rodríguez

Supervised by
Prof. Dr. José Antonio Troyano Jiménez

Weitere ähnliche Inhalte

Andere mochten auch

Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Christopher Polak
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyZeroNet-Energy-Solutions
 
independent group
independent groupindependent group
independent groupcartonmo
 
Microbes and technology f.1.
Microbes and technology f.1.Microbes and technology f.1.
Microbes and technology f.1.linamontero
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?Teila123
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]43705656K
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memorymarco_fro19
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Pilii Ise Gelsi
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estradacarloslenin19
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011IFSW
 
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...J+E Creative
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostravajirikomar
 
Hileman Group: Marketing Automation Matters
Hileman Group: Marketing Automation MattersHileman Group: Marketing Automation Matters
Hileman Group: Marketing Automation MattersKyle Chandler
 
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)Javier Ortega
 

Andere mochten auch (19)

Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011Salem Area Market Statistics Q1 2011
Salem Area Market Statistics Q1 2011
 
Real Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy TechnologyReal Estate Impacts of Alternative Energy Technology
Real Estate Impacts of Alternative Energy Technology
 
independent group
independent groupindependent group
independent group
 
Microbes and technology f.1.
Microbes and technology f.1.Microbes and technology f.1.
Microbes and technology f.1.
 
TP 13
TP 13TP 13
TP 13
 
UI Prototype
UI PrototypeUI Prototype
UI Prototype
 
What do We Know about Drag Kings?
What do We Know about Drag Kings?What do We Know about Drag Kings?
What do We Know about Drag Kings?
 
Power guineu 1[1]
Power guineu 1[1]Power guineu 1[1]
Power guineu 1[1]
 
What is your earliest memory
What is your earliest memoryWhat is your earliest memory
What is your earliest memory
 
Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01Victoriamolinatp1 110601071455-phpapp01
Victoriamolinatp1 110601071455-phpapp01
 
Carlos lenin estrada
Carlos lenin estradaCarlos lenin estrada
Carlos lenin estrada
 
28th Social Work Day at the United Nations 2011
28th Social Work Day at the  United Nations 201128th Social Work Day at the  United Nations 2011
28th Social Work Day at the United Nations 2011
 
Dani h
Dani hDani h
Dani h
 
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...
How to get sh*t done or, Practical advice—not vague bullshit—about pursuing y...
 
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 OstravaUser eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
User eXitus - Nenechte sve navstevniky odchazet BarCamp 2011 Ostrava
 
Hileman Group: Marketing Automation Matters
Hileman Group: Marketing Automation MattersHileman Group: Marketing Automation Matters
Hileman Group: Marketing Automation Matters
 
Slide
SlideSlide
Slide
 
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
Spam Detection with a Content-based Random-walk Algorithm (SMUC'2010)
 
Web Style Guide
Web Style GuideWeb Style Guide
Web Style Guide
 

Ähnlich wie PhD Thesis presentation

Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceIdentifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceSajjad "JJ" Arshad
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure miningAtul Khanna
 
Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConmattthemathman
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_pptManant Sweet
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesGianluca Stringhini
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsIOSRjournaljce
 
Done rerea dlink spam alliances good
Done rerea dlink spam alliances goodDone rerea dlink spam alliances good
Done rerea dlink spam alliances goodJames Arnold
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Sajjad "JJ" Arshad
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishilaAshwin Palani
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYIJNSA Journal
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESSubhajit Sahu
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learningijtsrd
 
3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender systemIaetsd Iaetsd
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Denis Shestakov
 

Ähnlich wie PhD Thesis presentation (20)

Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content ProvenanceIdentifying Extension-based Ad Injection via Fine-grained Web Content Provenance
Identifying Extension-based Ad Injection via Fine-grained Web Content Provenance
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Algorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozConAlgorithmic Web Spam detection - Matt Peters MozCon
Algorithmic Web Spam detection - Matt Peters MozCon
 
Web mining
Web miningWeb mining
Web mining
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
TrustRank.PDF
TrustRank.PDFTrustRank.PDF
TrustRank.PDF
 
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web PagesShady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages
 
Information filtering
Information filteringInformation filtering
Information filtering
 
State of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLsState of the Art Analysis Approach for Identification of the Malignant URLs
State of the Art Analysis Approach for Identification of the Malignant URLs
 
Done rerea dlink spam alliances good
Done rerea dlink spam alliances goodDone rerea dlink spam alliances good
Done rerea dlink spam alliances good
 
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
Understanding and Mitigating the Security Risks of Content Inclusion in Web B...
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMYA SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
A SURVEY ON WEB SPAM DETECTION METHODS: TAXONOMY
 
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTESA Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
 
ppt presentation
ppt presentationppt presentation
ppt presentation
 
Web mining
Web miningWeb mining
Web mining
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
 
Nindi ppt
Nindi pptNindi ppt
Nindi ppt
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

PhD Thesis presentation

  • 1. DETECTION OF DISHONEST BEHAVIORS IN ON-LINE NETWORKS USING GRAPH-BASED RANKING TECHNIQUES Francisco Javier Ortega Rodríguez Supervised by Prof. Dr. José Antonio Troyano Jiménez
  • 3. Motivation 3  WWW: Web Search A new business model  Advertisements on the web pages  More web traffic  More visits to (or views of) the ads  Search Engine Optimization (SEO) is born  White Hat SEO  Black Hat SEO Web Spam!
  • 4. Motivation 4  Social Networks  Reputation of users similar to relevance of web pages  Higher reputation can imply some benefits  Malicious users manipulate the TRS’s  On-line marketplaces: money  Social news sites: slant the contents of the web site  Simply for “trolling” (for pleasure)
  • 6. Motivation 6  Hypothesis The detection of dishonest behaviors in on- line networks can be carried out with graph- based techniques, flexible enough to include in their schemes specific information (in the form of features of the elements in a graph) about the network to be processed and the concrete task to be solved.
  • 8. Web Spam Detection 8  Web spam mechanisms try to increase the web traffic to specific web sites  Reach the top positions of a web search engine  Relatedness: similarity to the user query  Changing the content of the web page  Visibility: relevance in the collection  Getting a high number of references
  • 9. Web Spam Detection 9  Content-based methods: self promotion  HiddenHTML code  Keyword stuffing
  • 10. Web Spam Detection 10  Link-based methods: mutual promotion  Link-farms  PR-sculpting
  • 12. Web Spam Detection 12  Relevant web spam detection methods:  Link-based approaches  PageRank-based  Adaptations:  Truncated PageRank [Castillo et al. 2007]  TrustRank [Gyongy et al. 2004]
  • 13. Web Spam Detection 13  Relevant web spam detection methods:  Link-based approaches  Pros:  Tackle the link-based spam methods  The ranking can be directly used as the result of an user query  Cons:  Do not take into account the content of the web pages  Need human intervention in some specific parts
  • 14. Web Spam Detection 14  Relevant web spam detection methods:  Content-based approaches Database WP’s Size Compressibilit Avg. word lenght .. y . 1 … … … … 2 … … … … … … … … … Classifie r Spam Not Spam
  • 15. Web Spam Detection 15  Relevant web spam detection methods:  Content-based approaches  Pros:  Deal with content-based spam methods  Binary classification methods  Cons:  Very slow in comparison to the link-based methods  Based on user-specified features  Do not take into account the topology of the web graph
  • 16. Web Spam Detection 16  Relevant web spam detection methods:  Hybrid approaches Database WP’s Size % Compressibilit Out-links Avg. word .. In- y / In-links lenght . links 1 … … … … … … 2 … … … … … … … … … … … … … Link-based metrics
  • 17. Web Spam Detection 17  Relevant web spam detection methods:  Hybrid approaches  Pros:  Combine the pros of link and content-based methods.  Really effective in the classification of web pages  Cons:  Need user-specified features for both the content and the link-based heuristics.  Opportunity:  Do not take advantage of the global topology of the web graph
  • 19. PolaritySpam 19  Intuition  Include content-based knowledge in a link-based system. Content Propagatio Ranking Evaluation n algorithm Databas e Selection of sources
  • 20. PolaritySpam 20  Content Evaluation Content Evaluation Databas e
  • 21. PolaritySpam 21  Content Evaluation  Acquire useful knowledge from the textual content  Content-based heuristics  Adequate for spam detection  Easy to compute  Highest discriminative ability  A-priori spam likelihood of a web page
  • 22. PolaritySpam 22  Content Evaluation  Small set of heuristics [Ntoulas et al., 2006]  Compressibility  Average length of words A high value of the metrics implies an a-priori high spam likelihood of a web page
  • 23. PolaritySpam 23  Selection of Sources Databas e Selection of sources
  • 24. PolaritySpam 24  Selection of Sources  Automatically pick a set of a-priori spam and not- spam web pages, Sources- and Sources+, respectively  Take into account the content-basedmi1 , mi 2 ,..., mij } M i { heuristics  Given a web page wpi with metrics:
  • 25. PolaritySpam 25  Selection of Sources  Most Spamy/Not-Spamy sources (S-NS) Sources  Content-based S-NS (CS-NS) Sources  Content-based Graph Sources (C-GS)
  • 26. PolaritySpam 26  Propagation algorithm Propagatio Ranking n algorithm
  • 27. PolaritySpam 27  Propagation algorithm:  PageRank-based algorithm  Idea: propagate a-priori information from a specific set of web pages, Sources  A-priori scores for the Sources ei ¹ 0Ûwpi Î Sources
  • 28. PolaritySpam 28  Propagation algorithm:  Two scores for each web page, vi: ei+ ¹ 0 Û wpi Î Sources+ Set of a-priori non-spam web pages ei- ¹ 0 Û wpi Î Sources- Set of a-priori spam web pages
  • 29. PolaritySpam 29 Propagatio Ranking n algorithm
  • 30. PolaritySpam 30  Evaluation:  Dataset  Baseline  Evaluation Methods  Results
  • 31. PolaritySpam 31  Evaluation:  Dataset  WEBSPAM-UK 2006 (Università degli Studi di Milano)  Metrics:  98 million pages  11,400 hosts manually labeled  7,423 hosts are labeled as spam  About 10 million web pages are labeled as spam  Processed with Terrier IR Platform  http://terrier.org
  • 32. PolaritySpam 32  Evaluation:  Baseline: TrustRank [Gyongy et al., 2004]  Link-based web spam detection method  Personalized PageRank equation  Propagation from a set of hand-picked web pages
  • 33. PolaritySpam 33  Evaluation:  Evaluation methods: PR-Buckets … Bucket 1 Bucket 2 Bucket N
  • 34. PolaritySpam 34  Evaluation:  Evaluation methods: PR-Buckets  Evaluation metric: number of spam web pages in each bucket … Bucket 1 Bucket 2 Bucket N
  • 35. PolaritySpam 35  Evaluation:  Evaluation methods: PR-Buckets  Evaluation metric: number of spam web pages in each bucket … Bucket 1 Bucket 2 Bucket N
  • 36. PolaritySpam 36  Evaluation:  Normalized Discounted Cumulative Gain (nDCG):  Global metric: measures the demotion of spam web pages  Sum the “relevance” scores of not-spam web pages
  • 37. PolaritySpam 37  Evaluation:  Normalized Discounted Cumulative Gain (nDCG):  Global metric: measures the demotion of spam web pages  Sum the “relevance” scores of not-spam web pages
  • 38. PolaritySpam 38  Evaluation:  Normalized Discounted Cumulative Gain (nDCG):  Global metric: measures the demotion of spam web pages  Sum the “relevance” scores of not-spam web pages
  • 39. PolaritySpam 39  Evaluation:  PR-Buckets evaluation 1000 Number of Spam Web Pages 100 10 1 1 2 3 4 5 6 7 8 9 10 Buckets TrustRank S-NS CS-NS C-GS
  • 40. PolaritySpam 40  Evaluation:  nDCG evaluation nDCG TrustRan 0.7381 k S-NS 0.4230 CS-NS 0.8621 C-GS 0.8648
  • 41. PolaritySpam 41  Evaluation:  Content-based heuristics 1000 Number of Spam Web Pages 100 10 1 1 2 3 4 5 6 7 8 9 10 Buckets AverageLength Compressibility AllMetrics TrustRank PolaritySpam
  • 43. Trust & Reputation in Social Networks 43  Trust and reputation are key concepts in social networks  Similar to the relevance of web pages in the WWW  Reputation: assessment of the trustworthiness of a user in a social network, according to his behavior and the opinions of the other users.
  • 44. Trust & Reputation in Social Networks 44  Example: On-line marketplaces  Trustworthiness as determining as the price  Higher reputation implies more sales  Positive and negative opinions
  • 45. Trust & Reputation in Social Networks 45  Main goal: gain high reputation  Obtain positive feedbacks from the customers  Sell some bargains  Special offers  Give negative opinions for sellers that can be competitors.  Obtain false positive opinions from other accounts (not necessarily other users). Dishonest behaviors!
  • 47. Trust & Reputation in Social Networks 47  TRS’s in real world  Moderators  Special set of users with specific responsibilities  Example: Slashdot.org  A hierarchy of moderators  A special user, No_More_Trolls, maintains a list of known trolls  Drawbacks:  Scalability  Subjectivity
  • 48. Trust & Reputation in Social Networks 48  TRS’s in real world  Unsupervised TRS’s  Users rate the contents of the system (and also other users)  Scalability problem: rely on the users  Subjectivity problem: decentralized  Examples: Digg.com, eBay.com  Drawbacks:  Unsupervised!
  • 49. Trust & Reputation in Social Networks 49  Transivity of Trust and Distrust [Guha et al., 2004]  Multiplicative distrust  The enemy of my enemy is my friend  Additive distrust  Don’t trust someone not trusted by someone you don’t trust  Neutral distrust  Don’t take into account your enemies’ opinions
  • 50. Trust & Reputation in Social Networks 50  Threats of TRS’s Orchestrated attacks Camouflage behind good behavior Malicious Spies Camouflage behind judgments
  • 51. Trust & Reputation in Social Networks 51  Threats of TRS’s  Orchestrated attacks: obtaining positive opinions from other accounts (not necessarily other users) 6 1 7 2 0 3 9 5 8 4
  • 52. Trust & Reputation in Social Networks 52  Threats of TRS’s  Camouflage behind good behavior: feigning good behavior in order to obtain positive feedback from others. 6 1 7 2 0 3 9 5 8 4
  • 53. Trust & Reputation in Social Networks 53  Threats of TRS’s  Malicious spies: using an “honest” account to provide positive opinions to malicious users. 6 1 7 2 0 3 9 8 4 5
  • 54. Trust & Reputation in Social Networks 54  Threats of TRS’s  Camouflage behind judgments: giving negative feedback to users who can be competitors. 6 1 7 2 0 3 9 5 8 4
  • 56. PolarityTrust 56  Intuition  Compute a ranking of the users in a social network according to their trustworthiness  Take into account both positive and negative feedback  Graph-based ranking algorithm to obtain two scores ifor each node: PT (v )  PT (vi ) : positive reputation of user i  : negative reputation of user i
  • 57. PolarityTrust 57  Intuition  Propagation algorithm for the opinions of the users  Given a set of trustworthy users  Their PT+ and PT- scores are propagated to their neighbors  … and so on 6 1 7 2 0 3 9 4 5 8
  • 58. PolarityTrust 58  Algorithm  Propagation schema of the opinions of the users  Different behavior depending on the type of relation between the users: positive or negative PT ⁺ (b) ↑ ⁻ PT (e) ↑ b e a d ⁺ PT (a) c ⁻ PT (d) f ⁻ PT (c) ↑ ⁺ PT (f) ↑
  • 59. PolarityTrust 59  Algorithm  The scores of the nodes influence the scores of their neighbors PT (vi )
  • 60. PolarityTrust 60  Algorithm  The scores of the nodes influence the scores of their neighbors PT (vi ) (1 d )ei d Set of sources
  • 61. PolarityTrust 61  Algorithm  The scores of the nodes influence the scores of their neighbors pij PT (vi ) (1 d )ei d PT (v j ) j In ( i ) | p jk | k Out ( v j ) Direct relation with the PT+ of positively voting users
  • 62. PolarityTrust 62  Algorithm  The scores of the nodes influence the scores of their neighbors pij pij PT (vi ) (1 d )ei d PT (v j ) PT (v j ) j In ( i ) | p jk | j In ( i ) | p jk | k Out ( v j ) k Out ( v j ) Inverse relation with the PT- of negatively voting users
  • 63. PolarityTrust 63  Algorithm  The scores of the nodes influence the scores of their neighbors pij pij PT (vi ) (1 d )ei d PT (v j ) PT (v j ) j In ( i ) | p jk | j In ( i ) | p jk | k Out ( v j ) k Out ( v j ) pij pij PT (vi ) (1 d )ei d PT (v j ) PT (v j ) j In ( i ) | p jk | j In ( i ) | p jk | k Out ( v j ) k Out ( v j )
  • 64. PolarityTrust 64  Non-Negative Propagation  Problems caused by negative opinions from malicious users  Solution: dynamically avoid the propagation of these opinions from malicious users PR⁻(b) ↑ b a ⁻ PR (a) c ⁺ PR (c) ↑
  • 65. PolarityTrust 65  Action-Reaction Propagation  Problems caused by dishonest voting attacks  Positive votes to malicious users  Orchestrated attacks, malicious spies…  Negative votes to good users  Camouflage behind bad judgments  React against bad actions: dishonest voting  Penalize users who performs these actions  Proportional to the trustworthiness of the nodes been affected
  • 66. PolarityTrust 66  Action-Reaction Propagation  Computation:  Relation between the number of dishonest votes and the total number of votes  Applied after each iteration of the ranking algorithm b a d c
  • 67. PolarityTrust 67  Complete Formulation
  • 68. PolarityTrust 68  Evaluation  Datasets  Baselines  Results
  • 69. PolarityTrust 69  Evaluation  Datasets  Barabasi & Albert  Preferential attachment property  Randomly generated attacks  Metrics of the dataset  104nodes per graph  103 malicious users  100 malicious spies
  • 70. PolarityTrust 70  Evaluation  Datasets  Slashdot Zoo  Graph of users in Slashdot.org  Friend and Foe relationships  Gold Standard = list of Foes of the special user No_More_Trolls  Metrics of the dataset  71,500 users in total  24% negative edges  96 known trolls  Source set: CmdrTaco and his friends  6 users in total
  • 71. PolarityTrust 71  Evaluation  Baselines  EigenTrust [Kamvar et al. 2003]  It does not take into account negative opinions  Fans Minus Freaks  (Number of friends – Number of foes)  Signed Spectral Ranking [Kunegis et al. 2009]  Negative Ranking [Kunegis et al. 2009]
  • 72. PolarityTrust 72  Evaluation  Results: Randomly generated datasets  nDCG Threats ET FmF SR NR PTNN PTAR PT A 0.833 0.843 0.599 0.749 0.876 0.906 0.987 AB 0.833 0.844 0.811 0.920 0.876 0.906 0.987 ABC 0.842 0.719 0.816 0.920 0.877 0.903 0.984 ABCD 0.823 0.723 0.818 0.937 0.879 0.903 0.984 ABCDE 0.753 0.777 0.877 0.933 0.966 0.862 0.982 A: No estrategies B: Orchestrated attack ET: EigenTrust PTNN: Non-Negative Propagation C: Camouflage behind good FmF: Fans Minus PTAR: Action-Reaction behaviors Freaks Propagation D: Malicious spies SR: Spectral Ranking PT: PolarityTrust E: Camouflage behind judgments NR: Negative Ranking
  • 73. PolarityTrust 73  Evaluation  Results: Slashdot Zoo dataset ET FmF SR NR PTNN PTAR PT nDCG 0.31 0.460 0.479 0.477 0.593 0.570 0.588 0 ET: EigenTrust PTNN: Non-Negative Propagation FmF: Fans Minus PTAR: Action-Reaction Freaks Propagation SR: Spectral Ranking PT: PolarityTrust NR: Negative Ranking
  • 74. PolarityTrust 74  Evaluation  Results: Trolling Slashdot  nDCG Threats ET FmF SR NR PTNN PTAR PT A 0.310 0.460 0.479 0.477 0.593 0.570 0.588 AB 0.308 0.460 0.478 0.477 0.593 0.570 0.588 ABC 0.311 0.460 0.474 0.484 0.593 0.570 0.588 ABCD 0.370 0.476 0.501 0.501 0.580 0.570 0.586 ABCDE 0.370 0.475 0.501 0.496 0.580 0.574 0.588 A: No estrategies B: Orchestrated attack ET: EigenTrust PTNN: Non-Negative Propagation C: Camouflage behind good FmF: Fans Minus PTAR: Action-Reaction behaviors Freaks Propagation D: Malicious spies SR: Spectral Ranking PT: PolarityTrust E: Camouflage behind judgments NR: Negative Ranking
  • 75. PolarityTrust 75  Evaluation  Include a set of sources of distrust  In Slashdot Zoo Dataset:  Sources of trust: CmdrTaco and friends  Sources of distrust: 5 random foes of No_More_Trolls  Many possible methods to choose the sources of distrust
  • 76. PolarityTrust 76  Evaluation  Results: Sources os trust and distrust  nDCG Sources of Trust Sources of Trust & Distrust Threats PTNN PTAR PT PTNN PTAR PT A 0.593 0.570 0.588 0.846 0.790 0.846 AB 0.593 0.570 0.588 0. 846 0.790 0.846 ABC 0.593 0.570 0.588 0.846 0.790 0.846 ABCD 0.580 0.570 0.586 0.775 0.739 0.782 ABCDE 0.580 0.574 0.588 0.774 0.741 0.781 A: No estrategies PTNN: Non-Negative Propagation B: Orchestrated attack D: Malicious spies PTAR: Action-Reaction C: Camouflage behind good E: Camouflage behind Propagation behaviors judgments PT: PolarityTrust
  • 78. Conclusions 78  Final Remarks  Development of two systems for the detection of dishonest behaviors in on-line networks  Web Spam Detection: PolaritySpam  Trust and Reputation: PolarityTrust  Propagation of some a-priori information  Web Spam: Textual content of the web pages  Trust and Reputation: Trust and distrust sources sets
  • 79. Conclusions 79  Final Remarks  Web Spam Detection  Unlike existent approaches, include content-based knowledge into a link-based technique  Unsupervised methods for the selection of sources  Propagate information of the sources through the network  Two simple metrics improve state-of-the-art methods
  • 80. Conclusions 80  Final Remarks  Trust and Reputation in social networks  Negative links improve the discriminative ability of TRS’s  Propagationestrategies to deal with different attacks against a TRS  Non-Negative propagation  Action-Reaction propagation  Interrelated scores modeling the transitivity of trust and distrust  Flexible to be adapted to different situations and
  • 81. Conclusions 81  Future Work  PolaritySpam  Applicability of more content-based metrics  Aditional methods for the selection of sources  Propagation ability of each source  Infer negative relations between web pages  According to their textual content  Apply similar propagation schemas as in PolarityTrust
  • 82. Conclusions 82  Future Work  PolarityTrust  Study other possible attacks  Playbook sequences (omniscience of the attackers)  Analyze the casuistry of the different social networks  Selection of sources of trust and distrust  Link-based methods  Study other contexts with positive and negative relations:  Trending topics  Authorities in the blogosphere
  • 83. Conclusions 83  Future Work  Both techniques  Study of the parallelization of both algorithms  Many works on the parallelization of PageRank  Saving time and memory  Detection of Spam on the social networks  Spam messages and spam user accounts  Recommender Systems  NLP and Opinion Mining techniques in a link-based system  Use the positive and negative information
  • 84. Curriculum Vitae 84  Academic and Research milestones  2006: Degree on Computer Science  2006: Funded Student in the Itálica research group  2008: Master of Advances Studies:  “STR: A graph-based tagger generator”  2010: Research stay at the University of Glasgow  IR Group (Dr. Iadh Ounis and Dr. Craig Macdonald)
  • 85. Curriculum Vitae 85  26 contributions to conferences and journals 5 JCR  10 International Conferences  2 CORE B  4 CORE C  4 ISI Proceedings  3 Lecture Notes in Computer Sciences  3 CiteSeer Venue Impact Ratings  Proyectos de investigación
  • 86. Curriculum Vitae 86  Contributions related to the thesis PolarityRank National Conf. TexRank for International PolarityTrus Tagging Conf. t JCR System Combinatio STR n Methods PolaritySpa m Web Spam Detection Improving a Tagger Generator in IE
  • 87. Curriculum Vitae 87  Contributions related to the thesis National Conf. TexRank for International Tagging Conf. JCR System Combinatio n Methods Improving a Tagger Generator in IE TextRank como motor de aprendizaje en tareas de etiquetado, SEPLN 2006 Bootstrapping Applied to a Corpus Generation Task, EUROCAST 2007 Improving the Performance of a Tagger Generator in an Information Extraction Application, Journal of Universal Computer Science (2007)
  • 88. Curriculum Vitae 88  Contributions related to the thesis National Conf. International Conf. JCR STR STR: A Graph-based Tagging Technique, International Journal on Artificial Intelligence Tools (2011)
  • 89. Curriculum Vitae 89  Contributions related to the thesis PolarityRank National Conf. International Conf. JCR Web Spam Detection A Knowledge-Rich Approach to Featured-based Opinion Extraction from Product Reviews, SMUC 2010 (CIKM 2010) Combining Textual Content and Hyperlinks in Web Spam Detection, NLDB 2010
  • 90. Curriculum Vitae 90  Contributions related to the thesis National Conf. International PolarityTrus Conf. t JCR PolaritySpa m PolarityTrust: Measuring Trust and Reputation in Social Networks, ITA 2011 PolaritySpam: Propagating Content-based Information Through a Web Graph to Detect Web Spam, International Journal of Innovative Computing, Information and Control (2012)
  • 91. DETECTION OF DISHONEST BEHAVIORS IN ON-LINE NETWORKS USING GRAPH-BASED RANKING TECHNIQUES Francisco Javier Ortega Rodríguez Supervised by Prof. Dr. José Antonio Troyano Jiménez