SlideShare ist ein Scribd-Unternehmen logo
1 von 104
Downloaden Sie, um offline zu lesen
Web Science & Technologies
                           University of Koblenz ▪ Landau, Germany



          Managing Social Communities


                    Steffen Staab

Acknowledgements to ROBUST Project team & WEST Team,
                         in particular
         K. Dellschaft, J. Kunegis, F. Schwagereit
Institut WeST – Web Science & Technologies




Semantic Web     Web Retrieval        Interactive Web      Multimedia Web       Software Web




  eGovernment             eMedia            eScience         eOrganizations        ePerson


      Institute for Computer                 Institute for                Leibniz Institute for
              Science                   Information Systems             Social Sciences (GESIS)
                     Steffen Staab               Web Science Doctoral
                     staab@uni-koblenz.de         Summer School 2
Plan for this Talk




                             1 Web

                        2 Science

               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 3
Social Communities

        …are everywhere




                                          c




           Steffen Staab          Web Science Doctoral
           staab@uni-koblenz.de    Summer School 4
Risks                                Opportunities
            Bad content quality,                     Open innovation,
            social ill behavior,…                     improved user support,…
             jeopardize business value                increase business value




        Data Storage                                              Content, User &
      and Processing                                              Networks Analysis
Scalability, heterogeneity                                        Understanding,
                                                                         response time




                                  Business Value
Product support & innovation, CRM, Expertise management, Marketing, Advertising
            Online Communities                  Intranet, Extranet, Internet
                    Steffen Staab          Web Science Doctoral
                    staab@uni-koblenz.de    Summer School 5
Large-scale Testbeds

2013                                                                2013
5M users                                                            millions posts/day
1200K accesses/day                                                  1TB data/day
           SAP (B2B)                                       Polecat (C2C)
       Community Network
                                                             Online Marketing
      Business Partner Network
             CRM for IT
 2009
                                                                    2009
 1.5M users
                                                                    …
 150K access/day



                                    IBM (E2E)
                                 Developer Network
        2009                                               2013
                                 Corporate Knowledge
        99K accounts                                       800K accounts
                                     Management

                  Steffen Staab              Web Science Doctoral
 2                staab@uni-koblenz.de        Summer School 6
SAP Business Partner Use Case

SAP Developer Network


                                  Size of user generated
          Posts per day                                               Number of users
                                      content (posts)
       2007   2009     2013        2007       2009     2013        2007   2009     2013

 SAP   5000   6000     7000            1M     4M        10.0       1M     1.7M    4.8M
                                                         M




                Steffen Staab               Web Science Doctoral
                staab@uni-koblenz.de         Summer School 7
ROBUST: IBM Employee Use Case
Business Data                    Created per day                           Number of users

                          2007        2009         2013            2007         2009         2013

IBM Activities Entry      700         2750         5000            53200       143600    200000

IBM Blogs Entries         120          30           60             34600       77750     100000

IBM Communities             3          23           50             3000        181950    250000

IBM Bookmarks             800         900          1000            8500        22400         50000

IBM Wikis                 NA           40          100              NA         35450     100000

IBM Files                 NA          290          1000             NA         45160     100000
IBM Overall               1623        4033         7210          500000*      500000*    500000*




                       Steffen Staab                Web Science Doctoral
                       staab@uni-koblenz.de          Summer School 8
Risks in Online Communities
Definition: Risk                                                      Likelihood
       Probability of an event occurring
       Impact of the event occurring

Risk management                                                Cost             Benefit
       Process for managing costs, benefits and likelyhoods
       Detect high impact risks in time even if
        they generate expensive false alarms SAP: SCN Award Points Scamming
       Ignore very low impact risks             • Experts reputation decreases
        even if they can be reliably detected    • Business users leave the forum

Types of risks
         Non-compliance with the community policies/polity
         Scamming or spamming behavior
         Lower involvement and productivity
         Decrease of user satisfaction
         Loss of community dynamics
                                                 Web: Public communities
                                                 • Death of TechCrunch forum due to
 Loss of 1% experts  loss of high revenue
                                                 spam and lack of management
 Loss of 10% lurkers  low impact
                      Steffen Staab          Web Science Doctoral
  8                   staab@uni-koblenz.de    Summer School 9
Communities: dynamics and confidentiality

ROBUST supports decision making for users, hosts and service providers
Managing growth & decline
      Identify, encourage, safeguard core users
      Social matching
      Define/maintain etiquette and policies
      Manage negative behavior and conflicts
      Content matching
      Recognize, categorize decline and growth
      Redirect users to other communities
Merging communities
    Cross community topic detection to stimulate inter-community interactions
Splitting communities
    Identification of clusters/compartments of members that can be separate




                     Steffen Staab          Web Science Doctoral
                     staab@uni-koblenz.de    Summer School 10
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Many related Talks in this Summer School


Robust partners                      Closely related
Alani: Monitoring and analysis       Greene: Network Analysis
of social networks                   Bernstein: Scalable
Karnstedt: User churn                infrastructures


But here comes the biased account from work in our institute

              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 11
Plan for this Talk




                             1 Web

                        2 Science

               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 12
Bild eines schwarzen Lochs




             Steffen Staab                    Web Science Doctoral
             staab@uni-koblenz.de   Flickr   cc, Jan 7 2009 by
                                               Summer School 13      thebadastronomer
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Web Science Methodology:
  An explanation by analogy with Physics
  and some initial (!) applications to online communities
   • Modeling dynamic system at micro level,
     Understanding collective effects (macro level) arising
     from individual behavior (micro level)
   • Predicting dynamic system behavior,
     recognizing behavior deviating from the model
   • Modeling dynamic system behavior at the macro level
   • Controling dynamic system behavior by collective action
              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 14
Better understanding of the tagging process
     Cooperative classification of resources
     Which factors influence the tagging process?
       • Background knowledge of the user?
       • Tag assignments of other users?




   Hypothesis: Tagging involves imitation of other users AND
    selection of tags from background knowledge of users.
                Steffen Staab          Web Science Doctoral
                staab@uni-koblenz.de    Summer School 15
Methodology
         User interface          Something else?


                                                              Tagging
                 Conceptualization                            Behavior




                                                                Comparison
                                                                of Statistics
            Own                         Shared
          Knowledge                  terminology


   Model of User Interface Influence

                                                              Simulated
               Joint Stochastic Model                          Tagging
                                                              Behavior


       Model of Own                    Model of
        Knowledge                      Sharing
              Steffen Staab            Web Science Doctoral
              staab@uni-koblenz.de      Summer School 16
Components of Analysis

Properties of Tag Streams
                                                            Observations
    Stream view of Folksonomies                                  in
    Co-occurrence streams                                  the real world
    Resource streams

Dynamic model for Tagging Systems                            Stochastic
   Simulating background knowledge                          models of
   Simulating tag imitation                                 influence

Simulation Results                                          Which models
    Co-occurrence streams                                   best fit the
    Resource streams                                         reality?

              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 17
Stream Views of a Folksonomy

Folksonomies:
    Vertices: Users, tags, resources
    Edges: Tag assignments
    Postings:
      • Tag assignments of a user to a single resource
      • Can be ordered according to their time-stamp




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 18
Co-occurrence Streams

Co-occurrence Streams:
   All tags co-occurring with a given tag in a posting
   Ordered by posting time
Co-occurrence stream for 'apple':
   {mackz, r1, {apple, tree}, 13:25}
    {klaasd, r2, {apple, mac, ibook}, 13:26}
    {mackz, r2, {apple, macintosh, stevejobs}, 13:27}
   tree, mac, ibook, macintosh, stevejobs

           Tag          |Y|                |U|              |T|           |R|
           ajax     2.949.614            88.526          41.898         71.525
           blog     6.098.471            158.578        186.043         557.017
           xml       974.866             44.326          31.998         61.843




                  Steffen Staab                  Web Science Doctoral
                  staab@uni-koblenz.de            Summer School 19
Properties of Co-occurrence Streams – Tag Growth


                                          linear
                                         growth




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 20
Properties of Co-occurrence Streams – Tag Frequencies

                          power law




             Steffen Staab            Web Science Doctoral
             staab@uni-koblenz.de      Summer School 21
Resource Streams

Resource Streams:
   All tags assigned to a resource
   Ordered by posting time
Resource stream for 'r2':
   {mackz, r1, {apple, tree}, 13:25}
    {klaasd, r2, {apple, mac, ibook}, 13:26}
    {mackz, r2, {apple, macintosh, stevejobs}, 13:27}
   apple, mac, ibook, apple, macintosh, stevejobs




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 22
Properties of Resource Streams – Tag Frequencies




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 23
Properties of Resource Streams – Tag Frequencies




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 24
Web Science & Technologies
                    University of Koblenz ▪ Landau, Germany




Simulating the Evolution of Tag Streams
Simulating tag streams
        Which of my concepts
                                     Inspiration for conceptualization from:
         represent this web
         page? How do I tag                                 1. Most popular tags
           this web page?
                                                   2. Most recently used tags
                                               3. Tags used for this resource
                                        4. Tags co-occuring with similar text
                                                                 documents
                                            5. Creating completely new tags
                                                                          6. …



                                                 Which combination of
                                                 inspirations develop the
                                                 same statistics as the
                                                 one observed for
                                                 delicious?

              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 26
The Delicious User Interface




Imitating previous tag assignments:
    Recommended tags: Intersection of tags of a user and tags already
     assigned to the resource.
    Your tags: Tags of the user.
    Popular tags: 7 most popular tags assigned to the resource.

                Steffen Staab          Web Science Doctoral
                staab@uni-koblenz.de    Summer School 27
Simulating a Tag Stream

Start with empty tag stream
Each simulation step appends a new tag assignment
Simulation of a single tag assignment:

                     p(w|t): Probability of selecting word w for topic t.
                     Modeled by word distributions in a topic centered
                     text corpus.




                    n: Number of visible previous tags.
                    h: Maximal number of previous tag assignments
                    used for determining ranking of the n distinct tags.


             Steffen Staab             Web Science Doctoral
             staab@uni-koblenz.de       Summer School 28
Modeling Background Knowledge




   Text Corpora                              Del.icio.us
                       Text Corpora



PBK: Probability of selecting from background knowledge
    p(w|t): Probability of selecting word w for topic t. Modeled by word
     distributions in a topic centered text corpus.
    p(w|r): Probability of selecting word w for resource r.
                  Steffen Staab          Web Science Doctoral
                  staab@uni-koblenz.de    Summer School 29
Modeling Tag Imitation




              PBK                   t      t-1   t-2   t-3   t-4   t-5    …   t-h   …



                                        1-PBK
                                1   2    3 … n



PI = 1 – PBK: Probability of imitating a previous tag assignment
     n: Number of visible top-ranked tags
     h: Maximal number of previous tag assignments used for determining
       ranking of the n distinct tags


                    Steffen Staab                  Web Science Doctoral
                    staab@uni-koblenz.de            Summer School 30
Web Science & Technologies
         University of Koblenz ▪ Landau, Germany




Simulation Results
Overall Scheme
          User interface              Something else?


                                                                  Tagging
                   Conceptualization                              Behavior




                                                                    Comparison
                                                                    of Statistics
            Own                            Shared
          Knowledge                      terminology


    Model of User Interface Influence

                                                                  Simulated
                 Joint Stochastic Model                            Tagging
                                                                  Behavior


   Model of Own Knowledge              Model of Sharing

               Steffen Staab               Web Science Doctoral
               staab@uni-koblenz.de         Summer School 32
Simulating Co-occurrence Streams

Tag growth:
   Influenced by PBK and p(w|t)

Tag Frequencies:
   Influenced by PBK, p(w|t), n, h
   n: Semantic breadth of a topic (blog: 100 tags,
     ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)
   h: No hint for realistic values. Good guesses may be 500
     and 1000.




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 33
Co-occ. Streams – Simulated Tag Growth




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 34
Co-occ. Stream – Simulated Tag Frequencies




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 36
Simulating Resource Streams

PI and PBK: Values comparable to co-occurrence streams
p(w|r): Approximated by p(w|t)
n: 7 tags are visible (cf. Delicious user interface)
h: Smaller value than for co-occurrence streams




              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 37
Res. Streams – Simulated Tag Frequencies




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 38
Lessons learned                                                   [Dellschaft+Staab,
                                                                  ACM Hypertext 2008]

Black holes do not only eat mass they also dissolve by
  emitting radiation

Imitation AND background knowledge are needed for
  explaining properties of tag streams
Probability of imitating previous tag assignments: ~70-90%

                                     Frequency Rank
                          Co-occur. Streams Resource Streams       Tag Growth
     Polya Urn Model             o                  o               fixed size
      Simon Model                   o                o               linear
   YS Model w/ Memory               +                o               linear
    Halpin et al. Model             o                o               linear
      Our Model
   Epistemic Model                 +                 +             power-law


                    Steffen Staab          Web Science Doctoral
                    staab@uni-koblenz.de    Summer School 40
Solar System




                                                                       Neptun


                                                                                Uranus
                                      Jupiter




                                                           Saturn
Flickr, cc Sep 1 2008 by Image Editor
               Steffen Staab                    Web Science Doctoral
               staab@uni-koblenz.de              Summer School 41
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Web Science Methodology:
  An explanation by analogy with Physics
  and some initial (!) applications to online communities
   • Modeling dynamic system at micro level,
     Understanding collective effects (macro level) arising from
     individual behavior (micro level)
   • Predicting dynamic system behavior,
     recognizing behavior deviating from the model
   • Modeling dynamic system behavior at the macro level
   • Controling dynamic system behavior by collective action
               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 42
Overall Scheme
          User interface              Something else?


                                                                  Tagging
                   Conceptualization                              Behavior




                                                                    Comparison
                                                                    of Statistics
            Own                            Shared
          Knowledge                      terminology


    Model of User Interface Influence

                                                                  Simulated
                 Joint Stochastic Model                            Tagging
                                                                  Behavior


   Model of Own Knowledge              Model of Sharing

               Steffen Staab               Web Science Doctoral
               staab@uni-koblenz.de         Summer School 43
What is our Uranus?

                   What is this?




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 44
Uranus = Spam                                                  [Dellschaft+Staab,
                                                               WebSci 2010]
Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream




                 Steffen Staab          Web Science Doctoral
                 staab@uni-koblenz.de    Summer School 45
Why care? The Bibsonomy Example

 Complete snapshot of Bibsonomy system
 Manually labeled ground truth of spammers in the data set



                  Users               Tags          Resources         TAS
Spammers              29,248          297,846           1,197,354   13,258,759
Non-Spammers           2,467           61,154             234,143     816,196




               Steffen Staab            Web Science Doctoral
               staab@uni-koblenz.de      Summer School 46
Why care? The Delicious Example

Crawled during the TAGora Project
       Users                 Tags         Resources              TAS
      532,938             2,482,850       18,778,566          140,305,446



Amount of spammers not known exactly
Estimation based on random sample of 500 users:
   With 95% probability: Between 1.972 and 12.949 spammers
   Delicious most likely already applies spam detection
   Why care about ~ 1.5% spammers in Delicious?




                Steffen Staab          Web Science Doctoral
                staab@uni-koblenz.de    Summer School 47
Filtering Results (Users)

                            Number of Spammers and Non-Spammers
       16000


       14000


       12000


       10000

                                                                                     Spammer
       8000                                                                          Non-Spammer

       6000


       4000


       2000


          0
               1   2    3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20



                       Steffen Staab                    Web Science Doctoral
                       staab@uni-koblenz.de              Summer School 48
Filtering Results (Tag Assignments)

                           Filtered and unfiltered number of TAS
      450000

      400000

      350000

      300000

      250000
                                                                                    Spam
                                                                                    Non-Spam
      200000

      150000

      100000

       50000

          0
               1   2   3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20



                   Steffen Staab                     Web Science Doctoral
                   staab@uni-koblenz.de               Summer School 49
That’s why

Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream




                 Steffen Staab          Web Science Doctoral
                 staab@uni-koblenz.de    Summer School 50
How statistically significant is the epistemic model for
normal users?




              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 51
Lessons learned

Uranus was discovered because it affected Neptun
Pluto was discovered because it affected Uranus!

Spammers can be discovered by their behavior,
  even if you do not know what kind of spam they are
  producing!




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 52
How do constellations in the sky evolve?




     http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/
                 Steffen Staab           Web Science Doctoral
                 staab@uni-koblenz.de     Summer School 53
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Web Science Methodology:
  An explanation by analogy with Physics
  and some initial (!) applications to online communities
   • Modeling dynamic system at micro level,
     Understanding collective effects (macro level) arising from
     individual behavior (micro level)
   • Predicting dynamic system behavior,
     recognizing behavior deviating from the model
   • Modeling dynamic system behavior at the macro level
   • Controling dynamic system behavior by collective action
               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 54
Example: Network




            Person                 Friendship




         Steffen Staab          Web Science Doctoral
         staab@uni-koblenz.de    Summer School 55
SUGGESTING WHOM TO LINK
TO NEXT

    Steffen Staab          Web Science Doctoral
    staab@uni-koblenz.de    Summer School 56
Use Networks for Recommendation


                                                            :-(


    me


   Goal: Predict who a person will add as friend

   Facebook's algorithm: find friends-of-friends

     → Problem: Rest of the network is ignored!

              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 57
Algebraic Graph Theory


                             3


           1          2            4      5            6




Represent a network                                           1   2   3   4   5   6
                                                       1      0   1   0   0   0   0
by an adjacency matrix A:
                                                       2      1   0   1   1   0   0
                                                       3      0   1   0   1   0   0
Aij = 1 when i and j are connected             A=
                                                       4      0   1   1   0   1   0
Aij = 0 when i and j are not connected                 5      0   0   0   1   0   1
                                                       6      0   0   0   0   1   0
A is square and symmetric.

                Steffen Staab          Web Science Doctoral
                staab@uni-koblenz.de    Summer School 58
Baseline: Friend of a Friend Model

Count the number of ways a person can be found as
the friend of a friend.

Consider the matrix product AA = A2


                         2                                                        3
0   1   0   0   0   0      1    0   1   1      0   0
1   0   1   1   0   0      0    3   1   1      1   0
0   1   0   1   0   0      1    1   2   1      1   0
                         =
0   1   1   0   1   0      1    1   1   3      0   1
0   0   0   1   0   1      0    1   1   0      2   0             1            2       4
0   0   0   0   1   0      0    0   0   1      0   1




                        Steffen Staab                  Web Science Doctoral
                        staab@uni-koblenz.de            Summer School 59
Eigenvalue Decomposition



Write the matrix A as a product:


                           A = UΛUT

where
U are the eigenvectors                UTU = I
Λ are the eigenvalues                 Λij = 0 when i ≠ j


           Steffen Staab          Web Science Doctoral
           staab@uni-koblenz.de    Summer School 60
Computing A2


Use the eigenvalue decomposition A = UΛUT

              A2 = UΛUT UΛUT = UΛ2UT
Exploit U and Λ:
    T
 U U = I      because U contains eigenvectors
 (Λ ) = Λ       because Λ contains eigenvalues
     2       2
       ii  ii

Result: Just square all eigenvalues!



           Steffen Staab          Web Science Doctoral
           staab@uni-koblenz.de    Summer School 61
Friend of a Friend of a Friend
                             3


        1            2               4               5               6



Compute the number of friends-of-friends-of-friends:
                                                 1   2   3   4   5       6
                                         3
             0   1       0   0   0   0           0   3   1   1   1       0   1
             1   0       1   1   0   0           3   2   4   5   1       1   2
             0   1       0   1   0   0           1   4   2   4   1       1   3
             0   1       1   0   1   0       =   1   5   4   2   4       0   4
             0   0       0   1   0   1           1   1   1   4   0       2   5
             0   0       0   0   1   0           0   1   1   0   2       0   6



A3 = UΛUT UΛUT UΛUT = UΛ3UT
            Steffen Staab                        Web Science Doctoral
            staab@uni-koblenz.de                  Summer School 62
Matrix Exponential
                                  3

      0.98                                                         0.76           0.22
          1               2            4              5              6              7

The matrix exponential can be written as a power
sum with decreasing coefficients:
       exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . .
                                           1    2         3    4         5    6         7
      0   1   0   0   0   0   0         1.66   1.72   0.93    0.98   0.28    0.06   0.01    1
      1   0   1   1   0   0   0         1.72   3.57   2.70    2.93   1.04    0.29   0.06    2
      0   1   0   1   0   0   0         0.93   2.70   2.86    2.71   0.99    0.28   0.06    3
  exp 0   1   1   0   1   0   0       = 0.98   2.93   2.71    3.63   1.95    0.76   0.22    4
      0   0   0   1   0   1   0         0.28   1.04   0.99    1.95   2.35    1.59   0.64    5
      0   0   0   0   1   0   1         0.06   0.29   0.28    0.76   1.59    2.23   1.38    6
      0   0   0   0   0   1   0         0.01   0.06   0.06    0.22   0.64    1.38   1.59    7


Recommendations for user ④:                                                  ①>⑥>⑦
                  Steffen Staab                 Web Science Doctoral
                  staab@uni-koblenz.de           Summer School 63
Why the Matrix Exponential


An
     = Number of paths of length n

aA2 + bA3 + cA4 + . . .
  = Number of paths, weighted by path length

→ New edges more likely to appear when there are
many paths already

→ When a > b > c > . . . > 0, short paths are
weighted more

             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 64
Computing Power Series

Let p(A) be a power series:

           p(A) = aA2 + bA3 + cA4 + . . .
        = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . .
           = U(aΛ2 + bΛ3 + cΛ4 + . . .)UT
                    = Up(Λ)UT


Therefore:

     Power series change only the eigenvalues!

             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 65
TRACKING THE EVOLUTION
OF THE NETWORK AS A
WHOLE
    Steffen Staab          Web Science Doctoral
    staab@uni-koblenz.de    Summer School 66
Diversity
• Many, equally-sized subcommunities
• High entropy
• ‘Flat’ structure

Regularity
• Few large subcommunities
• Low entropy
• Many ‘hubs’




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 67
⇒                             ⇒
Network Evolution




 • How did a network look at time t?
 • Idea: Observe the change of diversity/regularity over time




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 68
Outline




 1.   Power-law exponent
 2.   Weighted spectral distribution
 3.   Network entropy
 4.   Network rank




                 Steffen Staab          Web Science Doctoral
                 staab@uni-koblenz.de    Summer School 69
1. Power-law Exponent

 Number of neighbors is unevenly distributed:

                                      Epinions trust network (Massa et al. 2005)




                                                             C(n) ∼ n−γ




 Results in a power-law (Newman 2006)
 Higher exponent γ denotes less regularity
               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 70
1. Power-law Exponent over Time

                 Epinions trust network (Massa et al. 2005)




         γ shrinks ⇒ Network becomes more regular
              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 71
2. Weighted Spectral Distribution

 • Consider the n×n matrix N defined by

 Nij = 1 / sqrt(d(i)d(j))                when (i,j) is an edge
 Nij = 0                                       otherwise

 Then the distribution of the eigenvalues of N is called the
   weighted spectral distribution (WSD) (Fay et al. 2010)

 Eigenvalues nearer to ±1:      diversity
 Eigenvalues nearer to 0: regularity



                  Steffen Staab               Web Science Doctoral
                  staab@uni-koblenz.de         Summer School 72
2. Weighted Spectral Distribution over Time

               CiteULike user–tag network (Emamy et al. 2007)




 • The WSD shifts to zero ⇒ Regularization
  The WSD shifts towards zero ⇒ The network becomes regular
              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 73
3. Network Entropy



                     G = G1 ∪ G2 ∪ . . . ∪ Gr
 • Write the graph G as a sum of subgraphs Gk

 Each Gk has weighted edges, with total weight λk
 • When picking an edge from G at random, the probability of
   it being in community Gk is
                 λk / (λ1 + λ2 + . . . + λr) = λk / L
 • The entropy of this distribution is (Kunegis et al. 2011)

                H(G) = −       Σk (λk / L) log (λk / L)
 • Entropy: Effective number of subcommunities



              Steffen Staab           Web Science Doctoral
              staab@uni-koblenz.de     Summer School 74
3. Network Entropy over Time

                             Enron email network (Klimt et al. 2004)
                                                                       absolut
                                                                       e
         Entropy (H(G))




                                                                         zoo
                                                                         m




 Entropy is constant ⇒ Constant number of communities
            0
                                                 Time (t)

                          Steffen Staab                 Web Science Doctoral
                          staab@uni-koblenz.de           Summer School 75
4. Network Rank

 Decompose network into subcommunities:

                      G = G1 ∪ G2 ∪ . . . ∪ Gr

 The rank r is a measure of diversity:

                                 rank(G) = r

 Weighted rank:

                      rank∗(G) =      Σk |Gk| / |G1|
 Robust measure of diversity (Kunegis et al. 2011)
               Steffen Staab            Web Science Doctoral
               staab@uni-koblenz.de      Summer School 76
4. Network Rank over Time



       Network rank (rank∗(G))




                                    Enron email network (Klimt et al. 2004)


                                                        Time (t)

 • Increasing network rank: increasing diversity
 • Shrinking network rank: shrinking diversity
                                 Steffen Staab                Web Science Doctoral
                                 staab@uni-koblenz.de          Summer School 77
More Network Rank Plots




                                                                  Epinions trust network
    hep-th citations                Wikipedia elections




  frwikibooks edits                MIT conference contacts            YouTube social network



               (biased towards good examples of convex evolution)
                      Steffen Staab            Web Science Doctoral
                      staab@uni-koblenz.de      Summer School 78
Conclusion

 • Power-law exponent shrinks
    – Connection diversity shrinking
 • Weighted spectral distribution shifts to zero
    – Emerging main components
 • Entropy is constant
    – Effective number of communities is constant
 • Network rank increases, then shrinks
    – Two-phase- model of expansion




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 79
Watch out!

KONECT – Koblenz Network Collection

http://uni-koblenz.de/~kunegis/paper/kunegis-
   konect.poster.pdf

Coming soon!

Follow #ictrobust or @kunegis               or      @ststaab




               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 80
Why has the sky the density it has?




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 81 14,
                                      Flickr, cc Oct        2007, Michael Donough
Why do tagging systems have so little spam?

                         Administrative
                           Process


Content                 Community                          User
Quality                   Policy                           Roles



                                Content
             Steffen Staab
                                Process
                                    Web Science Doctoral
             staab@uni-koblenz.de    Summer School 82
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Web Science Methodology:
  An explanation by analogy with Physics
  and some initial (!) applications to online communities
   • Modeling dynamic system at micro level,
     Understanding collective effects (macro level) arising from
     individual behavior (micro level)
   • Predicting dynamic system behavior,
     recognizing behavior deviating from the model
   • Modeling dynamic system behavior at the macro level
   • Controling dynamic system behavior by collective action
               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 83
Yahoo Answers




 • Ensure quality of user generated content

 • Use of administrators and community moderators          How?

 • Policy influences community processes

             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 84
SURVEY OF
GOVERNANCE MODELS

    Steffen Staab          Web Science Doctoral
    staab@uni-koblenz.de    Summer School 85
Communities need Governance




 Steering and coordinating actions of community members
                                                    [Benz2004]
Goal: Successful and flourishing community
    High quality user-generated content
    Active community members
               [ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ]

               Steffen Staab                        Web Science Doctoral
               staab@uni-koblenz.de                  Summer School 86
Motivation

Different types of
     Web communities
     User-generated content (video, photos, comment, article,
      questions, answers, posting, review text)


   What are the most successful means of
   governance for user-generated content?

    Analyze successful platforms and compare
            their means of governance!


               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 87
Means of Governance

1. Direct intervention of community owner
   Affecting content or users based on apparent properties
2. Functionality of the community platform


                            Text Reviews         Bookmarks
           Ratings                                                  Abuse Reports
                                   Assessment

   User-generated                Content Modification               Community
      Content                    Complex User Roles                  Member
                                Selection & Ranking
        Ratings                                                      Score
                   Time      Views       Replies Hide Low Quality


                  Steffen Staab              Web Science Doctoral
                  staab@uni-koblenz.de        Summer School 88
Method

Selection of 250 most prominent web sites with community
  functionality according to Alexa Page Rank
Clustering web sites in four groups according to purpose
   Social Media                                        Editorial News




   Social Networking                              Social Reviewing

Top-5 web sites of each group analyzed (*)
              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de    Summer School 89
Key Results

(1) Abuse Reports are a successful means of governance.
   • 16 occurrences
   • Restricted to filter out unwanted content
   • Staff needed – expensive but efficient    [Schwagereit2010]




(2) Simple ratings are dominant – but battle between
    “Like” and “Like/Dislike”
   • “Like”: 9 occurrences
   • “Like/Dislike”: 7 occurrences
   • Tradeoff between simplicity and improved ranking ability


               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 90
Key Results

(3) Creation time is most implemented ranking criterion
    • 18 occurrences
    • Others: score: 8, ratings: 6
    • Important content is renewed - unimportant content will be
      forgotten

(4) Content modification and user roles are rarely
   implemented
     2 occurrences
     Requires complex role system and users
      who understand it



               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de    Summer School 91
GOVERNANCE MODEL:
DEEP DIVE - SIMULATION

     Steffen Staab          Web Science Doctoral
     staab@uni-koblenz.de    Summer School 92
Methodology Principle

  1. Define a Web Community model
     (Lycos IQ, Yahoo Answers…)
  2. Adapt this model to an existing community
  3. Estimate parameters

  4. Define quality measure

  5. Simulate community behaviour

  6. Compare simulation results with real data
  7. Analyze quality measures wrt variations of CoSiMo
     parameters

             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 93
Dataset Lycos IQ




    Time Period:                                909 days
    Users:                                      34.327
    Administrators:                             36
    Questions:                                  1.031.982
    Answers:                                    2.996.446
    Deleted non-compliant Answers:              21.139




             Steffen Staab          Web Science Doctoral
             staab@uni-koblenz.de    Summer School 94
Observed parameters (input to simulation)


                                                       100000

                                                       10000

                                                      1000

                                                      100       Number of
                                                                  Users
                                                      10

                                                  1
                                                0-999
                                                1000-1999
                                               2000-2999
                                              3000-3999
                                             4000-4999
                                           5000-5999
                 0.9-1.0
               0.8-0.89




                                          6000-6999
              0.7-0.79
             0.6-0.69




                                        7000-7999
            0.5-0.59
           0.4-0.49




                                       >7000
          0.3-0.39
         0.2-0.29




                                                       Answers
        0.1-0.19
       0.0-0.09




                                                       per year
  Rate of Compliant Answers
                Steffen Staab           Web Science Doctoral
                staab@uni-koblenz.de     Summer School 95
Example Behaviors and Example Policies

 Behaviors of Ordinary Users:                    Reading Policies for
         • Create new postings                     Administrators:
     • Read existing postings                    PA: random selection of
             • Report non-compliant                postings
             postings                            PB: random selection of
             OR give bonus points to               postings that no other
             poster                                administrator has examined
                                                   so far
 Moderator Users:                                PC: selection of postings that
     • Create new postings                         were most often reported
     • Read existing postings                      by users for being non-
              • Delete non-compliant
                                                   compliant
              posting
              OR give bonus points to            Promotion Policy:
              poster                             PM-X : ordinary users become
                                                    moderators (who can
 Administrators:                                    delete postings) when
 •Read existing postings                            having at least X bonus
              •Delete non-compliant                 points
              postings
                   Steffen Staab          Web Science Doctoral
                   staab@uni-koblenz.de    Summer School 96
How many administrators are needed?




                                                                  1,05           0,95-
                                                                                 1,05
                                                                  0,95           0,85-
                                                                       Recent    0,95
                                                                  0,85 Posting   0,75-
                                                                       Quality   0,85
                                                                  0,75
                                                                                 0,65-
                                                                  0,65           0,75




                                                   5
                                                 10
                                                20
                                               40
 1152

                                              80
        288
              72                            160            Additional non-compliant
                                           320
                     18
                                          640

                                4                             Postings (per day)
                                        1280


 Number of Administrators
                                       2560



                                       1

                Steffen Staab              Web Science Doctoral
                staab@uni-koblenz.de        Summer School 97
Fighting spam with administrators…



              1
          0,998
          0,996
          0,994                                                          0,998-1
Recent    0,992
Posting    0,99                                                          0,996-0,998




                                                              576
Quality                                                                  0,994-0,996




                                                       72
                                                9
                                                                         0,992-0,994




                                           1
                                                                        0,99-0,992
                                                             Number of
                                                            Administrators
    Applied Policies

 Variation of policies and number of administrators
     • Efficient policies result in high quality content
     • A minimum of 18 administrators are needed
     • Many moderators are needed to bring the quality to a high level
                    Steffen Staab          Web Science Doctoral
                    staab@uni-koblenz.de    Summer School 98
Fighting spam with user moderators…

                                                     1
                                                     0,95
                                                     0,9
                                                     0,85
                                                     0,8                           0,95-1
                                                     0,75
                                                     0,7            Recent
                                                   5 0,65
                                                   100,6            Posting
                                                   20
                                                  40                               0,9-
                                                 80
                                                160                 Quality
                                               320
                                              640                                  0,95
                                             1280
       PA+PB+PC+PM12




                                            2560                                   0,85-
       PA+PB+PC+PM25




      PA+PB+PC+PM1…
      PA+PB+PC+PM50




      PA+PB+PC+PM3…
     PA+PB+PC+PM100
    PA+PB+PC+PM200
    PA+PB+PC+PM400
    PA+PB+PC+PM800


           PA+PB+PC
              PA+PB
                                                                                   0,9




                 PA
                                                                 Additional non-
                                                                   compliant
                                                                  Postings (per
                                                                      day)
             Applied Policies

  Variation of policies and posting quality
      • A limited number of administrators has a limited capacity of
      filtering a surge of non-compliant postings
      • Moderators are helping to increase quality
                  Steffen Staab           Web Science Doctoral
                  staab@uni-koblenz.de     Summer School 99
Lessons Learned


 • Strategy of selecting questionable postings is crucial

 • Reporting by normal users is the most effective strategy

 • Moderators are not so effective as expected, if they hunt
   only incidentally for non-compliant content

 • Sufficiently strong requirements regarding moderator
   profiles lead to high quality of moderators

 • Policies for promoting users need to be based on a
   criterion that is time dependent

               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de   Summer School 100
Agenda

• Risks and Opportunities in Social Communities:
  the ROBUST project

• Web Science Methodology:
  An explanation by analogy with Physics
  and some initial (!) applications to online communities
   • Modeling dynamic system at micro level,
     Understanding collective effects (macro level) arising from
     individual behavior (micro level)
   • Predicting dynamic system behavior,
     recognizing behavior deviating from the model
   • Modeling dynamic system behavior at the macro level
   • Controling dynamic system behavior by collective action
               Steffen Staab          Web Science Doctoral
               staab@uni-koblenz.de   Summer School 101
Are we satisfied here? No! Not by far!

Understand how and why users tag or tweet?
  -> What are people‘s limitations that affect the system?
  -> Psychology and Sociology!

What are their legal boundaries?
 -> How can you shape the systems?
 -> Law!

What are organizations‘ incentives?
 -> Why and how do organizations participate?
     -> Nice example: open source
 -> Economy
              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de   Summer School 102
Web Science & Technologies
     University of Koblenz ▪ Landau, Germany




Thank You!
References

The Slashdot Zoo: Mining a social network with negative edges
J. Kunegis, A. Lommatzsch and C. Bauckhage
In Proc. World Wide Web Conf., pp. 741–750, 2009.
Learning spectral graph transformations for link prediction
J. Kunegis and A. Lommatzsch
In Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009.
Spectral analysis of signed graphs for clustering, prediction and
visualization
J. Kunegis, S. Schmidt, A. Lommatzsch and J. Lerner
In Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010.
Network growth and the spectral evolution model
J. Kunegis, D. Fay and C. Bauckhage
In Proc. Conf. on Information and Knowledge Management,
pp. 739–748, 2010.

                Steffen Staab          Web Science Doctoral
                staab@uni-koblenz.de   Summer School 104
References

B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the
evolution of user interaction in Facebook. In Proc.
Workshop on Online Social Networks, pp. 37–42, 2009.




            Steffen Staab          Web Science Doctoral
            staab@uni-koblenz.de   Summer School 105
References

K. Dellschaft, S. Staab. An Epistemic Dynamic Model for
   Tagging Systems. HYPERTEXT 2008, Proceedings of the
   19th ACM Conference on Hypertext and Hypermedia,
   June 19-21, 2008 - Pittsburgh, Pennsylvania, USA.
K. Dellschaft, S. Staab. On Differences in the Tagging
   Behavior of Spammers and Regular Users. In: Proc. of
   WebSci-2010, Raleigh, April, 2010.
F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies
   for Online Communities with CoSiMo. In: Proc. of WebSci-
   2010, Raleigh, US, April, 2010.




              Steffen Staab          Web Science Doctoral
              staab@uni-koblenz.de   Summer School 106

Weitere ähnliche Inhalte

Ähnlich wie Managing Social Communities

IBM Social Business Social Software for Business Discover Expertise. Deliver ...
IBM Social Business Social Software for Business Discover Expertise. Deliver ...IBM Social Business Social Software for Business Discover Expertise. Deliver ...
IBM Social Business Social Software for Business Discover Expertise. Deliver ...Fred von Graf
 
Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1ISSIP
 
Artic Startup
Artic StartupArtic Startup
Artic StartupBobsNJ
 
NYC Chalk Talk
NYC Chalk TalkNYC Chalk Talk
NYC Chalk TalkBobsNJ
 
The business case for hybrid clouds and mini pods
The business case for hybrid clouds and mini podsThe business case for hybrid clouds and mini pods
The business case for hybrid clouds and mini podsJanet Brokerage
 
Acquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia
 
Clearvale Hlo En 2010 01 18
Clearvale Hlo En 2010 01 18Clearvale Hlo En 2010 01 18
Clearvale Hlo En 2010 01 18rwang5688
 
Creating Cloud Communities
Creating Cloud CommunitiesCreating Cloud Communities
Creating Cloud CommunitiesPeter Coffee
 
Challenges of Building Web Observatories
Challenges of Building Web ObservatoriesChallenges of Building Web Observatories
Challenges of Building Web ObservatoriesSteffen Staab
 
B2B Social Media Summit, Philadelphia
B2B Social Media Summit, PhiladelphiaB2B Social Media Summit, Philadelphia
B2B Social Media Summit, PhiladelphiaChip Rodgers
 
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...Marieke Guy
 
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...Kalman Graffi
 
Comparing SOAs for the Internet of Things
Comparing SOAs for the Internet of ThingsComparing SOAs for the Internet of Things
Comparing SOAs for the Internet of ThingsDominique Guinard
 
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...City University London
 
IT Academy WorkShop 14/05/2012
IT Academy WorkShop 14/05/2012IT Academy WorkShop 14/05/2012
IT Academy WorkShop 14/05/2012Lee Stott
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2Md Aminul Hassan
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...Alex Liu
 
PDL Distinguished Alumni Talk
PDL Distinguished Alumni TalkPDL Distinguished Alumni Talk
PDL Distinguished Alumni TalkErik Riedel
 
061223_web_20_conference_sf_shan
061223_web_20_conference_sf_shan061223_web_20_conference_sf_shan
061223_web_20_conference_sf_shancjin cheng
 

Ähnlich wie Managing Social Communities (20)

IBM Social Business Social Software for Business Discover Expertise. Deliver ...
IBM Social Business Social Software for Business Discover Expertise. Deliver ...IBM Social Business Social Software for Business Discover Expertise. Deliver ...
IBM Social Business Social Software for Business Discover Expertise. Deliver ...
 
Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1
 
Artic Startup
Artic StartupArtic Startup
Artic Startup
 
NYC Chalk Talk
NYC Chalk TalkNYC Chalk Talk
NYC Chalk Talk
 
The business case for hybrid clouds and mini pods
The business case for hybrid clouds and mini podsThe business case for hybrid clouds and mini pods
The business case for hybrid clouds and mini pods
 
Acquia Business Mandate Deck Final
Acquia Business Mandate Deck FinalAcquia Business Mandate Deck Final
Acquia Business Mandate Deck Final
 
Clearvale Hlo En 2010 01 18
Clearvale Hlo En 2010 01 18Clearvale Hlo En 2010 01 18
Clearvale Hlo En 2010 01 18
 
Creating Cloud Communities
Creating Cloud CommunitiesCreating Cloud Communities
Creating Cloud Communities
 
Challenges of Building Web Observatories
Challenges of Building Web ObservatoriesChallenges of Building Web Observatories
Challenges of Building Web Observatories
 
B2B Social Media Summit, Philadelphia
B2B Social Media Summit, PhiladelphiaB2B Social Media Summit, Philadelphia
B2B Social Media Summit, Philadelphia
 
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...
Home, Work, Work, Home!? How Information Professionals Can Exploit Blurred Bo...
 
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...
Dagstuhl 2010 - Kalman Graffi - Alternative, more promising IT Paradigms for ...
 
Comparing SOAs for the Internet of Things
Comparing SOAs for the Internet of ThingsComparing SOAs for the Internet of Things
Comparing SOAs for the Internet of Things
 
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...
hcid2011 - RED: a multi-disciplinary approach to experience design - Jarnail ...
 
IT Academy WorkShop 14/05/2012
IT Academy WorkShop 14/05/2012IT Academy WorkShop 14/05/2012
IT Academy WorkShop 14/05/2012
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2
 
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
 
PDL Distinguished Alumni Talk
PDL Distinguished Alumni TalkPDL Distinguished Alumni Talk
PDL Distinguished Alumni Talk
 
Shakawath's Profile
Shakawath's ProfileShakawath's Profile
Shakawath's Profile
 
061223_web_20_conference_sf_shan
061223_web_20_conference_sf_shan061223_web_20_conference_sf_shan
061223_web_20_conference_sf_shan
 

Mehr von Steffen Staab

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Steffen Staab
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableSteffen Staab
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Steffen Staab
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudSteffen Staab
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagSteffen Staab
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and SpreadingSteffen Staab
 
10 Jahre Web Science
10 Jahre Web Science10 Jahre Web Science
10 Jahre Web ScienceSteffen Staab
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contentsSteffen Staab
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with ContextSteffen Staab
 
Wwsss intro2016-final
Wwsss intro2016-finalWwsss intro2016-final
Wwsss intro2016-finalSteffen Staab
 
10 Years Web Science
10 Years Web Science10 Years Web Science
10 Years Web ScienceSteffen Staab
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSteffen Staab
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015Steffen Staab
 
ISWC2015 Opening Session
ISWC2015 Opening SessionISWC2015 Opening Session
ISWC2015 Opening SessionSteffen Staab
 

Mehr von Steffen Staab (20)

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, Sustainable
 
Eyeing the Web
Eyeing the WebEyeing the Web
Eyeing the Web
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag Terminologietag
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and Spreading
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
10 Jahre Web Science
10 Jahre Web Science10 Jahre Web Science
10 Jahre Web Science
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
 
Wwsss intro2016-final
Wwsss intro2016-finalWwsss intro2016-final
Wwsss intro2016-final
 
10 Years Web Science
10 Years Web Science10 Years Web Science
10 Years Web Science
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015
 
ISWC2015 Opening Session
ISWC2015 Opening SessionISWC2015 Opening Session
ISWC2015 Opening Session
 

Kürzlich hochgeladen

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Kürzlich hochgeladen (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Managing Social Communities

  • 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Managing Social Communities Steffen Staab Acknowledgements to ROBUST Project team & WEST Team, in particular K. Dellschaft, J. Kunegis, F. Schwagereit
  • 2. Institut WeST – Web Science & Technologies Semantic Web Web Retrieval Interactive Web Multimedia Web Software Web eGovernment eMedia eScience eOrganizations ePerson Institute for Computer Institute for Leibniz Institute for Science Information Systems Social Sciences (GESIS) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 2
  • 3. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 3
  • 4. Social Communities …are everywhere c Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 4
  • 5. Risks Opportunities Bad content quality, Open innovation, social ill behavior,… improved user support,…  jeopardize business value  increase business value Data Storage Content, User & and Processing Networks Analysis Scalability, heterogeneity Understanding, response time Business Value Product support & innovation, CRM, Expertise management, Marketing, Advertising Online Communities Intranet, Extranet, Internet Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 5
  • 6. Large-scale Testbeds 2013 2013 5M users millions posts/day 1200K accesses/day 1TB data/day SAP (B2B) Polecat (C2C) Community Network Online Marketing Business Partner Network CRM for IT 2009 2009 1.5M users … 150K access/day IBM (E2E) Developer Network 2009 2013 Corporate Knowledge 99K accounts 800K accounts Management Steffen Staab Web Science Doctoral 2 staab@uni-koblenz.de Summer School 6
  • 7. SAP Business Partner Use Case SAP Developer Network Size of user generated Posts per day Number of users content (posts) 2007 2009 2013 2007 2009 2013 2007 2009 2013 SAP 5000 6000 7000 1M 4M 10.0 1M 1.7M 4.8M M Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 7
  • 8. ROBUST: IBM Employee Use Case Business Data Created per day Number of users 2007 2009 2013 2007 2009 2013 IBM Activities Entry 700 2750 5000 53200 143600 200000 IBM Blogs Entries 120 30 60 34600 77750 100000 IBM Communities 3 23 50 3000 181950 250000 IBM Bookmarks 800 900 1000 8500 22400 50000 IBM Wikis NA 40 100 NA 35450 100000 IBM Files NA 290 1000 NA 45160 100000 IBM Overall 1623 4033 7210 500000* 500000* 500000* Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 8
  • 9. Risks in Online Communities Definition: Risk Likelihood  Probability of an event occurring  Impact of the event occurring Risk management Cost Benefit  Process for managing costs, benefits and likelyhoods  Detect high impact risks in time even if they generate expensive false alarms SAP: SCN Award Points Scamming  Ignore very low impact risks • Experts reputation decreases even if they can be reliably detected • Business users leave the forum Types of risks  Non-compliance with the community policies/polity  Scamming or spamming behavior  Lower involvement and productivity  Decrease of user satisfaction  Loss of community dynamics Web: Public communities • Death of TechCrunch forum due to Loss of 1% experts  loss of high revenue spam and lack of management Loss of 10% lurkers  low impact Steffen Staab Web Science Doctoral 8 staab@uni-koblenz.de Summer School 9
  • 10. Communities: dynamics and confidentiality ROBUST supports decision making for users, hosts and service providers Managing growth & decline  Identify, encourage, safeguard core users  Social matching  Define/maintain etiquette and policies  Manage negative behavior and conflicts  Content matching  Recognize, categorize decline and growth  Redirect users to other communities Merging communities  Cross community topic detection to stimulate inter-community interactions Splitting communities  Identification of clusters/compartments of members that can be separate Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 10
  • 11. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Many related Talks in this Summer School Robust partners Closely related Alani: Monitoring and analysis Greene: Network Analysis of social networks Bernstein: Scalable Karnstedt: User churn infrastructures But here comes the biased account from work in our institute Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 11
  • 12. Plan for this Talk 1 Web 2 Science Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 12
  • 13. Bild eines schwarzen Lochs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Flickr cc, Jan 7 2009 by Summer School 13 thebadastronomer
  • 14. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 14
  • 15. Better understanding of the tagging process  Cooperative classification of resources  Which factors influence the tagging process? • Background knowledge of the user? • Tag assignments of other users?  Hypothesis: Tagging involves imitation of other users AND selection of tags from background knowledge of users. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 15
  • 16. Methodology User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Model of Knowledge Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 16
  • 17. Components of Analysis Properties of Tag Streams Observations  Stream view of Folksonomies in  Co-occurrence streams the real world  Resource streams Dynamic model for Tagging Systems Stochastic  Simulating background knowledge models of  Simulating tag imitation influence Simulation Results Which models  Co-occurrence streams best fit the  Resource streams reality? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 17
  • 18. Stream Views of a Folksonomy Folksonomies:  Vertices: Users, tags, resources  Edges: Tag assignments  Postings: • Tag assignments of a user to a single resource • Can be ordered according to their time-stamp Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 18
  • 19. Co-occurrence Streams Co-occurrence Streams:  All tags co-occurring with a given tag in a posting  Ordered by posting time Co-occurrence stream for 'apple':  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  tree, mac, ibook, macintosh, stevejobs Tag |Y| |U| |T| |R| ajax 2.949.614 88.526 41.898 71.525 blog 6.098.471 158.578 186.043 557.017 xml 974.866 44.326 31.998 61.843 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 19
  • 20. Properties of Co-occurrence Streams – Tag Growth linear growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 20
  • 21. Properties of Co-occurrence Streams – Tag Frequencies power law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 21
  • 22. Resource Streams Resource Streams:  All tags assigned to a resource  Ordered by posting time Resource stream for 'r2':  {mackz, r1, {apple, tree}, 13:25} {klaasd, r2, {apple, mac, ibook}, 13:26} {mackz, r2, {apple, macintosh, stevejobs}, 13:27}  apple, mac, ibook, apple, macintosh, stevejobs Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 22
  • 23. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 23
  • 24. Properties of Resource Streams – Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 24
  • 25. Web Science & Technologies University of Koblenz ▪ Landau, Germany Simulating the Evolution of Tag Streams
  • 26. Simulating tag streams Which of my concepts Inspiration for conceptualization from: represent this web page? How do I tag 1. Most popular tags this web page? 2. Most recently used tags 3. Tags used for this resource 4. Tags co-occuring with similar text documents 5. Creating completely new tags 6. … Which combination of inspirations develop the same statistics as the one observed for delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 26
  • 27. The Delicious User Interface Imitating previous tag assignments:  Recommended tags: Intersection of tags of a user and tags already assigned to the resource.  Your tags: Tags of the user.  Popular tags: 7 most popular tags assigned to the resource. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 27
  • 28. Simulating a Tag Stream Start with empty tag stream Each simulation step appends a new tag assignment Simulation of a single tag assignment: p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus. n: Number of visible previous tags. h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 28
  • 29. Modeling Background Knowledge Text Corpora Del.icio.us Text Corpora PBK: Probability of selecting from background knowledge  p(w|t): Probability of selecting word w for topic t. Modeled by word distributions in a topic centered text corpus.  p(w|r): Probability of selecting word w for resource r. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 29
  • 30. Modeling Tag Imitation PBK t t-1 t-2 t-3 t-4 t-5 … t-h … 1-PBK 1 2 3 … n PI = 1 – PBK: Probability of imitating a previous tag assignment  n: Number of visible top-ranked tags  h: Maximal number of previous tag assignments used for determining ranking of the n distinct tags Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 30
  • 31. Web Science & Technologies University of Koblenz ▪ Landau, Germany Simulation Results
  • 32. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 32
  • 33. Simulating Co-occurrence Streams Tag growth:  Influenced by PBK and p(w|t) Tag Frequencies:  Influenced by PBK, p(w|t), n, h  n: Semantic breadth of a topic (blog: 100 tags, ajax: 50 tags, xml: 50 tags; Cattuto et al. 2007)  h: No hint for realistic values. Good guesses may be 500 and 1000. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 33
  • 34. Co-occ. Streams – Simulated Tag Growth Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 34
  • 35. Co-occ. Stream – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 36
  • 36. Simulating Resource Streams PI and PBK: Values comparable to co-occurrence streams p(w|r): Approximated by p(w|t) n: 7 tags are visible (cf. Delicious user interface) h: Smaller value than for co-occurrence streams Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 37
  • 37. Res. Streams – Simulated Tag Frequencies Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 38
  • 38. Lessons learned [Dellschaft+Staab, ACM Hypertext 2008] Black holes do not only eat mass they also dissolve by emitting radiation Imitation AND background knowledge are needed for explaining properties of tag streams Probability of imitating previous tag assignments: ~70-90% Frequency Rank Co-occur. Streams Resource Streams Tag Growth Polya Urn Model o o fixed size Simon Model o o linear YS Model w/ Memory + o linear Halpin et al. Model o o linear Our Model Epistemic Model + + power-law Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 40
  • 39. Solar System Neptun Uranus Jupiter Saturn Flickr, cc Sep 1 2008 by Image Editor Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 41
  • 40. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 42
  • 41. Overall Scheme User interface Something else? Tagging Conceptualization Behavior Comparison of Statistics Own Shared Knowledge terminology Model of User Interface Influence Simulated Joint Stochastic Model Tagging Behavior Model of Own Knowledge Model of Sharing Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 43
  • 42. What is our Uranus? What is this? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 44
  • 43. Uranus = Spam [Dellschaft+Staab, WebSci 2010] Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 45
  • 44. Why care? The Bibsonomy Example Complete snapshot of Bibsonomy system Manually labeled ground truth of spammers in the data set Users Tags Resources TAS Spammers 29,248 297,846 1,197,354 13,258,759 Non-Spammers 2,467 61,154 234,143 816,196 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 46
  • 45. Why care? The Delicious Example Crawled during the TAGora Project Users Tags Resources TAS 532,938 2,482,850 18,778,566 140,305,446 Amount of spammers not known exactly Estimation based on random sample of 500 users:  With 95% probability: Between 1.972 and 12.949 spammers  Delicious most likely already applies spam detection  Why care about ~ 1.5% spammers in Delicious? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 47
  • 46. Filtering Results (Users) Number of Spammers and Non-Spammers 16000 14000 12000 10000 Spammer 8000 Non-Spammer 6000 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 48
  • 47. Filtering Results (Tag Assignments) Filtered and unfiltered number of TAS 450000 400000 350000 300000 250000 Spam Non-Spam 200000 150000 100000 50000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 49
  • 48. That’s why Effect of removing 257 spammers of 12.777 users from the ‘bookmark’ stream Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 50
  • 49. How statistically significant is the epistemic model for normal users? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 51
  • 50. Lessons learned Uranus was discovered because it affected Neptun Pluto was discovered because it affected Uranus! Spammers can be discovered by their behavior, even if you do not know what kind of spam they are producing! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 52
  • 51. How do constellations in the sky evolve? http://www.flickr.com/photos/furious-angel/2142647358/sizes/o/in/photostream/ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 53
  • 52. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 54
  • 53. Example: Network Person Friendship Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 55
  • 54. SUGGESTING WHOM TO LINK TO NEXT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 56
  • 55. Use Networks for Recommendation :-( me  Goal: Predict who a person will add as friend  Facebook's algorithm: find friends-of-friends → Problem: Rest of the network is ignored! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 57
  • 56. Algebraic Graph Theory 3 1 2 4 5 6 Represent a network 1 2 3 4 5 6 1 0 1 0 0 0 0 by an adjacency matrix A: 2 1 0 1 1 0 0 3 0 1 0 1 0 0 Aij = 1 when i and j are connected A= 4 0 1 1 0 1 0 Aij = 0 when i and j are not connected 5 0 0 0 1 0 1 6 0 0 0 0 1 0 A is square and symmetric. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 58
  • 57. Baseline: Friend of a Friend Model Count the number of ways a person can be found as the friend of a friend. Consider the matrix product AA = A2 2 3 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1 1 0 0 0 3 1 1 1 0 0 1 0 1 0 0 1 1 2 1 1 0 = 0 1 1 0 1 0 1 1 1 3 0 1 0 0 0 1 0 1 0 1 1 0 2 0 1 2 4 0 0 0 0 1 0 0 0 0 1 0 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 59
  • 58. Eigenvalue Decomposition Write the matrix A as a product: A = UΛUT where U are the eigenvectors UTU = I Λ are the eigenvalues Λij = 0 when i ≠ j Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 60
  • 59. Computing A2 Use the eigenvalue decomposition A = UΛUT A2 = UΛUT UΛUT = UΛ2UT Exploit U and Λ: T  U U = I because U contains eigenvectors  (Λ ) = Λ because Λ contains eigenvalues 2 2 ii ii Result: Just square all eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 61
  • 60. Friend of a Friend of a Friend 3 1 2 4 5 6 Compute the number of friends-of-friends-of-friends: 1 2 3 4 5 6 3 0 1 0 0 0 0 0 3 1 1 1 0 1 1 0 1 1 0 0 3 2 4 5 1 1 2 0 1 0 1 0 0 1 4 2 4 1 1 3 0 1 1 0 1 0 = 1 5 4 2 4 0 4 0 0 0 1 0 1 1 1 1 4 0 2 5 0 0 0 0 1 0 0 1 1 0 2 0 6 A3 = UΛUT UΛUT UΛUT = UΛ3UT Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 62
  • 61. Matrix Exponential 3 0.98 0.76 0.22 1 2 4 5 6 7 The matrix exponential can be written as a power sum with decreasing coefficients: exp(A) = I + A + 1/2 A2 + 1/6 A3 + . . . 1 2 3 4 5 6 7 0 1 0 0 0 0 0 1.66 1.72 0.93 0.98 0.28 0.06 0.01 1 1 0 1 1 0 0 0 1.72 3.57 2.70 2.93 1.04 0.29 0.06 2 0 1 0 1 0 0 0 0.93 2.70 2.86 2.71 0.99 0.28 0.06 3 exp 0 1 1 0 1 0 0 = 0.98 2.93 2.71 3.63 1.95 0.76 0.22 4 0 0 0 1 0 1 0 0.28 1.04 0.99 1.95 2.35 1.59 0.64 5 0 0 0 0 1 0 1 0.06 0.29 0.28 0.76 1.59 2.23 1.38 6 0 0 0 0 0 1 0 0.01 0.06 0.06 0.22 0.64 1.38 1.59 7 Recommendations for user ④: ①>⑥>⑦ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 63
  • 62. Why the Matrix Exponential An = Number of paths of length n aA2 + bA3 + cA4 + . . . = Number of paths, weighted by path length → New edges more likely to appear when there are many paths already → When a > b > c > . . . > 0, short paths are weighted more Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 64
  • 63. Computing Power Series Let p(A) be a power series: p(A) = aA2 + bA3 + cA4 + . . . = aUΛ2UT + bUΛ3UT + cUΛ4UT + . . . = U(aΛ2 + bΛ3 + cΛ4 + . . .)UT = Up(Λ)UT Therefore: Power series change only the eigenvalues! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 65
  • 64. TRACKING THE EVOLUTION OF THE NETWORK AS A WHOLE Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 66
  • 65. Diversity • Many, equally-sized subcommunities • High entropy • ‘Flat’ structure Regularity • Few large subcommunities • Low entropy • Many ‘hubs’ Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 67
  • 66. ⇒ Network Evolution • How did a network look at time t? • Idea: Observe the change of diversity/regularity over time Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 68
  • 67. Outline 1. Power-law exponent 2. Weighted spectral distribution 3. Network entropy 4. Network rank Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 69
  • 68. 1. Power-law Exponent Number of neighbors is unevenly distributed: Epinions trust network (Massa et al. 2005) C(n) ∼ n−γ Results in a power-law (Newman 2006) Higher exponent γ denotes less regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 70
  • 69. 1. Power-law Exponent over Time Epinions trust network (Massa et al. 2005) γ shrinks ⇒ Network becomes more regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 71
  • 70. 2. Weighted Spectral Distribution • Consider the n×n matrix N defined by Nij = 1 / sqrt(d(i)d(j)) when (i,j) is an edge Nij = 0 otherwise Then the distribution of the eigenvalues of N is called the weighted spectral distribution (WSD) (Fay et al. 2010) Eigenvalues nearer to ±1: diversity Eigenvalues nearer to 0: regularity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 72
  • 71. 2. Weighted Spectral Distribution over Time CiteULike user–tag network (Emamy et al. 2007) • The WSD shifts to zero ⇒ Regularization The WSD shifts towards zero ⇒ The network becomes regular Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 73
  • 72. 3. Network Entropy G = G1 ∪ G2 ∪ . . . ∪ Gr • Write the graph G as a sum of subgraphs Gk Each Gk has weighted edges, with total weight λk • When picking an edge from G at random, the probability of it being in community Gk is λk / (λ1 + λ2 + . . . + λr) = λk / L • The entropy of this distribution is (Kunegis et al. 2011) H(G) = − Σk (λk / L) log (λk / L) • Entropy: Effective number of subcommunities Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 74
  • 73. 3. Network Entropy over Time Enron email network (Klimt et al. 2004) absolut e Entropy (H(G)) zoo m Entropy is constant ⇒ Constant number of communities 0 Time (t) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 75
  • 74. 4. Network Rank Decompose network into subcommunities: G = G1 ∪ G2 ∪ . . . ∪ Gr The rank r is a measure of diversity: rank(G) = r Weighted rank: rank∗(G) = Σk |Gk| / |G1| Robust measure of diversity (Kunegis et al. 2011) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 76
  • 75. 4. Network Rank over Time Network rank (rank∗(G)) Enron email network (Klimt et al. 2004) Time (t) • Increasing network rank: increasing diversity • Shrinking network rank: shrinking diversity Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 77
  • 76. More Network Rank Plots Epinions trust network hep-th citations Wikipedia elections frwikibooks edits MIT conference contacts YouTube social network (biased towards good examples of convex evolution) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 78
  • 77. Conclusion • Power-law exponent shrinks – Connection diversity shrinking • Weighted spectral distribution shifts to zero – Emerging main components • Entropy is constant – Effective number of communities is constant • Network rank increases, then shrinks – Two-phase- model of expansion Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 79
  • 78. Watch out! KONECT – Koblenz Network Collection http://uni-koblenz.de/~kunegis/paper/kunegis- konect.poster.pdf Coming soon! Follow #ictrobust or @kunegis or @ststaab Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 80
  • 79. Why has the sky the density it has? Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 81 14, Flickr, cc Oct 2007, Michael Donough
  • 80. Why do tagging systems have so little spam? Administrative Process Content Community User Quality Policy Roles Content Steffen Staab Process Web Science Doctoral staab@uni-koblenz.de Summer School 82
  • 81. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 83
  • 82. Yahoo Answers • Ensure quality of user generated content • Use of administrators and community moderators How? • Policy influences community processes Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 84
  • 83. SURVEY OF GOVERNANCE MODELS Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 85
  • 84. Communities need Governance  Steering and coordinating actions of community members [Benz2004] Goal: Successful and flourishing community  High quality user-generated content  Active community members [ http://www.flickr.com/photos/61433480@N02/5593890914/, http://www.flickr.com/photos/boojee/3733902852/ ] Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 86
  • 85. Motivation Different types of  Web communities  User-generated content (video, photos, comment, article, questions, answers, posting, review text) What are the most successful means of governance for user-generated content? Analyze successful platforms and compare their means of governance! Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 87
  • 86. Means of Governance 1. Direct intervention of community owner Affecting content or users based on apparent properties 2. Functionality of the community platform Text Reviews Bookmarks Ratings Abuse Reports Assessment User-generated Content Modification Community Content Complex User Roles Member Selection & Ranking Ratings Score Time Views Replies Hide Low Quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 88
  • 87. Method Selection of 250 most prominent web sites with community functionality according to Alexa Page Rank Clustering web sites in four groups according to purpose Social Media Editorial News Social Networking Social Reviewing Top-5 web sites of each group analyzed (*) Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 89
  • 88. Key Results (1) Abuse Reports are a successful means of governance. • 16 occurrences • Restricted to filter out unwanted content • Staff needed – expensive but efficient [Schwagereit2010] (2) Simple ratings are dominant – but battle between “Like” and “Like/Dislike” • “Like”: 9 occurrences • “Like/Dislike”: 7 occurrences • Tradeoff between simplicity and improved ranking ability Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 90
  • 89. Key Results (3) Creation time is most implemented ranking criterion • 18 occurrences • Others: score: 8, ratings: 6 • Important content is renewed - unimportant content will be forgotten (4) Content modification and user roles are rarely implemented  2 occurrences  Requires complex role system and users who understand it Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 91
  • 90. GOVERNANCE MODEL: DEEP DIVE - SIMULATION Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 92
  • 91. Methodology Principle 1. Define a Web Community model (Lycos IQ, Yahoo Answers…) 2. Adapt this model to an existing community 3. Estimate parameters 4. Define quality measure 5. Simulate community behaviour 6. Compare simulation results with real data 7. Analyze quality measures wrt variations of CoSiMo parameters Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 93
  • 92. Dataset Lycos IQ Time Period: 909 days Users: 34.327 Administrators: 36 Questions: 1.031.982 Answers: 2.996.446 Deleted non-compliant Answers: 21.139 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 94
  • 93. Observed parameters (input to simulation) 100000 10000 1000 100 Number of Users 10 1 0-999 1000-1999 2000-2999 3000-3999 4000-4999 5000-5999 0.9-1.0 0.8-0.89 6000-6999 0.7-0.79 0.6-0.69 7000-7999 0.5-0.59 0.4-0.49 >7000 0.3-0.39 0.2-0.29 Answers 0.1-0.19 0.0-0.09 per year Rate of Compliant Answers Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 95
  • 94. Example Behaviors and Example Policies Behaviors of Ordinary Users: Reading Policies for • Create new postings Administrators: • Read existing postings PA: random selection of • Report non-compliant postings postings PB: random selection of OR give bonus points to postings that no other poster administrator has examined so far Moderator Users: PC: selection of postings that • Create new postings were most often reported • Read existing postings by users for being non- • Delete non-compliant compliant posting OR give bonus points to Promotion Policy: poster PM-X : ordinary users become moderators (who can Administrators: delete postings) when •Read existing postings having at least X bonus •Delete non-compliant points postings Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 96
  • 95. How many administrators are needed? 1,05 0,95- 1,05 0,95 0,85- Recent 0,95 0,85 Posting 0,75- Quality 0,85 0,75 0,65- 0,65 0,75 5 10 20 40 1152 80 288 72 160 Additional non-compliant 320 18 640 4 Postings (per day) 1280 Number of Administrators 2560 1 Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 97
  • 96. Fighting spam with administrators… 1 0,998 0,996 0,994 0,998-1 Recent 0,992 Posting 0,99 0,996-0,998 576 Quality 0,994-0,996 72 9 0,992-0,994 1 0,99-0,992 Number of Administrators Applied Policies Variation of policies and number of administrators • Efficient policies result in high quality content • A minimum of 18 administrators are needed • Many moderators are needed to bring the quality to a high level Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 98
  • 97. Fighting spam with user moderators… 1 0,95 0,9 0,85 0,8 0,95-1 0,75 0,7 Recent 5 0,65 100,6 Posting 20 40 0,9- 80 160 Quality 320 640 0,95 1280 PA+PB+PC+PM12 2560 0,85- PA+PB+PC+PM25 PA+PB+PC+PM1… PA+PB+PC+PM50 PA+PB+PC+PM3… PA+PB+PC+PM100 PA+PB+PC+PM200 PA+PB+PC+PM400 PA+PB+PC+PM800 PA+PB+PC PA+PB 0,9 PA Additional non- compliant Postings (per day) Applied Policies Variation of policies and posting quality • A limited number of administrators has a limited capacity of filtering a surge of non-compliant postings • Moderators are helping to increase quality Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 99
  • 98. Lessons Learned • Strategy of selecting questionable postings is crucial • Reporting by normal users is the most effective strategy • Moderators are not so effective as expected, if they hunt only incidentally for non-compliant content • Sufficiently strong requirements regarding moderator profiles lead to high quality of moderators • Policies for promoting users need to be based on a criterion that is time dependent Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 100
  • 99. Agenda • Risks and Opportunities in Social Communities: the ROBUST project • Web Science Methodology: An explanation by analogy with Physics and some initial (!) applications to online communities • Modeling dynamic system at micro level, Understanding collective effects (macro level) arising from individual behavior (micro level) • Predicting dynamic system behavior, recognizing behavior deviating from the model • Modeling dynamic system behavior at the macro level • Controling dynamic system behavior by collective action Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 101
  • 100. Are we satisfied here? No! Not by far! Understand how and why users tag or tweet? -> What are people‘s limitations that affect the system? -> Psychology and Sociology! What are their legal boundaries? -> How can you shape the systems? -> Law! What are organizations‘ incentives? -> Why and how do organizations participate? -> Nice example: open source -> Economy Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 102
  • 101. Web Science & Technologies University of Koblenz ▪ Landau, Germany Thank You!
  • 102. References The Slashdot Zoo: Mining a social network with negative edges J. Kunegis, A. Lommatzsch and C. Bauckhage In Proc. World Wide Web Conf., pp. 741–750, 2009. Learning spectral graph transformations for link prediction J. Kunegis and A. Lommatzsch In Proc. Int. Conf. on Machine Learning, pp. 561–568, 2009. Spectral analysis of signed graphs for clustering, prediction and visualization J. Kunegis, S. Schmidt, A. Lommatzsch and J. Lerner In Proc. SIAM Int. Conf. on Data Mining, pp. 559–570, 2010. Network growth and the spectral evolution model J. Kunegis, D. Fay and C. Bauckhage In Proc. Conf. on Information and Knowledge Management, pp. 739–748, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 104
  • 103. References B. Viswanath, A. Mislove, M. Cha, K. P. Gummadi, On the evolution of user interaction in Facebook. In Proc. Workshop on Online Social Networks, pp. 37–42, 2009. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 105
  • 104. References K. Dellschaft, S. Staab. An Epistemic Dynamic Model for Tagging Systems. HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, June 19-21, 2008 - Pittsburgh, Pennsylvania, USA. K. Dellschaft, S. Staab. On Differences in the Tagging Behavior of Spammers and Regular Users. In: Proc. of WebSci-2010, Raleigh, April, 2010. F. Schwagereit, S. Sizov, S. Staab. Finding Optimal Policies for Online Communities with CoSiMo. In: Proc. of WebSci- 2010, Raleigh, US, April, 2010. Steffen Staab Web Science Doctoral staab@uni-koblenz.de Summer School 106