SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Trust in Recommender Systems
  a historical overview and recent 
            developments


                               Paolo Massa
               Universita' di Trento e ITC/iRST
                http://moloko.itc.it/paoloblog/

                  (adapted by Hassan Masum)
  Slides licenced under CreativeCommons Attribution­ShareAlike (see last slide for more info)   1
Ask and ye shall receive...




There are no stupid questions.
   Interrupt me and ask!




                                 2
Plan of the talk
■ Info Overload

■ Info Retrieval vs Info Filtering

■ Content­based Filtering

  
      Weaknesses
■ Collaborative Filtering (aka Recommender 

 Systems)
  
      Weaknesses
■ Trust­aware Filtering

  
      What is trust? Reputation?                  3
Info Overload
■ 5 seconds

■ 1 ... 2 ... 3 ... 4 ... 5 




                                           4
Info Overload
■ 5 seconds:
     
         Scientific information written in this 5 seconds can keep you 
         busy reading for 40 minutes (based on 1985 data!)
     
         400KB of new text published on paper (24TB printed each 
         year, 2000, “How Much Information” project at Berkeley)
     
         You have received an email (probably spam) ;­)

■ Is this  true? Who can tell .... Take facts with 

    a grain of salt.
■   quot;Technology reduces the amount of time it takes to do any one 
    task but also leads to the expansion of tasks that people are 
    expected to do.quot; ­­ Juliet Schor                               5
Info Overload (IO)
■ IO refers to the state of having too much 

    information to make a decision or remain 
    informed about a topic.
■ The term was coined in 1970 by Alvin Toffler 

    in his book “Future Shock.”
■   http://en.wikipedia.org/wiki/Information_overload
■   Too much information can be worse than too little – 
    illusion of being informed


                                                           6
Info Overload Stats
■   (NO NO – I'm not reading it, it is just a practical example of information overload!)

■   The daily New York Times now contains more information that the 17th century man or woman would have encountered in a 
    lifetime.  (Wurman, S.A. (1987)  Information Anxiety.  New York:  Doubleday, 32.)

■   quot;As we go from grade school to high school we learn only a billionth of what there is to learn.  There is enough scientific information written every 
    day to fill seven complete sets of Encyclopedia Britannica; there is enough scientific information written every year to keep a person busy 
    reading day and night for 460 years!quot;  (Siegel, B.L. (1984, April 15).  Knowledge with commitment:  Teaching is the central task of the 
    university. Vital Speeches of the Day, 50, 394.)

■   quot;In the last 30 years mankind has produced more information than in the previous 5,000.quot;  (Information Overload Causes Stress. (1997, 
    March/April).  Reuters Magazine. Available:  Lexis Nexis Universe [4/28/98].)

■   Gordon Moore, co­founder of Intel, coined Moore's Law which states that the processing power of computer chips doubles about every 18 months.

■    quot;About 1,000 books are published internationally every day, and the total of all printed knowledge doubles every five years. 
    (Information Overload Causes Stress. (1997, March/April). Reuters Magazine. Available:  Lexis Nexis Universe [4/28/98].)

■    quot;The average Fortune 1000 worker already is sending and receiving approximately 178 messages and documents each day, according to a recent 
    study, quot;Managing Corporate Communications in the Information Age.quot;  (Boles, M. (1997)  Help! Information overload. Workforce, 76, 20.)

■   quot;Dr Dharma Singh Khalsa, in his book Brain Longevity,...says the average American sees 16,000 advertisements, logos, and labels in a day.quot;  (Gore, 
    A. (1998, January 18) . Stressed?  Maybe it's information overload.  Sun Herald,  27.)

■   University of California Berkely has a quot;How Much Informationquot; project which studies the amount of information produced each year.  quot;The world's 
    total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage.  This is the equivalent of 
    250 megabytes per person for each man, woman, and child on earth.quot;  Berkeley:  How Much Information? 
    (http://www.sims.berkeley.edu/research/projects/how­much­info/) 

■   http://library.humboldt.edu/~ccm/fingertips/ioverloadstats.html

■   http://www.sims.berkeley.edu/research/projects/how­much­info­2003/execsum.htm#summary

■   Data Smog: Surviving the Information Glut, by David Shenk                                                                                              7
Info overload
■ BLOGS!!!



■ Am I contributing? You bet :­)

  
      You now have to do some Information Retrieval /  
      Filtering




                                                          8
Info Retrieval vs Info Filtering
■   Info Retrieval: deals with static information 
    (Reuters, a database, a book): you want to find 
    information that is “lying there”
■   Info Filtering: deals with dynamic information 
    (flows such as the Web or the media): you want to 
    prioritize important incoming information, and 
    block the rest
■   Relevance and Quality of items 
    
        On a paper repository like Citeseer: no papers about “spam” 
        but good papers about “spam”
    
        Which “spam” papers are worth your while?                  9
Recommender Systems
■ Algorithms/systems that suggest to a user 

 items she might like.
■ Books, Songs, Restaurants, Food, ..., 

 Jokes, ..., anything?


■ E­commerce sites (but not only!)

  
      For now, think of Amazon.com



                                               10
Recommender Systems
Techniques:
■ Content­based

■ Collaborative Filtering (CF)

■ Trust­aware




[... always think of a way of “spamming” the 
   technique I describe. It is a safe assumption 
   nowadays...]                                     11
Content­Based RSs
●
    RSs find items similar to ones you liked in past. How? 
    Analyse the “syntactic content” of all the items.
●
    Example: If you like papers containing word “Info 
    Retrieval”, RS recommends to you another paper with the 
    word “Info Retrieval” in it.
●
    If you read news containing word “Darfur”, it recommends 
    to you other news with the word “Darfur”.
●
    If you like movies of Kubrik, you get one more movie of 
    Kubrik.
●
    Techniques of Info Retrieval ...
.... What are the weaknesses?      STOP!
                                                               12
Content­Based RSs weaknesses
●
    Good for text: If you like papers containing word 
    “Info Retrieval”, RS recommends you another 
    paper with the word “Info Retrieval” in it.  (And 
    partially effective ways to find “similar” papers ­ 
    vector space, LSI.)
●
    For movies or songs, humans must tag the 
    content (genre, actors, year, ...) but this is time­
    consuming, costly, errors­prone and subjective.
    –   Can your employees “correctly” tag all the 
        podcasts? All the videos? All the photos?      13
Content­Based RSs weaknesses
Content­based RSs weaknesses summary:
●
    Text Items (papers, news): Doable but RSs tend 
    to propose always the same soup (boring). 
    Difficult to recognize synonyms, concepts, or 
    new emerging words (such as “folksonomy”).
●
    Movies or Songs: not parsable at the moment by 
    machines, so humans must tag them.
●
    Jokes (or subjective items such as political 
    ideas): What are the “right” features? Tagging 
    “objectively” is not possible!                  14
Collaborative Filtering
●
    Users give ratings to items (implicit or 
    explicit)
    ●
        I like “Titanic” as 4/5
●
    RS finds users similar to you (User 
    similarity)
●
    Suggests to you items liked by similar user


Idea: out there, there is someone that is similar to 
  you and you will like what they liked.           15
1
                                                 Item2



                                                                  4
                                                         Item3
                                          Item




                                                                 Item
Ratings from 1 (min) to 5 (max)


                                  ME       2 5 ? 5
                                           2 5 5 5
    Sim(ME,User2) =  ­0,2         User2    5             1       3
    Sim(ME,User3) =  ­0,4         User3    5     5               1
    Sim(ME,User4) =  +0,9         User4    2
                                           2     5
                                                 5       5
                                                         5       4
                                                                 4

                It does not consider the content of the items, only 
                   the ratings given by users.
                It works independently of the domain (also jokes)
                                          BUT
                Overlapping of rated items required!
                                                                        16
Let's collect a real example
■ We collect some ratings for “our” RS.

■ Movies: Matrix, LOTR, Hotel Rwanda, 

 Titanic, La vita e' bella.




                                          17
CF formulas
Similarity measure: Pearson Correlation 
Coefficient of user a and u (in [­1,+1])
                         m
                     ∑i=1  r a ,i −r a  r u , i −r u 
      wa , u =
                 ∑  m
                     i=1
                         r a ,i −r a 
                                          2      m                2
                                              ∑i =1  r u , i−r u 

Prediction of rating given by user a to item i 
                         n
                     ∑u=1  r u ,i − r u ∗wa , u
                                     
    p a , i =r a 
                                 n
                             ∑u=1 w a , u                             18
CF pros
CF is Simple and effective, 
Works for every kind of item independently 
 of the domain (ex: Jester recommends 
 jokes).
It allows Serendipity (fortunate discoveries): 
  you get recommended “Iron 3” even if you 
  never saw a Korean movie. 


BUT ...
                                             19
CF WEAKNESSES!!!
■    User Similarity often not computable

    – Ratings Matrix sparseness (95­99%) ­> Low or No 
      overlapping
■    Cold start
    – New users have 0 ratings (­> not comparable)
    – At the beginning, your RS is not Amazon!
■    Easy Attacks by Malicious Users
    – Copy profile and become the most similar
    – Even easier on the Semantic Web
■    Hard to understand and control
    – Black box (bad recs ­> user gives up)

                    Solution? Trust!                     20
Trust­awareness
Trust: explicit rating of a user on another user
●
    about the perceived quality of the user's 
    characteristics
●
    in RSs, you “trust” someone if you like her 
    tastes 


­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
We will now speak about trust and trust metrics
and then we will come back to “trust and RSs”               21
Trust usage for real
E­marketplaces: Ebay.com, Epinions.com, Amazon.com
News sites: Slashdot.org, Kuro5hin.org
P2P networks: eDonkey, Gnutella, JXTA
Job sites: LinkedIn, Ryze, ...
Open Source Developer communities: Advogato.org (Affero.org)
Hospitalityclub, couchsurfing: hosting in your house unknown people?
Bookcrossing and lending stuff sites

Network of personal weblogs (millions of blogrolls!)
Semantic Web: FOAF (Friend­Of­A­Friend) is an RDF format that allows 
 you to express social relationships (~10 million files)
Google (and Yahoo!): TrustRank (patented)
                                                                       22
Trust networks
■ Aggregate all the trust statements to produce a 

 trust network.                             A node is a user.
                                            A direct edge is a trust statement
                          0
                 Mena               Ben
 0.2                                          Properties of Trust:
       0.9                                     ­ weighted (0=distrust, 1=max trust)
                              0.6
                                               ­ subjective
         1
ME                Doc                          ­ asymmetric ­ context­dependent?

                                    Trust Metric (TM):
  ?                ?                Uses existing edges for predicting values
                                    of trust for non­existing edges. 
             1                       Thanks to trust propagation, if you trust 
Cory               Mary
                                    someone, then you have some degree of 
                                    trust in anyone that person trusts.           23
PageRank: a trust metric?
Imagine the web as a          ■   Nodes are web pages, Edges 
trust network                     are links (not weighted).

                       Web
                              ■   PageRank 
          Web
                       page       (Google)computes the 
          page
                                  “importance” of every single 
                                  page based on number and 
Web        Web
page                              quality of incoming edges...
           page
                              ■   So, YES: PageRank is a 
                                  trust metric.

 Web        Web               ■   HITS as well.
 page       page

                                                            24
Spam a Trust Metric
■ Is it easy to spam a trust metric?

■ It depends: some are attack­resistant 

 (advogato, for example)


■ Identity is an issue

■ ... and “social cost of cheap pseudonyms” by 

 Eric Friedman and Paul Resnick: unknown 
 peers should not be trusted.
                                                  25
TM perspective: Local or Global
                  1                 1
         Mary              Mena                  Bill
                                                        How much can Bill be trusted?
                                     0                   On average (by the community)?
           ME         1                                  By Mary?
                            Doc
                                                         And by ME?
■       Global Trust Metrics: 
    
        “Reputation” of user is based on number and quality of incoming edges. Bill has 
        just one predicted trust value (0.5). 
    
        PageRank (eBay, Slashdot, ). Work bad for controversial people (bush)
■       Local Trust Metrics
    
        Trust is subjective ­­> consider personal views (trust “Bill”?)
    
        AppleSeed, Golbeck TM, Advogato, ...
    
        Local can be more effective if people are not standardized.                 26
Local vs Global
■   Search engine: abortion, jew, scientology, ...
■   Who can define what is spam? Google? A site that opposes 
    Chinese Comm. Party should be removed?
■   Local vs global:
     
         Is gwbush.com a good page? Is johnkerry.com a good page? Is 
         sex.com a good page?
■   Maybe these questions are meaningless?
     
         It depends on YOUR LOCAL point of view!
     
         republican/democrats, child/parent, federal/newglobal, 
         catholic/atheist, pro/against abortion, ...
■   Tyranny of the majority / Daily Me (Sunstein)
                                                                   27
Sociology and Trust
■   Is this Sociology?
■   Yes ... 
■   You have seen many graphs, but the first to model groups 
    in this way was Moreno, a sociologist (1934, sociogram).
■   Social network analysis 
    (faculty.ucr.edu/~hanneman/nettext/)
■   Degree, betweeness, centrality, ...


■   Is this Politics? Yes ... 
■   Read “Republic.com” and “Why Societies Need Dissent” by 
    Cass Sunstein                                          28
Economy and Trust
■   Is this Economy? Yes...
■   Reputation is an asset, for companies (marketing) but 
    also for people
■   Centrality in Network is money as well.
■   Open source movement: your peers knows you and will 
    hire you when they need someone they trust and value. 
    But also researchers (who gets the next Nobel in 
    Physics? The most “trusted” by physicists!)
■   Read “Down and out in the Magic Kingdom” ­ SciFi, 
    reputation (whuffie) is the only currency
                                                         29
Trust and Search Engines
■   3  generation search engines: 
     rd


■   personalization of results based on trust networks 
    (LOCAL!), based on what your friends like/dislike.
■   Google and Yahoo! are moving in this direction (I'm 
    speculating). [TrustRank]
■   Problem: Scalability! You cannot recompute 
    PageRank of every site for every user!
■   But you can do it on your laptop/mobile for 
    yourself, aggregating only the information “close” 
    to you ...
                                                          30
Which Trust Metric works better?
■   And under which conditions?
■   Still an open question. [you can work on it ;­) ]
■   Few papers until now evaluate trust metrics:
    ●
        Input data not easily available
         ●
             (advogato.org (8K), FOAF, epinions.com(150K), ... but 
             not weighted)
■   No papers compare different TMs
    
        Leave­one­out technique
■   Is local better than global? Only for the few users who are 
    atypical? Computational Expensive? Attack­resistant 
    (googlebomb)?                                              31
Trust propagation
                             1
                      Mena         Bill
          0.6
                             0.2

     ME         0.8
                      Doc


■   Trust chains (propagation)
■   Combining different trust chains


■   0.6 * 1 = 0.6, and 0.8 * 0.2 = 0.16
■   Then average? Not that simple ...
■   And how far does trust propagate?                     32
Trust metrics open issues
       (There are no comparative evaluations of TMs)
       ■   Cycles are a problem ­­> Order peers based on distance from source user
            
                Trust of users at level k is based only on trust of users at level k­1 (and k)
            
                Trust propagation horizon (computation)

Find all trust paths from source to target
Propagate trust along trust paths
Trust decay: every hop reduces trust (or certainty of trust).
   ●
       A user can't propagate more than received trust.
   ●
       Distrust (trust=0) blocks the propagation.
Trust about quality vs Trust as judger
     Tquality(A,C)=f(Tjudger(A,B),Tquality(B,C)) 
Combine different trust paths
– Unpredictable Trust = minimum trust value.
– There are no globally “bad” users.
                                           1                              1
– Warn about Paradoxes or inconsistencies.
                                                                1          0                     33
How to use distrust
■   Distrust? Opinions of distrusted peers should 
    simply be discarded, otherwise they could 
    manipulate them to influence our recs
■   Example: suppose we distrust some that is 
    distrusted by our enemy, then our enemy could 
    say “I trust A” and we come to distrust A (that 
    could be anyone ... from pope to bush)
■   But, it is worth knowing about someone who is 
    trusted by many, even if distrusted by you...
                                                     34
RS evaluation: how?
■ Back to Recommender Systems:

  
      How do we evaluate RSs performances?
  
      Any ideas?...




                                             35
RS evaluation: let us count the ways...
■   Many ways to evaluate Recommender Systems.
■   Leave­one­out: hide one rating and try to predict it
    
        Accuracy: are predictions correct?
    
        Coverage: how often are we able to predict?
■   Accuracy: differences between real value and predicted 
    value.
    
        MAE, MSE, Weighted MAE, MAUE, ...
■   Ability to identify some new items user will like 
    (unwatched movie), or bad items (spam, products).
■   Evaluation is still problematic                        36
Trust in Recommender Systems
How do we exploit trust in RSs?


Instead of computing UserSim of other users, compute 
    Trust in other users.


Instead of items liked by similar users, 
   recommend items liked by “trustable” users.


(or combine both methods ...)                           37
Trust Propagation




                           ME

6 degrees of separation “theorem” (Stanley Milgram, 1967)

With few trust steps it is possible to reach every person in 
the world! (but more steps needed for higher­trust actions)
­­> Ideally, using trust metrics, no more unknown users. 38
Trust solves RS problems
■   User Similarity often not computable
    ➔
         trust propagation and “6 degrees” ­> we are 
         now able to predict trust for many users
■   Cold start
     ➔
         “just add 1 
         friend”
■   Easy copy­profile Attacks
     ➔
         “you can be similar but if no trust path to 
         you ...”
■   Hard to understand and control
     ➔
         Showing Trust Networks supports 
         Explanation                                    39
Epinions.com Experiments
■   Some experiments to show that trust solves RSs problems...
■ Epinions.com users can
    
        Review and rate items (from 1 to 5)
    
        Keep web of trust (trust=1) and block list (trust=0). [Epinions 
        FAQ says to put in Web of Trust “Reviewers whose reviews and 
        ratings you have consistently found to be valuable”]
■ Dataset (collected by crawling site):
    
        ~50K users, ~140K items, ~660K ratings.
    
        ~500K trust statements. 
         ➔
             No block list (not shown on site, kept hidden)
                                                                      40
Experimental Setup
Compare performances of CF (1) and trust­aware (2)  algorithm
 (1) ­ use CF on ratings and compute “similarity” of other users
 (2) ­ use Trust Metric and compute “trustworthiness” of other users
Then we can predict ratings based on similar OR trustable users.




Leave­one­out: hide one rating, predict it and compute the error (660.000 ratings!) 
                                                                                       41
UserSimilarity and Trust 
                                            computability

Mean number of Comparable users for             Mean number of Comparable users for 
            All users                                    Cold Start users

         Propagating Trust             Using             Propagating Trust             Using 
                                      Pearson                                         Pearson
Dist 1     Dist 2   Dist 3   Dist 4             Dist 1    Dist 2   Dist 3    Dist 4


9.88       400      4386     16334     161      2.14      94.54    1675      9121     2.74




                                                                                         42
Webs of Trust Grow Quickly...
Mean # Reachable Users (in k steps) for users expressing X trust 
 statements.
In few steps, you can predict trust in every user!  Even for Cold Start 
   Users!!!




                                                                           43
User Trust Metric

Linear decay based on distance from ME: closer users are more trustable.

Parameter: max propagation distance (mpd) – distance in social network past 
  which there is no trust.

TrustME(B) = (mpd­distME(B)+1) / mpd
       If (distME(B)>mpd) then TrustME(B) = 0



Experiments with mpd=1, 2, 3, 4 called Trust­1, Trust­2, Trust­3, Trust­4




                                                                            44
User Trust Metric
      Example: max propagation distance=4
DistME(B) = 1
TME(B) = (4­1+1)/4=1

DistME(B) = 2
TME(B) = 3/4

DistME(B) = 3                          ME
TME(B) = 2/4

DistME(B) = 4
TME(B)= 1/4

DistME(B) > 4
TME(B) = 0
                                                   45
Experimental Results
                                                                             Rows:
#Expressed Ratings       ALL       2       3       4      UserSim = CollaborativeFiltering
User population size     40169    3937    2917    2317
                                                          Trust­x = Trust propagation up to distance x
Mean Web of Trust Size     9.88    2.54    3.15    3.64
Ratings    UserSim         51%      N/A      4%      8%
Coverage Trust-1           28%     10%     11%     12%    RatingsCoverage = how many hidden 
                           60%     23%     26%     31%
           Trust-2
                                                          ratings are predictable.
           Trust-3         74%     39%     45%     51%
           Trust-4         77%     45%     53%     59%
                                                          UsersCoverage = how many users get at 
Users      UserSim         41%      N/A      6%    14%    least a prediction
Coverage Trust-1           45%     17%     25%     32%    MAE = |real_rating­pred_rating| averaged 
                           56%     32%     43%     53%
           Trust-2
                                                          over all the ratings.
           Trust-3         61%     46%     57%     64%
           Trust-4         62%     56%     59%     66%    MAUE = |real_rating­pred_rating| averaged 
Mean       UserSim        0.843     N/A   1.244   1.027   over the ratings of one user, then averaged 
Absolute Trust-1          0.837   0.929   0.903   0.840   over all users.
Error      Trust-2        0.829   1.050   0.940   0.927
(MAE)      Trust-3        0.811   1.046   0.940   0.918
           Trust-4        0.805   1.033   0.926   0.903                    Columns:
                                                          Views over users.
                                                          ALL = all the users (with at least 1 rating)
                                                          2 = only the subset of users that gave 2 
                                                          ratings (there are 3937)
                                                          (similarly for 3 and 4)...
                                                                                                  46
Experimental Results
                                                          On average, Trust­x 
#Expressed Ratings       ALL       2       3       4
User population size     40169    3937    2917    2317    achieves better coverage 
                                                          without loss of accuracy.
Mean Web of Trust Size     9.88    2.54    3.15    3.64
Ratings    UserSim         51%      N/A      4%      8%
Coverage Trust-1           28%     10%     11%     12%
           Trust-2         60%     23%     26%     31%
           Trust-3         74%     39%     45%     51%
           Trust-4         77%     45%     53%     59%
Users      UserSim         41%      N/A      6%    14%
Coverage Trust-1           45%     17%     25%     32%
           Trust-2         56%     32%     43%     53%
           Trust-3         61%     46%     57%     64%
           Trust-4         62%     56%     59%     66%
Mean       UserSim       0.843      N/A   1.244   1.027
Absolute Trust-1          0.837   0.929   0.903   0.840
Error      Trust-2        0.829   1.050   0.940   0.927
(MAE)      Trust-3        0.811   1.046   0.940   0.918
           Trust-4       0.805    1.033   0.926   0.903




                                                                                      47
Experimental Results
                                                          On average, Trust­x 
#Expressed Ratings       ALL       2       3       4
User population size     40169    3937    2917    2317    achieves better coverage 
                                                          without loss of accuracy.
Mean Web of Trust Size     9.88    2.54    3.15    3.64
Ratings    UserSim         51%      N/A      4%      8%
Coverage Trust-1           28%     10%     11%     12%
           Trust-2         60%     23%     26%     31%    UserSim performs well 
           Trust-3
           Trust-4
                           74%
                           77%
                                   39%
                                   45%
                                           45%
                                           53%
                                                   51%
                                                   59%
                                                          with heavy raters and 
Users      UserSim         41%      N/A      6%    14%    poorly with cold start users.
Coverage Trust-1           45%     17%     25%     32%
           Trust-2         56%     32%     43%     53%
           Trust-3         61%     46%     57%     64%
           Trust-4         62%     56%     59%     66%
Mean       UserSim       0.843      N/A   1.244   1.027
Absolute   Trust-1       0.837    0.929   0.903   0.840
Error      Trust-2        0.829   1.050   0.940   0.927
(MAE)      Trust-3        0.811   1.046   0.940   0.918
           Trust-4        0.805   1.033   0.926   0.903




                                                                                          48
Experimental Results
                                                           On average, Trust­x 
#Expressed Ratings       ALL       2       3        4
User population size     40169    3937     2917    2317    achieves better coverage 
                                                           without loss of accuracy.
Mean Web of Trust Size     9.88    2.54     3.15    3.64
Ratings    UserSim         51%      N/A      4%       8%
Coverage Trust-1           28%     10%      11%     12%
           Trust-2         60%     23%      26%     31%    UserSim performs well 
           Trust-3
           Trust-4
                           74%
                           77%
                                   39%
                                   45%
                                            45%
                                           53%
                                                    51%
                                                    59%
                                                           with heavy raters and 
Users      UserSim         41%      N/A      6%     14%    poorly with cold start users.
Coverage Trust-1           45%     17%      25%     32%
           Trust-2         56%     32%      43%     53%
           Trust-3
           Trust-4
                           61%
                           62%
                                   46%
                                   56%
                                            57%
                                           59%
                                                    64%
                                                    66%
                                                           For cold start users (50% of 
Mean       UserSim        0.843     N/A   1.244    1.027   the total!), Trust­x achieves 
Absolute Trust-1          0.837   0.929   0.903    0.840
Error      Trust-2        0.829   1.050   0.940    0.927   also better accuracy.
(MAE)      Trust-3
           Trust-4
                          0.811
                          0.805
                                  1.046
                                  1.033
                                          0.940
                                          0.926
                                                   0.918
                                                   0.903
                                                           For bootstrapping RSs, 
                                                           asking one trust statement 
                                                           is better than asking one 
                                                           rating.
                                                           (experiments on 660.000 ratings)   49
Centralized vs decentralized
■   Another problem with current RS: centralization
■   Information is Centralized in one server
    
        Your “profile” scattered in many RS (Amazon, B&B, ...)
    
        Profile not reusable (your profile in Amazon is NOT yours)
    
        Recommendation computation out of your control
■   Decentralized
    
        The Web is decentralized: anyone can write whatever she 
        wants, in whatever “language” she wants (spam is good)
    
        No censorship, innovation can happen on the edges, not a 
        single mind but many minds... you are in control of what you 
        produce
                                                                     50
Semantic Web
■   A Web of content designed for and understandable by machines. 
    (matrix?)
■   Promising Semantic Web formats 
     
         FOAF (Friend­Of­A­Friend): trust info        <­­­
     
         XFN (Xhtml Friend Network): social info
     
         VoteLink: vote­for, vote­against, vote­abstain links
     
         Blogroll: not semantic!
     
         hReview: review/rating info                           <­­­
     
         RSS of OutFoxed (http://getoutfoxed.com/rss)
     
         RVW, OpenReview, ...: review/rating info

■   Allow decentralized publishing of information that RSs 
    aggregate and exploit.                                                51
Note: Adoption of a Language
■   Suppose you can define the language we have to use 
    for communicating
    
        Which language is better? Chinese? Italiano? The 
        one you invent?
■   Interesting question, but as long as “good enough”, 
    matters little (or not at all) for adoption of language.
■   Do you know why keys in your keyboard are placed in 
    that way? ­­> how standards get adopted ...
■   Who has the power to “propose” changes in the 
    language of the Web?
                                                               52
Format adoption
■   Adoption does not depend on quality of the format but 
    (mainly) on the authority of the proponent.
■   Google can push changes in HTML (example: 
    rel=nofollow)
■   Certainly Microsoft could (even without you noticing it 
    or telling it you)
■   I can't.
■   ...But maybe if I create a format that's immediately 
    useful, it will be taken up by a user community and 
    spread “virally” (how HTML started)
                                                            53
FOAF (Friend­Of­A­Friend)
Every peer expresses who she knows (and trusts, with an 
 extension). Based on RDF (Resource Description Format)
It is already used (some millions of files). Create one FOAF 
  file and put it in your page! Find more at 
  http://foaf­project.org
Decentralized publishing advantages: profile not scattered in 
 many different sites and under user control. A 
 scutter/crawler can follow “links” (seeAlso) and aggregate 
 the complete social network.


STOP!!!  Downsides to publishing your friendship network?

                                                            54
FOAF example
<foaf:Person rdf:nodeId=quot;mequot;>

      <foaf:name>Paolo Massa</foaf:name>

      <foaf:mbox rdf:resource=quot;mailto:massa@itc.itquot; />

        ...

      <foaf:knows rdf:nodeId=quot;friend01quot; />

      <trust:trust9 rdf:nodeId=quot;friend01quot; />

</foaf:Person>

<foaf:Person rdf:nodeId=quot;friend001quot;>

      <foaf:name>Jennifer Golbeck</foaf:name>

      <rdfs:seeAlso rdf:resource=quot;http://cs.umd.edu/~golbeck/daml/golbeckFOAF.rdfquot;/>

</foaf:Person>




Find my FOAF file at http://sra.itc.it/people/massa/paolofoaf.rdf

(and check if you are one of my friends ;­)                                            55
hReview
■   hReview is a simple, open, distributed reviews 
    format suitable for embedding in (X)HTML, Atom, 
    RSS, and arbitrary XML. 
■   In order to enable and encourage the sharing, 
    distribution, syndication, and aggregation of 
    reviews
■   Proposed by Technorati.com on a wiki page
■   http://developers.technorati.com/wiki/hReview
■   You are free to participate and give suggestions 
    and feedback!
                                                        56
hReview example
<div class=quot;hreviewquot;>

 <span><span class=quot;ratingquot;>5</span> out of 5 stars</span>

 <h4 class=quot;summaryquot;><span class=quot;item fnquot;>Crepes on Cole</span> is awesome</h4>

 <span>Reviewer: <span class=quot;reviewer fnquot;>Tantek</span> ­ 

 <abbr class=quot;dtreviewquot; title=quot;20050418T2300­0700quot;>April 18, 2005</abbr></span>

 <blockquote class=quot;descriptionquot;><p>Crepes on Cole is one of the best little creperies in 
   San Francisco.   Excellent food and service. Plenty of tables in a variety of sizes  for 
   parties large and small.  Window seating makes for excellent  people watching to/from 
   the N­Judah which stops right outside.    I've had many fun social gatherings here, as 
   well as gotten    plenty of work done thanks to neighborhood WiFi. </p></blockquote>

 <p>Visit date: <span>April 2005</span></p>

 <p>Food eaten: <span>Florentine crepe</span></p>

</div>


                                                                                        57
Conclusions
■   Info Filtering:
■   From Content­based ... to Collaborative Filtering ... to 
    Trust­aware (?)
■   Trust is a simple and complicated concept
■   It is a rapidly evolving and increasingly important topic: 
    there is room for your contributions!
■   Forecast: In 3 years, anyone will publish her opinions 
    (about stuff, organizations...people, ideas?) in some 
    semantic format and trust­aware aggregators 
    (googletrust?) will help in coping with an increased 
    information overload.
                                                                  58
Licence of these slides
These slides are released under

Creative Commons

Attribution­ShareAlike 2.5
You are free:

    * to copy, distribute, display, and perform the work

    * to make derivative works

    * to make commercial use of the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor.

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license 
   identical to this one.

    * For any reuse or distribution, you must make clear to others the license terms of this work.

    * Any of these conditions can be waived if you get permission from the copyright holder.

Your fair use and other rights are in no way affected by the above.

More info at http://creativecommons.org/licenses/by­sa/2.5/
                                                                                                                             59
The END!




      THE END!
Thanks for your attention.

       Questions?



                                60

Weitere ähnliche Inhalte

Ähnlich wie Trust in Recommender Systems a historical overview and recent developments

Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
SEO for the Semantic Web
SEO for the Semantic WebSEO for the Semantic Web
SEO for the Semantic WebMihai Gheza
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?Michael Nelson
 
Semantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksSemantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksDavid Peterson
 
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)Insight Technology, Inc.
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webDan Delany
 
Tim Mackinnon Agile And Beyond
Tim Mackinnon Agile And BeyondTim Mackinnon Agile And Beyond
Tim Mackinnon Agile And Beyonddeimos
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...eswcsummerschool
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshopQuantUniversity
 
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)Paolo Massa
 
Crisis or Opportunity? Cataloging, Catalogers, RDA, and Change
Crisis or Opportunity? Cataloging, Catalogers, RDA, and ChangeCrisis or Opportunity? Cataloging, Catalogers, RDA, and Change
Crisis or Opportunity? Cataloging, Catalogers, RDA, and ChangeDiane Hillmann
 
A Family That Hacks Together, Interacts Together!
A Family That Hacks Together, Interacts Together!A Family That Hacks Together, Interacts Together!
A Family That Hacks Together, Interacts Together!Daniel Davis
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMilen Dyankov
 

Ähnlich wie Trust in Recommender Systems a historical overview and recent developments (20)

Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
SEO for the Semantic Web
SEO for the Semantic WebSEO for the Semantic Web
SEO for the Semantic Web
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Semantic Web For Distributed Social Networks
Semantic Web For Distributed Social NetworksSemantic Web For Distributed Social Networks
Semantic Web For Distributed Social Networks
 
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
[INSIGHT OUT 2011] A21 why why is probably the right answer(tom kyte)
 
Lecture09
Lecture09Lecture09
Lecture09
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
Tim Mackinnon Agile And Beyond
Tim Mackinnon Agile And BeyondTim Mackinnon Agile And Beyond
Tim Mackinnon Agile And Beyond
 
Web2.0 and KM
Web2.0 and KMWeb2.0 and KM
Web2.0 and KM
 
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshop
 
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)
Web2.0: from "I know nothing" to "I know something" in 2 hours (what?!?)
 
Crisis or Opportunity? Cataloging, Catalogers, RDA, and Change
Crisis or Opportunity? Cataloging, Catalogers, RDA, and ChangeCrisis or Opportunity? Cataloging, Catalogers, RDA, and Change
Crisis or Opportunity? Cataloging, Catalogers, RDA, and Change
 
A Family That Hacks Together, Interacts Together!
A Family That Hacks Together, Interacts Together!A Family That Hacks Together, Interacts Together!
A Family That Hacks Together, Interacts Together!
 
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmxMoved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx
 

Mehr von Paolo Massa

Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Paolo Massa
 
Manypedia: Comparing Language Points of View of Wikipedia Communities
Manypedia: Comparing  Language Points of View  of Wikipedia CommunitiesManypedia: Comparing  Language Points of View  of Wikipedia Communities
Manypedia: Comparing Language Points of View of Wikipedia CommunitiesPaolo Massa
 
Gamification Features 4 Fitcity
Gamification Features 4 FitcityGamification Features 4 Fitcity
Gamification Features 4 FitcityPaolo Massa
 
Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Paolo Massa
 
Social fitness (fitcity project)
Social fitness (fitcity project)Social fitness (fitcity project)
Social fitness (fitcity project)Paolo Massa
 
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES  DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES Paolo Massa
 
Reputation: local or global?
Reputation: local or global?Reputation: local or global?
Reputation: local or global?Paolo Massa
 
Collective Memory building in Wikipedia: the case of North African uprisings
Collective Memory building in Wikipedia: the case of North African uprisingsCollective Memory building in Wikipedia: the case of North African uprisings
Collective Memory building in Wikipedia: the case of North African uprisingsPaolo Massa
 
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...Paolo Massa
 
Social net-work 4 your business
Social net-work 4 your businessSocial net-work 4 your business
Social net-work 4 your businessPaolo Massa
 
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...Paolo Massa
 
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Paolo Massa
 
Combining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksCombining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksPaolo Massa
 
The Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardThe Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardPaolo Massa
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Paolo Massa
 
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...Paolo Massa
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesPaolo Massa
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network SitesPaolo Massa
 
Social Networking 4 your business
Social Networking 4 your businessSocial Networking 4 your business
Social Networking 4 your businessPaolo Massa
 
OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1Paolo Massa
 

Mehr von Paolo Massa (20)

Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
Monitoraggio - Alternanza Scuola Lavoro - 2016 (Slides del Ministro)
 
Manypedia: Comparing Language Points of View of Wikipedia Communities
Manypedia: Comparing  Language Points of View  of Wikipedia CommunitiesManypedia: Comparing  Language Points of View  of Wikipedia Communities
Manypedia: Comparing Language Points of View of Wikipedia Communities
 
Gamification Features 4 Fitcity
Gamification Features 4 FitcityGamification Features 4 Fitcity
Gamification Features 4 Fitcity
 
Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?Rete e Reti: Per-che' e per-chi?
Rete e Reti: Per-che' e per-chi?
 
Social fitness (fitcity project)
Social fitness (fitcity project)Social fitness (fitcity project)
Social fitness (fitcity project)
 
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES  DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
DESIGN PRINCIPLES OF WIKIS AND THEIR IMPACT ON KNOWLEDGE EXCHANGE PROCESSES
 
Reputation: local or global?
Reputation: local or global?Reputation: local or global?
Reputation: local or global?
 
Collective Memory building in Wikipedia: the case of North African uprisings
Collective Memory building in Wikipedia: the case of North African uprisingsCollective Memory building in Wikipedia: the case of North African uprisings
Collective Memory building in Wikipedia: the case of North African uprisings
 
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...
Social networks of Wikipedia - Paolo Massa - Presentation at (2011). ACM Hype...
 
Social net-work 4 your business
Social net-work 4 your businessSocial net-work 4 your business
Social net-work 4 your business
 
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
An Empirical Analysis on Social Capital and Enterprise 2.0 Participation in a...
 
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...Supporting Collaborative Networks in Organizational Settings using an Enterpr...
Supporting Collaborative Networks in Organizational Settings using an Enterpr...
 
Combining Ridesharing& Social Networks
Combining Ridesharing& Social NetworksCombining Ridesharing& Social Networks
Combining Ridesharing& Social Networks
 
The Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan WardThe Simplicity Cycle by Dan Ward
The Simplicity Cycle by Dan Ward
 
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...Invited talk at Future Networked Technologies / FIT-IT research calls opening...
Invited talk at Future Networked Technologies / FIT-IT research calls opening...
 
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
The Future of Work, Fun, and Being Social: an introduction to the nascent adv...
 
Feedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online CommunitiesFeedback Effects Between Similarity And Social Influence In Online Communities
Feedback Effects Between Similarity And Social Influence In Online Communities
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network Sites
 
Social Networking 4 your business
Social Networking 4 your businessSocial Networking 4 your business
Social Networking 4 your business
 
OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1OMG Girlz Don't Exist on teh Intarweb!!!!1
OMG Girlz Don't Exist on teh Intarweb!!!!1
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Trust in Recommender Systems a historical overview and recent developments

  • 1. Trust in Recommender Systems a historical overview and recent  developments Paolo Massa Universita' di Trento e ITC/iRST http://moloko.itc.it/paoloblog/ (adapted by Hassan Masum) Slides licenced under CreativeCommons Attribution­ShareAlike (see last slide for more info) 1
  • 3. Plan of the talk ■ Info Overload ■ Info Retrieval vs Info Filtering ■ Content­based Filtering  Weaknesses ■ Collaborative Filtering (aka Recommender  Systems)  Weaknesses ■ Trust­aware Filtering  What is trust? Reputation? 3
  • 5. Info Overload ■ 5 seconds:  Scientific information written in this 5 seconds can keep you  busy reading for 40 minutes (based on 1985 data!)  400KB of new text published on paper (24TB printed each  year, 2000, “How Much Information” project at Berkeley)  You have received an email (probably spam) ;­) ■ Is this  true? Who can tell .... Take facts with  a grain of salt. ■ quot;Technology reduces the amount of time it takes to do any one  task but also leads to the expansion of tasks that people are  expected to do.quot; ­­ Juliet Schor 5
  • 6. Info Overload (IO) ■ IO refers to the state of having too much  information to make a decision or remain  informed about a topic. ■ The term was coined in 1970 by Alvin Toffler  in his book “Future Shock.” ■ http://en.wikipedia.org/wiki/Information_overload ■ Too much information can be worse than too little –  illusion of being informed 6
  • 7. Info Overload Stats ■ (NO NO – I'm not reading it, it is just a practical example of information overload!) ■ The daily New York Times now contains more information that the 17th century man or woman would have encountered in a  lifetime.  (Wurman, S.A. (1987)  Information Anxiety.  New York:  Doubleday, 32.) ■ quot;As we go from grade school to high school we learn only a billionth of what there is to learn.  There is enough scientific information written every  day to fill seven complete sets of Encyclopedia Britannica; there is enough scientific information written every year to keep a person busy  reading day and night for 460 years!quot;  (Siegel, B.L. (1984, April 15).  Knowledge with commitment:  Teaching is the central task of the  university. Vital Speeches of the Day, 50, 394.) ■ quot;In the last 30 years mankind has produced more information than in the previous 5,000.quot;  (Information Overload Causes Stress. (1997,  March/April).  Reuters Magazine. Available:  Lexis Nexis Universe [4/28/98].) ■ Gordon Moore, co­founder of Intel, coined Moore's Law which states that the processing power of computer chips doubles about every 18 months. ■  quot;About 1,000 books are published internationally every day, and the total of all printed knowledge doubles every five years.  (Information Overload Causes Stress. (1997, March/April). Reuters Magazine. Available:  Lexis Nexis Universe [4/28/98].) ■  quot;The average Fortune 1000 worker already is sending and receiving approximately 178 messages and documents each day, according to a recent  study, quot;Managing Corporate Communications in the Information Age.quot;  (Boles, M. (1997)  Help! Information overload. Workforce, 76, 20.) ■ quot;Dr Dharma Singh Khalsa, in his book Brain Longevity,...says the average American sees 16,000 advertisements, logos, and labels in a day.quot;  (Gore,  A. (1998, January 18) . Stressed?  Maybe it's information overload.  Sun Herald,  27.) ■ University of California Berkely has a quot;How Much Informationquot; project which studies the amount of information produced each year.  quot;The world's  total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage.  This is the equivalent of  250 megabytes per person for each man, woman, and child on earth.quot;  Berkeley:  How Much Information?  (http://www.sims.berkeley.edu/research/projects/how­much­info/)  ■ http://library.humboldt.edu/~ccm/fingertips/ioverloadstats.html ■ http://www.sims.berkeley.edu/research/projects/how­much­info­2003/execsum.htm#summary ■ Data Smog: Surviving the Information Glut, by David Shenk 7
  • 8. Info overload ■ BLOGS!!! ■ Am I contributing? You bet :­)  You now have to do some Information Retrieval /   Filtering 8
  • 9. Info Retrieval vs Info Filtering ■ Info Retrieval: deals with static information  (Reuters, a database, a book): you want to find  information that is “lying there” ■ Info Filtering: deals with dynamic information  (flows such as the Web or the media): you want to  prioritize important incoming information, and  block the rest ■ Relevance and Quality of items   On a paper repository like Citeseer: no papers about “spam”  but good papers about “spam”  Which “spam” papers are worth your while? 9
  • 10. Recommender Systems ■ Algorithms/systems that suggest to a user  items she might like. ■ Books, Songs, Restaurants, Food, ...,  Jokes, ..., anything? ■ E­commerce sites (but not only!)  For now, think of Amazon.com 10
  • 11. Recommender Systems Techniques: ■ Content­based ■ Collaborative Filtering (CF) ■ Trust­aware [... always think of a way of “spamming” the  technique I describe. It is a safe assumption  nowadays...] 11
  • 12. Content­Based RSs ● RSs find items similar to ones you liked in past. How?  Analyse the “syntactic content” of all the items. ● Example: If you like papers containing word “Info  Retrieval”, RS recommends to you another paper with the  word “Info Retrieval” in it. ● If you read news containing word “Darfur”, it recommends  to you other news with the word “Darfur”. ● If you like movies of Kubrik, you get one more movie of  Kubrik. ● Techniques of Info Retrieval ... .... What are the weaknesses?      STOP! 12
  • 13. Content­Based RSs weaknesses ● Good for text: If you like papers containing word  “Info Retrieval”, RS recommends you another  paper with the word “Info Retrieval” in it.  (And  partially effective ways to find “similar” papers ­  vector space, LSI.) ● For movies or songs, humans must tag the  content (genre, actors, year, ...) but this is time­ consuming, costly, errors­prone and subjective. – Can your employees “correctly” tag all the  podcasts? All the videos? All the photos? 13
  • 14. Content­Based RSs weaknesses Content­based RSs weaknesses summary: ● Text Items (papers, news): Doable but RSs tend  to propose always the same soup (boring).  Difficult to recognize synonyms, concepts, or  new emerging words (such as “folksonomy”). ● Movies or Songs: not parsable at the moment by  machines, so humans must tag them. ● Jokes (or subjective items such as political  ideas): What are the “right” features? Tagging  “objectively” is not possible!  14
  • 15. Collaborative Filtering ● Users give ratings to items (implicit or  explicit) ● I like “Titanic” as 4/5 ● RS finds users similar to you (User  similarity) ● Suggests to you items liked by similar user Idea: out there, there is someone that is similar to  you and you will like what they liked. 15
  • 16. 1 Item2 4 Item3 Item Item Ratings from 1 (min) to 5 (max) ME 2 5 ? 5 2 5 5 5 Sim(ME,User2) =  ­0,2 User2 5 1 3 Sim(ME,User3) =  ­0,4 User3 5 5   1 Sim(ME,User4) =  +0,9 User4 2 2 5 5 5 5 4 4 It does not consider the content of the items, only  the ratings given by users. It works independently of the domain (also jokes) BUT Overlapping of rated items required! 16
  • 18. CF formulas Similarity measure: Pearson Correlation  Coefficient of user a and u (in [­1,+1]) m ∑i=1  r a ,i −r a  r u , i −r u  wa , u = ∑ m i=1 r a ,i −r a  2 m 2 ∑i =1  r u , i−r u  Prediction of rating given by user a to item i  n ∑u=1  r u ,i − r u ∗wa , u  p a , i =r a   n ∑u=1 w a , u 18
  • 20. CF WEAKNESSES!!! ■ User Similarity often not computable – Ratings Matrix sparseness (95­99%) ­> Low or No  overlapping ■ Cold start – New users have 0 ratings (­> not comparable) – At the beginning, your RS is not Amazon! ■ Easy Attacks by Malicious Users – Copy profile and become the most similar – Even easier on the Semantic Web ■ Hard to understand and control – Black box (bad recs ­> user gives up) Solution? Trust! 20
  • 21. Trust­awareness Trust: explicit rating of a user on another user ● about the perceived quality of the user's  characteristics ● in RSs, you “trust” someone if you like her  tastes  ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­ We will now speak about trust and trust metrics and then we will come back to “trust and RSs” 21
  • 23. Trust networks ■ Aggregate all the trust statements to produce a  trust network. A node is a user. A direct edge is a trust statement 0 Mena Ben 0.2 Properties of Trust: 0.9 ­ weighted (0=distrust, 1=max trust) 0.6 ­ subjective 1 ME Doc  ­ asymmetric ­ context­dependent? Trust Metric (TM): ? ? Uses existing edges for predicting values of trust for non­existing edges.  1  Thanks to trust propagation, if you trust  Cory Mary someone, then you have some degree of  trust in anyone that person trusts. 23
  • 24. PageRank: a trust metric? Imagine the web as a ■ Nodes are web pages, Edges  trust network are links (not weighted). Web ■ PageRank  Web page (Google)computes the  page “importance” of every single  page based on number and  Web Web page quality of incoming edges... page ■ So, YES: PageRank is a  trust metric. Web Web ■ HITS as well. page page 24
  • 25. Spam a Trust Metric ■ Is it easy to spam a trust metric? ■ It depends: some are attack­resistant  (advogato, for example) ■ Identity is an issue ■ ... and “social cost of cheap pseudonyms” by  Eric Friedman and Paul Resnick: unknown  peers should not be trusted. 25
  • 26. TM perspective: Local or Global 1 1 Mary Mena Bill How much can Bill be trusted? 0  On average (by the community)? ME 1  By Mary? Doc  And by ME? ■ Global Trust Metrics:   “Reputation” of user is based on number and quality of incoming edges. Bill has  just one predicted trust value (0.5).   PageRank (eBay, Slashdot, ). Work bad for controversial people (bush) ■ Local Trust Metrics  Trust is subjective ­­> consider personal views (trust “Bill”?)  AppleSeed, Golbeck TM, Advogato, ...  Local can be more effective if people are not standardized. 26
  • 27. Local vs Global ■ Search engine: abortion, jew, scientology, ... ■ Who can define what is spam? Google? A site that opposes  Chinese Comm. Party should be removed? ■ Local vs global:  Is gwbush.com a good page? Is johnkerry.com a good page? Is  sex.com a good page? ■ Maybe these questions are meaningless?  It depends on YOUR LOCAL point of view!  republican/democrats, child/parent, federal/newglobal,  catholic/atheist, pro/against abortion, ... ■ Tyranny of the majority / Daily Me (Sunstein) 27
  • 28. Sociology and Trust ■ Is this Sociology? ■ Yes ...  ■ You have seen many graphs, but the first to model groups  in this way was Moreno, a sociologist (1934, sociogram). ■ Social network analysis  (faculty.ucr.edu/~hanneman/nettext/) ■ Degree, betweeness, centrality, ... ■ Is this Politics? Yes ...  ■ Read “Republic.com” and “Why Societies Need Dissent” by  Cass Sunstein 28
  • 29. Economy and Trust ■ Is this Economy? Yes... ■ Reputation is an asset, for companies (marketing) but  also for people ■ Centrality in Network is money as well. ■ Open source movement: your peers knows you and will  hire you when they need someone they trust and value.  But also researchers (who gets the next Nobel in  Physics? The most “trusted” by physicists!) ■ Read “Down and out in the Magic Kingdom” ­ SciFi,  reputation (whuffie) is the only currency 29
  • 30. Trust and Search Engines ■ 3  generation search engines:  rd ■ personalization of results based on trust networks  (LOCAL!), based on what your friends like/dislike. ■ Google and Yahoo! are moving in this direction (I'm  speculating). [TrustRank] ■ Problem: Scalability! You cannot recompute  PageRank of every site for every user! ■ But you can do it on your laptop/mobile for  yourself, aggregating only the information “close”  to you ... 30
  • 31. Which Trust Metric works better? ■ And under which conditions? ■ Still an open question. [you can work on it ;­) ] ■ Few papers until now evaluate trust metrics: ● Input data not easily available ● (advogato.org (8K), FOAF, epinions.com(150K), ... but  not weighted) ■ No papers compare different TMs  Leave­one­out technique ■ Is local better than global? Only for the few users who are  atypical? Computational Expensive? Attack­resistant  (googlebomb)? 31
  • 32. Trust propagation 1 Mena Bill 0.6 0.2 ME 0.8 Doc ■ Trust chains (propagation) ■ Combining different trust chains ■ 0.6 * 1 = 0.6, and 0.8 * 0.2 = 0.16 ■ Then average? Not that simple ... ■ And how far does trust propagate? 32
  • 33. Trust metrics open issues (There are no comparative evaluations of TMs) ■ Cycles are a problem ­­> Order peers based on distance from source user  Trust of users at level k is based only on trust of users at level k­1 (and k)  Trust propagation horizon (computation) Find all trust paths from source to target Propagate trust along trust paths Trust decay: every hop reduces trust (or certainty of trust). ● A user can't propagate more than received trust. ● Distrust (trust=0) blocks the propagation. Trust about quality vs Trust as judger  Tquality(A,C)=f(Tjudger(A,B),Tquality(B,C))  Combine different trust paths – Unpredictable Trust = minimum trust value. – There are no globally “bad” users. 1 1 – Warn about Paradoxes or inconsistencies. 1 0 33
  • 34. How to use distrust ■ Distrust? Opinions of distrusted peers should  simply be discarded, otherwise they could  manipulate them to influence our recs ■ Example: suppose we distrust some that is  distrusted by our enemy, then our enemy could  say “I trust A” and we come to distrust A (that  could be anyone ... from pope to bush) ■ But, it is worth knowing about someone who is  trusted by many, even if distrusted by you... 34
  • 35. RS evaluation: how? ■ Back to Recommender Systems:  How do we evaluate RSs performances?  Any ideas?... 35
  • 36. RS evaluation: let us count the ways... ■ Many ways to evaluate Recommender Systems. ■ Leave­one­out: hide one rating and try to predict it  Accuracy: are predictions correct?  Coverage: how often are we able to predict? ■ Accuracy: differences between real value and predicted  value.  MAE, MSE, Weighted MAE, MAUE, ... ■ Ability to identify some new items user will like  (unwatched movie), or bad items (spam, products). ■ Evaluation is still problematic 36
  • 37. Trust in Recommender Systems How do we exploit trust in RSs? Instead of computing UserSim of other users, compute  Trust in other users. Instead of items liked by similar users,  recommend items liked by “trustable” users. (or combine both methods ...) 37
  • 38. Trust Propagation ME 6 degrees of separation “theorem” (Stanley Milgram, 1967) With few trust steps it is possible to reach every person in  the world! (but more steps needed for higher­trust actions) ­­> Ideally, using trust metrics, no more unknown users. 38
  • 39. Trust solves RS problems ■ User Similarity often not computable ➔ trust propagation and “6 degrees” ­> we are  now able to predict trust for many users ■ Cold start ➔ “just add 1  friend” ■ Easy copy­profile Attacks ➔ “you can be similar but if no trust path to  you ...” ■ Hard to understand and control ➔ Showing Trust Networks supports  Explanation 39
  • 40. Epinions.com Experiments ■ Some experiments to show that trust solves RSs problems... ■ Epinions.com users can  Review and rate items (from 1 to 5)  Keep web of trust (trust=1) and block list (trust=0). [Epinions  FAQ says to put in Web of Trust “Reviewers whose reviews and  ratings you have consistently found to be valuable”] ■ Dataset (collected by crawling site):  ~50K users, ~140K items, ~660K ratings.  ~500K trust statements.  ➔ No block list (not shown on site, kept hidden) 40
  • 42. UserSimilarity and Trust  computability Mean number of Comparable users for  Mean number of Comparable users for  All users Cold Start users Propagating Trust Using  Propagating Trust Using  Pearson Pearson Dist 1 Dist 2 Dist 3 Dist 4 Dist 1 Dist 2 Dist 3 Dist 4 9.88 400 4386 16334 161 2.14 94.54 1675 9121 2.74 42
  • 45. User Trust Metric Example: max propagation distance=4 DistME(B) = 1 TME(B) = (4­1+1)/4=1 DistME(B) = 2 TME(B) = 3/4 DistME(B) = 3 ME TME(B) = 2/4 DistME(B) = 4 TME(B)= 1/4 DistME(B) > 4 TME(B) = 0 45
  • 46. Experimental Results Rows: #Expressed Ratings ALL 2 3 4 UserSim = CollaborativeFiltering User population size 40169 3937 2917 2317 Trust­x = Trust propagation up to distance x Mean Web of Trust Size 9.88 2.54 3.15 3.64 Ratings UserSim 51% N/A 4% 8% Coverage Trust-1 28% 10% 11% 12% RatingsCoverage = how many hidden  60% 23% 26% 31% Trust-2 ratings are predictable. Trust-3 74% 39% 45% 51% Trust-4 77% 45% 53% 59% UsersCoverage = how many users get at  Users UserSim 41% N/A 6% 14% least a prediction Coverage Trust-1 45% 17% 25% 32% MAE = |real_rating­pred_rating| averaged  56% 32% 43% 53% Trust-2 over all the ratings. Trust-3 61% 46% 57% 64% Trust-4 62% 56% 59% 66% MAUE = |real_rating­pred_rating| averaged  Mean UserSim 0.843 N/A 1.244 1.027 over the ratings of one user, then averaged  Absolute Trust-1 0.837 0.929 0.903 0.840 over all users. Error Trust-2 0.829 1.050 0.940 0.927 (MAE) Trust-3 0.811 1.046 0.940 0.918 Trust-4 0.805 1.033 0.926 0.903 Columns: Views over users. ALL = all the users (with at least 1 rating) 2 = only the subset of users that gave 2  ratings (there are 3937) (similarly for 3 and 4)... 46
  • 47. Experimental Results On average, Trust­x  #Expressed Ratings ALL 2 3 4 User population size 40169 3937 2917 2317 achieves better coverage  without loss of accuracy. Mean Web of Trust Size 9.88 2.54 3.15 3.64 Ratings UserSim 51% N/A 4% 8% Coverage Trust-1 28% 10% 11% 12% Trust-2 60% 23% 26% 31% Trust-3 74% 39% 45% 51% Trust-4 77% 45% 53% 59% Users UserSim 41% N/A 6% 14% Coverage Trust-1 45% 17% 25% 32% Trust-2 56% 32% 43% 53% Trust-3 61% 46% 57% 64% Trust-4 62% 56% 59% 66% Mean UserSim 0.843 N/A 1.244 1.027 Absolute Trust-1 0.837 0.929 0.903 0.840 Error Trust-2 0.829 1.050 0.940 0.927 (MAE) Trust-3 0.811 1.046 0.940 0.918 Trust-4 0.805 1.033 0.926 0.903 47
  • 48. Experimental Results On average, Trust­x  #Expressed Ratings ALL 2 3 4 User population size 40169 3937 2917 2317 achieves better coverage  without loss of accuracy. Mean Web of Trust Size 9.88 2.54 3.15 3.64 Ratings UserSim 51% N/A 4% 8% Coverage Trust-1 28% 10% 11% 12% Trust-2 60% 23% 26% 31% UserSim performs well  Trust-3 Trust-4 74% 77% 39% 45% 45% 53% 51% 59% with heavy raters and  Users UserSim 41% N/A 6% 14% poorly with cold start users. Coverage Trust-1 45% 17% 25% 32% Trust-2 56% 32% 43% 53% Trust-3 61% 46% 57% 64% Trust-4 62% 56% 59% 66% Mean UserSim 0.843 N/A 1.244 1.027 Absolute Trust-1 0.837 0.929 0.903 0.840 Error Trust-2 0.829 1.050 0.940 0.927 (MAE) Trust-3 0.811 1.046 0.940 0.918 Trust-4 0.805 1.033 0.926 0.903 48
  • 49. Experimental Results On average, Trust­x  #Expressed Ratings ALL 2 3 4 User population size 40169 3937 2917 2317 achieves better coverage  without loss of accuracy. Mean Web of Trust Size 9.88 2.54 3.15 3.64 Ratings UserSim 51% N/A 4% 8% Coverage Trust-1 28% 10% 11% 12% Trust-2 60% 23% 26% 31% UserSim performs well  Trust-3 Trust-4 74% 77% 39% 45% 45% 53% 51% 59% with heavy raters and  Users UserSim 41% N/A 6% 14% poorly with cold start users. Coverage Trust-1 45% 17% 25% 32% Trust-2 56% 32% 43% 53% Trust-3 Trust-4 61% 62% 46% 56% 57% 59% 64% 66% For cold start users (50% of  Mean UserSim 0.843 N/A 1.244 1.027 the total!), Trust­x achieves  Absolute Trust-1 0.837 0.929 0.903 0.840 Error Trust-2 0.829 1.050 0.940 0.927 also better accuracy. (MAE) Trust-3 Trust-4 0.811 0.805 1.046 1.033 0.940 0.926 0.918 0.903 For bootstrapping RSs,  asking one trust statement  is better than asking one  rating. (experiments on 660.000 ratings) 49
  • 50. Centralized vs decentralized ■ Another problem with current RS: centralization ■ Information is Centralized in one server  Your “profile” scattered in many RS (Amazon, B&B, ...)  Profile not reusable (your profile in Amazon is NOT yours)  Recommendation computation out of your control ■ Decentralized  The Web is decentralized: anyone can write whatever she  wants, in whatever “language” she wants (spam is good)  No censorship, innovation can happen on the edges, not a  single mind but many minds... you are in control of what you  produce 50
  • 51. Semantic Web ■ A Web of content designed for and understandable by machines.  (matrix?) ■ Promising Semantic Web formats   FOAF (Friend­Of­A­Friend): trust info        <­­­  XFN (Xhtml Friend Network): social info  VoteLink: vote­for, vote­against, vote­abstain links  Blogroll: not semantic!  hReview: review/rating info                           <­­­  RSS of OutFoxed (http://getoutfoxed.com/rss)  RVW, OpenReview, ...: review/rating info ■ Allow decentralized publishing of information that RSs  aggregate and exploit. 51
  • 52. Note: Adoption of a Language ■ Suppose you can define the language we have to use  for communicating  Which language is better? Chinese? Italiano? The  one you invent? ■ Interesting question, but as long as “good enough”,  matters little (or not at all) for adoption of language. ■ Do you know why keys in your keyboard are placed in  that way? ­­> how standards get adopted ... ■ Who has the power to “propose” changes in the  language of the Web? 52
  • 53. Format adoption ■ Adoption does not depend on quality of the format but  (mainly) on the authority of the proponent. ■ Google can push changes in HTML (example:  rel=nofollow) ■ Certainly Microsoft could (even without you noticing it  or telling it you) ■ I can't. ■ ...But maybe if I create a format that's immediately  useful, it will be taken up by a user community and  spread “virally” (how HTML started) 53
  • 54. FOAF (Friend­Of­A­Friend) Every peer expresses who she knows (and trusts, with an  extension). Based on RDF (Resource Description Format) It is already used (some millions of files). Create one FOAF  file and put it in your page! Find more at  http://foaf­project.org Decentralized publishing advantages: profile not scattered in  many different sites and under user control. A  scutter/crawler can follow “links” (seeAlso) and aggregate  the complete social network. STOP!!!  Downsides to publishing your friendship network? 54
  • 56. hReview ■ hReview is a simple, open, distributed reviews  format suitable for embedding in (X)HTML, Atom,  RSS, and arbitrary XML.  ■ In order to enable and encourage the sharing,  distribution, syndication, and aggregation of  reviews ■ Proposed by Technorati.com on a wiki page ■ http://developers.technorati.com/wiki/hReview ■ You are free to participate and give suggestions  and feedback! 56
  • 57. hReview example <div class=quot;hreviewquot;>  <span><span class=quot;ratingquot;>5</span> out of 5 stars</span>  <h4 class=quot;summaryquot;><span class=quot;item fnquot;>Crepes on Cole</span> is awesome</h4>  <span>Reviewer: <span class=quot;reviewer fnquot;>Tantek</span> ­   <abbr class=quot;dtreviewquot; title=quot;20050418T2300­0700quot;>April 18, 2005</abbr></span>  <blockquote class=quot;descriptionquot;><p>Crepes on Cole is one of the best little creperies in  San Francisco.   Excellent food and service. Plenty of tables in a variety of sizes  for  parties large and small.  Window seating makes for excellent  people watching to/from  the N­Judah which stops right outside.    I've had many fun social gatherings here, as  well as gotten    plenty of work done thanks to neighborhood WiFi. </p></blockquote>  <p>Visit date: <span>April 2005</span></p>  <p>Food eaten: <span>Florentine crepe</span></p> </div> 57
  • 58. Conclusions ■ Info Filtering: ■ From Content­based ... to Collaborative Filtering ... to  Trust­aware (?) ■ Trust is a simple and complicated concept ■ It is a rapidly evolving and increasingly important topic:  there is room for your contributions! ■ Forecast: In 3 years, anyone will publish her opinions  (about stuff, organizations...people, ideas?) in some  semantic format and trust­aware aggregators  (googletrust?) will help in coping with an increased  information overload. 58
  • 59. Licence of these slides These slides are released under Creative Commons Attribution­ShareAlike 2.5 You are free:     * to copy, distribute, display, and perform the work     * to make derivative works     * to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license  identical to this one.     * For any reuse or distribution, you must make clear to others the license terms of this work.     * Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. More info at http://creativecommons.org/licenses/by­sa/2.5/ 59
  • 60. The END! THE END! Thanks for your attention. Questions? 60