Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

DATA MINING AND MACHINE LEARNING
IN A NUTSHELL

LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL
MEDIA WITH COUPLED MUTUAL REINFORCEMENT

Mohammad-Ali Abbasi
http://www.public.asu.edu/~mabbasi2/

SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING
ARIZONA STATE UNIVERSITY

Arizona State University
http://dmml.asu.edu/ to Recognize Reliable Users and Content in Social Media with
Learning
Data Mining and Machine Learning Lab
Data Mining and Machine Learning- in a nutshell 1
Coupled Mutual Reinforcement

About the paper

• Learning to Recognize Reliable Users and Content in
Social Media with Coupled Mutual Reinforcement
– Jiang Bian, Georgia Institute of Technology
– Yandong Liu, Emory University
– Ding Zhou, Facebook Inc.
– Eugene Agichtein, Emory University
– Hongyuan Zha, Georgia Institute of Technology

• WWW 2009, April 20–24, 2009, Madrid, Spain.

Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with
Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2 2

Community Question Answering (CQA)

• Is a popular forum for users to pose questions
for the other users to answer
• User can ask natural language question
• Is comparable with regular web search


Sample: Yahoo! Answers

• Introduction


What is the problem?

• retrieve answers from a social media archive
with a large amount information
– the quality, accuracy, and comprehensiveness of
the submitted questions and answers varies
widely
– A large fraction of the content is not useful for
answering queries
– Current approaches require large amounts of
manually labeled data


CQA environment

• Users
• Question
• Answers


The goal

• Identify
– High quality Answers
– High quality Questions
– High reputation Users
• Simultaneously
• With the minimum manual labeling


The contribution of this paper

• developing a semi-supervised coupled mutual
reinforcement framework for simultaneously
calculating content quality and user
reputation, that requires relatively few labeled
examples to initialize the training process
• more effective for finding high-quality
answers, questions, and users.
• improves the accuracy of search over CQA
archives


Current approaches

• Relies on the users reputation,
• OR- Require large amount of supervision,
• OR- focus on the network properties of the
CQA
• without considering the actual content of the
information exchanged


How to rank?

• Current approaches:
– Content Quality
OR
– User reputation
• This paper:
– Content Quality
AND
– User reputation

Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1010

Definitions

• Question Quality
– A question's effectiveness at attracting high quality
answers
• Answer Quality
– the responsiveness, accuracy, and comprehensiveness of
the answer to a question.
• Question Reputation
– indicating the expected quality of the questions posted by
a user
• Answer Reputation
– the expected quality of the answers posted by a user.


Model the problem

• Solution


Mutual reinforcement Principle

• Solution


Feature Space: X(Q), X(A), X(U)

• Solution


Learning quality and reputation(Coupled Mutual Reinforcement)

• P(x): probability of being “good”
• Model of P(x)

• B is Coefficient of the linear model and can be
found by maximizing:


Non independent equations

• Conditional log-likelihood

• Objective function


CQA-MR Algorithm

• Solution


Experimental Setup- Data Collection

• From Yahoo! Answers with their API
• Use TREC QA benchmark Archive to crawl QA
archives (http://trec.nist.gov/data.html)
• Get all available answers for each question
– 107293 users
– 27354 questions
– 224617 answers


Evaluation Metrics

• Mean Reciprocal Rank(MRR)
– the reciprocal of the rank at which the first relevant
answer was returned, or 0 if none of the top N results
contained a relevant answer

• Precision at K
– for a given query, P(K) reports the fraction of answers
ranked in the top K results that are labeled as relevant

• Mean Average of Precision(MAP)
– the mean of the precision at K values calculated after each
relevant answer was retrieved


User reputation methods

• Baseline
– users are ranked by “indegree" (number of answers
posted)
• HITS
– Users are ranked based on their authority scores
• CQA-Supervised
– classify users into those with "high" and "low”
reputation, and trained over the features
• CQA-MR
– predict user reputation based on mutual- reinforcement
algorithm


CQA Retrieval methods

• Baseline
– score computed as the difference of up votes and down
votes
• Gbrank
– did not include answer and question quality and user
reputation
• GBrank-HITS:
– optimized GBrank by adding user reputation calculated by
HITS algorithm
• GBrank-Supervised
– supervised learning and optimize GBrank by adding
obtained quality

Precision at K for the top contributors

• Experiments


Precision at K

• Experiments


Accuracy

• Experiments


Training Labels

• Experiments


Mohammad-Ali Abbasi (Ali),
Ali, is a Ph.D student at Data Mining
and Machine Learning Lab, Arizona
State University.
His research interests include Data
Mining, Machine Learning, Social
Computing, and Social Media Behavior
Analysis.

http://www.public.asu.edu/~mabbasi2/


Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Ähnlich wie Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Hinweis der Redaktion