Eval rec algo_crowdsourcing__icalt_2014_ma

© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide
28-Dez-14
Prof. Dr.-Ing. Ralf Steinmetz
KOM - Multimedia Communications Lab
Eval_Rec_Algo_Crowdsourcing__ICALT_2014_MA.pptx
Evaluating Recommender Algorithms
for Learning using Crowdsourcing
Mojisola Erdt
Christoph Rensing
ICALT 2014, Athen
Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/

KOM – Multimedia Communications Lab 2
Motivation
Learning on-the-job
§ To solve a particular problem
§ To learn about a new topic
§ Mostly web resources
Social Tagging Applications
§ Help to manage resources
§ Offer recommendations
TEL Recommender Systems
§ Recommend relevant, novel and
diverse resources to a specific
learning goal or activity

Evaluation Approach Advantages Disadvantages
Offline Experiments
(historical or synthetic
datasets)
§  Fast
§  Less effort
§  Repeatable
§  New, unknown
resources cannot be
evaluated
§  Dependent on dataset
User Experiments §  User’s perspective §  A lot of effort and time
§  Few users (ca. 40)
Real-life testing §  Real-life setting §  Needs a substantial
amount of users
Crowdsourcing §  Fast
§  Less effort
§  Repeatable
§  User’s perspective
§  Sufficient users
§  Unknown users
§  “Artificial task”
§  Spamming
Evaluation Methods for TEL Recommender
Systems

microworkers
§  500,000 crowdworkers worldwide
§  Flexible forwarding to other
hosting platforms
§  Since 2009
CrowdFlower
§  5 million crowdworkers in 208
countries
§  Gives access to other
crowdsourcing platforms e.g.
Amazon MTurk
§  Since 2007
https://microworkers.com, http://www.crowdflower.com
Crowdsourcing Platforms

§ Motivation
§ Crowdsourcing Evaluation Concept
§  Preparation Step
§  Execution Step
§ Crowdsourcing Evaluation Results
§ Conclusion & Future Work
Overview

Crowdsourcing Evaluation Concept
Preparation Step
Create
Questionnaire
Set
Goal
Formulate
Hypotheses
Create
Questions
Add
Control
Questions
Select
Topic
Create
Activity
Hierarchy
Create
Seed
Dataset
Prepare
Algorithms
Generate
Recommendations
Filter
Duplicates
DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing

Preparation Step
Set Goal
AScore is based on Activity Hierarchies
§ Extends FolkRank by considering activities, activity hierarchies and the current
activity of the learner
ECTEL 2012. Anjorin et al
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change

Set Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1.  Hypothesis: Relevance
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
2.  Hypothesis: Novelty
3.  Hypothesis: Diversity
Preparation Step
Set Goal and Formulate Hypotheses

Generate a basis graph structure for recommendations
§ 5 experts research on the topic of climate change for one hour
§ Using CROKODIL to create an extended folksonomy (users, tags, resources,
activities)
§ Ca. 70 resources were tagged and attached to 8 activities
Preparation Step
Select Topic and Generate Recommendations
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change
Experiment Spring
Experiment Autumn

Conduct personal research on the topic
§ Level of knowledge on this topic
§ Request to find 5 online resources relevant to this topic
10 Questions per Recommendation
§ 3 questions to each hypothesis (relevance, novelty, diversity)
§ 1 control question to detect spammers
§  E.g. Give 4 keywords to summarize the recommended resource
General Questions
§ Age, gender, level of education and nationality
Preparation Step
Create Questionnaire
Experiment
Spring
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super
Experiment
Autumn
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super

KOM – Multimedia Communications Lab 11https://www.soscisurvey.de
Crowdsourcing Evaluation Concept
Execution Step
Release next
iteration burst
Crowdsourcing
Platform
Results
Filter
Spammers
Make
Payments
Questionnaire

Execution Step
Participants and Treatment Conditions
Experiment
Spring
Sub-
activity
Super-
activity
AScore A_Sub:
45
A_Super:
39
FolkRank F_Sub:
39
F_Super:
36
Experiment
Autumn
Sub-
activity
Super-
activity
AScore A_Sub:
80
A_Super:
73
FolkRank F_Sub:
76
F_Super:
85
CrowdFlower (32)
Microworker (35)
Volunteers (92)
Spammers (243)
Crowdworkers (314)
Spammers (549)

§ Motivation
§ Crowdsourcing Evaluation Concept
§ Crowdsourcing Evaluation Results
§  AScore and FolkRank
§  Experiment Spring
§  Experiment Autumn
§  A_Sub and A_Super
§  Experiment Spring
§  Experiment Autumn
§ Conclusion & Future Work
Overview

Crowdsourcing Evaluation Results
Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.000003578 < 0.05 0.000001531 < 0.05 0.0001618 < 0.05

Experiment Autumn
Significance Tests
p-value 0.000001362 < 0.05 0.0000007654 < 0.05 0.00000000015 < 0.05

Evaluation Goals:
than FolkRank.
Execution Step
Evaluation Results
✔
✔ ✔
✔

Experiment Spring
Significance Tests
p-value 0.0005654 < 0.05 0.01666 < 0.05 0.02176 < 0.05

Experiment Autumn
Significance Tests
p-value 0.0005306 < 0.05 0.000001531 < 0.05 0.0000001608 < 0.05

Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mean
01234567
F_Sub
F_Super
3.95 4.05 3.97 3.91 3.96 3.83
Experiment Spring
Significance Tests
p-value 0.3023 > 0.05 0.5216 > 0.05 0.2031 > 0.05

Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mean
01234567
F_Sub
F_Super
4.04 3.9 4.11 4.09 4.07 4.01
Experiment Autumn
Significance Tests
p-value 0.01481 < 0.05 0.7064 > 0.05 0.2881 > 0.05

Evaluation Goals:
than FolkRank.
Execution Step
Evaluation Results
✔
✔
✔
✔
✔

Crowdsourcing can be successfully applied to evaluate TEL
recommender algorithms
§ Integrate more user-centric evaluations already during the design and
development of TEL recommender algorithms
§ Select the best fitting evaluation approach
Future Work
§ Can crowdsourcing be used to evaluate other aspects of a recommender
system? E.g. explanations, presentation…
Can more complex TEL evaluation tasks be evaluated with
crowdsourcing?
Conclusion and Future Work

Questions & Contact

Eval rec algo_crowdsourcing__icalt_2014_ma

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Eval rec algo_crowdsourcing__icalt_2014_ma

Ähnlich wie Eval rec algo_crowdsourcing__icalt_2014_ma (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Eval rec algo_crowdsourcing__icalt_2014_ma