The document summarizes a study that evaluated recommender algorithms for technology-enhanced learning (TEL) using crowdsourcing. Researchers used crowdsourcing platforms to evaluate whether an activity-based scoring algorithm (AScore) provided more relevant, novel, and diverse learning resource recommendations than a folksonomy-based algorithm (FolkRank). Crowdworkers assessed recommendations for activities related to climate change. Results showed AScore recommendations were significantly more relevant, novel, and diverse than FolkRank recommendations, supporting the hypotheses. Sub-activities also received more relevant, novel, and diverse recommendations than super-activities in some cases. The study demonstrated crowdsourcing can effectively evaluate TEL recommender algorithms.
2. KOM – Multimedia Communications Lab 2
Motivation
Learning on-the-job
§ To solve a particular problem
§ To learn about a new topic
§ Mostly web resources
Social Tagging Applications
§ Help to manage resources
§ Offer recommendations
TEL Recommender Systems
§ Recommend relevant, novel and
diverse resources to a specific
learning goal or activity
3. KOM – Multimedia Communications Lab 3
Evaluation Approach Advantages Disadvantages
Offline Experiments
(historical or synthetic
datasets)
§ Fast
§ Less effort
§ Repeatable
§ New, unknown
resources cannot be
evaluated
§ Dependent on dataset
User Experiments § User’s perspective § A lot of effort and time
§ Few users (ca. 40)
Real-life testing § Real-life setting § Needs a substantial
amount of users
Crowdsourcing § Fast
§ Less effort
§ Repeatable
§ User’s perspective
§ Sufficient users
§ Unknown users
§ “Artificial task”
§ Spamming
Evaluation Methods for TEL Recommender
Systems
4. KOM – Multimedia Communications Lab 4
microworkers
§ 500,000 crowdworkers worldwide
§ Flexible forwarding to other
hosting platforms
§ Since 2009
CrowdFlower
§ 5 million crowdworkers in 208
countries
§ Gives access to other
crowdsourcing platforms e.g.
Amazon MTurk
§ Since 2007
https://microworkers.com, http://www.crowdflower.com
Crowdsourcing Platforms
6. KOM – Multimedia Communications Lab 6
Crowdsourcing Evaluation Concept
Preparation Step
Create
Questionnaire
Set
Goal
Formulate
Hypotheses
Create
Questions
Add
Control
Questions
Select
Topic
Create
Activity
Hierarchy
Create
Seed
Dataset
Prepare
Algorithms
Generate
Recommendations
Filter
Duplicates
DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing
7. KOM – Multimedia Communications Lab 7
Preparation Step
Set Goal
AScore is based on Activity Hierarchies
§ Extends FolkRank by considering activities, activity hierarchies and the current
activity of the learner
ECTEL 2012. Anjorin et al
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change
8. KOM – Multimedia Communications Lab 8
Set Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1. Hypothesis: Relevance
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
2. Hypothesis: Novelty
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
3. Hypothesis: Diversity
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
Preparation Step
Set Goal and Formulate Hypotheses
9. KOM – Multimedia Communications Lab 9
Generate a basis graph structure for recommendations
§ 5 experts research on the topic of climate change for one hour
§ Using CROKODIL to create an extended folksonomy (users, tags, resources,
activities)
§ Ca. 70 resources were tagged and attached to 8 activities
Preparation Step
Select Topic and Generate Recommendations
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change
Experiment Spring
Experiment Autumn
10. KOM – Multimedia Communications Lab 10
Conduct personal research on the topic
§ Level of knowledge on this topic
§ Request to find 5 online resources relevant to this topic
10 Questions per Recommendation
§ 3 questions to each hypothesis (relevance, novelty, diversity)
§ 1 control question to detect spammers
§ E.g. Give 4 keywords to summarize the recommended resource
General Questions
§ Age, gender, level of education and nationality
Preparation Step
Create Questionnaire
Experiment
Spring
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super
Experiment
Autumn
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super
11. KOM – Multimedia Communications Lab 11https://www.soscisurvey.de
Crowdsourcing Evaluation Concept
Execution Step
Release next
iteration burst
Crowdsourcing
Platform
Results
Filter
Spammers
Make
Payments
Questionnaire
16. KOM – Multimedia Communications Lab 16
Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1. Hypothesis: Relevance
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
2. Hypothesis: Novelty
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
3. Hypothesis: Diversity
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
Execution Step
Evaluation Results
✔
✔ ✔
✔
21. KOM – Multimedia Communications Lab 21
Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1. Hypothesis: Relevance
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
2. Hypothesis: Novelty
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
3. Hypothesis: Diversity
§ Ascore vs. FolkRank
§ A_Sub vs. A_Super
Execution Step
Evaluation Results
✔
✔
✔
✔
✔
22. KOM – Multimedia Communications Lab 22
Crowdsourcing can be successfully applied to evaluate TEL
recommender algorithms
§ Integrate more user-centric evaluations already during the design and
development of TEL recommender algorithms
§ Select the best fitting evaluation approach
Future Work
§ Can crowdsourcing be used to evaluate other aspects of a recommender
system? E.g. explanations, presentation…
Can more complex TEL evaluation tasks be evaluated with
crowdsourcing?
Conclusion and Future Work