SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide
28-Dez-14
Prof. Dr.-Ing. Ralf Steinmetz
KOM - Multimedia Communications Lab
Eval_Rec_Algo_Crowdsourcing__ICALT_2014_MA.pptx
Evaluating Recommender Algorithms
for Learning using Crowdsourcing
Mojisola Erdt
Christoph Rensing
ICALT 2014, Athen
Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/
KOM – Multimedia Communications Lab 2
Motivation
Learning on-the-job
§ To solve a particular problem
§ To learn about a new topic
§ Mostly web resources
Social Tagging Applications
§ Help to manage resources
§ Offer recommendations
TEL Recommender Systems
§ Recommend relevant, novel and
diverse resources to a specific
learning goal or activity
KOM – Multimedia Communications Lab 3
Evaluation Approach Advantages Disadvantages
Offline Experiments
(historical or synthetic
datasets)
§  Fast
§  Less effort
§  Repeatable
§  New, unknown
resources cannot be
evaluated
§  Dependent on dataset
User Experiments §  User’s perspective §  A lot of effort and time
§  Few users (ca. 40)
Real-life testing §  Real-life setting §  Needs a substantial
amount of users
Crowdsourcing §  Fast
§  Less effort
§  Repeatable
§  User’s perspective
§  Sufficient users
§  Unknown users
§  “Artificial task”
§  Spamming
Evaluation Methods for TEL Recommender
Systems
KOM – Multimedia Communications Lab 4
microworkers
§  500,000 crowdworkers worldwide
§  Flexible forwarding to other
hosting platforms
§  Since 2009
CrowdFlower
§  5 million crowdworkers in 208
countries
§  Gives access to other
crowdsourcing platforms e.g.
Amazon MTurk
§  Since 2007
https://microworkers.com, http://www.crowdflower.com
Crowdsourcing Platforms
KOM – Multimedia Communications Lab 5
§ Motivation
§ Crowdsourcing Evaluation Concept
§  Preparation Step
§  Execution Step
§ Crowdsourcing Evaluation Results
§ Conclusion & Future Work
Overview
KOM – Multimedia Communications Lab 6
Crowdsourcing Evaluation Concept
Preparation Step
Create
Questionnaire
Set
Goal
Formulate
Hypotheses
Create
Questions
Add
Control
Questions
Select
Topic
Create
Activity
Hierarchy
Create
Seed
Dataset
Prepare
Algorithms
Generate
Recommendations
Filter
Duplicates
DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing
KOM – Multimedia Communications Lab 7
Preparation Step
Set Goal
AScore is based on Activity Hierarchies
§ Extends FolkRank by considering activities, activity hierarchies and the current
activity of the learner
ECTEL 2012. Anjorin et al
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change
KOM – Multimedia Communications Lab 8
Set Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1.  Hypothesis: Relevance
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
2.  Hypothesis: Novelty
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
3.  Hypothesis: Diversity
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
Preparation Step
Set Goal and Formulate Hypotheses
KOM – Multimedia Communications Lab 9
Generate a basis graph structure for recommendations
§ 5 experts research on the topic of climate change for one hour
§ Using CROKODIL to create an extended folksonomy (users, tags, resources,
activities)
§ Ca. 70 resources were tagged and attached to 8 activities
Preparation Step
Select Topic and Generate Recommendations
Understanding the
Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to
Climate Change
Investigate causes of
Climate Change
Give an overview
on the history of
Global Warming
Determine
future prognoses
on Climate Change
Understanding
Climate Change
Experiment Spring
Experiment Autumn
KOM – Multimedia Communications Lab 10
Conduct personal research on the topic
§ Level of knowledge on this topic
§ Request to find 5 online resources relevant to this topic
10 Questions per Recommendation
§ 3 questions to each hypothesis (relevance, novelty, diversity)
§ 1 control question to detect spammers
§  E.g. Give 4 keywords to summarize the recommended resource
General Questions
§ Age, gender, level of education and nationality
Preparation Step
Create Questionnaire
Experiment
Spring
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super
Experiment
Autumn
Sub-
activity
Super-
activity
AScore A_Sub A_Super
FolkRank F_Sub F_Super
KOM – Multimedia Communications Lab 11https://www.soscisurvey.de
Crowdsourcing Evaluation Concept
Execution Step
Release next
iteration burst
Crowdsourcing
Platform
Results
Filter
Spammers
Make
Payments
Questionnaire
KOM – Multimedia Communications Lab 12
Execution Step
Participants and Treatment Conditions
Experiment
Spring
Sub-
activity
Super-
activity
AScore A_Sub:
45
A_Super:
39
FolkRank F_Sub:
39
F_Super:
36
Experiment
Autumn
Sub-
activity
Super-
activity
AScore A_Sub:
80
A_Super:
73
FolkRank F_Sub:
76
F_Super:
85
CrowdFlower (32)
Microworker (35)
Volunteers (92)
Spammers (243)
Crowdworkers (314)
Spammers (549)
KOM – Multimedia Communications Lab 13
§ Motivation
§ Crowdsourcing Evaluation Concept
§ Crowdsourcing Evaluation Results
§  AScore and FolkRank
§  Experiment Spring
§  Experiment Autumn
§  A_Sub and A_Super
§  Experiment Spring
§  Experiment Autumn
§ Conclusion & Future Work
Overview
KOM – Multimedia Communications Lab 14
Crowdsourcing Evaluation Results
Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.000003578 < 0.05 0.000001531 < 0.05 0.0001618 < 0.05
KOM – Multimedia Communications Lab 15
Crowdsourcing Evaluation Results
Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.000001362 < 0.05 0.0000007654 < 0.05 0.00000000015 < 0.05
KOM – Multimedia Communications Lab 16
Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1.  Hypothesis: Relevance
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
2.  Hypothesis: Novelty
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
3.  Hypothesis: Diversity
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
Execution Step
Evaluation Results
✔
✔ ✔
✔
KOM – Multimedia Communications Lab 17
Crowdsourcing Evaluation Results
Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.0005654 < 0.05 0.01666 < 0.05 0.02176 < 0.05
KOM – Multimedia Communications Lab 18
Crowdsourcing Evaluation Results
Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.0005306 < 0.05 0.000001531 < 0.05 0.0000001608 < 0.05
KOM – Multimedia Communications Lab 19
Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mean
01234567
F_Sub
F_Super
3.95 4.05 3.97 3.91 3.96 3.83
Crowdsourcing Evaluation Results
Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.3023 > 0.05 0.5216 > 0.05 0.2031 > 0.05
KOM – Multimedia Communications Lab 20
Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mean
01234567
F_Sub
F_Super
4.04 3.9 4.11 4.09 4.07 4.01
Crowdsourcing Evaluation Results
Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.01481 < 0.05 0.7064 > 0.05 0.2881 > 0.05
KOM – Multimedia Communications Lab 21
Evaluation Goals:
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to a specified topic
than FolkRank.
§ Investigate if AScore
recommends more relevant,
novel and diverse learning
resources to sub-activities
(A Sub) than to activities higher
up in the hierarchy (A Super).
Formulate Hypotheses:
1.  Hypothesis: Relevance
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
2.  Hypothesis: Novelty
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
3.  Hypothesis: Diversity
§  Ascore vs. FolkRank
§  A_Sub vs. A_Super
Execution Step
Evaluation Results
✔
✔
✔
✔
✔
KOM – Multimedia Communications Lab 22
Crowdsourcing can be successfully applied to evaluate TEL
recommender algorithms
§ Integrate more user-centric evaluations already during the design and
development of TEL recommender algorithms
§ Select the best fitting evaluation approach
Future Work
§ Can crowdsourcing be used to evaluate other aspects of a recommender
system? E.g. explanations, presentation…
Can more complex TEL evaluation tasks be evaluated with
crowdsourcing?
Conclusion and Future Work
KOM – Multimedia Communications Lab 23
Questions & Contact

Weitere ähnliche Inhalte

Andere mochten auch

Combination of resource based learning with instructional designed and collab...
Combination of resource based learning with instructional designed and collab...Combination of resource based learning with instructional designed and collab...
Combination of resource based learning with instructional designed and collab...CROKODIl consortium
 
Aufgabenprototypen zur Unterstützung Ressourcen-basierten Lernens
Aufgabenprototypen zur Unterstützung Ressourcen-basierten LernensAufgabenprototypen zur Unterstützung Ressourcen-basierten Lernens
Aufgabenprototypen zur Unterstützung Ressourcen-basierten LernensCROKODIl consortium
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Christoph Rensing
 
Szenarien und Erfahrungen mobilen situierten Lernens an Hochschulen
Szenarien und Erfahrungen mobilen situierten Lernens an HochschulenSzenarien und Erfahrungen mobilen situierten Lernens an Hochschulen
Szenarien und Erfahrungen mobilen situierten Lernens an HochschulenChristoph Rensing
 
A Q&A system considering employees‘ willingness to help colleagues and to loo...
A Q&A system considering employees‘ willingness to help colleagues and to loo...A Q&A system considering employees‘ willingness to help colleagues and to loo...
A Q&A system considering employees‘ willingness to help colleagues and to loo...Multimedia Communications Lab
 
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...Christoph Rensing
 
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0Mojisola Erdt née Anjorin
 
CROKODIL - a Platform for Collaborative Resource-Based Learning
CROKODIL - a Platform for Collaborative Resource-Based LearningCROKODIL - a Platform for Collaborative Resource-Based Learning
CROKODIL - a Platform for Collaborative Resource-Based LearningCROKODIl consortium
 
Content Syndication zwischen Hessen und e-teaching.org
Content Syndication zwischen Hessen und e-teaching.orgContent Syndication zwischen Hessen und e-teaching.org
Content Syndication zwischen Hessen und e-teaching.orgChristoph Rensing
 
Bedarfsgetriebener situativer Wissenserwerb mit Webressourcen
Bedarfsgetriebener situativer Wissenserwerb mit WebressourcenBedarfsgetriebener situativer Wissenserwerb mit Webressourcen
Bedarfsgetriebener situativer Wissenserwerb mit WebressourcenCROKODIl consortium
 
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...Christoph Rensing
 
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...Christoph Rensing
 
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...Multimedia Communications Lab
 

Andere mochten auch (13)

Combination of resource based learning with instructional designed and collab...
Combination of resource based learning with instructional designed and collab...Combination of resource based learning with instructional designed and collab...
Combination of resource based learning with instructional designed and collab...
 
Aufgabenprototypen zur Unterstützung Ressourcen-basierten Lernens
Aufgabenprototypen zur Unterstützung Ressourcen-basierten LernensAufgabenprototypen zur Unterstützung Ressourcen-basierten Lernens
Aufgabenprototypen zur Unterstützung Ressourcen-basierten Lernens
 
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
Investigating Crowdsourcing as an Evaluation Method for (TEL) Recommender Sy...
 
Szenarien und Erfahrungen mobilen situierten Lernens an Hochschulen
Szenarien und Erfahrungen mobilen situierten Lernens an HochschulenSzenarien und Erfahrungen mobilen situierten Lernens an Hochschulen
Szenarien und Erfahrungen mobilen situierten Lernens an Hochschulen
 
A Q&A system considering employees‘ willingness to help colleagues and to loo...
A Q&A system considering employees‘ willingness to help colleagues and to loo...A Q&A system considering employees‘ willingness to help colleagues and to loo...
A Q&A system considering employees‘ willingness to help colleagues and to loo...
 
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...
Lernen mit Web 2.0 Ressourcen in der betrieblichen Ausbildung Erfahrungen a...
 
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0
Erster f vortrag_personalized_rec_sys_for_rbl__20110919_ma_v5.0
 
CROKODIL - a Platform for Collaborative Resource-Based Learning
CROKODIL - a Platform for Collaborative Resource-Based LearningCROKODIL - a Platform for Collaborative Resource-Based Learning
CROKODIL - a Platform for Collaborative Resource-Based Learning
 
Content Syndication zwischen Hessen und e-teaching.org
Content Syndication zwischen Hessen und e-teaching.orgContent Syndication zwischen Hessen und e-teaching.org
Content Syndication zwischen Hessen und e-teaching.org
 
Bedarfsgetriebener situativer Wissenserwerb mit Webressourcen
Bedarfsgetriebener situativer Wissenserwerb mit WebressourcenBedarfsgetriebener situativer Wissenserwerb mit Webressourcen
Bedarfsgetriebener situativer Wissenserwerb mit Webressourcen
 
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...
Mobiles aktivierendes Lernen im Bauingenieurwesen: eine Semantic MediaWiki b...
 
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...
Anregung der Kooperation zwischen Lernenden mittels eines Feeds von Aktionen ...
 
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...
Lernanwendungen im mobilen Web – technische Herausforderungen und Lösungen, v...
 

Ähnlich wie Eval rec algo_crowdsourcing__icalt_2014_ma

Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Mojisola Erdt née Anjorin
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...Erin Robinson
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Erin Robinson
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptxEdelmarBenosa3
 
Robotics-Based Learning in the Context of Computer Programming
Robotics-Based Learning in the Context of Computer ProgrammingRobotics-Based Learning in the Context of Computer Programming
Robotics-Based Learning in the Context of Computer ProgrammingJacob Storer
 
Scaling Up Learning Analytics
Scaling Up Learning AnalyticsScaling Up Learning Analytics
Scaling Up Learning AnalyticsDoug Clow
 
insight-centre-galway-learning-analytics
insight-centre-galway-learning-analyticsinsight-centre-galway-learning-analytics
insight-centre-galway-learning-analyticsSimon Buckingham Shum
 
9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptxMagdaLo1
 
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfCURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfMagdaLo1
 
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Arniel Ping
 
(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdfClaesTrinio
 
A Task-Centered Framework för Computationally Grounded Science Collaborations
A Task-Centered Framework för Computationally Grounded Science CollaborationsA Task-Centered Framework för Computationally Grounded Science Collaborations
A Task-Centered Framework för Computationally Grounded Science CollaborationsDr. Matheus Hauder
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability InstituteNeil Chue Hong
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context SuggestionYONG ZHENG
 
Webinar series: Public engagement, education and outreach for carbon capture ...
Webinar series: Public engagement, education and outreach for carbon capture ...Webinar series: Public engagement, education and outreach for carbon capture ...
Webinar series: Public engagement, education and outreach for carbon capture ...Global CCS Institute
 
Presentation Literacy: Skills for Effective Communication
Presentation Literacy: Skills for Effective CommunicationPresentation Literacy: Skills for Effective Communication
Presentation Literacy: Skills for Effective Communicationoiisdp2010
 
LDT Future of Learning 2010
LDT Future of Learning 2010LDT Future of Learning 2010
LDT Future of Learning 2010Life Unexamined
 

Ähnlich wie Eval rec algo_crowdsourcing__icalt_2014_ma (20)

Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...Exploiting Semantic Information for Graph-based Recommendations of Learning R...
Exploiting Semantic Information for Graph-based Recommendations of Learning R...
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx10.MIL 9. Current and Future Trends in Media and Information.pptx
10.MIL 9. Current and Future Trends in Media and Information.pptx
 
Robotics-Based Learning in the Context of Computer Programming
Robotics-Based Learning in the Context of Computer ProgrammingRobotics-Based Learning in the Context of Computer Programming
Robotics-Based Learning in the Context of Computer Programming
 
Scaling Up Learning Analytics
Scaling Up Learning AnalyticsScaling Up Learning Analytics
Scaling Up Learning Analytics
 
insight-centre-galway-learning-analytics
insight-centre-galway-learning-analyticsinsight-centre-galway-learning-analytics
insight-centre-galway-learning-analytics
 
9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx9 Current and Future Trends of Media and Information.pptx
9 Current and Future Trends of Media and Information.pptx
 
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdfCURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
CURRENT AND FUTURE TRENDS IN MEDIA AND .pdf
 
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
Media and Information Literacy (MIL) - 9. Current and Future Trends in Media ...
 
(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf(lc26,27,28) 9-170209082212.pdf
(lc26,27,28) 9-170209082212.pdf
 
Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS Training Session on Using Nvivo and SPSS
Training Session on Using Nvivo and SPSS
 
A Task-Centered Framework för Computationally Grounded Science Collaborations
A Task-Centered Framework för Computationally Grounded Science CollaborationsA Task-Centered Framework för Computationally Grounded Science Collaborations
A Task-Centered Framework för Computationally Grounded Science Collaborations
 
Software Sustainability Institute
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
 
[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion[UMAP 2016] User-Oriented Context Suggestion
[UMAP 2016] User-Oriented Context Suggestion
 
Webinar series: Public engagement, education and outreach for carbon capture ...
Webinar series: Public engagement, education and outreach for carbon capture ...Webinar series: Public engagement, education and outreach for carbon capture ...
Webinar series: Public engagement, education and outreach for carbon capture ...
 
1st oper as userboard workshop report
1st oper as userboard workshop   report1st oper as userboard workshop   report
1st oper as userboard workshop report
 
Presentation Literacy: Skills for Effective Communication
Presentation Literacy: Skills for Effective CommunicationPresentation Literacy: Skills for Effective Communication
Presentation Literacy: Skills for Effective Communication
 
LDT Future of Learning 2010
LDT Future of Learning 2010LDT Future of Learning 2010
LDT Future of Learning 2010
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Eval rec algo_crowdsourcing__icalt_2014_ma

  • 1. © author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 28-Dez-14 Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab Eval_Rec_Algo_Crowdsourcing__ICALT_2014_MA.pptx Evaluating Recommender Algorithms for Learning using Crowdsourcing Mojisola Erdt Christoph Rensing ICALT 2014, Athen Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/
  • 2. KOM – Multimedia Communications Lab 2 Motivation Learning on-the-job § To solve a particular problem § To learn about a new topic § Mostly web resources Social Tagging Applications § Help to manage resources § Offer recommendations TEL Recommender Systems § Recommend relevant, novel and diverse resources to a specific learning goal or activity
  • 3. KOM – Multimedia Communications Lab 3 Evaluation Approach Advantages Disadvantages Offline Experiments (historical or synthetic datasets) §  Fast §  Less effort §  Repeatable §  New, unknown resources cannot be evaluated §  Dependent on dataset User Experiments §  User’s perspective §  A lot of effort and time §  Few users (ca. 40) Real-life testing §  Real-life setting §  Needs a substantial amount of users Crowdsourcing §  Fast §  Less effort §  Repeatable §  User’s perspective §  Sufficient users §  Unknown users §  “Artificial task” §  Spamming Evaluation Methods for TEL Recommender Systems
  • 4. KOM – Multimedia Communications Lab 4 microworkers §  500,000 crowdworkers worldwide §  Flexible forwarding to other hosting platforms §  Since 2009 CrowdFlower §  5 million crowdworkers in 208 countries §  Gives access to other crowdsourcing platforms e.g. Amazon MTurk §  Since 2007 https://microworkers.com, http://www.crowdflower.com Crowdsourcing Platforms
  • 5. KOM – Multimedia Communications Lab 5 § Motivation § Crowdsourcing Evaluation Concept §  Preparation Step §  Execution Step § Crowdsourcing Evaluation Results § Conclusion & Future Work Overview
  • 6. KOM – Multimedia Communications Lab 6 Crowdsourcing Evaluation Concept Preparation Step Create Questionnaire Set Goal Formulate Hypotheses Create Questions Add Control Questions Select Topic Create Activity Hierarchy Create Seed Dataset Prepare Algorithms Generate Recommendations Filter Duplicates DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing
  • 7. KOM – Multimedia Communications Lab 7 Preparation Step Set Goal AScore is based on Activity Hierarchies § Extends FolkRank by considering activities, activity hierarchies and the current activity of the learner ECTEL 2012. Anjorin et al Understanding the Carbon Footprint Calculating the Carbon Footprint Investigate the impact of Climate Change Analyze potential Catastrophes due to Climate Change Investigate causes of Climate Change Give an overview on the history of Global Warming Determine future prognoses on Climate Change Understanding Climate Change
  • 8. KOM – Multimedia Communications Lab 8 Set Evaluation Goals: § Investigate if AScore recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank. § Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super). Formulate Hypotheses: 1.  Hypothesis: Relevance §  Ascore vs. FolkRank §  A_Sub vs. A_Super 2.  Hypothesis: Novelty §  Ascore vs. FolkRank §  A_Sub vs. A_Super 3.  Hypothesis: Diversity §  Ascore vs. FolkRank §  A_Sub vs. A_Super Preparation Step Set Goal and Formulate Hypotheses
  • 9. KOM – Multimedia Communications Lab 9 Generate a basis graph structure for recommendations § 5 experts research on the topic of climate change for one hour § Using CROKODIL to create an extended folksonomy (users, tags, resources, activities) § Ca. 70 resources were tagged and attached to 8 activities Preparation Step Select Topic and Generate Recommendations Understanding the Carbon Footprint Calculating the Carbon Footprint Investigate the impact of Climate Change Analyze potential Catastrophes due to Climate Change Investigate causes of Climate Change Give an overview on the history of Global Warming Determine future prognoses on Climate Change Understanding Climate Change Experiment Spring Experiment Autumn
  • 10. KOM – Multimedia Communications Lab 10 Conduct personal research on the topic § Level of knowledge on this topic § Request to find 5 online resources relevant to this topic 10 Questions per Recommendation § 3 questions to each hypothesis (relevance, novelty, diversity) § 1 control question to detect spammers §  E.g. Give 4 keywords to summarize the recommended resource General Questions § Age, gender, level of education and nationality Preparation Step Create Questionnaire Experiment Spring Sub- activity Super- activity AScore A_Sub A_Super FolkRank F_Sub F_Super Experiment Autumn Sub- activity Super- activity AScore A_Sub A_Super FolkRank F_Sub F_Super
  • 11. KOM – Multimedia Communications Lab 11https://www.soscisurvey.de Crowdsourcing Evaluation Concept Execution Step Release next iteration burst Crowdsourcing Platform Results Filter Spammers Make Payments Questionnaire
  • 12. KOM – Multimedia Communications Lab 12 Execution Step Participants and Treatment Conditions Experiment Spring Sub- activity Super- activity AScore A_Sub: 45 A_Super: 39 FolkRank F_Sub: 39 F_Super: 36 Experiment Autumn Sub- activity Super- activity AScore A_Sub: 80 A_Super: 73 FolkRank F_Sub: 76 F_Super: 85 CrowdFlower (32) Microworker (35) Volunteers (92) Spammers (243) Crowdworkers (314) Spammers (549)
  • 13. KOM – Multimedia Communications Lab 13 § Motivation § Crowdsourcing Evaluation Concept § Crowdsourcing Evaluation Results §  AScore and FolkRank §  Experiment Spring §  Experiment Autumn §  A_Sub and A_Super §  Experiment Spring §  Experiment Autumn § Conclusion & Future Work Overview
  • 14. KOM – Multimedia Communications Lab 14 Crowdsourcing Evaluation Results Experiment Spring Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.000003578 < 0.05 0.000001531 < 0.05 0.0001618 < 0.05
  • 15. KOM – Multimedia Communications Lab 15 Crowdsourcing Evaluation Results Experiment Autumn Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.000001362 < 0.05 0.0000007654 < 0.05 0.00000000015 < 0.05
  • 16. KOM – Multimedia Communications Lab 16 Evaluation Goals: § Investigate if AScore recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank. § Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super). Formulate Hypotheses: 1.  Hypothesis: Relevance §  Ascore vs. FolkRank §  A_Sub vs. A_Super 2.  Hypothesis: Novelty §  Ascore vs. FolkRank §  A_Sub vs. A_Super 3.  Hypothesis: Diversity §  Ascore vs. FolkRank §  A_Sub vs. A_Super Execution Step Evaluation Results ✔ ✔ ✔ ✔
  • 17. KOM – Multimedia Communications Lab 17 Crowdsourcing Evaluation Results Experiment Spring Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.0005654 < 0.05 0.01666 < 0.05 0.02176 < 0.05
  • 18. KOM – Multimedia Communications Lab 18 Crowdsourcing Evaluation Results Experiment Autumn Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.0005306 < 0.05 0.000001531 < 0.05 0.0000001608 < 0.05
  • 19. KOM – Multimedia Communications Lab 19 Hypothesis 1 Hypothesis 2 Hypothesis 3 Aggregated Mean Values for Hypotheses 1, 2 and 3 Mean 01234567 F_Sub F_Super 3.95 4.05 3.97 3.91 3.96 3.83 Crowdsourcing Evaluation Results Experiment Spring Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.3023 > 0.05 0.5216 > 0.05 0.2031 > 0.05
  • 20. KOM – Multimedia Communications Lab 20 Hypothesis 1 Hypothesis 2 Hypothesis 3 Aggregated Mean Values for Hypotheses 1, 2 and 3 Mean 01234567 F_Sub F_Super 4.04 3.9 4.11 4.09 4.07 4.01 Crowdsourcing Evaluation Results Experiment Autumn Significance Tests Hypothesis 1: Relevance 2: Novelty 3: Diversity p-value 0.01481 < 0.05 0.7064 > 0.05 0.2881 > 0.05
  • 21. KOM – Multimedia Communications Lab 21 Evaluation Goals: § Investigate if AScore recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank. § Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super). Formulate Hypotheses: 1.  Hypothesis: Relevance §  Ascore vs. FolkRank §  A_Sub vs. A_Super 2.  Hypothesis: Novelty §  Ascore vs. FolkRank §  A_Sub vs. A_Super 3.  Hypothesis: Diversity §  Ascore vs. FolkRank §  A_Sub vs. A_Super Execution Step Evaluation Results ✔ ✔ ✔ ✔ ✔
  • 22. KOM – Multimedia Communications Lab 22 Crowdsourcing can be successfully applied to evaluate TEL recommender algorithms § Integrate more user-centric evaluations already during the design and development of TEL recommender algorithms § Select the best fitting evaluation approach Future Work § Can crowdsourcing be used to evaluate other aspects of a recommender system? E.g. explanations, presentation… Can more complex TEL evaluation tasks be evaluated with crowdsourcing? Conclusion and Future Work
  • 23. KOM – Multimedia Communications Lab 23 Questions & Contact