Professor for Data & Knowledge Engineering (HHU Düsseldorf) & Scientific Director of Knowledge Technologies for the Social Sciences (GESIS, Cologne) um Heinrich-Heine-University Düsseldorf & GESIS (Cologne)
Big Data in Learning Analytics - Analytics for Everyday Learning
Backup
Big Data in Learning Analytics –
Analytics for Everyday Learning
Stefan Dietze, L3S Research Center, Hannover
24.01.2017
LearnTec 2017, Karlsruhe
23/02/17 1Stefan Dietze
Research areas
Web science, Information Retrieval, Semantic Web, Social Web
Analytics, Knowledge Discovery, Human Computation
Interdisciplinary application areas: digital humanities,
TEL/education, Web archiving, mobility
Some projects
L3S Research Center
23/02/17 2Stefan Dietze
http://l3s.de/
http://stefandietze.net/
Technology-enhanced
Learning / Web-based
Learning
Big Data in Learning Analytics? A simplistic perspective
23/02/17 3Stefan Dietze
Learning
Analytics &
Educational
Data Mining
Application of data mining techniques to understand
learning activities and performance
Traditionally confined to dedicated learning environments
and platforms (e.g, Moodle)
Examples: JLA special issue on LA Datasets, data ranging
between few MB and max. 15 GB
Near complete research corpus: LAK Dataset
(http://lak.linkededucation.org)
Learning Analytics & Knowledge Dataset
Cooperation of
Near-complete Linked Data corpus of Learning Analytics
research publications (~ 800, seit 2009)
Dietze, S., Taibi, D., D’Aquin, M., Facilitating
Scientometrics in Learning Analytics and
Educational Data Mining - the LAK Dataset,
Semantic Web Journal, 2017.
23/02/17 4Stefan Dietze
http://lak.linkededucation.org/
Technology-enhanced
Learning / Web-based
Learning
Big Data in Learning Analytics? A simplistic Perspective
23/02/17 5Stefan Dietze
Learning
Analytics &
Educational
Data Mining
Application of data mining techniques to understand
learning activities and performance
Traditionally confined to dedicated learning environments
and platforms (e.g, Moodle)
Examples: JLA special issue on LA Datasets, data ranging
between few MB and max. 15 GB
Near complete research corpus: LAK Dataset
(http://lak.linkededucation.org)
Broader understanding: informal learning, micro-learning
Research often focused on resources: sharing, reusing,
recommendation
Data examples:
„LinkedUp Catalog“:
> 50 M resources, 300 M statements
„LRMI/schema.org“:
> 45 M quads (Common Crawl 2015)
Big Data? –
Depends, but mostly not!
(Volume?)
LinkedUp Catalog of learning resources
Dataset
Catalog/Registry
http://data.linkededucation.org/linkedup/catalog/
“LinkedUp” (FP7 project): L3S, OU, OKFN, Elsevier, Exact Learning Solutions
Publishing and curation of educational/learning resources according to Linked Data principles
Largest collection of Linked Data about learning resources
(approx. 50 datasets, 50 M resources)
23/02/17 6Stefan Dietze
1
10
100
1000
10000
100000
1000000
10000000
1 51 101 151 201
count(log)
PLD (ranked)
# entities # statements
Learning Resources annotations on the Web?
“Learning Resources Metadata Intiative (LRMI)”:
schema.org vocabulary for annotation of learning
resources in Web documents (schema.org etc)
Approx. 5000 PLDs in “Common Crawl” (2 bn Web
documents)
LRMI-Adaptation on the Web (WDC) [LILE16]:
2015: 44.108.511 quads, 6.243.721 resources
2014: 30.599.024 quads, 4.182.541 resources
2013: 10.636873 quads, 1.461.093 resources
23/02/17 7
Power law distribution across providers
4805 Providers / PLDs
Taibi, D., Dietze, S., Towards embedded markup of learning resources
on the Web: a quantitative Analysis of LRMI Terms Usage, in
Companion Publication of the IW3C2 WWW 2016 Conference, IW3C2
2016, Montreal, Canada, April 11, 2016
Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju
http://lrmi.itd.cnr.it/
Technology-enhanced
Learning / Web-based
Learning
Big Data in Learning Analytics? A simplistic Perspective
Learning
Analytics &
Educational
Data Mining
Application of data mining techniques to understand
learning activities and performance
Traditionally confined to dedicated learning environments
and platforms (e.g, Moodle)
Complete research corpus: LAK Dataset
(http://lak.linkededucation.org)
Data examples: JLA special issue on LA Datasets, data
ranging between few MB and max. 15 GB
Broader understanding: informal learning, micro-learning
Research focused on resources: sharing, reusing,
recommendation
Data examples:
„LinkedUp Catalog“:
> 50 M resources, 300 M statements
„LRMI/schema.org“:
> 45 M quads (Common Crawl 2015)
Big Data? –
Depends, but mostly not!
(Volume?)
Big Data? –
Depends, but mostly not!
(Velocity?)
23/02/17 8Stefan Dietze
23/02/17 9
(Informal) Learning on the Web ?
Stefan Dietze
Anything can be a learning resource
The activity makes the difference (not the
resource): i.e. how a resource is being used
Learning Analytics in online/non-learning
environments?
o Activity streams,
o Social graphs (and their evolution),
o Behavioural traces (mouse movements,
keystrokes)
o ...
Research challenges:
o How to detect „learning“?
o How to detect learning-specific notions
such as „competences“, „learning
performance“ etc?
23/02/17 10
„AFEL – Analytics for Everyday (Online) Learning“
Stefan Dietze
Examples of AFEL data sources:
• Activity streams and behavioral traces
• L3S Twitter Crawl: 6 bn tweets
• Common Crawl (2015): 2 bn documents
• Web Data Commons (2015): 1 TB = 24 bn
quads
• „German Academic Web“: 6 TB Web crawl
(quarterly recrawled)
• Wikipedia edit history: 3 M edits/month
(engl.)
• ....
H2020 project (since 12/2015) aimed at understanding/supporting learning in social Web environments
Big Data Challenges/Tasks in AFEL & beyond: some examples
23/02/17 11Stefan Dietze
I Efficient data capture
Crawling & extracting activity data
Crawling, extracting and indexing learning
resources (eg Common Crawl)
II Efficient data analysis
Understanding learning resources: entity
extraction & clustering on large Web crawls of
resources
“Search as learning”: detecting learning in
heterogeneous search query logs & click streams
Detecting learning activities: detection of learning
pattern (eg competent behavior) in absence of
learning objectives & assessments (!)
o Obtaining performance indicators from
behavioral traces?
o Quasi experiments in crowdsourcing
platforms to obtain training data
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the
Machine: Challenges and Opportunities of Microtask
Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –
Jul/Aug 2015.
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding
Malicious Behavior in Crowdsourcing Platforms: The Case of
Online Surveys. ACM CHI Conference on Human Factors in Computing
Systems (CHI2015), April 18-23, Seoul, Korea.
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the
Machine: Challenges and Opportunities of Microtask
Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –
Jul/Aug 2015.
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding
Malicious Behavior in Crowdsourcing Platforms: The Case of
Online Surveys. ACM CHI Conference on Human Factors in Computing
Systems (CHI2015), April 18-23, Seoul, Korea.
23/02/17 12Stefan Dietze
Detecting competence in online users?
Capturing assessment data: microtasks in Crowdflower
“Content Creation (CC)”: transcription of captchas
“Information Finding (IF)”: middle name of famous persons
1800 assessments: 2 tasks * 3 durations * 3 difficulty levels
* 100 users (per assessment)
Level 1
„Daniel Craig“
Level 2
„George Lucas“
(profession: Archbishop)
Level 3
„Brian Smith“
(profession: Ice Hockey, born: 1972)
Behavioral Traces: keystrokes- and mouse movements
timeBeforeInput, timeBeforeClick
tabSwitchFreq
windowToggleFreq
openNewTabFreq
WindowFocusFrequency
totalMouseMovements
scrollUpFreq, scrollDownFreq
….
Total amount of events: 893.285 (CC Tasks), 736.664 (IF Tasks)
Find the middle name of:
23/02/17 13Stefan Dietze
Predicting competence from behavioural traces?
Training data
Manual annotation of 1800 assessments
Performance types [CHI15]:
o “Competent Worker” ,
o “Diligent Worker”
o “Fast Deceiver”
o “Incompetent Worker”
o “Rule Breaker”
o “Smart Deceiver”
o “Sloppy Worker”
Prediction of performance types from
behavioral traces?
Predicting learner types from behavioral traces
“Random Forest Classifier” (per task)
10-fold cross validation
Prediction performance: Accuracy, F-Measure
Results
Longer assessments more signals
Simpler assessments more conclusive signals
“Competent Workers” (CW, DW): accuracy of 91% respectively 87%
Most significant features: “TotalTime”, “TippingPoint”,
“MouseMovementFrequency”, “WindowFocusFrequency”
23/02/17 14Stefan Dietze
Other features to predict competence in learning/assessments?
“Dunning-Kruger Effect”
Incompetence in task/domain reduces capacity to
recognice/assess own incompetence
Research question
Self-assessment as indicator for competence?
Results
Self-assessment as reliable indicator of competence
(94% accuracy), superior to mere performance
measurement
Tendency to over-estimated own competence
increases with increasing difficulty level
David Dunning. 2011. The Dunning-Kruger Effect: On Being Ignorant of
One’s Own Ignorance. Advances in experimental social psychology 44
(2011), 247.
Performance („Accuracy“) of users classified as „competent“
23/02/17 15Stefan Dietze
Summary & outlook
Learning analytics in online & Web-based settings
o Detection of learning & learning-related notions in
absence of assessment/performance indicators?
o Analysis of range of data, including behavioral
traces, activity streams, self assessment etc
o Actual big data
Positive results from initial models and classifiers
Application of developed models and classifiers in
online (learning) environments (e.g. AFEL Projekt)
o GNOSS/Didactalia (200.000 users)
o LearnWeb
o Deutsche Welle online
o …
Acknowledgements: Team
23/02/17 16Stefan Dietze
Pavlos Fafalios (L3S)
Besnik Fetahu (L3S)
Ujwal Gadiraju (L3S)
Eelco Herder (L3S)
Ivana Marenzi (L3S)
Ran Yu (L3S)
Pracheta Sahoo (L3S, IIT India)
Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)
Mathieu d‘Aquin (The Open University, UK)
Davide Taibi (CNR, Italy)
...
Acknowledgements: Team
23/02/17 17Stefan Dietze
Pavlos Fafalios (L3S)
Besnik Fetahu (L3S)
Ujwal Gadiraju (L3S)
Eelco Herder (L3S)
Ivana Marenzi (L3S)
Ran Yu (L3S)
Pracheta Sahoo (L3S, IIT India)
Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)
Mathieu d‘Aquin (The Open University, UK)
Davide Taibi (CNR, Italy)
...
?http://stefandietze.net