Big Data in Learning Analytics - Analytics for Everyday Learning

Backup
Big Data in Learning Analytics –
Analytics for Everyday Learning
Stefan Dietze, L3S Research Center, Hannover
24.01.2017
LearnTec 2017, Karlsruhe
23/02/17 1Stefan Dietze

Research areas
 Web science, Information Retrieval, Semantic Web, Social Web
Analytics, Knowledge Discovery, Human Computation
 Interdisciplinary application areas: digital humanities,
TEL/education, Web archiving, mobility
Some projects
L3S Research Center
http://l3s.de/
http://stefandietze.net/

Technology-enhanced
Learning / Web-based
Learning
Big Data in Learning Analytics? A simplistic perspective
Learning
Analytics &
Educational
Data Mining
 Application of data mining techniques to understand
learning activities and performance
 Traditionally confined to dedicated learning environments
and platforms (e.g, Moodle)
 Examples: JLA special issue on LA Datasets, data ranging
between few MB and max. 15 GB
 Near complete research corpus: LAK Dataset
(http://lak.linkededucation.org)

Learning Analytics & Knowledge Dataset
 Cooperation of
 Near-complete Linked Data corpus of Learning Analytics
research publications (~ 800, seit 2009)
Dietze, S., Taibi, D., D’Aquin, M., Facilitating
Scientometrics in Learning Analytics and
Educational Data Mining - the LAK Dataset,
Semantic Web Journal, 2017.
http://lak.linkededucation.org/

Technology-enhanced
Learning
Big Data in Learning Analytics? A simplistic Perspective
Learning
Analytics &
Educational
Data Mining
 Examples: JLA special issue on LA Datasets, data ranging
between few MB and max. 15 GB
 Near complete research corpus: LAK Dataset
 Broader understanding: informal learning, micro-learning
 Research often focused on resources: sharing, reusing,
recommendation
 Data examples:
 „LinkedUp Catalog“:
> 50 M resources, 300 M statements
 „LRMI/schema.org“:
> 45 M quads (Common Crawl 2015)
Big Data? –
Depends, but mostly not!
(Volume?)

LinkedUp Catalog of learning resources
Dataset
Catalog/Registry
http://data.linkededucation.org/linkedup/catalog/
 “LinkedUp” (FP7 project): L3S, OU, OKFN, Elsevier, Exact Learning Solutions
 Publishing and curation of educational/learning resources according to Linked Data principles
 Largest collection of Linked Data about learning resources
(approx. 50 datasets, 50 M resources)

1
10
100
1000
10000
100000
1000000
10000000
1 51 101 151 201
count(log)
PLD (ranked)
# entities # statements
Learning Resources annotations on the Web?
 “Learning Resources Metadata Intiative (LRMI)”:
schema.org vocabulary for annotation of learning
resources in Web documents (schema.org etc)
 Approx. 5000 PLDs in “Common Crawl” (2 bn Web
documents)
 LRMI-Adaptation on the Web (WDC) [LILE16]:
 2015: 44.108.511 quads, 6.243.721 resources
 2014: 30.599.024 quads, 4.182.541 resources
 2013: 10.636873 quads, 1.461.093 resources
23/02/17 7
Power law distribution across providers
4805 Providers / PLDs
Taibi, D., Dietze, S., Towards embedded markup of learning resources
on the Web: a quantitative Analysis of LRMI Terms Usage, in
Companion Publication of the IW3C2 WWW 2016 Conference, IW3C2
2016, Montreal, Canada, April 11, 2016
Stefan Dietze, Besnik Fetahu, Ujwal Gadiraju
http://lrmi.itd.cnr.it/

Technology-enhanced
Learning
Big Data in Learning Analytics? A simplistic Perspective
Learning
Analytics &
Educational
Data Mining
 Complete research corpus: LAK Dataset
 Data examples: JLA special issue on LA Datasets, data
ranging between few MB and max. 15 GB
 Broader understanding: informal learning, micro-learning
 Research focused on resources: sharing, reusing,
recommendation
 Data examples:
 „LinkedUp Catalog“:
> 50 M resources, 300 M statements
 „LRMI/schema.org“:
> 45 M quads (Common Crawl 2015)
Big Data? –
(Volume?)
Big Data? –
(Velocity?)

23/02/17 9
(Informal) Learning on the Web ?
Stefan Dietze
 Anything can be a learning resource
 The activity makes the difference (not the
resource): i.e. how a resource is being used
 Learning Analytics in online/non-learning
environments?
o Activity streams,
o Social graphs (and their evolution),
o Behavioural traces (mouse movements,
keystrokes)
o ...
 Research challenges:
o How to detect „learning“?
o How to detect learning-specific notions
such as „competences“, „learning
performance“ etc?

23/02/17 10
„AFEL – Analytics for Everyday (Online) Learning“
Stefan Dietze
Examples of AFEL data sources:
• Activity streams and behavioral traces
• L3S Twitter Crawl: 6 bn tweets
• Common Crawl (2015): 2 bn documents
• Web Data Commons (2015): 1 TB = 24 bn
quads
• „German Academic Web“: 6 TB Web crawl
(quarterly recrawled)
• Wikipedia edit history: 3 M edits/month
(engl.)
• ....
 H2020 project (since 12/2015) aimed at understanding/supporting learning in social Web environments

Big Data Challenges/Tasks in AFEL & beyond: some examples
I Efficient data capture
 Crawling & extracting activity data
 Crawling, extracting and indexing learning
resources (eg Common Crawl)
II Efficient data analysis
 Understanding learning resources: entity
extraction & clustering on large Web crawls of
resources
 “Search as learning”: detecting learning in
heterogeneous search query logs & click streams
 Detecting learning activities: detection of learning
pattern (eg competent behavior) in absence of
learning objectives & assessments (!)
o Obtaining performance indicators from
behavioral traces?
o Quasi experiments in crowdsourcing
platforms to obtain training data
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the
Machine: Challenges and Opportunities of Microtask
Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –
Jul/Aug 2015.
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding
Malicious Behavior in Crowdsourcing Platforms: The Case of
Online Surveys. ACM CHI Conference on Human Factors in Computing
Systems (CHI2015), April 18-23, Seoul, Korea.

Gadiraju, U., Demartini, G., Kawase, R., Dietze, S. Human beyond the
Machine: Challenges and Opportunities of Microtask
Crowdsourcing. In: IEEE Intelligent Systems, Volume 30 Issue 4 –
Jul/Aug 2015.
Gadiraju, U., Kawase, R., Dietze, S, Demartini, G., Understanding
Malicious Behavior in Crowdsourcing Platforms: The Case of
Online Surveys. ACM CHI Conference on Human Factors in Computing
Systems (CHI2015), April 18-23, Seoul, Korea.
Detecting competence in online users?
Capturing assessment data: microtasks in Crowdflower
 “Content Creation (CC)”: transcription of captchas
 “Information Finding (IF)”: middle name of famous persons
 1800 assessments: 2 tasks * 3 durations * 3 difficulty levels
* 100 users (per assessment)
Level 1
„Daniel Craig“
Level 2
„George Lucas“
(profession: Archbishop)
Level 3
„Brian Smith“
(profession: Ice Hockey, born: 1972)
Behavioral Traces: keystrokes- and mouse movements
 timeBeforeInput, timeBeforeClick
 tabSwitchFreq
 windowToggleFreq
 openNewTabFreq
 WindowFocusFrequency
 totalMouseMovements
 scrollUpFreq, scrollDownFreq
 ….
 Total amount of events: 893.285 (CC Tasks), 736.664 (IF Tasks)
Find the middle name of:

Predicting competence from behavioural traces?
Training data
 Manual annotation of 1800 assessments
 Performance types [CHI15]:
o “Competent Worker” ,
o “Diligent Worker”
o “Fast Deceiver”
o “Incompetent Worker”
o “Rule Breaker”
o “Smart Deceiver”
o “Sloppy Worker”
 Prediction of performance types from
behavioral traces?
Predicting learner types from behavioral traces
 “Random Forest Classifier” (per task)
 10-fold cross validation
 Prediction performance: Accuracy, F-Measure
Results
 Longer assessments  more signals
 Simpler assessments  more conclusive signals
 “Competent Workers” (CW, DW): accuracy of 91% respectively 87%
 Most significant features: “TotalTime”, “TippingPoint”,
“MouseMovementFrequency”, “WindowFocusFrequency”

Other features to predict competence in learning/assessments?
“Dunning-Kruger Effect”
 Incompetence in task/domain reduces capacity to
recognice/assess own incompetence
Research question
 Self-assessment as indicator for competence?
Results
 Self-assessment as reliable indicator of competence
(94% accuracy), superior to mere performance
measurement
 Tendency to over-estimated own competence
increases with increasing difficulty level
David Dunning. 2011. The Dunning-Kruger Effect: On Being Ignorant of
One’s Own Ignorance. Advances in experimental social psychology 44
(2011), 247.
Performance („Accuracy“) of users classified as „competent“

Summary & outlook
 Learning analytics in online & Web-based settings
o Detection of learning & learning-related notions in
absence of assessment/performance indicators?
o Analysis of range of data, including behavioral
traces, activity streams, self assessment etc
o Actual big data
 Positive results from initial models and classifiers
 Application of developed models and classifiers in
online (learning) environments (e.g. AFEL Projekt)
o GNOSS/Didactalia (200.000 users)
o LearnWeb
o Deutsche Welle online
o …

Acknowledgements: Team
 Pavlos Fafalios (L3S)
 Besnik Fetahu (L3S)
 Ujwal Gadiraju (L3S)
 Eelco Herder (L3S)
 Ivana Marenzi (L3S)
 Ran Yu (L3S)
 Pracheta Sahoo (L3S, IIT India)
 Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)
 Mathieu d‘Aquin (The Open University, UK)
 Davide Taibi (CNR, Italy)
 ...

Acknowledgements: Team
 Pavlos Fafalios (L3S)
 Besnik Fetahu (L3S)
 Ujwal Gadiraju (L3S)
 Eelco Herder (L3S)
 Ivana Marenzi (L3S)
 Ran Yu (L3S)
 Pracheta Sahoo (L3S, IIT India)
 Bernardo Pereira Nunes (L3S, PUC Rio de Janeiro)
 Mathieu d‘Aquin (The Open University, UK)
 Davide Taibi (CNR, Italy)
 ...
?http://stefandietze.net

Big Data in Learning Analytics - Analytics for Everyday Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Big Data in Learning Analytics - Analytics for Everyday Learning

Ähnlich wie Big Data in Learning Analytics - Analytics for Everyday Learning (20)

Mehr von Stefan Dietze

Mehr von Stefan Dietze (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data in Learning Analytics - Analytics for Everyday Learning