Edm2014 investigating automated student modeling in a java mooc

Investigating Automated Student
Modeling in a Java MOOC
Michael Yudelson1, Roya Hosseini2,
Arto Vihavainen3, & Peter Brusilovsky2
1Carnegie Learning, 2University of Pittsburgh, 3University of Helsinki

Everybody’s Coding
• Programming is no longer the trade of the few
– Wide penetration of computer science
– Challenge for educators
• Talent pool is different
• Abundance of learning materials doesn’t help
– Even if digital, there’s no persistent student model
– New languages appear and need to be taught
(e.g., R, Swift)
Michael V. Yudelson (C) 2014 2

Problem
• Programming course (MOOC or otherwise) at
University of Helsinki
– 100 close-formed/open-ended assignments over 6 weeks
(101-103 lines of code each)
– NetBeans plugin for testing/submitting/feedback
– Code snapshots are meticulously archived
– No provisions to account for student learning
(no student model)
• On top of black-box-style pass/fail code grading
– Build longitudinal student model automatically
– Non-trivial programming assignments

Data
• Every snapshot compiled and ran against tests
• JavaParser* extracted concepts/skills (programming
constructs)
• Incremental snapshots that did not result in changes to
concepts removed
Course Students All
(Male)
Age
Min/Median/
Max
Code snapshots
All / Median
Intro to Programming, Fall 2012 185 (121) 18 / 22 / 65 204460 / 1131
Intro to Programming, Fall 2013 207 (147) 18 / 22 / 57 263574 / 1126
Programming MOOC, Spring 2013 683 (492) 13 / 23 / 75 842356 / 876
* Hosseini, R., & Brusilovsky, P. (2013). JavaParser: A Fine- Grain Concept Indexing Tool for Java Problems. In The First Workshop
on AI-supported Education for Computer Science (AIEDCS 2013) (pp. 60-63).
Code for assignment:
automatically saved, ran
against tests, submitted

Questions
• Given the approach is fully automated
– Can we build accurate models of learning?
– Can we do that while using a fraction of the data?
• Only fraction of the concepts are relevant in each
successive code snapshot
– Can the models be used beyond detecting student
progress
• E.g. for building an intelligent [fully automated] hinting
component for struggling students

Methodology (1)
• Modeling student learning
– Additive Factors Model
• responseilj = studenti + problemj +
Σk(skillk + skill_slopek * attemtpsik)
• responseij – student ith code passing test l for problem j
• Selecting concepts (AFM A, AFM B, AFM C)
– A. all concepts available
– B. changes from the previous snapshot
– C. changes, distinguishing addition/deletion

Methodology (2)
• Selecting relevant concepts (+PC)
– PC – conditional independence search algorithm
from Tetrad tool*
– What concepts are associated with [not] passing
the test
– PC data-mining task was setup for each problem
• Different snapshot submission speeds (+Ln)
– Smoothing attempt counts by taking a logarithm
*Spirtes, P., Glymour, C., and Scheines, R. (2000) Causation, Prediction, and Search, 2nd Ed. MIT Press,
Cambridge MA.

Methodology (3)
• Validating models
– Consecutive code snapshots and changes in
passing/failing the tests (YY, YN, NY, NN)
– Model support scores for adding, deleting
concepts: positive, negative, neutral (P,N,0)
• Support – sum of slopes for the concept changes
– NYP0 – from fail to pass, positive support for
addition, neutral for deletions

Methodology (4)
• Conditional probabilities – relative frequencies of
– A: pass-to-pass – no-negative support for any changes
– B: pass-to-fail – negative support for any change
– C: fail-to-fail – no positive support for changes
– D: fail-to-pass – positive support for changes
• Grouped conditional probabilities
– Average of all A, B, C, D
– Average of B and D (arguably of primary interest)
• Last but not least – size of the data required to fit
models

Results (1)
• Accuracy, Data size,
Validation values

Results (2)
Model Acc. Acc. rnk File Sz rnk Val. A-D rnk Val. B,D rnk Overall rnk
Rasch .71 - - - - -
AFM A .81
AFM B .73
AFM C .78
AFM A+PC .84 1
AFM B+PC .77
AFM C+PC .83 2
AFM A+Ln* .75 2 (.62) 3 (.45)
AFM B+Ln .71 1 (123Mb) 1 (.63) 2 (4.75)
AFM C+Ln .77 2 (139Mb)
AFM A+PC+Ln .82 3 6 (284Mb) 8 (.59) 2 (.47) 3 (4.75)
AFM B+PC+Ln .75 3 (141Mb) 3 (.62) 1 (.49) 1 (4.00)
AFM C+PC+Ln .78
* Logarithm of opportunity counts slightly inflates log file size due to text format
See full table
in the paper

Discussion
• It is possible to fully automate student
modeling (in programming domain) with a
fraction of rich data
• Models we built have potential to be used for
providing in-problem learning support
• The choice of best model has tradeoffs
– Accuracy vs. data requirement vs. validation*

Future Work
• Address concept counts in snapshots
• Make use of code structure (parse trees)
• Make use of student behaviors
– Builder, Massager, Reducer, Struggler
• Account for within IDE actions (save, run, ask
for hint)
• Tie to student’s browsing of the support
material

Thank You!

Edm2014 investigating automated student modeling in a java mooc

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Edm2014 investigating automated student modeling in a java mooc

Ähnlich wie Edm2014 investigating automated student modeling in a java mooc (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Edm2014 investigating automated student modeling in a java mooc

Hinweis der Redaktion