This document summarizes a study that investigated automatically modeling student learning in a MOOC for Java programming. The study analyzed code snapshots from programming assignments to build additive factors models of student skill development over time. The best models were able to accurately model skill development using only a fraction of the available data. These automated student models have the potential to be used to provide intelligent in-problem learning support for struggling students.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Edm2014 investigating automated student modeling in a java mooc
1. Investigating Automated Student
Modeling in a Java MOOC
Michael Yudelson1, Roya Hosseini2,
Arto Vihavainen3, & Peter Brusilovsky2
1Carnegie Learning, 2University of Pittsburgh, 3University of Helsinki
2. Everybody’s Coding
• Programming is no longer the trade of the few
– Wide penetration of computer science
– Challenge for educators
• Talent pool is different
• Abundance of learning materials doesn’t help
– Even if digital, there’s no persistent student model
– New languages appear and need to be taught
(e.g., R, Swift)
Michael V. Yudelson (C) 2014 2
3. Problem
• Programming course (MOOC or otherwise) at
University of Helsinki
– 100 close-formed/open-ended assignments over 6 weeks
(101-103 lines of code each)
– NetBeans plugin for testing/submitting/feedback
– Code snapshots are meticulously archived
– No provisions to account for student learning
(no student model)
• On top of black-box-style pass/fail code grading
– Build longitudinal student model automatically
– Non-trivial programming assignments
Michael V. Yudelson (C) 2014 3
4. Data
• Every snapshot compiled and ran against tests
• JavaParser* extracted concepts/skills (programming
constructs)
• Incremental snapshots that did not result in changes to
concepts removed
Course Students All
(Male)
Age
Min/Median/
Max
Code snapshots
All / Median
Intro to Programming, Fall 2012 185 (121) 18 / 22 / 65 204460 / 1131
Intro to Programming, Fall 2013 207 (147) 18 / 22 / 57 263574 / 1126
Programming MOOC, Spring 2013 683 (492) 13 / 23 / 75 842356 / 876
* Hosseini, R., & Brusilovsky, P. (2013). JavaParser: A Fine- Grain Concept Indexing Tool for Java Problems. In The First Workshop
on AI-supported Education for Computer Science (AIEDCS 2013) (pp. 60-63).
Michael V. Yudelson (C) 2014 4
Code for assignment:
automatically saved, ran
against tests, submitted
5. Questions
• Given the approach is fully automated
– Can we build accurate models of learning?
– Can we do that while using a fraction of the data?
• Only fraction of the concepts are relevant in each
successive code snapshot
– Can the models be used beyond detecting student
progress
• E.g. for building an intelligent [fully automated] hinting
component for struggling students
Michael V. Yudelson (C) 2014 5
6. Methodology (1)
• Modeling student learning
– Additive Factors Model
• responseilj = studenti + problemj +
Σk(skillk + skill_slopek * attemtpsik)
• responseij – student ith code passing test l for problem j
• Selecting concepts (AFM A, AFM B, AFM C)
– A. all concepts available
– B. changes from the previous snapshot
– C. changes, distinguishing addition/deletion
Michael V. Yudelson (C) 2014 6
7. Methodology (2)
• Selecting relevant concepts (+PC)
– PC – conditional independence search algorithm
from Tetrad tool*
– What concepts are associated with [not] passing
the test
– PC data-mining task was setup for each problem
• Different snapshot submission speeds (+Ln)
– Smoothing attempt counts by taking a logarithm
Michael V. Yudelson (C) 2014 7
*Spirtes, P., Glymour, C., and Scheines, R. (2000) Causation, Prediction, and Search, 2nd Ed. MIT Press,
Cambridge MA.
8. Methodology (3)
• Validating models
– Consecutive code snapshots and changes in
passing/failing the tests (YY, YN, NY, NN)
– Model support scores for adding, deleting
concepts: positive, negative, neutral (P,N,0)
• Support – sum of slopes for the concept changes
– NYP0 – from fail to pass, positive support for
addition, neutral for deletions
Michael V. Yudelson (C) 2014 8
9. Methodology (4)
• Conditional probabilities – relative frequencies of
– A: pass-to-pass – no-negative support for any changes
– B: pass-to-fail – negative support for any change
– C: fail-to-fail – no positive support for changes
– D: fail-to-pass – positive support for changes
• Grouped conditional probabilities
– Average of all A, B, C, D
– Average of B and D (arguably of primary interest)
• Last but not least – size of the data required to fit
models
Michael V. Yudelson (C) 2014 9
11. Results (2)
Model Acc. Acc. rnk File Sz rnk Val. A-D rnk Val. B,D rnk Overall rnk
Rasch .71 - - - - -
AFM A .81
AFM B .73
AFM C .78
AFM A+PC .84 1
AFM B+PC .77
AFM C+PC .83 2
AFM A+Ln* .75 2 (.62) 3 (.45)
AFM B+Ln .71 1 (123Mb) 1 (.63) 2 (4.75)
AFM C+Ln .77 2 (139Mb)
AFM A+PC+Ln .82 3 6 (284Mb) 8 (.59) 2 (.47) 3 (4.75)
AFM B+PC+Ln .75 3 (141Mb) 3 (.62) 1 (.49) 1 (4.00)
AFM C+PC+Ln .78
Michael V. Yudelson (C) 2014 11
* Logarithm of opportunity counts slightly inflates log file size due to text format
See full table
in the paper
12. Discussion
• It is possible to fully automate student
modeling (in programming domain) with a
fraction of rich data
• Models we built have potential to be used for
providing in-problem learning support
• The choice of best model has tradeoffs
– Accuracy vs. data requirement vs. validation*
Michael V. Yudelson (C) 2014 12
13. Future Work
• Address concept counts in snapshots
• Make use of code structure (parse trees)
• Make use of student behaviors
– Builder, Massager, Reducer, Struggler
• Account for within IDE actions (save, run, ask
for hint)
• Tie to student’s browsing of the support
material
Michael V. Yudelson (C) 2014 13
We do not need to know whether the student is right or wrong, programming language by definition takes care of that
We knowingly violated i.i.d. assumptions of the algorithm, but we are not drawing causal conclusions, but filtering concept
Logging would not solve the problem, we should have accounted for submission speeds otherwise, but we are after streamlining and speed