The development of Massive Open Online Courses (MOOCs) has stimulated educational research and challenged the existing models of distance and online learning. New questions have emerged that could be answered and administered across thousands of students. Although data on the engagement of this large number of online learners is quite accessible for researchers, they are usually limited to few sources. Very few researchers have data from different courses, institutions and/or platforms. Student privacy and data protection regulations tend to limit data sharing and suppress collaborative research that is necessary for addressing some of the main challenges MOOC research is currently facing. The MOOC Data Science Commons is a collaborative platform that tends to bring the next generation of MOOC research, allowing for generalized MOOC data organization and shared analytics. The aim of the project is to bring together educational researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science.
3. Data sources
Weekly data packages
auth_user-{site}-analytics.sql
auth_userprofile-{site}-analytics.sql
certificates_generatedcertificate-{site}-analytics.sql
Daily data packages
course_structure-{site}-analytics.json
courseware_studentmodule-{site}-analytics.sql
email_opt_in-{site}-analytics.csv
student_courseenrollment-{site}-analytics.sql
user_api_usercoursetag-{site}-analytics.sql
user_id_map-{site}-analytics.sql
{org}-{course}-{date}-{site}.mongo
wiki_article-{site}-analytics.sql
wiki_articlerevision-{site}-analytics.sql
{org}-{site}-events-{date}.log.gz.gpg
4. Challenges
• Analytics across several courses
• Analytics across different platforms
• Analytics across different institutions
• Sharing data
5. Solution?
• Collaborative data science platform
– Standardize data storage
– Generalizable across courses and data providers
(currently OpenEdX, edX and Coursera)
– “Data being shared without data being
exchanged”
– Sharing and reproducing the results
8. Collaborative platform and
applications
edX Coursera MOOCdb doc Github repo
Feature factory LabelMe Digital learner quantified Problem analytics
My MOOCViz Social network analysis Forum analysis Dropout prediction
9. Current state
• Established network of institutions
– MIT, Stanford, University of Michigan, University
of Edinburgh, University of Queensland, University
of Texas (Austin)
• Release of open source software
• Development and release of the first data
analytics framework
10. Next steps
Digital Learner Quantified
Discussion forum analysis
LabelMe
Problem analytics
Dropout prediction
Social network analysis
11. Collaboration
• If you are interested in…
– Development
– Feature modeling
– Translating your data
– Testing
kalyan@csail.mit.edu
s.Joksimovic@ed.ac.uk
12. Q&A
MOOCdb:
Developing Data Standards for MOOCs
Srećko Joksimović
s.joksimovic@ed.ac.uk
@s_joksimovic
Kalyan Veeramachaneni
kalyan@csail.mit.edu
Dragan Gašević
dragan.gasevic@ed.ac.uk
FutureLearn Academic Network Conference
15 June 2015
Hinweis der Redaktion
MOOCdb which is our solution to centralizing and generalizing MOOC data organization and providing general purpose analytics for MOOC education research.
“How does amount of time spent on the videos during a certain week correlate to performance on the homework?”
CAN WE HAVE STANDARDIZED DATA STORAGE?
Sharing and reproducing the results: When they publish research, analysts can share the scripts by
depositing them into a public archive where they are retrievable and cross-referenced to their donor
and publication.
The MOOCdb project aims to brings together educational researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science. The project founded at MIT includes a platform agnostic functional data model for data exhaust from MOOCs, a collaborative-open source-open access data visualization framework, a crowd sourced knowledge discovery framework and a privacy preserving software framework. The team is currently working to release a number of these tools and frameworks as open source.
WHAT MOOCdb PROVIDES?
Concise data storage: MOOCdb's proposed schema is \loss-less" with respect to research relevant information, i.e. no information is lost in translating raw data to it.
Access Control Levels for Anonymized Data: The data schema offers an organized means of structuring anonymized user identities safeguard them further.
Sharing of data extraction scripts: Scripts for data extraction and descriptive statistics extraction can
be open source and shared by everyone because they reference data organized according to the schema.
Crowd source potential: Machine learning frequently involves humans identifying explanatory variables
that could drive a response. Enabling the crowd to help propose variables could greatly scale the com-
munity's progress in mining MOOC data.