The document discusses open data science research topics presented at a conference, including opportunities and challenges with learning analytics and adaptive learning using open data. It describes how learning analytics can help achieve large improvements in student outcomes through targeted feedback and personalized learning paths. An open analytics architecture is proposed to integrate different data sources and applications using common data standards.
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Frontiers of Open Data Science Research
1. FRONTIERS OF
OPEN DATA
SCIENCE RESEARCH
Ani Aghababyan
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2. Ani Aghababyan, Ph.D.
Data Scientist
McGraw-Hill Education
Analytics
Frontiers of Open Data Science
Research
Data and Analytics
Saturday, May 30, 2015
8. EXCITING POSSIBILITIES
What if my FitBit could if I will fail my test: ready for the test?
Whether I truly have test anxiety?
Should I delay taking this take home exam?
SOBERING QUESTIONS
Whose data is it?
Can I even access my data—all my data?
Who else can access my data?
Can the data be used against me?
Is the data even accurate?
How good is the science?
10. Research Studies
The 2-sigma problem
Group 2 – 1 sigma above Group 1
Group 3 – 2 sigmas above Group 1
The average tutored student outperformed 98% of traditional
students
BENJAMIN BLOOM
2𝞂
11. QUESTIONS + CONCLUSIONS
How do we achieve a 1- or 2-sigma improvement in outcomes?
How do we encourage self-regulation in the learner?
How do we provide targeted, real-time feedback (nudges)?
How do we create a personalized path for the learner?
HINT
Learning Analytics
Adaptive Learning
13. What is the best
that could happen?
What might happen?
Stages of Analytics
Analytics Maturity
CompetitiveAdvantage
Raw
Data
Cleaned
Data
Standard
Reports
Adhoc
Reports &
OLAP
Generic
Predictive
Analytics
Predictive
Modeling
PREDICTION
What happened?
What correlates to what happened??
PRESCRIPTIONDESCRIPTION
15. WHAT IS LEARNING ANALYTICS
The measurement, collection, analysis and reporting of data
about learners and their contexts, for purposes of
understanding and optimizing learning and the environments in
which it occurs.
How could we achieve that?
HINT
Open Architecture
17. Data Source 1
LearningEvents+Context
Learning Analytics
Store
OutputAPI
Caliper Data Capture
Specification
Product 1
Open Analytics Architecture
Data Source 2
Data Source 3
Data Source 4
InputAPIs
Product 2
Product 3
18. Data Source 1
LearningEvents+Context
Learning Analytics
Store
OutputAPI
Caliper Data Capture
Specification
Product 1
Open Analytics Architecture
Data Source 2
Data Source 3
Data Source 4
InputAPIs
Product 2
Product 3
19. Data Source 1
LearningEvents+Context
Learning Analytics
Store
OutputAPI
Caliper Data Capture
Specification
Product 1
Open Analytics Architecture
Data Source 2
Data Source 3
Data Source 4
InputAPIs
Product 2
Product 3
20. Data Source 1
LearningEvents+Context
Learning Analytics
Store
OutputAPI
Caliper Data Capture
Specification
Product 1
Open Analytics Architecture
Data Source 2
Data Source 3
Data Source 4
InputAPIs
Product 2
Product 3
21. Data Source 1
LearningEvents+Context
Learning Analytics
Store
OutputAPI
Caliper Data Capture
Specification
Product 1
Open Analytics Architecture
Data Source 2
Data Source 3
Data Source 4
InputAPIs
Product 2
Product 3
Frontiers of Open Data Science Research. Whenever I see a presentation titles such as the one I am giving today, the words that come to my mind are something like this:
Big Data, Data Analytics, Data Science, Learning Science, Visualization, Reporting, Hadoop, Elastic Map Reduce, Spark, Scala, NoSQL, etc.
Everyone seems to be explaining big data or data science in different words. So my goal for today is to provide clarity to these words in the context of education and learning. But first, why do we care? What is so important and noteworthy about data and data science anyways—and in particular, as it applies to learning and education since I represent a learning sciences company and I am a learning scientist myself.
Nowadays our lives seem to be filled with gadgets and tools that spit out data and most of them do some pretty cool analytics and reporting for the users. Here are some example of these everyday gadgets. Some seem trivial but in reality the questions we could ask and answer through these data could be very sophisticated and fascinating. Things that we couldn’t do easily before. An example would be this fitbit
Fitbit provides a phone app through which you can see charts and graphs of various information. It could be very simple such as your steps for the day, the milage you crossed, the evelevvation infroamtionetc.
Some models can even provide the user with their heart rate information
What brings it into data analytics is that you can create usage analytics based on these trivial data: for example, you could compare your heart rate based on the circumstances and see if there is a pattern that emerges. For example you can compare your heart rate for days when you are battling a cold to when you are very healthy and strong. See if there is a difference between your resting heart rates. If there is (which was the case in this situation), you can try to analyze whether fitbit could have predicted your illness prior to the day when you were unable to leave your bed. This is a simple case but there are many more we could apply for in learning context. For example, you could identify whether there is a difference in your academic performance based on your physical condition.
So exciting possibilities are that I could predict things like whether I am ready for my test or not, whether I have test anxiety. However, this excitement comes with price of such sobering realizations like is my data safe? Who else can see my data? Will I be judged based on this data?
So lets move closer to eduction. Let’s consider a research case that ground my talk
2-sigma Problem
Back in 1984 Benjamin Bloom looked at student performance for students learning in three different contexts. In the first group, students were taught in a traditional class-room setting. In the second group, students were taught using mastery-learning techniques and formative feedback loop. In the third group students were in one on one tutoring sessions.
Bloom discovered that students' performance from the second group was 1 sigma (standard deviation) higher than the students' performance in the first group. And students' performance in the third group was 2 sigmas higher than the students' performance in the first group. So another way said, the average tutored student outperformed 98% of the traditionally taught students!
This and other similar studies raise some very important questions for us:
How do we achieve a 1- or 2-sigma improvement in student outcomes?
How do we encourage self-regulation in the learner?
How do we provide targeted, real-time feedback (nudges)?
How do create a personalized path for the learner?
The hint is hint: learning analytics and adaptive learning.
So what is analytics? How does it differ from our reports? And how can we apply it to learning?
Data analytics and learning analytics, broadly put, is a system of analysis applied to data and to learning events. Yet, a definition of that breadth is not imminently practical. So lets look at the stages of analytics.
Descriptive. In this stage of analytics we are concerned with a presentation of the past. What has happened? What patterns of past behavior can be observed? This type of information, presented well, can be very powerful.
Predictive. In the predictive stage, we begin to change our time horizon towards the future. What trends do we see? What events correlates to what happened. And even, what might happen? In this stage of analytics, we create predictive models, grounded in past data, of what might happen in the future.
Prescriptive. In the last stage of analytics—the holy grail of analytics—we move from predictions to prescriptions. Given where I am, and given where I want to go, what should I do? What is the optimal path for me to take? This is where the adaptive learning comes in.
The analytics applied in learning context allow us to make sure that we align assignments to curricula but also allow students to follow their inidividual paths avoiding disengagement or ceiling effect.
Finally the last concept I will introduce is the open architecture.
Here at McGraw Hill we have created an Open Analytics Architecture. What does this mean?
In an open system, data and learning events can be sourced from many data sources. Why is this important? Because I can guarantee you that no one system or product has a complete picture of a student’s learning. The content tools you use “know” about a certain set of learning interactions
We use a standards body that provides a set pf requiremenst regarding how learning events data should be formatted and structured. This way we can guarantee a communication between different systems.
Here we transform and store the data collected from learning environments.
Finally, the last piece of our open architecture are the products and platforms that consume data from the analytics platform. These could be user-facing visualization products or any other system.