My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Kibana) - LASI 2016 - Andrii Vozniuk, Maria Rodriguez-Triana, Denis Gillet
1. Interactive Learning Analytics
with ELK
(Elasticsearch Logstash Kibana)
Andrii Vozniuk, María Jesús Rodríguez-Triana, Denis Gillet
The copyright of images belongs to their authors. I will remove them on demand. Contact me at andrii.vozniuk@epfl.ch
Bilbao, June 2016
Learning Analytics Summer Institute (LASI)
Workshop description: http://lasi16.snola.es/#!/schedule/113
3. Goals of the Workshop
1. Get understanding of modern, scalable
analytics tools: Elasticsearch & Kibana
2. Get hands-on experience of interactive
learning analytics with the tools
3. Elaborate on how the tools can be
useful in your specific cases
10. How the
tools are
being used
?We (researchers) want to know it
Teachers want to know it
Even students want to know it
Knowledge managers as well
for awareness & reflection
15. What is the purpose of xAPI?
xAPI = Experience API
It is a standard
to capture in a unified way
experiences of the user,
in our case user-tool interactions
17. How does xAPI Work?
Users interact with tools
These interactions are observed and
recorded by the tools as xAPI statements
The tools [store] and send the statements to a
central system (Learning Record Store or
LRS) for further usage, for instance, Analysis
18. xAPI Format Specification
• When? - timestamp
• Who? - actor
• Did what? - verb
• With what? - object
• With what result? - result
• In which context? - context
Today at 10:15 (time) Andrii (actor) answered (verb) the question five
(object) with the grade four (result), while it was raining outside (context)
23. The ELK Stack
Elasticsearch - search and analytics
database based on Lucene
Logstash - data ingestion and
transformation
Kibana - Interactive data visualisation
27. Logs are not the best format for analysis
Especially, when coming from multiple systems
28. Getting timestamp should
be easy, right?
• Apache [19/Feb/2015:19:00:00 +0000]
• Unix timestamp 1424372400
• log4j [2015-02-19 19:00:00,000]
• postfix.log Feb 19 19:00:00
• ISO 8601 2015-02-19T19:00:00+02:00
• …
Over 40 formats
Not sexy work to do
29. Logstash Solves
the Problem
It collects, transforms and
transmits logs (streams)
So they can be stored and analysed
in a centralised and unified way
36. Relational DBs vs Elasticsearch
Database
Table
Row
Column
Schema
Index
Type
Document
Field
Mapping
37. Default mapping is created automatically
Mapping
Differently from relational databases,
where schema is static,
in Elasticsearch mapping is flexible
Core types: string, integer/long, float/double, boolean, and null
Other types: Array, Object, Nested, IP, GeoPoint, GeoShape,
Attachment
Tells Elasticsearch how to treat the data
40. Query
Elasticsearch provides an expressive
Query Domain Specific Language
to query the data
Can be thought as SQL for non-
relational data.
The query itself is a JSON Object.
46. Kibana
• Enables near-real-time analysis and
visualisation of streaming data
• Allows interactive data exploration and
supports cross-filtering
• Multiple chart types: bar charts, line and
scatter plots, histograms, pie charts,
maps
• No need to know programming or query
language in most of the cases
• It’s open-source, there are extensions
55. Group Work
What to do?
1. Formulate a research question
2. Build a visualisation to answer it, save it
3. If you have time left, repeat 1 and 2 :)
Possible research question examples
• Who are the most active users? And per country?
• What are the most popular types of interaction in general?
• What are the content items (objects) used the most per country (per user)?
• What types of interaction are the most common? Per country? Per user?
• With whom the most active users communicate (message or mention)?
• From which countries most of the interactions happen?
go.epfl.ch/lasi2016
login: admin
pass: lasi2016
62. Possible Architecture
Platform provides a UI
to interact with the data
without storing the data
Data is stored on premises:
in a LRS of a school
or of a user
Push the traces
Get the analysis
66. Online
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Go through it. It is really good.
Elasticsearch Reference
Multiple tutorials
http://www.elasticsearchtutorial.com/
https://www.elastic.co/guide/index.html
ELK Guides
YouTube Videos
https://www.youtube.com/watch?v=Kqs7UcCJquM
https://www.youtube.com/watch?v=wHWb1d_VGp8
https://www.youtube.com/watch?v=1gnpzL9jBqY
https://www.youtube.com/watch?v=SH5hLM2asB8