This document summarizes the story of the Refugee Datathon Munich. It details the organizers' efforts to obtain refugee data from the German government and create data pipelines to extract, clean, visualize and publish the information online. Their initial methods of extracting PDF tables and creating visualizations proved difficult. Later pipelines involving converting PDFs to CSV, loading the data into Elasticsearch and visualizing it in Kibana worked better. The datathon aims to inform the public and activists about refugee decisions and support refugees by giving them a voice. The IT community is well positioned to help with their skills and democratic values.
4. • Many ideas: Apps, teaching, internet access, matching platforms …
• We knew some pro refugee activists
• We saw them struggling to get basic data: Asylum applications, decisions
• And struggling to extract visualizations from it
• We worked in 4 „refugee datathons“ and created a data pipeline for them
IT people also wanted to contribute
!4
But how?
7. Get Data: Use the Freedom of Information Law
!7
• There’s a law requiring ministries
to publish data
(like the Freedom of Information Act)
• There’s a web portal made by
open data activists
to make such inquiries simple:
https://fragdenstaat.de
(ask-the-government)
8. Get Data: Use the Freedom of Information Law
!8
• We kept asking for the refugee
data until the ministry started
publishing it monthly on their
webpage
• o/
13. This works.
Pipeline A: PDF to datawrapper
!13
Get PDF
1 2 3 4 5
tabula: extract
csv
Clean &
process
csv data
Create
visualizations
with datawrapper
Publish on
Webpage
14. But Steps 4 and 5 are so much work!
Pipeline A: PDF to datawrapper
!14
1 2 3 4 5😩 😩
Get PDF tabula: extract
csv
Clean &
process
csv data
Create
visualizations
with datawrapper
Publish on
Webpage
15. R is magic.
Pipeline B: eurostat to R
!15
Get data from
API
r-eurostat
package
1 2 3 4 5
dplyr, magrittr tidyr Visualize with
ggplot2
Publish on
Webpage
16. But we aren’t magicians (yet)…
Pipeline B: eurostat to R
!16
Get data from
API
r-eurostat
package
1 2 3 4 5
dplyr, magrittr tidyr Visualize with
ggplot2
Publish on
Webpage
😖 😖
17. Pipeline C: eurostat to Elastic Stack
!17
Get data from
eurostat API
1 2 3 4 5
Through
node.js
Into
elasticsearch
Visualize in
Kibana
Publish on
Webpage
18. This time getting / pushing data is error prone, which leads to data errors
Pipeline C: eurostat to Elastic Stack
!18
Get data from
eurostat API
1 2 3 4 5
Through
node.js
Into
elasticsearch
Visualize in
Kibana
Publish on
Webpage
😵 😵
19. Let’s combine the parts that work.
Pipeline D: CSV to Elastic Stack
!19
Get PDF data
1 2 3 4 5
Create CSV Into
elasticsearch,
using logstash
Visualize in
Kibana
Publish on
Webpage
20. Let’s combine the parts that work.
Pipeline D: CSV to Elastic Stack
!20
Get PDF data
1 2 3 4 5
Create CSV Into
elasticsearch,
using logstash
Visualize in
Kibana
Publish on
Webpage
😀 🙂😀😀 😀
21. • Embedding Kibana dashboards and graphics is not where we’d like it to be
• If you want to make it more accessible for data journalism, we have some ideas
Wishlist
!21
22. • Because it’s a stack!
• Because Kibana gives power to the user
• Because it’s free software
• If you need your data protected, you can get this, too
Why the Elastic Stack
!22
😍
27. What is the Impact?
There’s a light in the darkness …
28. • There were only rumors:
„I have the impression that there very few of decisions on Afghanistan lately“
Before, there was Darkness
!28
29. A look at: Afghanistan
!29
Activists always want to see the quota of positive decisions.
For Afghanistan:
30. A look at: Afghanistan
!30
Look at the absolute numbers. What happened in early summer 2017?
31. !31
• A terrorist attack killed nearly 100 people,
injuring hundreds, damaging the German
embassy
• For some time, the ministry wasn’t sure how
to decide on asylum requests
from Afghanistan
This is a
sample imageKabul, end of May 2017
A Bomb Happened
Photo: dpa/AP/Rahmat Gul
32. • Cooperate with more local organizations
• Make Kibana more accessible
• Eurostat data
• Connect Europe wide with activists
• Create R pipeline
• More datathons
Plans for the Datathon
!32
34. • 65 Million displaced persons
— a lot, but not impossible to take care in a dignified way
• Most of us live in democracies, which is good
• Those democracies are organized along national borders,
which is bad for everyone with a passport of a failed state, or that hates you
• Refugees have no voice
The Refugee Situation
!34
One of the great challenges of our time
35. • International
• Progressive
• Optimistic, with an abundance mindset
• Agile
• We have a voice
The IT community
!35
Natural Allies
36. • We expect great impact with minimal effort
• We are used to getting free pizza
• We want to scale
• We are impatient
We are Spoiled
!36
Be humble, bring patience.
37. • This is not about helping others, this is about our dignity
• Using our democratic power
• We give technology, we get relevance
• And there are all those wonderful people
Why We Keep Going
!37
After more than 2 years
39. Except where otherwise noted, this work is licensed under
http://creativecommons.org/licenses/by-nd/4.0/
Creative Commons and the double C in a circle are
registered trademarks of Creative Commons in the United States and other countries.
Third party marks and brands are the property of their respective holders.
!39
Please attribute Elastic with a link to elastic.co