2. What is data literacy?
• Ability to read, use and communicate
data as information
• Think critically about data
– Understanding how to work with large
datasets, how they are produced, how to
connect various datasets, how to
interpret them
3. What is data?
• Webster meaning:
“facts or information used usually to
calculate, analyse, or plan something”
• Anything is data – text, image, numbers, …
• For computer to understand, data needs to
be in structured and machine-readable form
4. Why data literacy?
• Slowly but steadily data are forcing
their way into every nook and cranny
of the industry, company and job
11. Climate Change and Budget
Data
• We will be using these data for the
hands-on practice
12. 1. Prepare
• Ask questions
• Collect data
• Organise data
• Cleanse data
13. Prepare: Questions
• Is climate change priority area for Nepal?
– How much contribution does climate related
projects have in the total budget?
– How much contribution each ministry has in the
climate projects?
– Which ministry has the highest contribution in
climate change?
• A look at the specific projects
14. Prepare: Data collection
• Where’s the data on Climate Change
and Budget?
– Budget: redbook (mof.gov.np)
– Climate change project: NPC report
• PDF?
– Data extraction from PDF
http://goo.gl/oCzfaW
15. Data Extraction Tools
• CSV (most used open format)
• Html (in websites)
– many programming tools, google chrome
scraper
• Pdf – very difficult
– Pdf2text, Tabula
16. Tabula
• Tabula is a tool for liberating data
tables trapped inside PDF files.
• Tabula requires java and you have to
run the software
http://tabula.nerdpower.org/
17. Tabula
• Upload the file
• Highlight the table
• Download the data in csv format
• Challenges
– Data not always in the correct format
– Need cleaning/organising
19. Prepare: Organise data
• Data not in usable form
• data not being able to use formula if
in such format
• Use tools like excel, google refine to
organize data
– Add columns, remove columns, edit
columns, add new fields
20. Prepare: Clean data
• Data might still have issues
• The PDF data have comma and they
are not numbers
– They are text and can’t be summed up,
– Search and replace comma
http://goo.gl/4ZuoiC
23. Analyse: Answer
• How much contribution does climate
related projects have in the total budget?
– Add climate related projects budget
– Simple addition formula
24. Analyse: Answer
• How much contribution each ministry make
to the climate change?
– Not so simple answer
– Try for 5 minutes
– Create suitable chart
25. Analyse: Answer
• How much contribution each ministry has
in the climate change?
– Use of Pivot table
– 10 seconds
– Create suitable chart
26. Analyse: Answer
• Which ministry has the highest
contribution in terms of their budget
share?
– Can you do it now?
– HINT: You will need to get data from two
sources and work in a third sheet
– Create suitable chart
27. Apply
• Using results to communicate
answers
• Make decisions
• Visualisations
• Create stories from the data
28. Apply: Visualisations
• Putting patterns on the screen
• Identify the outliers
• Allows access to large amount of data
• Makes data relevant
29. Visualisations: Datawrapper
• Created by journalists for journalists
• Go to http://datawrapper.de
• Create account and paste relevant
data for sharing
e.g. GDP data (data journalism) – a number that could be fabricatedTI data on most corrupt agecies – parties – 1000 respondentsBetter tools will not lead to better journalism if not used with insight
Talk about data scientists, how they are exploiting data to make best use of the dataTop job in silicon valley is data scientists, 6000 companies hiring data scientists
Process prepare analyse apply
School datae.g. how to increase the graduation rate of my district schools?Where to find the information – DOE, ministry of education, other district websites?
Pdfredbook
Tabula running s3.yipl.com.np:8080
DatawrapperSimilar tools google fusion table? For sharing