SlideShare ist ein Scribd-Unternehmen logo
1 von 29
An Introduction to
data cleaning with
spreadsheets
Anders Pedersen, @anpe
School of Data
Spreadsheets: The beginning of
each and every data story
• Which were the top growth sectors in this
quarter?
• What was the crime in the capital region in
2013 compared to 2012?
• Is there a house bubble waiting around the
corner?
It is time for journalists themselves to
tame this beast called spreadsheets!
Spreadsheets: Excel or
google docs
Some basic terminology
• data is organized in rows and columns
(rows go across the page, columns go top
down)
• each field holding data is called a cell
• Rows are numbered,
• columns are referred to by letters
• each cell has column and a row, or a
specific code (e.g. A1 is the top left cell
Some key features to
explore today
• Sorting and filtering
• Basic formulas
• Pivot tables
Tricky bits:
- don’t include summaries in pivot table
- pivot tables cannot remember when you
change your data
Data sources for exercise
• Education: Secondary school enrollment for
2012 from Data.gov.ph
http://data.gov.ph/catalogue/dataset/sy-
2012-enrollment-data-secondary
Sorting - finding the best
and the worst
• The 10 best paid sectors
• The 10 oldest cities
• The 10 poorest countries
• …
• If excel is a tool box for journalists, sorting
is the hammer!
How to sort
• 1) Mark all your data
• 2) In the Data tab go to
sort range
Sorting...
• 3) Check the Data has
header row check box
• 4) Select the
column you want to
sort
Filtering - getting a better
sense of your data
• 1) Turn on Filtering
via the Data tab
(Data → Filter)
Filtering...
• 2) Filter options now appear at top
Filtering...
• 3) Now click on the
• blue triangular arrow
Filtering...
• 4) Select the section
you wish to filter
Filtering...
• 5) A green arrow
will now appear on top
of the column
Moving forward!
• Sorting and filtering - check!
• Basic formulas
• Pivot tables
Basic formulas
• Let us know try to sum up some of the
values in the dataset…
• What is it good for: when you do analysis
and when you need to check if calculations
by your colleagues are right
Basic formulas
• Go to column H: In the second row
(cell H2), type “=sum(f2+g2)”
Basic formulas
• We now have a sum
• Now try to see if this cell can be calculated
for average “=average(f2:g2)”
Basic formulas
• You can also copy your calculations across
cells
Now only Pivot tables to go
• Sorting and filtering - check!
• Basic formulas - check!
• Pivot tables
Pivot tables
• finding stories inside datasets
• particularly well fitting for organised
datasets with clear categories and sub-
categories
Pivot tables
• Mark the full area of the dataset
• Go to Data → Pivot table report
Pivot tables
• Pivot tables allows you to work on rows,
column values and filters
• We start by dropping
a column header into Rows
• Then we drop one of our
value columns into Values
Basic formulas
• We now have a nice summary of the
budget for each department
Filtering pivot tables
• We can now go ahead and filter the Pivot
table
• Add the column you wish
to filter by
Filtering pivot tables
• Then select one or more categories within
the column you
wish to keep
Pivot tables
• We can finally add several value columns
to the pivot table
Exercises
• Find the sectors of the national budget that
grew the most in percentage
• Identify the budget lines, which had the
biggest absolute increase in the budget
• Generate a pivot table based on the
national budget comparing 2014 and 2013
in specific sectors

Weitere ähnliche Inhalte

Was ist angesagt?

Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
suganmca14
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1
Witoon Thammatuch-aree
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
Shalin Hai-Jew
 

Was ist angesagt? (20)

DBMS
DBMSDBMS
DBMS
 
Data mining
Data miningData mining
Data mining
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Text analytics
Text analyticsText analytics
Text analytics
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceCOVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
 
Database
DatabaseDatabase
Database
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Databases and types of databases
Databases and types of databasesDatabases and types of databases
Databases and types of databases
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 

Andere mochten auch

Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012 Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012
Anders Pedersen
 

Andere mochten auch (9)

Message map
Message mapMessage map
Message map
 
Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012 Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012
 
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
 
Geomapping Making Invisible Data Visible
Geomapping   Making Invisible Data VisibleGeomapping   Making Invisible Data Visible
Geomapping Making Invisible Data Visible
 
Censorship Regimes On The Chinese Internet
Censorship Regimes On The  Chinese InternetCensorship Regimes On The  Chinese Internet
Censorship Regimes On The Chinese Internet
 
An introduction to Data Journalism
An introduction to Data JournalismAn introduction to Data Journalism
An introduction to Data Journalism
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open data
 
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Ähnlich wie Introduction to data cleaning with spreadsheets

MIS 226: Chapter 1
MIS 226: Chapter 1MIS 226: Chapter 1
MIS 226: Chapter 1
macrob14
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
saurav3107pandey
 
Excel basics for everyday use part two
Excel basics for everyday use part twoExcel basics for everyday use part two
Excel basics for everyday use part two
Kevin McLogan
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
MitikuTeka1
 

Ähnlich wie Introduction to data cleaning with spreadsheets (20)

Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATABig Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
 
MS Excel training (Vidushi Khera)
MS Excel training (Vidushi Khera)MS Excel training (Vidushi Khera)
MS Excel training (Vidushi Khera)
 
IS100 Week 8
IS100 Week 8IS100 Week 8
IS100 Week 8
 
IS100 Week 9
IS100 Week 9IS100 Week 9
IS100 Week 9
 
Clueless to journal publishing
Clueless to journal publishingClueless to journal publishing
Clueless to journal publishing
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
 
intro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptxintro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptx
 
Elementary Data Analysis with MS Excel_Day-2
Elementary Data Analysis with MS Excel_Day-2Elementary Data Analysis with MS Excel_Day-2
Elementary Data Analysis with MS Excel_Day-2
 
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
 
MIS 226: Chapter 1
MIS 226: Chapter 1MIS 226: Chapter 1
MIS 226: Chapter 1
 
The Excel ToolKit
The Excel ToolKitThe Excel ToolKit
The Excel ToolKit
 
Exel
ExelExel
Exel
 
ENHANCING ICT SKILLS ON MS EXCEL.ppt
ENHANCING ICT SKILLS ON MS EXCEL.pptENHANCING ICT SKILLS ON MS EXCEL.ppt
ENHANCING ICT SKILLS ON MS EXCEL.ppt
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
 
Introduction to Spreadsheets.ppt
Introduction to Spreadsheets.pptIntroduction to Spreadsheets.ppt
Introduction to Spreadsheets.ppt
 
Pivot tables 1.2
Pivot tables 1.2Pivot tables 1.2
Pivot tables 1.2
 
EDA
EDAEDA
EDA
 
Excel basics for everyday use part two
Excel basics for everyday use part twoExcel basics for everyday use part two
Excel basics for everyday use part two
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 

Kürzlich hochgeladen

Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour
celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hourcelebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour
celebrity 💋 Agra Escorts Just Dail 8250092165 service available anytime 24 hour
 
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Junnar ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Junnar ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
2024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 292024: The FAR, Federal Acquisition Regulations - Part 29
2024: The FAR, Federal Acquisition Regulations - Part 29
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
PPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORS
PPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORSPPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORS
PPT BIJNOR COUNTING Counting of Votes on ETPBs (FOR SERVICE ELECTORS
 
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakurbest call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
best call girls in Pune - 450+ Call Girl Cash Payment 8005736733 Neha Thakur
 
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition PlansSustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
 
Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
 
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
 
Chakan ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Chakan ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Chakan ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Chakan ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Expressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptxExpressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptx
 
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
 
Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated  Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...Top Rated  Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
 
An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)
 
2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 

Introduction to data cleaning with spreadsheets

  • 1. An Introduction to data cleaning with spreadsheets Anders Pedersen, @anpe School of Data
  • 2. Spreadsheets: The beginning of each and every data story • Which were the top growth sectors in this quarter? • What was the crime in the capital region in 2013 compared to 2012? • Is there a house bubble waiting around the corner?
  • 3. It is time for journalists themselves to tame this beast called spreadsheets!
  • 5. Some basic terminology • data is organized in rows and columns (rows go across the page, columns go top down) • each field holding data is called a cell • Rows are numbered, • columns are referred to by letters • each cell has column and a row, or a specific code (e.g. A1 is the top left cell
  • 6. Some key features to explore today • Sorting and filtering • Basic formulas • Pivot tables Tricky bits: - don’t include summaries in pivot table - pivot tables cannot remember when you change your data
  • 7. Data sources for exercise • Education: Secondary school enrollment for 2012 from Data.gov.ph http://data.gov.ph/catalogue/dataset/sy- 2012-enrollment-data-secondary
  • 8. Sorting - finding the best and the worst • The 10 best paid sectors • The 10 oldest cities • The 10 poorest countries • … • If excel is a tool box for journalists, sorting is the hammer!
  • 9. How to sort • 1) Mark all your data • 2) In the Data tab go to sort range
  • 10. Sorting... • 3) Check the Data has header row check box • 4) Select the column you want to sort
  • 11. Filtering - getting a better sense of your data • 1) Turn on Filtering via the Data tab (Data → Filter)
  • 12. Filtering... • 2) Filter options now appear at top
  • 13. Filtering... • 3) Now click on the • blue triangular arrow
  • 14. Filtering... • 4) Select the section you wish to filter
  • 15. Filtering... • 5) A green arrow will now appear on top of the column
  • 16. Moving forward! • Sorting and filtering - check! • Basic formulas • Pivot tables
  • 17. Basic formulas • Let us know try to sum up some of the values in the dataset… • What is it good for: when you do analysis and when you need to check if calculations by your colleagues are right
  • 18. Basic formulas • Go to column H: In the second row (cell H2), type “=sum(f2+g2)”
  • 19. Basic formulas • We now have a sum • Now try to see if this cell can be calculated for average “=average(f2:g2)”
  • 20. Basic formulas • You can also copy your calculations across cells
  • 21. Now only Pivot tables to go • Sorting and filtering - check! • Basic formulas - check! • Pivot tables
  • 22. Pivot tables • finding stories inside datasets • particularly well fitting for organised datasets with clear categories and sub- categories
  • 23. Pivot tables • Mark the full area of the dataset • Go to Data → Pivot table report
  • 24. Pivot tables • Pivot tables allows you to work on rows, column values and filters • We start by dropping a column header into Rows • Then we drop one of our value columns into Values
  • 25. Basic formulas • We now have a nice summary of the budget for each department
  • 26. Filtering pivot tables • We can now go ahead and filter the Pivot table • Add the column you wish to filter by
  • 27. Filtering pivot tables • Then select one or more categories within the column you wish to keep
  • 28. Pivot tables • We can finally add several value columns to the pivot table
  • 29. Exercises • Find the sectors of the national budget that grew the most in percentage • Identify the budget lines, which had the biggest absolute increase in the budget • Generate a pivot table based on the national budget comparing 2014 and 2013 in specific sectors