SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Data Journalism

Online Journalism - Magazines MA
          City University
        February 16 2012
What is data journalism?
The key thing here is to learn how to solve
your own problems. Asking a tutor should be
your last resort - they will not be there for the rest
of your life!
1.Coming up with a question
You need to find a data source. But where?Spend 15 minutes mapping out potential
data sources related to your field. They might be commercial or governmental; they
might need collecting or already be compiled somewhere. For example, if your field
was cycling there will be :
   ● transport data
   ● crime data
   ● health data (encouraging people to cycle as part of healthy lifestyle, for
     example)
   ● environmental data (pollution)
   ● community data (things being shared online by cyclists)

Also take a look at the examples at http://delicious.com/paulb/foieg
2. Use advanced search techniques to find data for a journalistic
question




There are lots of different ways to search, not just typing things
into Google.

You can limit by file type, domain, site and use Boolean limits.
● Limit by filetype:
    ○ filetype:xls will restrict results to Excel spreadsheets;
    ○ filetype:csv to 'comma separated values' spreadsheets;
    ○ filetype:doc to Word documents - often used for internal documents
    ○ filetype:pdf to PDFs - often used for official reports
● Limit by domain:
          ■ site:gov.uk will restrict results to UK government websites
          ■ .ac.uk to UK educational establishments (not all of them
            reputable) - the US equivalent is .edu
          ■ .org.uk to (mostly) nonprofit organisations - again, this is not
            guaranteed. You can also try .org although this will include
            results from other countries.
          ■ .mod.uk - the Ministry of Defence
          ■ .nhs.uk - NHS sites
          ■ .dh.gov.uk - Department of Health
          ■ .police.uk - police websites, including British Transport Police,
            the Met
    ○ Limit by website:
          ■ site:bolton.gov.uk will further limit results to just one website,
            rather than all local authority websites.
          ■ Likewise site:city.ac.uk would only return results from City
            University's website
    ○ You can limit your search further by using quotation marks so that
      only pages containing the exact phrase are returned, e.g. "annual
      report"
    ○ You can also expand it by using 'Boolean' operators like OR, e.g.
Then put it all together:

e.g. "deaths in police custody filetype:xls site:gov.uk"




Try other 'operators' such as

  ● + before a search term to ensure it is in the pages
    themselves, e.g. +custody
  ● phrases in quotes, e.g. "deaths in custody"
  ● The * wildcard, e.g. "deaths in * custody"
  ● The ~ operator for synonyms, e.g. ~deaths
3. Making sense of the data
Chances are that the data you've found will raise further questions.
There may be:
  ● jargon that you need to understand,
  ● codes that need translating,
  ● holes in the data,
  ● contextual data needed: the populations of different regions; data
    for previous years; etc.
  ● questions about how it was gathered - the methodology

  Use your journalistic skills to answer those
                 questions.
Spreadsheet skills
You can also use some spreadsheet techniques to put the data into a
form that is going to be easier to interrogate - for example try the
following:

 ● split addresses so that the postcode is in a separate column
   (Data > Text into columns in Excel, or =SPLIT in Google Docs) -
   or separate forename and surname.
 ● Or you want to count how many times a value appears
   (=COUNTIF), or how many values are above a certain number.
 ● Work out the total using =SUM(D:D) if your numbers are in
   column D, for example
 ● Work out the amount per day by using =SUM(D:D)/30 for a 30
   day month, etc.
 ● Work out a median average by using a formula like =MEDIAN(D:
   D). Compare that with other types of average like =AVERAGE(D:
   D) or =MODE(D:D)
4. Basic visualisations
Find a transcript of a politician's - or two politicians' - speeches and
visualise them using Wordle.com, Tagxedo or ManyEyes. (The
advanced search techniques mentioned above may help)

You can either compare one politician's speeches on a particular issue before
and after taking office - or one politician's speech with his or her replacement.

Spend some time tweaking the visualisation:

  ● Are similar words treated differently, e.g. "patient" and "patients" or
    "choice" and "options"? Should you combine the counts to clarify the
    emphases? What are the ethical issues of doing so?
  ● Should you reduce your sample to the top 10 or 20 words or phrases to
    make it clearer?
  ● Can you customise the words included (try copying into a text editor first),
    colour scheme, arrangement, fonts, etc. to greater effect?
  ● Is a word cloud best - or should you use a bar chart based on word
    counts?
Advanced tutorial 1 - GDoc webscraper

Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous.
com/tag/importhtml
...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog.
com/tag/importxml (start from the bottom)

For a really 'live' scraper, see instructions on how to grab XML from Backtweets or
RSS from a Twitter search in this tutorial:
http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter-
data/
Advanced tutorial 2 - interrogating data

Follow the tutorial at http://excelnotes.posterous.com/tag/filters
And the one at http://excelnotes.posterous.com/tag/sumifs

Or if you want to play with Google Refine, search for 'Getting Started
With Local Council Spending Data' or go to http://blog.ouseful.
info/2011/01/28/getting-started-with-local-council-spending-data/
Advanced tutorial 3 - Scraper tools

Data can come in all sorts of forms. Based on the data you found already, try
one or more of the following:

  ● Using a PDF conversion service to get to the data within - a list here: http:
    //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www.
    pdftoexcelonline.com/


  ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub
    (free version stores 100 results; buy a licence for more)

Weitere ähnliche Inhalte

Mehr von Patrick Smith

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Patrick Smith
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityPatrick Smith
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012Patrick Smith
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012Patrick Smith
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaPatrick Smith
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingPatrick Smith
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyPatrick Smith
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksPatrick Smith
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksPatrick Smith
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithPatrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 marchPatrick Smith
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithPatrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Patrick Smith
 

Mehr von Patrick Smith (13)

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high quality
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - Multimedia
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reporting
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategy
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networks
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networks
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 march
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
 

Kürzlich hochgeladen

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Kürzlich hochgeladen (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

City Journalism - Magazines MA - week 8 - Data journalism

  • 1. Data Journalism Online Journalism - Magazines MA City University February 16 2012
  • 2. What is data journalism?
  • 3. The key thing here is to learn how to solve your own problems. Asking a tutor should be your last resort - they will not be there for the rest of your life!
  • 4. 1.Coming up with a question You need to find a data source. But where?Spend 15 minutes mapping out potential data sources related to your field. They might be commercial or governmental; they might need collecting or already be compiled somewhere. For example, if your field was cycling there will be : ● transport data ● crime data ● health data (encouraging people to cycle as part of healthy lifestyle, for example) ● environmental data (pollution) ● community data (things being shared online by cyclists) Also take a look at the examples at http://delicious.com/paulb/foieg
  • 5. 2. Use advanced search techniques to find data for a journalistic question There are lots of different ways to search, not just typing things into Google. You can limit by file type, domain, site and use Boolean limits.
  • 6. ● Limit by filetype: ○ filetype:xls will restrict results to Excel spreadsheets; ○ filetype:csv to 'comma separated values' spreadsheets; ○ filetype:doc to Word documents - often used for internal documents ○ filetype:pdf to PDFs - often used for official reports ● Limit by domain: ■ site:gov.uk will restrict results to UK government websites ■ .ac.uk to UK educational establishments (not all of them reputable) - the US equivalent is .edu ■ .org.uk to (mostly) nonprofit organisations - again, this is not guaranteed. You can also try .org although this will include results from other countries. ■ .mod.uk - the Ministry of Defence ■ .nhs.uk - NHS sites ■ .dh.gov.uk - Department of Health ■ .police.uk - police websites, including British Transport Police, the Met ○ Limit by website: ■ site:bolton.gov.uk will further limit results to just one website, rather than all local authority websites. ■ Likewise site:city.ac.uk would only return results from City University's website ○ You can limit your search further by using quotation marks so that only pages containing the exact phrase are returned, e.g. "annual report" ○ You can also expand it by using 'Boolean' operators like OR, e.g.
  • 7. Then put it all together: e.g. "deaths in police custody filetype:xls site:gov.uk" Try other 'operators' such as ● + before a search term to ensure it is in the pages themselves, e.g. +custody ● phrases in quotes, e.g. "deaths in custody" ● The * wildcard, e.g. "deaths in * custody" ● The ~ operator for synonyms, e.g. ~deaths
  • 8. 3. Making sense of the data Chances are that the data you've found will raise further questions. There may be: ● jargon that you need to understand, ● codes that need translating, ● holes in the data, ● contextual data needed: the populations of different regions; data for previous years; etc. ● questions about how it was gathered - the methodology Use your journalistic skills to answer those questions.
  • 9. Spreadsheet skills You can also use some spreadsheet techniques to put the data into a form that is going to be easier to interrogate - for example try the following: ● split addresses so that the postcode is in a separate column (Data > Text into columns in Excel, or =SPLIT in Google Docs) - or separate forename and surname. ● Or you want to count how many times a value appears (=COUNTIF), or how many values are above a certain number. ● Work out the total using =SUM(D:D) if your numbers are in column D, for example ● Work out the amount per day by using =SUM(D:D)/30 for a 30 day month, etc. ● Work out a median average by using a formula like =MEDIAN(D: D). Compare that with other types of average like =AVERAGE(D: D) or =MODE(D:D)
  • 10. 4. Basic visualisations Find a transcript of a politician's - or two politicians' - speeches and visualise them using Wordle.com, Tagxedo or ManyEyes. (The advanced search techniques mentioned above may help) You can either compare one politician's speeches on a particular issue before and after taking office - or one politician's speech with his or her replacement. Spend some time tweaking the visualisation: ● Are similar words treated differently, e.g. "patient" and "patients" or "choice" and "options"? Should you combine the counts to clarify the emphases? What are the ethical issues of doing so? ● Should you reduce your sample to the top 10 or 20 words or phrases to make it clearer? ● Can you customise the words included (try copying into a text editor first), colour scheme, arrangement, fonts, etc. to greater effect? ● Is a word cloud best - or should you use a bar chart based on word counts?
  • 11. Advanced tutorial 1 - GDoc webscraper Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous. com/tag/importhtml ...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog. com/tag/importxml (start from the bottom) For a really 'live' scraper, see instructions on how to grab XML from Backtweets or RSS from a Twitter search in this tutorial: http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter- data/
  • 12. Advanced tutorial 2 - interrogating data Follow the tutorial at http://excelnotes.posterous.com/tag/filters And the one at http://excelnotes.posterous.com/tag/sumifs Or if you want to play with Google Refine, search for 'Getting Started With Local Council Spending Data' or go to http://blog.ouseful. info/2011/01/28/getting-started-with-local-council-spending-data/
  • 13. Advanced tutorial 3 - Scraper tools Data can come in all sorts of forms. Based on the data you found already, try one or more of the following: ● Using a PDF conversion service to get to the data within - a list here: http: //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www. pdftoexcelonline.com/ ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub (free version stores 100 results; buy a licence for more)