SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Data Journalism

Online Journalism - Magazines MA
          City University
        February 16 2012
What is data journalism?
The key thing here is to learn how to solve
your own problems. Asking a tutor should be
your last resort - they will not be there for the rest
of your life!
1.Coming up with a question
You need to find a data source. But where?Spend 15 minutes mapping out potential
data sources related to your field. They might be commercial or governmental; they
might need collecting or already be compiled somewhere. For example, if your field
was cycling there will be :
   ● transport data
   ● crime data
   ● health data (encouraging people to cycle as part of healthy lifestyle, for
     example)
   ● environmental data (pollution)
   ● community data (things being shared online by cyclists)

Also take a look at the examples at http://delicious.com/paulb/foieg
2. Use advanced search techniques to find data for a journalistic
question




There are lots of different ways to search, not just typing things
into Google.

You can limit by file type, domain, site and use Boolean limits.
● Limit by filetype:
    ○ filetype:xls will restrict results to Excel spreadsheets;
    ○ filetype:csv to 'comma separated values' spreadsheets;
    ○ filetype:doc to Word documents - often used for internal documents
    ○ filetype:pdf to PDFs - often used for official reports
● Limit by domain:
          ■ site:gov.uk will restrict results to UK government websites
          ■ .ac.uk to UK educational establishments (not all of them
            reputable) - the US equivalent is .edu
          ■ .org.uk to (mostly) nonprofit organisations - again, this is not
            guaranteed. You can also try .org although this will include
            results from other countries.
          ■ .mod.uk - the Ministry of Defence
          ■ .nhs.uk - NHS sites
          ■ .dh.gov.uk - Department of Health
          ■ .police.uk - police websites, including British Transport Police,
            the Met
    ○ Limit by website:
          ■ site:bolton.gov.uk will further limit results to just one website,
            rather than all local authority websites.
          ■ Likewise site:city.ac.uk would only return results from City
            University's website
    ○ You can limit your search further by using quotation marks so that
      only pages containing the exact phrase are returned, e.g. "annual
      report"
    ○ You can also expand it by using 'Boolean' operators like OR, e.g.
Then put it all together:

e.g. "deaths in police custody filetype:xls site:gov.uk"




Try other 'operators' such as

  ● + before a search term to ensure it is in the pages
    themselves, e.g. +custody
  ● phrases in quotes, e.g. "deaths in custody"
  ● The * wildcard, e.g. "deaths in * custody"
  ● The ~ operator for synonyms, e.g. ~deaths
3. Making sense of the data
Chances are that the data you've found will raise further questions.
There may be:
  ● jargon that you need to understand,
  ● codes that need translating,
  ● holes in the data,
  ● contextual data needed: the populations of different regions; data
    for previous years; etc.
  ● questions about how it was gathered - the methodology

  Use your journalistic skills to answer those
                 questions.
Spreadsheet skills
You can also use some spreadsheet techniques to put the data into a
form that is going to be easier to interrogate - for example try the
following:

 ● split addresses so that the postcode is in a separate column
   (Data > Text into columns in Excel, or =SPLIT in Google Docs) -
   or separate forename and surname.
 ● Or you want to count how many times a value appears
   (=COUNTIF), or how many values are above a certain number.
 ● Work out the total using =SUM(D:D) if your numbers are in
   column D, for example
 ● Work out the amount per day by using =SUM(D:D)/30 for a 30
   day month, etc.
 ● Work out a median average by using a formula like =MEDIAN(D:
   D). Compare that with other types of average like =AVERAGE(D:
   D) or =MODE(D:D)
4. Basic visualisations
Find a transcript of a politician's - or two politicians' - speeches and
visualise them using Wordle.com, Tagxedo or ManyEyes. (The
advanced search techniques mentioned above may help)

You can either compare one politician's speeches on a particular issue before
and after taking office - or one politician's speech with his or her replacement.

Spend some time tweaking the visualisation:

  ● Are similar words treated differently, e.g. "patient" and "patients" or
    "choice" and "options"? Should you combine the counts to clarify the
    emphases? What are the ethical issues of doing so?
  ● Should you reduce your sample to the top 10 or 20 words or phrases to
    make it clearer?
  ● Can you customise the words included (try copying into a text editor first),
    colour scheme, arrangement, fonts, etc. to greater effect?
  ● Is a word cloud best - or should you use a bar chart based on word
    counts?
Advanced tutorial 1 - GDoc webscraper

Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous.
com/tag/importhtml
...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog.
com/tag/importxml (start from the bottom)

For a really 'live' scraper, see instructions on how to grab XML from Backtweets or
RSS from a Twitter search in this tutorial:
http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter-
data/
Advanced tutorial 2 - interrogating data

Follow the tutorial at http://excelnotes.posterous.com/tag/filters
And the one at http://excelnotes.posterous.com/tag/sumifs

Or if you want to play with Google Refine, search for 'Getting Started
With Local Council Spending Data' or go to http://blog.ouseful.
info/2011/01/28/getting-started-with-local-council-spending-data/
Advanced tutorial 3 - Scraper tools

Data can come in all sorts of forms. Based on the data you found already, try
one or more of the following:

  ● Using a PDF conversion service to get to the data within - a list here: http:
    //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www.
    pdftoexcelonline.com/


  ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub
    (free version stores 100 results; buy a licence for more)

Weitere ähnliche Inhalte

Mehr von Patrick Smith

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Patrick Smith
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityPatrick Smith
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012Patrick Smith
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012Patrick Smith
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaPatrick Smith
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingPatrick Smith
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyPatrick Smith
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksPatrick Smith
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksPatrick Smith
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithPatrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 marchPatrick Smith
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithPatrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Patrick Smith
 

Mehr von Patrick Smith (13)

Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013Mobile, desktop and journalism: The digital economy in 2013
Mobile, desktop and journalism: The digital economy in 2013
 
Journalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high qualityJournalism, media and technology predictions 2013 final high quality
Journalism, media and technology predictions 2013 final high quality
 
UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012UBM cross media marketing white paper - June 2012
UBM cross media marketing white paper - June 2012
 
Pulse social media pres - July 2012
Pulse social media pres - July 2012Pulse social media pres - July 2012
Pulse social media pres - July 2012
 
City Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - MultimediaCity Journalism - Magazine MA - week 7 - Multimedia
City Journalism - Magazine MA - week 7 - Multimedia
 
City Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reportingCity Journalism - Magazine MA - week 5 - Live reporting
City Journalism - Magazine MA - week 5 - Live reporting
 
City Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategyCity Journalism - Magazine MA - week 4 - Content strategy
City Journalism - Magazine MA - week 4 - Content strategy
 
City Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networksCity Journalism - Magazine MA - week 4 - Choosing networks
City Journalism - Magazine MA - week 4 - Choosing networks
 
City Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networksCity Journalism - Magazine MA - week 4 - Identifying networks
City Journalism - Magazine MA - week 4 - Identifying networks
 
SIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick SmithSIPA UK 2011 - Presentation by Patrick Smith
SIPA UK 2011 - Presentation by Patrick Smith
 
Data journalism 10 march
Data journalism   10 marchData journalism   10 march
Data journalism 10 march
 
It’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick SmithIt’s a good time to be in journalism - Patrick Smith
It’s a good time to be in journalism - Patrick Smith
 
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...Violence, Society and Communication: the Vikings and Pattern of Violence in E...
Violence, Society and Communication: the Vikings and Pattern of Violence in E...
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

City Journalism - Magazines MA - week 8 - Data journalism

  • 1. Data Journalism Online Journalism - Magazines MA City University February 16 2012
  • 2. What is data journalism?
  • 3. The key thing here is to learn how to solve your own problems. Asking a tutor should be your last resort - they will not be there for the rest of your life!
  • 4. 1.Coming up with a question You need to find a data source. But where?Spend 15 minutes mapping out potential data sources related to your field. They might be commercial or governmental; they might need collecting or already be compiled somewhere. For example, if your field was cycling there will be : ● transport data ● crime data ● health data (encouraging people to cycle as part of healthy lifestyle, for example) ● environmental data (pollution) ● community data (things being shared online by cyclists) Also take a look at the examples at http://delicious.com/paulb/foieg
  • 5. 2. Use advanced search techniques to find data for a journalistic question There are lots of different ways to search, not just typing things into Google. You can limit by file type, domain, site and use Boolean limits.
  • 6. ● Limit by filetype: ○ filetype:xls will restrict results to Excel spreadsheets; ○ filetype:csv to 'comma separated values' spreadsheets; ○ filetype:doc to Word documents - often used for internal documents ○ filetype:pdf to PDFs - often used for official reports ● Limit by domain: ■ site:gov.uk will restrict results to UK government websites ■ .ac.uk to UK educational establishments (not all of them reputable) - the US equivalent is .edu ■ .org.uk to (mostly) nonprofit organisations - again, this is not guaranteed. You can also try .org although this will include results from other countries. ■ .mod.uk - the Ministry of Defence ■ .nhs.uk - NHS sites ■ .dh.gov.uk - Department of Health ■ .police.uk - police websites, including British Transport Police, the Met ○ Limit by website: ■ site:bolton.gov.uk will further limit results to just one website, rather than all local authority websites. ■ Likewise site:city.ac.uk would only return results from City University's website ○ You can limit your search further by using quotation marks so that only pages containing the exact phrase are returned, e.g. "annual report" ○ You can also expand it by using 'Boolean' operators like OR, e.g.
  • 7. Then put it all together: e.g. "deaths in police custody filetype:xls site:gov.uk" Try other 'operators' such as ● + before a search term to ensure it is in the pages themselves, e.g. +custody ● phrases in quotes, e.g. "deaths in custody" ● The * wildcard, e.g. "deaths in * custody" ● The ~ operator for synonyms, e.g. ~deaths
  • 8. 3. Making sense of the data Chances are that the data you've found will raise further questions. There may be: ● jargon that you need to understand, ● codes that need translating, ● holes in the data, ● contextual data needed: the populations of different regions; data for previous years; etc. ● questions about how it was gathered - the methodology Use your journalistic skills to answer those questions.
  • 9. Spreadsheet skills You can also use some spreadsheet techniques to put the data into a form that is going to be easier to interrogate - for example try the following: ● split addresses so that the postcode is in a separate column (Data > Text into columns in Excel, or =SPLIT in Google Docs) - or separate forename and surname. ● Or you want to count how many times a value appears (=COUNTIF), or how many values are above a certain number. ● Work out the total using =SUM(D:D) if your numbers are in column D, for example ● Work out the amount per day by using =SUM(D:D)/30 for a 30 day month, etc. ● Work out a median average by using a formula like =MEDIAN(D: D). Compare that with other types of average like =AVERAGE(D: D) or =MODE(D:D)
  • 10. 4. Basic visualisations Find a transcript of a politician's - or two politicians' - speeches and visualise them using Wordle.com, Tagxedo or ManyEyes. (The advanced search techniques mentioned above may help) You can either compare one politician's speeches on a particular issue before and after taking office - or one politician's speech with his or her replacement. Spend some time tweaking the visualisation: ● Are similar words treated differently, e.g. "patient" and "patients" or "choice" and "options"? Should you combine the counts to clarify the emphases? What are the ethical issues of doing so? ● Should you reduce your sample to the top 10 or 20 words or phrases to make it clearer? ● Can you customise the words included (try copying into a text editor first), colour scheme, arrangement, fonts, etc. to greater effect? ● Is a word cloud best - or should you use a bar chart based on word counts?
  • 11. Advanced tutorial 1 - GDoc webscraper Follow the tutorials tagged 'importHTML' on Excel Notes: http://excelnotes.posterous. com/tag/importhtml ...and 'importXML' on the Online Journalism Blog - http://onlinejournalismblog. com/tag/importxml (start from the bottom) For a really 'live' scraper, see instructions on how to grab XML from Backtweets or RSS from a Twitter search in this tutorial: http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter- data/
  • 12. Advanced tutorial 2 - interrogating data Follow the tutorial at http://excelnotes.posterous.com/tag/filters And the one at http://excelnotes.posterous.com/tag/sumifs Or if you want to play with Google Refine, search for 'Getting Started With Local Council Spending Data' or go to http://blog.ouseful. info/2011/01/28/getting-started-with-local-council-spending-data/
  • 13. Advanced tutorial 3 - Scraper tools Data can come in all sorts of forms. Based on the data you found already, try one or more of the following: ● Using a PDF conversion service to get to the data within - a list here: http: //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www. pdftoexcelonline.com/ ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub (free version stores 100 results; buy a licence for more)