SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Scraping the
      Olympics

Paul Bradshaw, author: Scraping for Journalists
                                 *
        Leanpub.com/scrapingforjournalists
?
Scraping basics
Combining data
Finding stories in data


                      *
*
Function (Parameters)
                  *
Function (Parameters)
=SUM(A2:A50)
=AVERAGE(B2:B300)
=COUNTIF(A10:A3000,”Smith”)


                       *
(“string”, index)
                *
Tip: search for
documentation     *
Tip: search for structure
      around data   *
*
//div[starts-with(@
class, ‘jobWrap’)]*
*
Combining data


          *
?
Question:
Which torchbearers are
from Dorset?


                    *
*
*
*
*
*
*
*
*
?
Finding leads:
Corporate torchbearers?



                   *
*
*
*
*
New entries - or
disappearing ones
               *
*
*
*
*
Leanpub.com/scrapingforjournalists
                      @paulbradshaw
             onlinejournalismblog.com
               helpmeinvestigate.com
        slideshare.net/onlinejournalist
                            *
      linkedin.com/in/onlinejournalist

Weitere ähnliche Inhalte

Mehr von Paul Bradshaw

How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RPaul Bradshaw
 
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalismChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalismPaul Bradshaw
 
Data journalism: history and roles
Data journalism: history and rolesData journalism: history and roles
Data journalism: history and rolesPaul Bradshaw
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approachesPaul Bradshaw
 
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniquesVisual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniquesPaul Bradshaw
 
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismPaul Bradshaw
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Paul Bradshaw
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Paul Bradshaw
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tipsPaul Bradshaw
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data storiesPaul Bradshaw
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyPaul Bradshaw
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Paul Bradshaw
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingPaul Bradshaw
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsPaul Bradshaw
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalismPaul Bradshaw
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesPaul Bradshaw
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalismPaul Bradshaw
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigationsPaul Bradshaw
 
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Paul Bradshaw
 
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Paul Bradshaw
 

Mehr von Paul Bradshaw (20)

How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
 
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalismChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
 
Data journalism: history and roles
Data journalism: history and rolesData journalism: history and roles
Data journalism: history and roles
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approaches
 
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniquesVisual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
 
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tips
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data stories
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalism
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigations
 
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
 
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
 

Kürzlich hochgeladen

Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

Scraping the Olympics