Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Scraping
in 60 mins
How do you scrape?
Aron Pilhofer, News Rewired
WYSIWYG tools: OutWit Hub, Apify
Browser extensions: Web Scraper,
Grepsr,
Google Sheets’ =IMPORT functions
Workbench Data,...
OutWit Hub
*
Chrome extensions:
*
Edit column >
Add column by fetching URLs…
https://ifttt.com/channels
https://apify.com/apify/google-search-scraper
https://app.workbenchdata.com/workflows/
*
app.workbenchdata.co
m/workflows/22852
/22850
/25739
https://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
Robots.txt
http://www.tcij.org/robots.txt
Database rights
Data copyright
Terms & conditions
Legal considerations
https://moveplanner.zoopla.co.uk/terms-and-conditions
Treat like any source:
build in TGTBT checks
Seek second sources
Seek right of reply/
confirmation
Data is just a lead
http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
https://www.mediawiki.org/wiki/API:Main_page
Does it have an API?
https://github.com/BBC-Data-Unit/music-festivals
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Thank you.
Scraping in 60 minutes (CIJ Summer School 2019)
Nächste SlideShare
Wird geladen in …5
×

Scraping in 60 minutes (CIJ Summer School 2019)

161 Aufrufe

Veröffentlicht am

Workshop at the Centre for Investigative Journalism Summer School, July 2019 introducing useful tools for scraping database search results and Twitter

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Scraping in 60 minutes (CIJ Summer School 2019)

  1. 1. Paul Bradshaw Leanpub.com/scrapingforjournalists* Scraping in 60 mins
  2. 2. How do you scrape? Aron Pilhofer, News Rewired
  3. 3. WYSIWYG tools: OutWit Hub, Apify Browser extensions: Web Scraper, Grepsr, Google Sheets’ =IMPORT functions Workbench Data, IFTTT, Open Refine Morph. io Scraping tools
  4. 4. OutWit Hub
  5. 5. * Chrome extensions:
  6. 6. * Edit column > Add column by fetching URLs…
  7. 7. https://ifttt.com/channels
  8. 8. https://apify.com/apify/google-search-scraper
  9. 9. https://app.workbenchdata.com/workflows/
  10. 10. * app.workbenchdata.co m/workflows/22852 /22850 /25739
  11. 11. https://onlinejournalismblog.com/2013/09/18/ethics-in-data-journalism-mass-data-gathering-scraping-foi-and-deception/
  12. 12. Robots.txt http://www.tcij.org/robots.txt
  13. 13. Database rights Data copyright Terms & conditions Legal considerations
  14. 14. https://moveplanner.zoopla.co.uk/terms-and-conditions
  15. 15. Treat like any source: build in TGTBT checks Seek second sources Seek right of reply/ confirmation Data is just a lead
  16. 16. http://www.storybench.org/to-scrape-or-not-to-scrape-the-technical-and-ethical-challenges-of-collecting-data-off-the-web/
  17. 17. https://www.mediawiki.org/wiki/API:Main_page Does it have an API?
  18. 18. https://github.com/BBC-Data-Unit/music-festivals
  19. 19. Paul Bradshaw Leanpub.com/scrapingforjournalists* Thank you.

×