Visit to a blind student's school🧑🦯🧑🦯(community medicine)
DMI Workshop: Crawling and Scraping
1. CRAWLING AND SCRAPING
Noortje Marres (Goldsmiths, University of London)
Michael Stevenson (Digital Methods Initiative, University of Amsterdam)
Esther Weltevrede (Digital Methods Initiative, University of Amsterdam)
Digital Methods Summer School, 28 June 2011
Wednesday, June 29, 2011
2. CRAWLING AND SCRAPING
Techniques for online data capture and analysis:
• Issuecrawler
• Lippmanian Device
Implications for research methods:
• dynamic data sets
• formatted data
Real-time research?
Wednesday, June 29, 2011
3. MAPPING NETWORKS WITH ISSUE CRAWLER
A Web-based tool for the location and visualization of
hyperlink networks on the Web
Wednesday, June 29, 2011
4. Locating issue networks on the Web
How to demarcate networks that have configured around
specific affairs on the Web?
To do this, Issue Crawler relies on:
• well-chosen starting points or Web pages that disclose activity
around a particular issue on the Web by way of hyperlinks
• the ‘intelligence’ of aggregated, live hyperlinking
Wednesday, June 29, 2011
6. More about hyperlink analysis
Issue Crawler performs iterations of co-link analysis
• the critique of ‘absolute’ citation measures (as in: pagerank)
Compare this with co-citation analysis in the social studies of
science (Callon et al., 1983)
• topical relevance vs overall popularity
Wednesday, June 29, 2011
7. Issue Crawler as a tool of online social research (1/2)
To perform immanent critique of the supposed ‘egalitarianism’
of the Internet:
to highlight specific asymmetries of relevance and/or authority among
organizations’ Web pages
To deploy hyperlink analysis for purposes of issue analysis
in the politics of issues, “experts and activists define issues by sharing
information about them” (Heclo, 1974)
Wednesday, June 29, 2011
14. Crawling and analysis
Crawling
to a certain depth
Analysis
snowball
inter-actor
co-link
Iterate (optional)
Wednesday, June 29, 2011
15. Issue Crawler as a tool of online social research (2/2)
More generally, to adopt an empirical approach to the study of
public controversies:
• is there a network? (is there an issue?)
• who are the actors?
• how are they related?
• what are the issues?
• where are they happening?
Wednesday, June 29, 2011
16. Co-link settings
1 iteration ~ social or event network
2 iterations ~ issue network
3 iterations ~ establishment network
See http://www.govcom.org/scenarios_use.htm
Wednesday, June 29, 2011
17. THE LIPPMANNIAN DEVICE*
Scraping and other digital methods skills
* a.k.a. The Google Scraper
Wednesday, June 29, 2011
18. WHEN SEARCH BECOMES RESEARCH
Turning Google into a research tool
Wednesday, June 29, 2011
19. WALTER LIPPMANN (1889-1974)
The Phantom Public, 1927
"The problem is to locate by clear
and coarse objective tests the
actor in a controversy who is most
worthy of public support" (p.120)
Wednesday, June 29, 2011
20. LIPPMANNIAN DEVICE - MODES OF ANALYSIS
Showing the partisanship of an actor.
Showing the issue agenda of an organization.
Issue Cloud Issue agenda.Which Source cloud Partisanship or
issues are on the agenda of an commitment. Which sources
organization or movement? mention the issue?
Wednesday, June 29, 2011
21. ISSUE CLOUD: GREENPEACE ISSUES
An organization’s issue agenda (or commitment)
Greenpeace has issues.
Which are they most committed to?
Wednesday, June 29, 2011
25. ISSUE CLOUD: GREENPEACE ISSUES
Greenpeace’s issue agenda (distribution of commitment)
Greenpeace's issue commitment. Greenpeace's campaign issue
list, ranked according to number of mentions of issues on
greenpeace.org, 11 October 2009.
Wednesday, June 29, 2011
26. EXAMPLE: SOURCE CLOUD
Method for showing the partisanship or commitment of sources to names
Method
1. Gather source list (e.g. through Issuecrawler or top google
results)
2. Query source list for one or more experts
Digital Methods Initiative, 2007
Wednesday, June 29, 2011
27. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS
Query design: What are the sources?
Climate Change Skeptics: Who recognizes them?
1. Top 100 results for the query “climate change”
http://www.google.com/search?q=%22climate+change
%22&num=100
Wednesday, June 29, 2011
28. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS
Query design: What are the issues?
Derive list of climate change skeptics
Sources: motherjones.com, wikipedia.org, heartland.org
Compare the three lists and retain the skeptics that are
mentioned in at least two of the lists
Wednesday, June 29, 2011
29. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS
Skeptics
S. Fred Singer
Robert Balling
Sallie Baliunas
Patrick Michaels
Richard Lindzen
Steven Milloy
Timothy Ball
Paul Driessen
Willie Soon
Sherwood B. Idso
Frederick Seitz
Wednesday, June 29, 2011
31. GOOGLE BLOCKING
Check query design before launching a scrape
Number of sources x number of issues = number of request to
Google
Wednesday, June 29, 2011
34. Climate Change Sceptics on the Web (Frederick Seitz)
Research Question_To what extent are climate change 'skeptics' present
in the climate change spaces on the Web?
Findings_There is distance between the skeptics and the top of the
search engine returns.
Body Text
epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0)
davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0)
greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0)
marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0)
acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0)
climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0)
realclimate.org (35)
metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0)
faqs.org (0)
ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0)
sourcewatch.org (21)
Body text
iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0)
climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0)
Source_google.com Product_of the Digital Methods Initiative,
Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond.
Date_30 July 2007 CLIMATE CHANGE
SCEPTICS
CC_BY:NC:SA
Wednesday, June 29, 2011
35. Climate Change Sceptics on the Web (Steven Milloy)
Research Question_To what extent are climate change 'skeptics' present
in the climate change spaces on the Web?
Findings_There is distance between the skeptics and the top of the
search engine returns.
Body Text
epa.gov (1) bbc.co.uk (0) defra.gov.uk (1) unep.org (1) bom.gov.au (0) ipcc.ch (1) pewclimate.org (1)
davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0)
greenpeace.org (1) climatechallenge.gov.uk (1) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0)
campaigncc.org (0) foe.co.uk (0) state.gov (1) eea.europa.eu (1) whoi.edu (1) cbc.ca (0) energy.gov (1)
marshall.org (0) climateark.org (2) un.org (0) dar.csiro.au (1) theglobeandmail.com (0) acfonline.org.au (0)
gcrio.org (0) nature.com (0) grida.no (0) nature.org (1) ecokids.ca (0) climatechangecentral.com (0)
iea.org (0) ecn.ac.uk (1) ecy.wa.gov (1) worldwildlife.org (0)
realclimate.org (33)
open2.net (0) eldis.org (0) ft.com (0) who.int (1) climatecrisis.net (1)
faqs.org (0) metoffice.gov.uk (1)
ltscotland.org.uk (1) abc.net.au (0)
climatechange.ca.gov (1) envirolink.org (1) mofa.go.jp (1)
Body text sourcewatch.org (27) iucn.org (0) dfat.gov.au (0)
ncdc.noaa.gov (1) climatescience.gov (0) climatechangecollege.org (1) ciel.org (0) ucar.edu (0)
Source_google.com Product_of the Digital Methods Initiative,
Query_“Stephen Milloy” dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Stephen Milloy” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond.
Date_30 July 2007 CLIMATE CHANGE
SCEPTICS
CC_BY:NC:SA
Wednesday, June 29, 2011
36. Climate Change Sceptics on the Web (S. Fred Singer)
Research Question_To what extent are climate change 'skeptics' present
in the climate change spaces on the Web?
Findings_There is distance between the skeptics and the top of the
search engine returns.
Body Textepa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0)
davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0)
greenpeace.org (1) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0)
marshall.org (0) climateark.org (1) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0)
gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0)
climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0)
realclimate.org (14) faqs.org (0) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0)
eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0)
sourcewatch.org (64)
envirolink.org (0) mofa.go.jp (0)
Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (11)
climatechangecollege.org (0) ciel.org (0) ucar.edu (0)
Source_google.com Product_of the Digital Methods Initiative,
Query_“Fred Singer” dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Fred Singer” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond.
Date_30 July 2007 CLIMATE CHANGE
SCEPTICS
CC_BY:NC:SA
Wednesday, June 29, 2011
37. LIPPMANNIAN DEVICE
Modes of analysis
Issue agenda check. What are the current commitments of an
organization(s)?
Use the issue cloud
Partisanship check. Which side is an actor on?
Use the source cloud
Wednesday, June 29, 2011
38. Tools and references
http://tools.digitalmethods.net
http://digitalmethods.net
http://govcom.org
Wednesday, June 29, 2011
39. Climate Change Sceptics on the Web (Frederick Seitz)
Research Question_To what extent are climate change 'skeptics' present
in the climate change spaces on the Web?
Findings_There is distance between the skeptics and the top of the
search engine returns.
epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0)
davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0)
greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0)
Body Text
marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0)
acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0)
climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0)
realclimate.org (35)
metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0)
faqs.org (0)
ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0)
sourcewatch.org (21)
Body text
iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0)
climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0)
Source_google.com Product_of the Digital Methods Initiative,
Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond.
Date_30 July 2007 CLIMATE CHANGE
SCEPTICS
CC_BY:NC:SA
40. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS
Research Question:
Which climate change issue actors mention the skeptics, and
what kinds of actors are more likely to mention them?
Method:
Comparative Query skeptics in two source sets (‘top’ sources
and climate change blogs), outputting source cloud.
41. SOURCE SETS
(1) Top ten Google returns for “climate change” (mix of media
as well as governmental organizations)
42. SOURCE SETS
(2) Climate change blogs network (IssueCrawler results - mix of
‘establishment’ blogs, media and governmental and non-
governmental organizations)
43. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS
Steps:
- Acquire source sets and skeptics list from Michael.
- Launch the Lippmannian device (aka Google Scraper - see
tools.digitalmethods.net).
- Enter source sets and skeptics names. Query the source sets
separately, and remember to use “” to get exact returns.
- Wait. Use this moment to discuss hypotheses.
- Explore the output, and present findings.