SlideShare ist ein Scribd-Unternehmen logo
1 von 43
CRAWLING AND SCRAPING




            Noortje Marres (Goldsmiths, University of London)

            Michael Stevenson (Digital Methods Initiative, University of Amsterdam)

            Esther Weltevrede (Digital Methods Initiative, University of Amsterdam)


            Digital Methods Summer School, 28 June 2011




Wednesday, June 29, 2011
CRAWLING AND SCRAPING




            Techniques for online data capture and analysis:

                  • Issuecrawler
                  • Lippmanian Device

            Implications for research methods:

                  • dynamic data sets
                  • formatted data

            Real-time research?


Wednesday, June 29, 2011
MAPPING NETWORKS WITH ISSUE CRAWLER




                       A Web-based tool for the location and visualization of
                                 hyperlink networks on the Web




Wednesday, June 29, 2011
Locating issue networks on the Web

            How to demarcate networks that have configured around
            specific affairs on the Web?

            To do this, Issue Crawler relies on:

                  • well-chosen starting points or Web pages that disclose activity
                  around a particular issue on the Web by way of hyperlinks

                  • the ‘intelligence’ of aggregated, live hyperlinking



Wednesday, June 29, 2011
Extractive Industries Review network, 2004


Wednesday, June 29, 2011
More about hyperlink analysis


            Issue Crawler performs iterations of co-link analysis

                  • the critique of ‘absolute’ citation measures (as in: pagerank)

            Compare this with co-citation analysis in the social studies of
            science (Callon et al., 1983)

                  • topical relevance vs overall popularity




Wednesday, June 29, 2011
Issue Crawler as a tool of online social research (1/2)

            To perform immanent critique of the supposed ‘egalitarianism’
            of the Internet:

                  to highlight specific asymmetries of relevance and/or authority among
                  organizations’ Web pages


                  To deploy hyperlink analysis for purposes of issue analysis

                  in the politics of issues, “experts and activists define issues by sharing
                  information about them” (Heclo, 1974)




Wednesday, June 29, 2011
fëëìÉë=áå=íÜÉ=cÉêÖ~å~=s~ääÉóI=
  ròÄÉâáëí~åI=~ÅÅçêÇáåÖ=íç=íÜÉ=tÉÄK
  c~ää=OMMN




  fëëìÉë=~êÉ=çå=íÜÉ=tÉÄI=Äìí=ïÜáÅÜ=áëëìÉë=
  qÜÉ=ÅÜ~ê~ÅíÉêáë~íáçå=çÑ=íÜÉ=cÉêÖ~å~=s~ääÉó=áëëìÉë=
  ÇÉéÉåÇë=çå=íÜÉ=ëáíÉë=~ÅÅÉëëÉÇK




                                                       jìãíçòÄÉÖáã                                                kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ=
                                                       _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ                       çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ=
                                                       dáêäDë=píìÇáçI=hçâ~åÇ                                      çå=íÜÉ=tÉÄK
                                                       kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ
                                                       j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ
                                                       j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ
                                                       fÑíáòçê=léÉå=vçìíÜ=`äìÄ
                                                       pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG
                                                       ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ
                                                       `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~
                                                        g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå
                                                        fÑíáòçê=`ÉåíêÉI=hçâ~åÇ
                                                        j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å
                                                                               ä
Wednesday, June 29, 2011                                oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
jìãíçòÄÉÖáã                                                 kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ=
                           _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ                        çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ=
                           dáêäDë=píìÇáçI=hçâ~åÇ                                       çå=íÜÉ=tÉÄK
                           kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ
                           j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ
                           j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ
                           fÑíáòçê=léÉå=vçìíÜ=`äìÄ
                           pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG
                           ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ
                           `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~
                            g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå
                            fÑíáòçê=`ÉåíêÉI=hçâ~åÇ
                            j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å
                                                   ä
                            oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
                            jÉÜêJp~Üçî~í=`Ü~êáí~ÄäÉ=`ÉåíêÉ
                            b`lp^k=fåíÉêå~íáçå~ä=cçìåÇ~íáçåI=hçâ~åÇG
                            jìëë~Ñç=bÅçäçÖáÅ~ä=`ÉåíêÉI=hçâ~åÇ                          Gdlkdl=EÖçîÉêåãÉåíJçêÖ~åáëÉÇ=kdlF




                                                                                      kç=fåíÉêåÉíI=åç=áëëìÉë=Ñêçã=íÜÉ=ÖêçìåÇ=
                                                                                      kdlÛë=áå=íÜÉ=cÉêÖ~å~=s~ääÉó=ã~ó=åçí=Ü~îÉ=tÉÄ=ëáíÉëI=
                                                                                      Äìí=íÜÉáê=áëëìÉë=~êÉ=çå=íÜÉ=tÉÄK




Wednesday, June 29, 2011
Issue Crawler




            How to use it




Wednesday, June 29, 2011
Issue Crawler




            http://issuecrawler.net

                  Request account and log in




Wednesday, June 29, 2011
Issue Crawler lobby



            News
              workshops, software

            Queue
              time sharing

            Current
              three simultaneous crawlers



Wednesday, June 29, 2011
Issue Crawler harvester




            Enter text, URLs will be stripped out




Wednesday, June 29, 2011
Crawling and analysis

            Crawling
              to a certain depth

            Analysis
              snowball
              inter-actor
              co-link

            Iterate (optional)




Wednesday, June 29, 2011
Issue Crawler as a tool of online social research (2/2)


            More generally, to adopt an empirical approach to the study of
            public controversies:

                  • is there a network? (is there an issue?)
                  • who are the actors?
                  • how are they related?
                  • what are the issues?
                  • where are they happening?




Wednesday, June 29, 2011
Co-link settings


            1 iteration ~ social or event network

            2 iterations ~ issue network

            3 iterations ~ establishment network


            See http://www.govcom.org/scenarios_use.htm




Wednesday, June 29, 2011
THE LIPPMANNIAN DEVICE*

            Scraping and other digital methods skills




            * a.k.a. The Google Scraper




Wednesday, June 29, 2011
WHEN SEARCH BECOMES RESEARCH




                  Turning Google into a research tool




Wednesday, June 29, 2011
WALTER LIPPMANN (1889-1974)

            The Phantom Public, 1927




            "The problem is to locate by clear
            and coarse objective tests the
            actor in a controversy who is most
            worthy of public support" (p.120)




Wednesday, June 29, 2011
LIPPMANNIAN DEVICE - MODES OF ANALYSIS
            Showing the partisanship of an actor.
            Showing the issue agenda of an organization.



            Issue Cloud Issue agenda.Which                 Source cloud Partisanship or
            issues are on the agenda of an                 commitment. Which sources
            organization or movement?                      mention the issue?




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            An organization’s issue agenda (or commitment)




            Greenpeace has issues.
            Which are they most committed to?




Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            Greenpeace issues, http://www.greenpeace.org/international/campaigns.

            Stop climate change
            Protect ancient forests
            Defending our Oceans
            Say no to genetic engineering
            Eliminate toxic chemicals
            Demand Peace and Disarmament
            End the nuclear age
            Encourage sustainable trade

            Keep most significant issue language:
            "climate change"
            "ancient forests"
            “oceans”
            "genetic engineering"
            "toxic chemicals"
            “disarmament”
            "nuclear power"
            "sustainable trade"                             ---> Query Design workshop
Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
ISSUE CLOUD: GREENPEACE ISSUES

            Greenpeace’s issue agenda (distribution of commitment)




            Greenpeace's issue commitment. Greenpeace's campaign issue
            list, ranked according to number of mentions of issues on
            greenpeace.org, 11 October 2009.
Wednesday, June 29, 2011
EXAMPLE: SOURCE CLOUD

            Method for showing the partisanship or commitment of sources to names




            Method
            1. Gather source list (e.g. through Issuecrawler or top google
            results)
            2. Query source list for one or more experts




            Digital Methods Initiative, 2007




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Query design: What are the sources?




            Climate Change Skeptics: Who recognizes them?

            1. Top 100 results for the query “climate change”

            http://www.google.com/search?q=%22climate+change
            %22&num=100




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Query design: What are the issues?




            Derive list of climate change skeptics
               Sources: motherjones.com, wikipedia.org, heartland.org

                     Compare the three lists and retain the skeptics that are
                     mentioned in at least two of the lists




Wednesday, June 29, 2011
SOURCE CLOUD: CLIMATE CHANGE SKEPTICS

            Skeptics




            S. Fred Singer
            Robert Balling
            Sallie Baliunas
            Patrick Michaels
            Richard Lindzen
            Steven Milloy
            Timothy Ball
            Paul Driessen
            Willie Soon
            Sherwood B. Idso
            Frederick Seitz



Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
GOOGLE BLOCKING




            Check query design before launching a scrape

            Number of sources x number of issues = number of request to
            Google




Wednesday, June 29, 2011
Body Text




            Body text




Wednesday, June 29, 2011
----> data visualization: clouding workshop

Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Frederick Seitz)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



              Body Text
                             epa.gov (0)     bbc.co.uk (0)         defra.gov.uk (0)      unep.org (0)        bom.gov.au (0)            ipcc.ch (0)         pewclimate.org (0)
                             davidsuzuki.org (0)       panda.org (0)     mfe.govt.nz (0)      ec.gc.ca (0)      exploratorium.edu (0)    climatechange.com.au (0)
                             greenpeace.org (0)       climatechallenge.gov.uk (0)       guardian.co.uk (0)       iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
                             foe.co.uk (0)    state.gov (0)        scidev.net (0)       eea.europa.eu (0)              whoi.edu (0)           cbc.ca (0)       energy.gov (0)
                             marshall.org (8)                     climateark.org (4)               un.org (0)           dar.csiro.au (0)         theglobeandmail.com (0)
                             acfonline.org.au (0)       gcrio.org (0)   nature.com (0)       grida.no (0)      nature.org (0)         ecokids.ca (0)       royalsoc.ac.uk (0)
                             climatechangecentral.com (0)                 iea.org (0)           ecn.ac.uk (0)                ecy.wa.gov (0)            worldwildlife.org (0)


                            realclimate.org (35)
                             metoffice.gov.uk (0)      open2.net (0)    scienceagogo.com (0)       eldis.org (0)  ft.com (0) who.int (0) climatecrisis.net (0)
                                                                                                                                                                  faqs.org (0)




                             ltscotland.org.uk (0)             abc.net.au (0)            climatechange.ca.gov (0)         envirolink.org (0)   mofa.go.jp (0)


                    sourcewatch.org (21)
              Body text
                                                                                                              iucn.org (0)         dfat.gov.au (0)         ncdc.noaa.gov (0)

                             climatescience.gov (0)            climatechangecollege.org (0)             ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                               Product_of the Digital Methods Initiative,
Query_“Frederick Seitz”                                                                                                         dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order.                                                       Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                    Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                CLIMATE CHANGE
                                                                                                                                                                                    SCEPTICS

                                                                                                                                CC_BY:NC:SA




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Steven Milloy)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



             Body Text
                             epa.gov (1)        bbc.co.uk (0)      defra.gov.uk (1)      unep.org (1)           bom.gov.au (0)       ipcc.ch (1)          pewclimate.org (1)
                             davidsuzuki.org (0)       panda.org (0)       mfe.govt.nz (0)       ec.gc.ca (0)     exploratorium.edu (0)            climatechange.com.au (0)
                             greenpeace.org (1)                  climatechallenge.gov.uk (1)               guardian.co.uk (0)                  iisd.org (0)       g8.gov.uk (0)
                             campaigncc.org (0)          foe.co.uk (0)     state.gov (1)       eea.europa.eu (1)          whoi.edu (1)           cbc.ca (0)    energy.gov (1)
                             marshall.org (0)      climateark.org (2)           un.org (0)     dar.csiro.au (1)        theglobeandmail.com (0)             acfonline.org.au (0)
                             gcrio.org (0)      nature.com (0)         grida.no (0)      nature.org (1)             ecokids.ca (0)             climatechangecentral.com (0)
                             iea.org (0)     ecn.ac.uk (1)            ecy.wa.gov (1)           worldwildlife.org (0)


                            realclimate.org (33)
                             open2.net (0)       eldis.org (0)     ft.com (0)   who.int (1)       climatecrisis.net (1)
                                                                                                                                       faqs.org (0) metoffice.gov.uk (1)

                                                                                                                                 ltscotland.org.uk (1)           abc.net.au (0)
                             climatechange.ca.gov (1)              envirolink.org (1)        mofa.go.jp (1)


              Body text      sourcewatch.org (27)                                                                                       iucn.org (0)            dfat.gov.au (0)

                             ncdc.noaa.gov (1)            climatescience.gov (0)              climatechangecollege.org (1)                      ciel.org (0)       ucar.edu (0)




Source_google.com                                                                                                                Product_of the Digital Methods Initiative,
Query_“Stephen Milloy”                                                                                                           dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Stephen Milloy” in top 100. Organized in order.                                                         Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                     Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                 CLIMATE CHANGE
                                                                                                                                                                                     SCEPTICS

                                                                                                                                 CC_BY:NC:SA




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (S. Fred Singer)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.



             Body Textepa.gov (0)                 bbc.co.uk (0)    defra.gov.uk (0)    unep.org (0)     bom.gov.au (0)     ipcc.ch (0)     pewclimate.org (0)
                               davidsuzuki.org (0)        panda.org (0)   mfe.govt.nz (0)   ec.gc.ca (0) exploratorium.edu (0)     climatechange.com.au (0)
                               greenpeace.org (1)         climatechallenge.gov.uk (0)         guardian.co.uk (0)      iisd.org (0)     g8.gov.uk (0)    campaigncc.org (1)
                               foe.co.uk (0)        state.gov (0)       scidev.net (0)        eea.europa.eu (0)             whoi.edu (0)         cbc.ca (0)       energy.gov (0)
                               marshall.org (0)       climateark.org (1)         un.org (0)      dar.csiro.au (0)      theglobeandmail.com (0)            acfonline.org.au (0)
                               gcrio.org (0)           nature.com (0)             grida.no (0)             nature.org (0)            ecokids.ca (0)           royalsoc.ac.uk (0)
                               climatechangecentral.com (0)                  iea.org (0)            ecn.ac.uk (0)               ecy.wa.gov (0)           worldwildlife.org (0)

                               realclimate.org (14)                                        faqs.org (0)    metoffice.gov.uk (0)       open2.net (0)    scienceagogo.com (0)

                               eldis.org (0)   ft.com (0) who.int (0) climatecrisis.net (0)             ltscotland.org.uk (0)     abc.net.au (0) climatechange.ca.gov (0)




                    sourcewatch.org (64)
                               envirolink.org (0)       mofa.go.jp (0)




              Body text        iucn.org (0)               dfat.gov.au (0)                ncdc.noaa.gov (0)          climatescience.gov (11)
                               climatechangecollege.org (0)          ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                                 Product_of the Digital Methods Initiative,
Query_“Fred Singer”                                                                                                               dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Fred Singer” in top 100. Organized in order.                                                             Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                      Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                  CLIMATE CHANGE
                                                                                                                                                                                      SCEPTICS

                                                                                                                                  CC_BY:NC:SA




Wednesday, June 29, 2011
LIPPMANNIAN DEVICE

            Modes of analysis




            Issue agenda check. What are the current commitments of an
            organization(s)?
            Use the issue cloud

            Partisanship check. Which side is an actor on?
            Use the source cloud




Wednesday, June 29, 2011
Tools and references




            http://tools.digitalmethods.net
            http://digitalmethods.net
            http://govcom.org




Wednesday, June 29, 2011
Climate Change Sceptics on the Web (Frederick Seitz)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.




                             epa.gov (0)     bbc.co.uk (0)         defra.gov.uk (0)      unep.org (0)        bom.gov.au (0)            ipcc.ch (0)         pewclimate.org (0)
                             davidsuzuki.org (0)       panda.org (0)     mfe.govt.nz (0)      ec.gc.ca (0)      exploratorium.edu (0)    climatechange.com.au (0)
                             greenpeace.org (0)       climatechallenge.gov.uk (0)       guardian.co.uk (0)       iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
                             foe.co.uk (0)    state.gov (0)        scidev.net (0)       eea.europa.eu (0)              whoi.edu (0)           cbc.ca (0)       energy.gov (0)
              Body Text
                     marshall.org (8)                             climateark.org (4)               un.org (0)           dar.csiro.au (0)         theglobeandmail.com (0)
                             acfonline.org.au (0)       gcrio.org (0)   nature.com (0)       grida.no (0)      nature.org (0)         ecokids.ca (0)       royalsoc.ac.uk (0)
                             climatechangecentral.com (0)                 iea.org (0)           ecn.ac.uk (0)                ecy.wa.gov (0)            worldwildlife.org (0)


                            realclimate.org (35)
                             metoffice.gov.uk (0)      open2.net (0)    scienceagogo.com (0)       eldis.org (0)  ft.com (0) who.int (0) climatecrisis.net (0)
                                                                                                                                                                  faqs.org (0)




                             ltscotland.org.uk (0)             abc.net.au (0)            climatechange.ca.gov (0)         envirolink.org (0)   mofa.go.jp (0)


                    sourcewatch.org (21)
              Body text
                                                                                                              iucn.org (0)         dfat.gov.au (0)         ncdc.noaa.gov (0)

                             climatescience.gov (0)            climatechangecollege.org (0)             ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                               Product_of the Digital Methods Initiative,
Query_“Frederick Seitz”                                                                                                         dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order.                                                       Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                    Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                CLIMATE CHANGE
                                                                                                                                                                                    SCEPTICS

                                                                                                                                CC_BY:NC:SA
E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS



Research Question:
Which climate change issue actors mention the skeptics, and
what kinds of actors are more likely to mention them?

Method:
Comparative Query skeptics in two source sets (‘top’ sources
and climate change blogs), outputting source cloud.
SOURCE SETS

(1) Top ten Google returns for “climate change” (mix of media
as well as governmental organizations)
SOURCE SETS

(2) Climate change blogs network (IssueCrawler results - mix of
‘establishment’ blogs, media and governmental and non-
governmental organizations)
E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E
SKEPTICS

Steps:
- Acquire source sets and skeptics list from Michael.
- Launch the Lippmannian device (aka Google Scraper - see
tools.digitalmethods.net).
- Enter source sets and skeptics names. Query the source sets
separately, and remember to use “” to get exact returns.
- Wait. Use this moment to discuss hypotheses.
- Explore the output, and present findings.

Weitere ähnliche Inhalte

Ähnlich wie DMI Workshop: Crawling and Scraping

創造のテーブル2021 - トークセッション・スライド(井庭崇)
創造のテーブル2021 - トークセッション・スライド(井庭崇)創造のテーブル2021 - トークセッション・スライド(井庭崇)
創造のテーブル2021 - トークセッション・スライド(井庭崇)Takashi Iba
 
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docxPlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docxrandymartin91030
 
Bukvar 1-klas-prischepa-rus
Bukvar 1-klas-prischepa-rusBukvar 1-klas-prischepa-rus
Bukvar 1-klas-prischepa-ruskreidaros1
 
Bukvar 1 prischepa_rus
Bukvar 1 prischepa_rusBukvar 1 prischepa_rus
Bukvar 1 prischepa_rusYchebnikUA
 
Букварь 1 класс Прищепа
Букварь 1 класс ПрищепаБукварь 1 класс Прищепа
Букварь 1 класс Прищепаoleg379
 
1k bukv-prishepa-kolis-05
1k bukv-prishepa-kolis-051k bukv-prishepa-kolis-05
1k bukv-prishepa-kolis-05pidruchnikiinua
 
论文范本
论文范本论文范本
论文范本Eric Hua
 

Ähnlich wie DMI Workshop: Crawling and Scraping (8)

創造のテーブル2021 - トークセッション・スライド(井庭崇)
創造のテーブル2021 - トークセッション・スライド(井庭崇)創造のテーブル2021 - トークセッション・スライド(井庭崇)
創造のテーブル2021 - トークセッション・スライド(井庭崇)
 
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docxPlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
PlanIDTASK NAMEDaysStartEnd141611116118161251621162816.docx
 
Bukvar 1-klas-prischepa-rus
Bukvar 1-klas-prischepa-rusBukvar 1-klas-prischepa-rus
Bukvar 1-klas-prischepa-rus
 
Bukvar 1 prischepa_rus
Bukvar 1 prischepa_rusBukvar 1 prischepa_rus
Bukvar 1 prischepa_rus
 
Букварь 1 класс Прищепа
Букварь 1 класс ПрищепаБукварь 1 класс Прищепа
Букварь 1 класс Прищепа
 
1k bukv-prishepa-kolis-05
1k bukv-prishepa-kolis-051k bukv-prishepa-kolis-05
1k bukv-prishepa-kolis-05
 
论文范本
论文范本论文范本
论文范本
 
NDF Guide
NDF GuideNDF Guide
NDF Guide
 

Mehr von Digital Methods Initiative

Query Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersQuery Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersDigital Methods Initiative
 
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Digital Methods Initiative
 
Digital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Initiative
 
Digital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Initiative
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Digital Methods Initiative
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiDigital Methods Initiative
 
National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013Digital Methods Initiative
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceDigital Methods Initiative
 
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Digital Methods Initiative
 
Digital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Initiative
 
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Digital Methods Initiative
 

Mehr von Digital Methods Initiative (20)

Query Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersQuery Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard Rogers
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
 
Digital Methods Tool Medley
Digital Methods Tool MedleyDigital Methods Tool Medley
Digital Methods Tool Medley
 
Digital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool Medley
 
Rogers data days_2014_slides_opti
Rogers data days_2014_slides_optiRogers data days_2014_slides_opti
Rogers data days_2014_slides_opti
 
Digital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool Medley
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
The Birth of Social Media Methods
The Birth of Social Media MethodsThe Birth of Social Media Methods
The Birth of Social Media Methods
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with Gephi
 
National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical device
 
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
 
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
 
Digital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool Medley
 
Hashtag lifelines
Hashtag lifelinesHashtag lifelines
Hashtag lifelines
 
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
 

Kürzlich hochgeladen

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 

Kürzlich hochgeladen (20)

AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 

DMI Workshop: Crawling and Scraping

  • 1. CRAWLING AND SCRAPING Noortje Marres (Goldsmiths, University of London) Michael Stevenson (Digital Methods Initiative, University of Amsterdam) Esther Weltevrede (Digital Methods Initiative, University of Amsterdam) Digital Methods Summer School, 28 June 2011 Wednesday, June 29, 2011
  • 2. CRAWLING AND SCRAPING Techniques for online data capture and analysis: • Issuecrawler • Lippmanian Device Implications for research methods: • dynamic data sets • formatted data Real-time research? Wednesday, June 29, 2011
  • 3. MAPPING NETWORKS WITH ISSUE CRAWLER A Web-based tool for the location and visualization of hyperlink networks on the Web Wednesday, June 29, 2011
  • 4. Locating issue networks on the Web How to demarcate networks that have configured around specific affairs on the Web? To do this, Issue Crawler relies on: • well-chosen starting points or Web pages that disclose activity around a particular issue on the Web by way of hyperlinks • the ‘intelligence’ of aggregated, live hyperlinking Wednesday, June 29, 2011
  • 5. Extractive Industries Review network, 2004 Wednesday, June 29, 2011
  • 6. More about hyperlink analysis Issue Crawler performs iterations of co-link analysis • the critique of ‘absolute’ citation measures (as in: pagerank) Compare this with co-citation analysis in the social studies of science (Callon et al., 1983) • topical relevance vs overall popularity Wednesday, June 29, 2011
  • 7. Issue Crawler as a tool of online social research (1/2) To perform immanent critique of the supposed ‘egalitarianism’ of the Internet: to highlight specific asymmetries of relevance and/or authority among organizations’ Web pages To deploy hyperlink analysis for purposes of issue analysis in the politics of issues, “experts and activists define issues by sharing information about them” (Heclo, 1974) Wednesday, June 29, 2011
  • 8. fëëìÉë=áå=íÜÉ=cÉêÖ~å~=s~ääÉóI= ròÄÉâáëí~åI=~ÅÅçêÇáåÖ=íç=íÜÉ=tÉÄK c~ää=OMMN fëëìÉë=~êÉ=çå=íÜÉ=tÉÄI=Äìí=ïÜáÅÜ=áëëìÉë= qÜÉ=ÅÜ~ê~ÅíÉêáë~íáçå=çÑ=íÜÉ=cÉêÖ~å~=s~ääÉó=áëëìÉë= ÇÉéÉåÇë=çå=íÜÉ=ëáíÉë=~ÅÅÉëëÉÇK jìãíçòÄÉÖáã kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ= _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ= dáêäDë=píìÇáçI=hçâ~åÇ çå=íÜÉ=tÉÄK kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ fÑíáòçê=léÉå=vçìíÜ=`äìÄ pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~ g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå fÑíáòçê=`ÉåíêÉI=hçâ~åÇ j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å ä Wednesday, June 29, 2011 oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG=
  • 9. jìãíçòÄÉÖáã kdl=~ÇÇêÉëëÉëI==éÜçåÉ=åìãÄÉêëI=~åÇ= _ìëáåÉëë=tçãÉåDë ^ëëçÅá~íáçåI hçâ~åÇ çÅÅ~ëáçå~ääó=ÉJã~áä=~ÇÇêÉëëÉë=~î~áä~ÄäÉ= dáêäDë=píìÇáçI=hçâ~åÇ çå=íÜÉ=tÉÄK kçòáÖáãI=j~âÜ~ää~=tçãÉåDë=`äìÄI=hçâ~åÇ j~åçîáó=_~êâ~ãçääáâ=`ÉåíêÉ j~äáâ~=c~ãáäó=pçÅá~ä=~åÇ=iÉÖ~ä=pìééçêí=`ÉåíêÉ fÑíáòçê=léÉå=vçìíÜ=`äìÄ pçÖäçã ^îäçÇ=rÅÜìå=`Ü~êáíó=cçìåÇ~íáçåG ^ÄÇìää~ h~Çóêá=cçìåÇ~íáçåI=hçâ~åÇ `çåëìãÉê=oáÖÜíë=mêçíÉÅíáçå=pçÅáÉíóI=cÉêÖ~å~ g~ãáä~=`Ü~êáí~ÄäÉ=cçìåÇ~íáçå fÑíáòçê=`ÉåíêÉI=hçâ~åÇ j^a^a=`ÉåíÉê=Ñçê=íÜÉ==çåÉäóI=~ÖÉÇ=~åÇ=Çáë~ÄäÉÇI=^åÇáà~å ä oÉÇ=`êÉëÅÉåí=pçÅáÉíóI hçâ~åÇG= jÉÜêJp~Üçî~í=`Ü~êáí~ÄäÉ=`ÉåíêÉ b`lp^k=fåíÉêå~íáçå~ä=cçìåÇ~íáçåI=hçâ~åÇG jìëë~Ñç=bÅçäçÖáÅ~ä=`ÉåíêÉI=hçâ~åÇ Gdlkdl=EÖçîÉêåãÉåíJçêÖ~åáëÉÇ=kdlF kç=fåíÉêåÉíI=åç=áëëìÉë=Ñêçã=íÜÉ=ÖêçìåÇ= kdlÛë=áå=íÜÉ=cÉêÖ~å~=s~ääÉó=ã~ó=åçí=Ü~îÉ=tÉÄ=ëáíÉëI= Äìí=íÜÉáê=áëëìÉë=~êÉ=çå=íÜÉ=tÉÄK Wednesday, June 29, 2011
  • 10. Issue Crawler How to use it Wednesday, June 29, 2011
  • 11. Issue Crawler http://issuecrawler.net Request account and log in Wednesday, June 29, 2011
  • 12. Issue Crawler lobby News workshops, software Queue time sharing Current three simultaneous crawlers Wednesday, June 29, 2011
  • 13. Issue Crawler harvester Enter text, URLs will be stripped out Wednesday, June 29, 2011
  • 14. Crawling and analysis Crawling to a certain depth Analysis snowball inter-actor co-link Iterate (optional) Wednesday, June 29, 2011
  • 15. Issue Crawler as a tool of online social research (2/2) More generally, to adopt an empirical approach to the study of public controversies: • is there a network? (is there an issue?) • who are the actors? • how are they related? • what are the issues? • where are they happening? Wednesday, June 29, 2011
  • 16. Co-link settings 1 iteration ~ social or event network 2 iterations ~ issue network 3 iterations ~ establishment network See http://www.govcom.org/scenarios_use.htm Wednesday, June 29, 2011
  • 17. THE LIPPMANNIAN DEVICE* Scraping and other digital methods skills * a.k.a. The Google Scraper Wednesday, June 29, 2011
  • 18. WHEN SEARCH BECOMES RESEARCH Turning Google into a research tool Wednesday, June 29, 2011
  • 19. WALTER LIPPMANN (1889-1974) The Phantom Public, 1927 "The problem is to locate by clear and coarse objective tests the actor in a controversy who is most worthy of public support" (p.120) Wednesday, June 29, 2011
  • 20. LIPPMANNIAN DEVICE - MODES OF ANALYSIS Showing the partisanship of an actor. Showing the issue agenda of an organization. Issue Cloud Issue agenda.Which Source cloud Partisanship or issues are on the agenda of an commitment. Which sources organization or movement? mention the issue? Wednesday, June 29, 2011
  • 21. ISSUE CLOUD: GREENPEACE ISSUES An organization’s issue agenda (or commitment) Greenpeace has issues. Which are they most committed to? Wednesday, June 29, 2011
  • 22. Body Text Body text Wednesday, June 29, 2011
  • 23. ISSUE CLOUD: GREENPEACE ISSUES Greenpeace issues, http://www.greenpeace.org/international/campaigns. Stop climate change Protect ancient forests Defending our Oceans Say no to genetic engineering Eliminate toxic chemicals Demand Peace and Disarmament End the nuclear age Encourage sustainable trade Keep most significant issue language: "climate change" "ancient forests" “oceans” "genetic engineering" "toxic chemicals" “disarmament” "nuclear power" "sustainable trade" ---> Query Design workshop Wednesday, June 29, 2011
  • 24. Body Text Body text Wednesday, June 29, 2011
  • 25. ISSUE CLOUD: GREENPEACE ISSUES Greenpeace’s issue agenda (distribution of commitment) Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009. Wednesday, June 29, 2011
  • 26. EXAMPLE: SOURCE CLOUD Method for showing the partisanship or commitment of sources to names Method 1. Gather source list (e.g. through Issuecrawler or top google results) 2. Query source list for one or more experts Digital Methods Initiative, 2007 Wednesday, June 29, 2011
  • 27. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Query design: What are the sources? Climate Change Skeptics: Who recognizes them? 1. Top 100 results for the query “climate change” http://www.google.com/search?q=%22climate+change %22&num=100 Wednesday, June 29, 2011
  • 28. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Query design: What are the issues? Derive list of climate change skeptics Sources: motherjones.com, wikipedia.org, heartland.org Compare the three lists and retain the skeptics that are mentioned in at least two of the lists Wednesday, June 29, 2011
  • 29. SOURCE CLOUD: CLIMATE CHANGE SKEPTICS Skeptics S. Fred Singer Robert Balling Sallie Baliunas Patrick Michaels Richard Lindzen Steven Milloy Timothy Ball Paul Driessen Willie Soon Sherwood B. Idso Frederick Seitz Wednesday, June 29, 2011
  • 30. Body Text Body text Wednesday, June 29, 2011
  • 31. GOOGLE BLOCKING Check query design before launching a scrape Number of sources x number of issues = number of request to Google Wednesday, June 29, 2011
  • 32. Body Text Body text Wednesday, June 29, 2011
  • 33. ----> data visualization: clouding workshop Wednesday, June 29, 2011
  • 34. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Text epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 35. Climate Change Sceptics on the Web (Steven Milloy) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Text epa.gov (1) bbc.co.uk (0) defra.gov.uk (1) unep.org (1) bom.gov.au (0) ipcc.ch (1) pewclimate.org (1) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (1) climatechallenge.gov.uk (1) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (0) foe.co.uk (0) state.gov (1) eea.europa.eu (1) whoi.edu (1) cbc.ca (0) energy.gov (1) marshall.org (0) climateark.org (2) un.org (0) dar.csiro.au (1) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (1) ecokids.ca (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (1) ecy.wa.gov (1) worldwildlife.org (0) realclimate.org (33) open2.net (0) eldis.org (0) ft.com (0) who.int (1) climatecrisis.net (1) faqs.org (0) metoffice.gov.uk (1) ltscotland.org.uk (1) abc.net.au (0) climatechange.ca.gov (1) envirolink.org (1) mofa.go.jp (1) Body text sourcewatch.org (27) iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (1) climatescience.gov (0) climatechangecollege.org (1) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Stephen Milloy” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Stephen Milloy” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 36. Climate Change Sceptics on the Web (S. Fred Singer) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. Body Textepa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (1) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) marshall.org (0) climateark.org (1) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (14) faqs.org (0) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) sourcewatch.org (64) envirolink.org (0) mofa.go.jp (0) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (11) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Fred Singer” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Fred Singer” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA Wednesday, June 29, 2011
  • 37. LIPPMANNIAN DEVICE Modes of analysis Issue agenda check. What are the current commitments of an organization(s)? Use the issue cloud Partisanship check. Which side is an actor on? Use the source cloud Wednesday, June 29, 2011
  • 38. Tools and references http://tools.digitalmethods.net http://digitalmethods.net http://govcom.org Wednesday, June 29, 2011
  • 39. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) Body Text marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA
  • 40. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E SKEPTICS Research Question: Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them? Method: Comparative Query skeptics in two source sets (‘top’ sources and climate change blogs), outputting source cloud.
  • 41. SOURCE SETS (1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)
  • 42. SOURCE SETS (2) Climate change blogs network (IssueCrawler results - mix of ‘establishment’ blogs, media and governmental and non- governmental organizations)
  • 43. E X E R C I S E : S O U R C I N G C L I M AT E C H A N G E SKEPTICS Steps: - Acquire source sets and skeptics list from Michael. - Launch the Lippmannian device (aka Google Scraper - see tools.digitalmethods.net). - Enter source sets and skeptics names. Query the source sets separately, and remember to use “” to get exact returns. - Wait. Use this moment to discuss hypotheses. - Explore the output, and present findings.