This document provides an agenda for a session on big data that will include readings and discussion. Four readings related to big data and geography will be discussed, looking at common themes, concepts, and questions. There will also be an exercise assessing big data characteristics. The session will conclude with an introduction to the Programmable City project, a research initiative studying how cities are programmed through data and algorithms.
2. Plan
1. 4 readings
2. Brainstorm and discuss commonalities and
outliers
3. Brainstorm & discuss each paper – definitions,
concepts, ideas, conclusions, concerns,
dislikes, new ideas...
4. Look at some maps & discuss
5. Do a big data assessment exercise based on
Kitchin’s big data definition
6. Introduction to the Programmable City Project
3. Readings:
Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data
and the future of geography, Dialogues in Human Geography 3:255, available at
http://dhg.sagepub.com/content/3/3/255 (5 pages)
Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal,
available at http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)
Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal,
available at http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)
Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the
production of space, Environment and Planning A, Vol. 41:2823-2835, available at
http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)
5. • Black boxed algorithms
• Predictive governance / predictive
categories / pre crime/ technological
agency / data dictatorships / anticipatory
governance / Post-hegemonic power –
algorithmic!
• Digital ghettoization or balkanization / Data
rich areas / samples / sorting
• Control & Power & humans matter
6. 3 of the 4 papers mentioned
these documents
http://archive.wired.com/science/discoveries/maga
zine/16-07/pb_theory
“There's no reason to cling to our old
ways. It's time to ask: What can science
learn from Google?” (2008)
http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-
Data-Management-Controlling-Data-Volume-Velocity-and-
Variety.pdf (2001)
7. All 4 papers include one of the
other of these
http://dhg.sagepub.com/content/3/3/262.abstract (2013) http://mitpress.mit.edu/books/codespace (2011)
8. 1. Graham & Shelton, 2013, Geography and the
future of big data, big data and the future of
geography
9. 1. Graham & Shelton, 2013, Geography and the
future of big data, big data and the future of
geography
• Big Data Characteristics
• Volume
• Velocity
• Variety
• Transactional?
• Effects they engender?
• Computational paradigm
• Meme – establishment of truth
• Big Data View of Authors
• Discourses, objects, practices
• Views of the world
• Measuring, models, algorithms, info
systems...
• Scientisvistic, positivistic and quantitative
turn
• Data as facts, validity and objective truth
• End of theory?
• Actors
• Technologists
• Journalists
• Venture capitalists
• Private sector
• Geographers?
• Concepts
• Data shadows
• Data and algorithmic governance
• Computational approaches
• Augmented space
• Behavioural profiles
• Privacy
• Metadata
• Predictive categories
• Triangulation
• Neutrality of databases and algorithms
• Black box algorithms
• Obfuscation and refraction
• Amplified socio-spatial unevenness
• Data as depoliticizing tool
• Digital ghettoization or balkanization
• Openness, trust, transparency
10. Graham & Shelton, 2013
• Conclusion
• Exposed the promises and perils of big data
• demonstrated the discursive power of big data as a meme
• Opportunity to use big data for social justice, inequality,
and relationship with the environment
• But, unevenness of representation, limited opportunities
for participation, barriers to research, opaqueness,
governance issues and privacy are a concern
• Who is big data serving?
11. 2. Kitchin & Lauriault, 2014, Small data in the era
of big data
12. 2. Kitchin & Lauriault, 2014, Small data in the era
of big data
• Growth
• Development of tek, infrastructure, techniques, &
processes,
• embedded into everyday business, social practices &
spaces,
• embedded into mobile devices, objects, machines,
and systems that are networked,
• social media, online interactions, transactions, data
analytics
• Objects
• Traffic systems & web cams
• BIMS
• Surveillance & policing systems, biometrics
• Gov. Dbases
• Customer, production & logistic chains
• Data enabled & data producing infrastructures
• Finance & payment systems
• Locative & social media
• Algorithmically controlled cameras, sensors,
scanners,
• smart phones,
• clickstreams,
• by-product of networks systems
• Derived data products
• Infrastructure
• Catalogues, portals, directories and repositories,
archives
• Cyberinfrastructure – SDI
• standards, protocols and policies
• Assemblage
• Concepts
• Small data
• Data rich areas
• Big data analytics
• Ontological characteristics
• Data brokers
• Dataveillance
• Social sorting
• Control creep
• Anticipatory governance
• Augmented
• Monitored
• Regulated
• Assemblage
• Socio-technical systems
• Volunteered or crowdsourced
• Oligoptic view of the world vs gods eye view
• Openly expressed data – swipe cards, sensors
• Exhaust – by products
• Ecological fallacies
• Gamed data
• Curated image of the self
• Streams of data, garden hose, spritzers, white list
• Data storage vs archiving
• Data brokers
• Abductive, deductive, inductive
• Geodemographic segmentation
• Black boxed algorithms
• Data determinism
13. Kitchin & Lauriault, 2014
• Issues
• Big data become more important than
small data
• “Small data mine gold from working on
a narrow seam, whereas big data
studies seek to extract nuggets through
open pit mining”
• Data quality, fidelity, lineage,
objective, authenticity, reliability – big
data are so large that these no longer
matter
• Inexactitude
• Open vs closed
• Replication & validation
• Combining big data with small data
• Data free from theory
• Lack of hypothesis
• Data driven science
• Weak surface analysis vs deep
penetrating insight
• Stigmatization and redlining
• Informed consent
• Big data are shaped by:
• Field of view/ sampling, location of
devices, settings/parameters, users
• Technology / platform used – produce
variance and bias
• Context w/in which generated
• Data ontology
• Regulatory environment
• They capture what is easy to ensnare
• Data Analytics
• Struggle with social & context
• Create bigger haystacks
• Do not address big issues well
• Favours memes over masterpieces
• Obscures values
14. Kitchin & Lauriault
• Conclusions
• Small data will continue to be vital, big and small data
will be complementary, small data are the baseline
• Data infrastructures store and disseminate small data
• Scaling, linking, joining, combining big and small data
• Small data are exposed to epistemologies of data science (e.g.,
digital humanities)
• Small data combined with big data are influencing the growth of
data brokers and profiling
• Pernicious effects of combining: dataveillance, social sorting,
control creep and anticipatory governance impinge on privacy,
social freedom and have structural consequences on peoples
lives
15. Comparing Small & Big Data
Characteristics Small Data Big Data Attributes of Big Data
Volume
Limited to
large
Very large Terabytes and pet bytes
Exhaustivity Samples
Entire
population
In scope striving toward entire population and
systems n=all
Resolution &
indexicality
Coarse & weak
to tight &
strong
Tight &
strong
As detailed as possible and uniquely indexical in
identification
Relationality Weak to strong Strong Common fields to enable co-joining of datasets
Velocity
Slow, Freeze-
framed
Fast Real & near-real time
Variety
Limited to
wide
Wide
Diverse in type, structured and unstructured,
maybe temporally and spatially referenced
Flexible &
Scalable
Low to
middling
High Can easily add to and extend, can expand in size
Table compiled by Kitchin from:
Boyd & Crawford 2012, Dodge & Kitchin 2005, Marz & Warren 2012, Mayer-Schonberger & Cukier 2013
16. 3. Miller & Goodchild, 2014, Data-driven geography
17. 3. Miller & Goodchild, 2014, Data-driven geography
• Big Data Characteristics
• Volume
• Velocity
• Veracity
• Data capturing technologies
• Sensors ground based
• Software
• Location aware tech
• GPS
• Mobile phones
• Surveillance cameras
• In situ sensors – cars, phones, in infrastructure
• Remote sensors – airborne and satellite platforms
• Radiofrequency
• RFID
• Georeference social media & crowdsourcing
• Def:
• Predictions are made by mining data for patterns
w/correlation among new data sources and some
accurate predictions
4 paradigms in science
1. Empirical science
2. Theoretical science
3. Computational science
4. Data driven science – big data
Tensions
1. Theory driven vs data driven
2. Prediction vs discovery
3. Law seeking vs description seeking
4. Evolution vs revolution
5. From question to sample – from sample to
question
Issues:
1. Population not samples
2. Messy not clean
3. Correlation not causation
4 capabilities of abductive reasoning
1. Ability to posit fragments of theory
2. Massive set of knowledge, common sense to
domain expertise
3. Means to search to find connections and
patterns and potential explanation
4. Complex problem solving – analogy,
approximation and guessing
5. Background kn and interesting measures,
formalized kn
18. Miller & Goodchild, 2014
• Big questions
• Are theory and
explanation archaic?
• Does data velocity matter?
• Can lack of QC & rigorous
sampling be overcome?
• Can we make valid
generalizations from
serendipitous data
collection?
• Can big data data-driven
methods lead to
significant discoveries?
• Or will we continue to rely
on scarce data (small
data)?
Sections
1.Theory in data driven geo
•correlation supersedes causation, explanation
but not laws.mid range theories, general
propositions, long terms big space vs short term
small space, nomotheic vs idiographic
2.Approaches to data driven geo
•knowledge discovery, data exploration and
hypothesis generating, abductive, deductive and
inductive reasoning
•Data-driven modelling – general to specific vs
specific to general, predictive performance
•Theory may not be possible, data drive the form
of the model, complexity, de-skilling
3.Caution with data driven
•Formalizing geo kn, spuriousness, truth and
understanding, black boxed algorithms, privacy,
pre-crime, pre-punishment, data-driven
dictatorship
Benefits
•Spatial temporal dynamics vs snapshots @
multiple scales
•Mundane & unplanned phenomena captured
•Probable and inconsequential
•Improbable but consequential
19. Miller & Goodchild
• Conclusion
• Most fundamental changes are variety and velocity
in data
• Old issues in new clothes – volume, n, messy data,
idiographic vs nomothetic kn
• Big data can inform both geographic kn discovery
and spatial modelling – but need to formalize geog
kn to clean data and ignore spurious patterns, and
to build true and understandable models
• Blackbox of closed systems
• Caution on social implications – predictive
governance, avoid data dictatorships and humans
need to be part of the decision making process
20. 4. Uprichard, Burrows & Parker, 2009,
Geodemographic code and the production of space
21. 4. Uprichard, Burrows & Parker, 2009,
Geodemographic code and the production of space
• Geodemographic classifications:
• the spaces people occupy says something
about the sort of people that live there
• Classes are sets of practices
• Inscriptions
• Embedded in social action and power
• Socially produced
• Have some social meaning about the
subjects, esp. name, useful
• Combines national censuses and other
data, admin & commercial
• Data used are already pre-classed –
contingent, historical, political and
cognitive
• Use of statistical knowledge
• credibility
• Tools:
• PRIZM
• Acorn
• Mosaic
• Concepts
• Social spatial vectors / forms
• Code/space
• Geodemographics as code
• Coded space
• Technological agency
• Algorithmic power
• Technological unconscious
• Automatic production of space
• Software sorted geographies
• Ground truth
• Urban ecology – socio spatial structure
• Ecological determinants
• Clusters, types of spaces, sorting
• Complexity, contingency, contrivance &
desirability
• Making hold and being held
• Coded classifications
• Mechanics of method
• Production of reality/space
• Ontological properties of the world
• Self-organizing, Fractal
• Dynamic interaction
• Post-hegemonic power – algorithmic!
• Translation and transduction of space
22. Uprichard, Burrows & Parker, 2009
• Big Questions:
• How code is instantiated,
materialised and constructed
via code/space
• Reiterative, transformative
or recursive practices of
technology
• How are the code that
construct coded spaces
constructed
• Problematize the contingency
in producing spaces on coded
classifications
• Who is constructing the code
for who?
• Material outcomes of code
• Issues
• Making coded space
• Which one becomes useful?
• Who decides what is and not
useful?
• Political, and ethical
concerns
• Social shaping
• Entrenchment of categories –
normalization
• Intrinsic or natural kinds?
• Circularity of measurement
23. Uprichard, Burrows & Parker
• Conclusion
• If posthegemonic power are algorithmic, and if
algorithms are fundamental to the transduction
of space, then we need to rethink the analysis of
the production of space so that the cultural,
social, political and technical construction of
code becomes a fundamental part of that
process
25. Exercise
Characteristics
Small
Data
Big
Data
Census Sensors
Remote
Sensing
Social
Media
Other
Volume
Limited
to large
Very
large
Very large • •
Exhaustivity Samples
Entire
pop.
all •
Crucial
•
Resolution &
indexicality
Coarse
& weak -
tight &
strong
Tight
&
Stron
g
Individual
ID • ?
Relationality
Weak to
strong
Stron
g
Name
address • ?
Velocity
Slow,
Freeze-
framed
Fast
Decennial
quinquen-
nial
X
Crucial
•
Variety
Limited
to wide
Wide Questions X
One
stream X
Flexible &
Scalable
Low to
middling
High
Hard to
change,
fields
fixed time
X ?
26. The Programmable City
• A European Research Council (ERC) and
Science Foundation of Ireland (SFI) funding
• SH3: Environment and Society
• Led by Dr Rob Kitchin, the Primary Investigator
• Based at the National Institute for Regional and
Spatial Analysis (NIRSA)
• At the National University of Ireland Maynooth
(NUIM)
27. MIT Press 2011 Sage 2014
Aim of the ERC
project is to build
off and extend a
decade of work that
culminated in
Code/Space book
(MIT Press) with a set
of detailed empirical
studies
28. Objective
• to provide:
• an interdisciplinary analysis of the two core
inter-related aspects of the emerging
programmable city:
• (a) Translation: how cities are translated
into code, and
• (b) Transduction: how code reshapes city
life” (Kitchin 2011).
29. Objectives
How is the city translated into software and data?
How do software and data reshape the city?
Translation:
City into Code &
Data
Transduction:
Code & Data
Reshape City
THE CITYCODE & DATA
Discourses, Practices, Knowledge, Models
Mediation, Augmentation, Facilitation, Regulation
30. ProgCity Research Matrix
Translation:
City into code
Transduction:
Code reshapes city
Understanding
the city
(Knowledge)
How are digital data
materially and discursively
supported and processed
about cities and their citizens?
How does software drive public
policy development and
implementation?
Managing
the city
(Governance)
How are discourses and practices
of city governance translated
into code?
How is software used to
regulate and govern city life?
Working
in the city
(Production)
How is the geography and
political economy of software
production organised?
How does software alter the
form and nature of work?
Living
in the city
(Social Politics)
How is software discursively
produced and legitimated by
vested interests?
How does software transform
the spatiality and spatial
behaviour of individuals?
31. Kitchin’s Data Assemblage
Attributes Elements
Systems of
thought
Modes of thinking, philosophies, theories, models,
ideologies, rationalities, etc.
Forms of
knowledge
Research texts, manuals, magazines, websites,
experience, word of mouth, chat forums, etc.
Finance
Business models, investment, venture capital,
grants, philanthropy, profit, etc.
Political
economy
Policy, tax regimes, public and political opinion,
ethical considerations, etc.
Govern-
mentalities /
Legalities
Data standards, file formats, system requirements,
protocols, regulations, laws, licensing, intellectual
property regimes, etc.
Materialities &
infrastructures
Paper/pens, computers, digital devices, sensors,
scanners, databases, networks, servers, etc.
Practices
Techniques, ways of doing, learned behaviours,
scientific conventions, etc.
Organisations
& institutions
Archives, corporations, consultants, manufacturers,
retailers, government agencies, universities,
conferences, clubs and societies, committees and
boards, communities of practice, etc.
Subjectivities
& communities
Of data producers, curators, managers, analysts,
scientists, politicians, users, citizens, etc.
Places
Labs, offices, field sites, data centres, server farms,
business parks, etc, and their agglomerations
Marketplace
For data, its derivatives (e.g., text, tables, graphs,
maps), analysts, analytic software, interpretations,
etc.
Systemsofthought
33. The Dublin Dashboard includes:
• real-time information
• time-series indicator data
• & interactive maps about all aspects of
the city
Benefits:
• detailed, up to date intelligence about
the city that aids everyday decision
making and fosters evidence-informed
analysis.
Freely available data sources:
• Dublin City Council
• Dublinked
• Central Statistics Office
• Eurostat
• government departments
• links to a variety of existing
applications
Produced by:
• The Programmable City project
• All-Island research Observatory (AIRO)
at Maynooth University
• working with Dublin City Council
Funded by :
• the European Research Council (ERC)
• Science Foundation Ireland (SFI)
34. Readings:
Mark Graham and Taylor Shelton, 2013, Geography and the future of big data, big data and the future of
geography, Dialogues in Human Geography 3:255, available at http://dhg.sagepub.com/content/3/3/255
(5 pages)
Rob Kitchin and Tracey P. Lauriault, 2014, Small data in the era of big data, GeoJournal, available at
http://link.springer.com/article/10.1007%2Fs10708-014-9601-7 (12 pages)
Harvey J. Miller and Michael F. Goodchild, 2014, Data-driven geography, GeoJournal, available at
http://link.springer.com/article/10.1007%2Fs10708-014-9602-6 (12 pages)
Emma Uprichard, Roger Burrows and Simon Parker, 2009, Geodemographic code and the production of
space, Environment and Planning A, Vol. 41:2823-2835, available at
http://www.envplan.com/abstract.cgi?id=a41116 (11 pages)