The document discusses how digital tools can be used to analyze reference cultures through a case study of cigarettes in Dutch newspapers from 1890-1940. It outlines the digital humanities cycle of heuristics, hermeneutics, corpus creation, source criticism and tool criticism. Methods discussed include full-text search, n-gram analysis, topic modeling and named entity recognition. The document concludes that these tools can discover new topics and entities for research while still requiring traditional close reading and theoretical frameworks.
2. Overview
• Consuming America: the role of the United
States as a reference culture in Dutch consumer
society between 1890-1940
• Digital Humanities Cycle: heuristics,
hermeneutics, corpus creation, source criticism,
and tool criticism
• Methods: Full-text search, N-gram analysis,
Topic modeling, Named entity recognition
3. What is a Reference
Culture ?
• Reference culture is an analytical concept to study geopolitical
formations in a transnational context.
• Reference cultures serve as a model for other countries, e.g.
Byzantium empire, 19th century England, Caliphate.
• Twentieth century: The American Century - Henry Luce
• Culture of references > imagined, symbolic, and metaphysical
‘America’
• Focus on the receiving end within a wider global context of
globalization, Americanization and modernization (cf. Rob Kroes,
John Muthyala)
4. How do we research
Reference Cultures?
• Reference cultures emerge in collective discussions on
specific products, ideas, and practices
• Against a background of cultural, technological, and
economic developments
• In other words, a reference culture is an imagined,
symbolic ‘America’ grounded within actual material
conditions and practices
• The project aims to use digital technologies to analyze
reference cultures in Dutch digitized newspapers
between 1890-1990
5. Case Study: Cigarettes
1890-1940
• Cultural icon of American entrepreneurialism
• “Product that defined America” (Allan Brandt)
• production, distribution, and consumption
• How was symbolic connotation perceived outside of the United
States?
• Geographical connotation
• Debates on technological changes: taste and packaging
• Changing consumer behavior > consumerist abundance,
female smokers
6. Geographical connotations
of the cigarette - RQ
• How have the geographic connotations of the
cigarette shifted between 1890-1940?
• How has this informed the idea of America? In
other words, the performance of America as a
reference culture?
7. Is this Big Data Research?
The change of scale has led to
a change of state. The
quantitative change has led to
a qualitative one. […]
[B]ig data refers to things one
can do at a large scale that
cannot be done at a smaller
one, to extract new insights or
create new forms of value
Viktor Mayer-Schönberger en Kenneth
Cukier, Big Data: A Revolution That Will
Transform How We Live, Work, and
Think (Boston 2013) 13.
8. Distant reading
‘Distant reading’, I have
once called this type of
approach; where distance
is however not an
obstacle, but a specific
form of knowledge; fewer
elements, hence a sharper
sense of their overall
interconnection. Shapes,
relations, structures.
Forms. Models.
Franco Moretti, Graphs, Maps, Trees.
Abstract Models for a Literary History
(Londen en New York 2005) 1.
9. • The Dutch newspaper archive is not really big data (biggish data?)
• Do we want to work with big data research? Big patterns? Or do we
aim for more extensive searching, and more complexity in our sources
• “[D]ata does not always have to be used as evidence, but can be
simply for discovering and framing research questions. […] [P]laying
with data – in all its formats and forms – is more important than
ever.”Frederick W. Gibbs and Trevor J. Owens, ‘The Hermeneutics of Data and Historical Writing’, in: Kristen Nawrotzki and Jack
Dougherty (eds.), Writing History in the Digital Age (Ann Arbor, MI: University of Michigan Press, 2013).
• Exploratory searching as an advance corrective against the threat of
essentialism and determinism [important in case of history/
Americanization]
How Big is Big Data?
10. Digital Humanities Cycle
Heuristics
Corpus Selection
Hermeneutics
Full-text search, text analytics, topic
modeling, named entity recognition,
n-gram analysis
Tool Criticism
Source criticism
11. Heuristics: Full-text search
• Large amounts of data
• Digital archives
• International data
• Ability to search full-text
Delpher.nl
12. Heuristics using metadata
“At least for research, digital history can be defined as the theory
and practice of bringing technology to bear on the abundance we
now confront.”
‘Interchange: The Promise of Digital History’, The Journal of American History 95 (2008) 452-491, 454.
13. New Way of Doing History
Bob Nicholson “The Digital Turn” Media History (2013)
14. Source Criticism
[T]he problem is that while we think we are searching newspapers,
we are actually searching markedly inaccurate representations of
text, hidden behind a poor quality image. And even more
damning, by citing a hard copy of the original we are then
refusing to document our research path, making it difficult for
others to critique the process.
Tim Hitchcock, ‘Confronting the Digital: Or How Academic History Writing Lost the Plot’,
Cultural and Social History 10 (2013) 9-23.
16. Corpus Selection
• Corpus Selection
• API (JSON)
• Texcavator
• Cleaning up the Corpus: Python/OpenRefine/NLTK
• Corpus analysis / Corpus Linguistics
• Topic modeling
• Named entity recognition (NER)
17. Tool Criticism
• Tools as instrument (STS)
• Bruno Latour - Laboratory Life: The construction of
scientific facts (1986)
• Steven Shaping - Never Pure: Historical Studies of
Science as if It Was Produced by People with Bodies,
Situated in Time, Space, Culture, and Society, and
Struggling for Credibility and Authority (2010)
• Explain how the tools works
• How do we define whether the tool works?
18. Topic Modeling
• Method (MALLET) to discover latent structures within a
collection of texts
• Words acquire meaning through context -> Topic
Modeling
• Contextual comparisons between different periods or
corpora
• Main goal: discover events, users, and objects > Topics >
Hidden debates
• In other words: not to prove stuff, but to find more
stuff
19. 1924-1929 key topics
advertisements
• sigaret virginia whip chief ardath london goud cigarettes kwaliteit olympia kurk
nummer rook beste gezondheid zoo zulk vooraan punten
• sigaret sigaar pijp beter smakelijker wybert amersfoort virginia houbaer tabletten
rooken oudste prijs magnums hollands nasmaak cent nemen noch
• sigaret nieuwe onze tabakken vervaardigd doosje import vraagt rookt smaak betere
cents sigaretten turksche fijne kwaliteit edelste uwe proef
• sigaret club sigaretten gij army camel tabak cent sopla camels wereld kwaliteit prijs
gemaakt virginia sigaren eerst rookt keel
• sigaret adamas egyptische mildste tegenwoordig stuks coupon coupons mavrides
fijnste sigaretten cts geschenken gratis ste naam fijn slechts omar
20. Named Entity Recognition
• StanfordNER is a method to automatically detect
specific entities within texts
• Locations
• Persons
• Organizations
21. Named Entity Recognition -
output 1890-1920
Foreign Locations (N>20) Dutch Locations (N>20) American Locations
The United Kingdom / London (151 / 84) Rotterdam (496) America (70)
Germany / Berlin / Hamburg (146 / 81 / 22) Amsterdam (177) New York (34)
France / Paris (139 / 154) Tilburg (107) Washington (11)
Russia (102) Groningen (94) United States (Vereenigde Staten) (11)
The United States / America / New York ( / 70 / 34) Breda (64) Chicago (3)
Belgium / Bruxelles / Antwerp ( / 57 / 46 ) Haarlem (48) Virginia (3)
Austria / Vienna (40 / 21) Utrecht (43) North-America (Noord-Amerika) (3)
Turkey (39) Arnhem (35)
Holland (39) The Hague (26)
Europe (36) Leeuwarden (24)
Spain (33) Maastricht (26)
Leiden (24)
Friesland (21)
22.
23. Good ‘Ole Close Reading
• Don’t say goodbye to your traditional methods or theories
• Country-of-Origin effect (branding theory)
• Theories of modernization/globalization/Americanization
• Discourse analysis > Foucault
• Conceptual history > Braudel, Koselleck, Armitage [Big
history manifesto]
• DH is too often about the tools or the methods; but can be
bridged with theoretical / analytical models into critical digital
humanities [cf. David Berry, Alan Liu]
24.
25.
26.
27.
28. Conclusion (I): Geographical
connotations
• Country of origin effect
• From actual locations to symbolic references
• Shift of geographical connotation of cigarette
• Oriental, British, European, American
• Detached from United States / United States as
floating signifier
29. Conclusion (II): Collateral
damage
• The output provided me with topics to further
research in other chapters > data-driven
• These are provided by the source material and
not only by secondary literature
• Technologies of Taste
• Consumer Behavior