In this talk, I will how we can use digital methods to generate sustainable knowledge in the humanities. I will give an overview of the data-intensive research methodology and discuss how methods, results, and data relate to each other and must be evaluated as parts of a whole: there is no such thing as a good method, nor is there a way to know if the results are good, without considering the data. I will discuss results as a window from which we can see our data, and how we can reason about the results of digital methods. Finally, I will present the Change is Key! research program and describe our efforts to connect computational research with research questions from the humanities and social sciences.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Tartu-DHtalk-final.pdf
1. The Strengths and Pitfalls of
Large-Scale Text Mining for DH
Nina Tahmasebi, Associate Professor
University of Gothenburg
TÜ Digihum Talk
December 2022, Tartu
2. Centre for
Digital Humanities
(2018-2019)
Mathematics
(B.Sc & M.Sc)
2003-2008
Computer/ Data Science
(Phd + Postdoc)
2008-2014)
NLP /
Language Technology
(Researcher, Associate
Professor) 2014→
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 2
3. Views on text
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 3
DH
Language
Data
1010011010010
1001010010101
0011010010101
4. Change is Key!
The study of contemporary and historical societies
using methods for synchronic semantic variation and diachronic semantic change
https://www.changeiskey.org/
5. Some facts
years
6
partner universities
6
Members from 4 countries
4
Countries,with advisors
6
People includingPM and SE
13
MSek from Riksbankens Jubileumsfond+
5.5MSek from the Universityand Faculty
33.5
Nina Tahmasebi, University of Gothenburg, TÜ Digihum2022 5
6. Our Research Questions
4 5
2 3
1
Computational
models of
meaning and
change
Gender Studies
4
5
1
2
3
7. Three axioms
There is no such thing as data-driven research
1
There is no such thing as a good computational
method
2
If you do not evaluate your results, you might
as well spend your time enjoying a hobby
3
8. From text to answers
text
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 8
9. A single physical
piece can be
studied in detail.
A few physical pieces
can be studied and
compared in detail.
Too many physical
pieces cannot be
treated manually.
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 9
10. From text to answers
text
text mining
method
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 10
13. From text to answers
text
text mining
method
research question
results
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 13
14. From text to answers
text
research question
text mining
method
results
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 14
15. Based on
• Tahmasebi, Nina, and Simon Hengchen. "The Strengths and Pitfalls of Large-Scale Text Mining for
Literary Studies." Samlaren: tidskrift för svensklitteraturvetenskaplig forskning 140 (2019): 198-
227.
• Tahmasebi, Nina, Hagen, Niclas, Brodén, Daniel, & Malm, Mats. (2019). "A Convergence of
Methodologies: Notes on a Data-intensive research methodology." DHN2019. p. 437-449.
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 15
16. Today’s outline
4. Research results and interpretation
2. Digital Text
3. Data-intensive research methodology
1. Research Questions
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 16
24. On the dangers of exploration I
Data
Hebrew bible text (Torah)
Method
Equidistant Letter Sequence (ELS)
Results
names of famous rabbinic personalities and
their respective birth and death dates
Bible codes (Torah code):
Nina Tahmasebi, University of Gothenburg, TÜ Digihum2022 24
25. On the dangers of exploration II
PresidentJohn F.
Kennedy was shot
in the head by an
assassinwho quietly
waited in a concealed
place. It was in Texas,
November 1963,
during a presidential
motorcade.
Moby Dick
26. On the dangers of exploration III
“… you can find things like this anywhere. The reason it looks amazing is
that the number of possible things to look for, and the number of places
to look, is much greater than you imagine. “
Brendan McKay, Em. Professor at AustralianNationalUniversity
https://users.cecs.anu.edu.au/~bdm/codes/moby.html
29. A book:
• Empty pages in the
beginning / end
• Large letter at the
beginning of each chapter
• Images?
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 29
31. Too many physical
pieces cannot be
treated manually.
Digital Text
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 31
32. Too many digital texts cannot
be studied in TOO LARGE
DETAIL either!
We need to ignore a lot of formatting
• White pages
• White space
• Fonts
• Capitalization of letters
• Etc…
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 32
34. I like the room but not the sheet. (only verbs)
I like the room but not the sheet. (frequency filtering)
I like the room but not the sheet. (only nouns)
I like the room but not the sheet. (after lemmatization)
I like the room but not the sheets. (after stop word filtering)
I like the room but not the sheets.
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 34
35. Clean much – keep much information
Matter of economy:
• We cannot afford
to keep it all
• So we keep what gives us most value
(= information)
frequency
information
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 35
36. 3. Nouns. After a series of experiments, it was determined that the thematic
information in this corpus could best be captured by modeling only the remaining
nouns. Using the Standford POS tagger, each word in each segment was marked up with
a part of speech indicatorand all but the nouns were removed.12
Jockers and Mimno, SignificantThemes in
19th-Century Literature
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 36
37. When Mr. Bilbo Baggins of Bag End announced that he would shortly be celebrating his eleventy-first birthday
with a party of special magnificence, there was much talk and excitement in Hobbiton.
Bilbo was very rich and very peculiar, and had been the wonder of the Shire for sixty years, ever since his
remarkable disappearance and unexpected return. The riches he had brought back from his travels had now
become a local legend, and it was popularly believed, whatever the old folk might say, that the Hill at Bag End was
full of tunnels stuffed with treasure. And if that was not enough for fame, there was also his prolonged vigour to
marvel at. Time wore on, but it seemed to have little effect on Mr. Baggins. At ninety he was much the same as at
fifty. At ninety-nine they began to call him well-preserved, but unchanged would have been nearer the mark.
There were some that shook their heads and thought this was too much of a good thing; it seemed unfair that
anyone should possess (apparently) perpetual youth as well as (reputedly) inexhaustible wealth.
‘It will have to be paid for,’ they said. ‘It isn’t natural, and trouble will come of it!’
But so far trouble had not come; and as Mr. Baggins was generous with his money, most people were willing to
forgive him his oddities and his good fortune. He remained on visiting terms with his relatives (except, of course,
the Sackville-Bagginses), and he had many devoted admirers among the hobbits of poor and unimportant
families. But he had no close friends, until some of his younger cousins began to grow up.
The eldest of these, and Bilbo’s favourite, was young Frodo Baggins. When Bilbo was ninety-nine, he adopted
Frodo as his heir, and brought him to live at Bag End; and the hopes of the Sackville-Bagginses were finally
dashed. Bilbo and Frodo happened to have the same birthday, September 22nd. ‘You had better come and live
here, Frodo my lad,’ said Bilbo one day; ‘and then we can celebrate our birthday-parties comfortably together.’ At
that time Frodo was still in his tweens, as the hobbits called the irresponsible twenties between childhood and
coming of age at thirty-three. Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 37
38. When Mr. Bilbo Baggins of Bag End announced that he would shortly be celebrating his eleventy-first birthday
with a party of special magnificence, there was much talk and excitement in Hobbiton.
Bilbo was very rich and very peculiar, and had been the wonder of the Shire for sixty years, ever since his
remarkable disappearance and unexpected return. The riches he had brought back from his travels had now
become a local legend, and it was popularly believed, whatever the old folk might say, that the Hill at Bag End was
full of tunnels stuffed with treasure. And if that was not enough for fame, there was also his prolonged vigour to
marvel at. Time wore on, but it seemed to have little effect on Mr. Baggins. At ninety he was much the same as at
fifty. At ninety-nine they began to call him well-preserved, but unchanged would have been nearer the mark.
There were some that shook their heads and thought this was too much of a good thing; it seemed unfair that
anyone should possess (apparently) perpetual youth as well as (reputedly) inexhaustible wealth.
‘It will have to be paid for,’ they said. ‘It isn’t natural, and trouble will come of it!’
But so far trouble had not come; and as Mr. Baggins was generous with his money, most people were willing to
forgive him his oddities and his good fortune. He remained on visiting terms with his relatives (except, of course,
the Sackville-Bagginses), and he had many devoted admirers among the hobbits of poor and unimportant
families. But he had no close friends, until some of his younger cousins began to grow up.
The eldest of these, and Bilbo’s favourite, was young Frodo Baggins. When Bilbo was ninety-nine, he adopted
Frodo as his heir, and brought him to live at Bag End; and the hopes of the Sackville-Bagginses were finally
dashed. Bilbo and Frodo happened to have the same birthday, September 22nd. ‘You had better come and live
here, Frodo my lad,’ said Bilbo one day; ‘and then we can celebrate our birthday-parties comfortably together.’ At
that time Frodo was still in his tweens, as the hobbits called the irresponsible twenties between childhood and
coming of age at thirty-three.
Prezentio add. 5
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 38
41. Culturomics
Michel, Jean-Baptiste,
et al. "Quantitative
analysisof culture
using millionsof
digitized books."
science 331.6014
(2011): 176-182.
Nina Tahmasebi, University of Gothenburg, TÜ Digihum2022 41
42. Fig 13. Upton Sinclair wrote 11 Lanny Budd novels set during World War II.
Pechenick EA, Danforth CM, Dodds PS (2015) Characterizing the Google Books Corpus:
Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution. PLOS ONE 10(10):
e0137041. https://doi.org/10.1371/journal.pone.0137041
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0137041
Lanny vs. Hitler
Nina Tahmasebi, University of Gothenburg, TÜ Digihum2022 42
43. When we have little data, the uncertainty
is large:
• Is A larger than B?
But when we have large data, we are more
certain about our observations, STILL, our
errors can be much larger
• Because our selection is biased Sample 2
Sample 2
Sample 1
Sample 2
Sample 2
Sample 2
Sample 2
Sample 2
Sample 2
Sample 2
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 43
44. Three axioms
There should be no such thing as data-driven
research – but the text we have is important!
1
52. Text-mining method
Dimensions
Filtering: Function words
Filtering: Stopwords
Part-of-speech tagging
Lemmatization
Tokenization
NLP pipeline: From text to result
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 52
63. Digital research needs to be
evaluated on the combination
of data, method, and
research question
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 63
64. Truths about data-
intensive research
Not all methods fit all data
Not all data fit all questions
Not all methods can answer all questions
Nothing lives separately,
it must be evaluated together:
Hypothesis
Text mining
method
results
Text
(digital large-scale text)
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 64
65. Three axioms
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 65
There is no such thing as a good computational
method
2
71. Reject 1 Data 2 Method / Preprocessing 3 Hypothesis
result
hypothesis
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 71
72. Accept 1 Method 2
Correct interpretation
of the results
result
hypothesis
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 72
73. Math results, average difference
Men
Women
Source: Factfullness Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 73
74. Men
Women
Math results, average difference
Source: Factfullness Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 74
75. NUMBER OF INDIVIDUALS WITH
DIFFERENT MATH SCORES 2016
Men
Women
Range of math scores
Source: Factfullness Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 75
76. Men
Women
Comparison of the same data
NUMBER OF INDIVIDUALS WITH
DIFFERENT MATH SCORES 2016
Men
Women
Source: Factfullness
Men
Women
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 76
77. result
hypothesis
1 Method 2
Correct interpretation
of the results
3
Where do the
results live?
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 77
78. Experimental design
Even when the math is right, we need to question the
selection and the grounds on which our conclusions are.
• What is the corresponding number elsewhere?
• What are we measuring?
• Why will this answer our questions?
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 78
79. Three axioms
If you do not evaluate your results, you might
as well spend your time enjoying a hobby
3
82. Digital research needs to be
evaluated on the combination
of data, method, and
research question
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 82
83. Experimental design
• What is the corresponding number elsewhere?
• What are we measuring?
• Why will this answer our questions?
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 83
84. Prof. Hans Rosling
You can’t understand
the world without
numbers…
Factfullness
… and you cannot
understand it
only with numbers.
Nina Tahmasebi, University of Gothenburg, TÜ Digihum 2022 84