SlideShare ist ein Scribd-Unternehmen logo
1 von 32
R in the Humanities: Text Analysis (2022)
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Who am I?
• Lecturer in Digital Media
• Programme Leader, MA New Media
• Book historian
• Digital humanist
• Canadian 🍁
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Publication in the next issue of Victorian Review: ‘Tangling and Untangling the Trollopes’, with Eleanor Dumbill
Session 1:
Gettin’ to Grips with R
CC Image: https://www.pexels.com/photo/smiling-model-in-pirate-costume-with-smoking-pipe-7000092
Overview
This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the
powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio
development environment.
By the end of the course, you will be able to:
• Navigate the RStudio development environment
• Prepare long-form prose texts for computational analysis using R
• Conduct basic computational analyses of long-form prose texts
• Construct and explain visualisations of computed results
• Critically apply computational text analysis to complement other analytical methods
To complete this course you will need to install:
• R version 3.6 or higher (download at https://www.r-project.org)
• RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
Session 1 Agenda
1. What are R and RStudio?
2. What can R help you do?
3. A quick note about Computational Literary Studies
4. Getting started with R
5. Cleaning text
CC Image: https://www.pexels.com/photo/black-cat-holding-persons-arm-1049764
What are R and RStudio?
R is:
• a programming language
• a software environment
• a really fancy calculator
• free/open source
Download: https://cran.r-project.org/mirrors.html
RStudio is:
• an integrated development environment (IDE)
• a great way to make your coding experiences easier, more colourful,
and more fun!
Download: https://www.rstudio.com/products/rstudio/download
What can R help you do?
• Count words
• Find linguistic patterns within and across texts
• Compare texts
• Make pretty pictures
But it’s still up to you to explain results.
Also, is R always the most appropriate tool?
CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
A quick note about Computational Literary
Studies (CLS)
CLS has a long history (for example, Father Robert Busa, ~1940s),
but has been criticised for:
• Misinterpretation of statistical data (Da)
• Unchecked enthusiasm for technological ‘hype’ (Kirsch)
• Turning literature into data and neglecting reception of works
(Marche)
Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019,
pp. 601-639.
Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014,
https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020.
Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books,
2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21
December 2020.
CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
Let’s get started!
Double click ‘Terminal’.
Terminal (write your script)
Console (run your script)
Environment (your data)
Everything else!
The Basics (1/2)
Calculating
• 10 + 2 (spaces optional)
• 10 – 2
• 10 * 2
• 10 / 2
Strings and Things
• 1:50
• print(“Hello world!”)
• [variable name] <- c(1, 2, 3)
• [variable name][2]
Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
The Basics (2/2)
• Data types: character, numeric, integer, logical, complex
• Data structures: vector, list, matrix, data frame, factors
• Keep notes using #
• Need help?
• ?____________
• help()
• install.packages(“[name of package]”)
Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
Tools > Global Options >
Appearance
(You will need to restart
RStudio to apply these
changes).
Let’s clean some text!
CC Image: https://thenounproject.com/term/cleaning/199037
You can use whatever corpus you’d like for this course.
However, I have prepared a corpus of twelve texts for you. You may download the corpus at http://tinyurl.com/n8texts.
This corpus includes six public domain texts comprising the first months of Astounding Stories of Super-Science (1930). A full
corpus for the year is available at http://tinyurl.com/n8texts2, if you’d like to use it in your own time.
• astoundingjan1930: https://www.gutenberg.org/ebooks/41481
• astoundingfeb1930: https://www.gutenberg.org/ebooks/28617
• astoundingmar1930: https://www.gutenberg.org/ebooks/29607
• astoundingapr1930: https://www.gutenberg.org/ebooks/29390
• astoundingmay1930: https://www.gutenberg.org/ebooks/29809
• astoundingjun1930: https://www.gutenberg.org/ebooks/29848
• astoundingjul1930: https://www.gutenberg.org/ebooks/29198
• astoundingaug1930: https://www.gutenberg.org/ebooks/29768
• astoundingsep1930: https://www.gutenberg.org/ebooks/29255
• astoundingoct1930: https://www.gutenberg.org/ebooks/29882
• astoundingnov1930: https://www.gutenberg.org/ebooks/29919
• astoundingdec1930: https://www.gutenberg.org/ebooks/30691
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
Session 2:
Charts, Clouds, and Confidence
Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
Session 2 Agenda
1. Any questions from last week?
2. Review of last week’s session (i.e. cleaning text)
3. Counting words
4. Plotting results
5. Making word clouds
6. Wrapping up
CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
Getting word frequencies and associations:
freq <- colSums(as.matrix(dtm))
freq[1:10]
freq_d <- sort(freq, decreasing=TRUE)
freq_d[1:10]
findFreqTerms(dtm, lowfreq=100)
findAssocs(dtm, “man", 0.95)
?findAssocs
Making a bar chart (and then making it look nice):
barplot(freq_d[1:10])
?barplot
install.packages("RColorBrewer")
library(RColorBrewer)
?RColorBrewer
display.brewer.all|)
cols <- brewer.pal(8, “Paired")
barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
Making a word cloud (and then making it look nice):
install.packages("wordcloud")
library(wordcloud)
matrix <- as.matrix(dtm)
wordbank <- sort(colSums(matrix), decreasing=TRUE)
df <- data.frame(words=names(wordbank), freq=wordbank)
?data.frame
?wordcloud
wordcloud(words=df$words, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
Discussion:
What are the potentials?
What are the limitations?
Is R the best choice?
CC Image: https://www.pexels.com/photo/selective-focus-photography-of-traffic-light-1616781
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
Thank you!
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson

Weitere ähnliche Inhalte

Was ist angesagt?

Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021Fabien Gandon
 
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Jenn Riley
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremonyFabien Gandon
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities Paul Spence
 
Towards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisTowards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisJohn Lavagnino
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Paige Morgan
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 

Was ist angesagt? (11)

Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
Granada0611 digital humanities
Granada0611 digital humanitiesGranada0611 digital humanities
Granada0611 digital humanities
 
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities
 
EricRochesterResume
EricRochesterResumeEricRochesterResume
EricRochesterResume
 
Towards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisTowards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysis
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Sattose talk
Sattose talkSattose talk
Sattose talk
 

Ähnlich wie R in the Humanities: Text Analysis and Visualization

N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxNafisa Vaz
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
 
Introduction to r
Introduction to rIntroduction to r
Introduction to rgslicraf
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & SparkXavier de Pedro
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in RAndrew Lowe
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometricsDiane Talley
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Charles Guedenet
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2Ajay Ohri
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a briefameermalik11
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)Raphael Troncy
 

Ähnlich wie R in the Humanities: Text Analysis and Visualization (20)

N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptx
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Digital humanities
Digital humanitiesDigital humanities
Digital humanities
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 

Mehr von Leah Henrickson

Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAILeah Henrickson
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipLeah Henrickson
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Leah Henrickson
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineLeah Henrickson
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Leah Henrickson
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Leah Henrickson
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...Leah Henrickson
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and AffectLeah Henrickson
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...Leah Henrickson
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsLeah Henrickson
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLeah Henrickson
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-IntroductionLeah Henrickson
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamLeah Henrickson
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingLeah Henrickson
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipLeah Henrickson
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingLeah Henrickson
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Leah Henrickson
 
'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research ProposalLeah Henrickson
 
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsThe Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsLeah Henrickson
 

Mehr von Leah Henrickson (20)

Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAI
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative Scholarship
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
 
Chatting with Computers
Chatting with ComputersChatting with Computers
Chatting with Computers
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and Affect
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary Team
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial Versifying
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
 
'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal
 
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsThe Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

R in the Humanities: Text Analysis and Visualization

  • 1. R in the Humanities: Text Analysis (2022) Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 2. Who am I? • Lecturer in Digital Media • Programme Leader, MA New Media • Book historian • Digital humanist • Canadian 🍁 L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 3. Publication in the next issue of Victorian Review: ‘Tangling and Untangling the Trollopes’, with Eleanor Dumbill
  • 4. Session 1: Gettin’ to Grips with R CC Image: https://www.pexels.com/photo/smiling-model-in-pirate-costume-with-smoking-pipe-7000092
  • 5. Overview This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio development environment. By the end of the course, you will be able to: • Navigate the RStudio development environment • Prepare long-form prose texts for computational analysis using R • Conduct basic computational analyses of long-form prose texts • Construct and explain visualisations of computed results • Critically apply computational text analysis to complement other analytical methods To complete this course you will need to install: • R version 3.6 or higher (download at https://www.r-project.org) • RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
  • 6. Session 1 Agenda 1. What are R and RStudio? 2. What can R help you do? 3. A quick note about Computational Literary Studies 4. Getting started with R 5. Cleaning text CC Image: https://www.pexels.com/photo/black-cat-holding-persons-arm-1049764
  • 7. What are R and RStudio? R is: • a programming language • a software environment • a really fancy calculator • free/open source Download: https://cran.r-project.org/mirrors.html RStudio is: • an integrated development environment (IDE) • a great way to make your coding experiences easier, more colourful, and more fun! Download: https://www.rstudio.com/products/rstudio/download
  • 8. What can R help you do? • Count words • Find linguistic patterns within and across texts • Compare texts • Make pretty pictures But it’s still up to you to explain results. Also, is R always the most appropriate tool? CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
  • 9. A quick note about Computational Literary Studies (CLS) CLS has a long history (for example, Father Robert Busa, ~1940s), but has been criticised for: • Misinterpretation of statistical data (Da) • Unchecked enthusiasm for technological ‘hype’ (Kirsch) • Turning literature into data and neglecting reception of works (Marche) Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019, pp. 601-639. Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014, https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020. Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books, 2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21 December 2020. CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
  • 12. Terminal (write your script) Console (run your script) Environment (your data) Everything else!
  • 13. The Basics (1/2) Calculating • 10 + 2 (spaces optional) • 10 – 2 • 10 * 2 • 10 / 2 Strings and Things • 1:50 • print(“Hello world!”) • [variable name] <- c(1, 2, 3) • [variable name][2] Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
  • 14. The Basics (2/2) • Data types: character, numeric, integer, logical, complex • Data structures: vector, list, matrix, data frame, factors • Keep notes using # • Need help? • ?____________ • help() • install.packages(“[name of package]”) Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
  • 15. Tools > Global Options > Appearance (You will need to restart RStudio to apply these changes).
  • 16. Let’s clean some text! CC Image: https://thenounproject.com/term/cleaning/199037
  • 17. You can use whatever corpus you’d like for this course. However, I have prepared a corpus of twelve texts for you. You may download the corpus at http://tinyurl.com/n8texts. This corpus includes six public domain texts comprising the first months of Astounding Stories of Super-Science (1930). A full corpus for the year is available at http://tinyurl.com/n8texts2, if you’d like to use it in your own time. • astoundingjan1930: https://www.gutenberg.org/ebooks/41481 • astoundingfeb1930: https://www.gutenberg.org/ebooks/28617 • astoundingmar1930: https://www.gutenberg.org/ebooks/29607 • astoundingapr1930: https://www.gutenberg.org/ebooks/29390 • astoundingmay1930: https://www.gutenberg.org/ebooks/29809 • astoundingjun1930: https://www.gutenberg.org/ebooks/29848 • astoundingjul1930: https://www.gutenberg.org/ebooks/29198 • astoundingaug1930: https://www.gutenberg.org/ebooks/29768 • astoundingsep1930: https://www.gutenberg.org/ebooks/29255 • astoundingoct1930: https://www.gutenberg.org/ebooks/29882 • astoundingnov1930: https://www.gutenberg.org/ebooks/29919 • astoundingdec1930: https://www.gutenberg.org/ebooks/30691
  • 18. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
  • 19. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 20. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 21. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 22. Session 2: Charts, Clouds, and Confidence Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
  • 23. Session 2 Agenda 1. Any questions from last week? 2. Review of last week’s session (i.e. cleaning text) 3. Counting words 4. Plotting results 5. Making word clouds 6. Wrapping up CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
  • 24. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
  • 25. Getting word frequencies and associations: freq <- colSums(as.matrix(dtm)) freq[1:10] freq_d <- sort(freq, decreasing=TRUE) freq_d[1:10] findFreqTerms(dtm, lowfreq=100) findAssocs(dtm, “man", 0.95) ?findAssocs
  • 26. Making a bar chart (and then making it look nice): barplot(freq_d[1:10]) ?barplot install.packages("RColorBrewer") library(RColorBrewer) ?RColorBrewer display.brewer.all|) cols <- brewer.pal(8, “Paired") barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
  • 27. Making a word cloud (and then making it look nice): install.packages("wordcloud") library(wordcloud) matrix <- as.matrix(dtm) wordbank <- sort(colSums(matrix), decreasing=TRUE) df <- data.frame(words=names(wordbank), freq=wordbank) ?data.frame ?wordcloud wordcloud(words=df$words, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
  • 28. Discussion: What are the potentials? What are the limitations? Is R the best choice? CC Image: https://www.pexels.com/photo/selective-focus-photography-of-traffic-light-1616781
  • 29. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 30. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 31. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 32. Thank you! Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson

Hinweis der Redaktion

  1. Matrix = table Data frame = table, with my flexible about what can be included in that table