SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Downloaden Sie, um offline zu lesen
“MASS SURVEILLANCE”
THROUGH DISTANT READING
Shalin Hai-Jew
• Aesthesia
• March 2, 2017
• Marianna Kistler Beach
Museum of Art
• Kansas State University
OVERVIEW
Distant reading refers to the uses of computers to “read” texts by counting words,
identifying themes and subthemes (through topic modeling), extracting sentiment,
applying psychological analysis to the author(s), and otherwise finding latent or
hidden insights. This work is based on research on “mass surveillance” based on five
text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and
leaked government data. The purpose was to capture some insights about the
collective social discussions occurring around this issue in an indirect way. This
presentation uses a variety of data visualizations (article network graphs, word trees,
dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and
others) to show how machines read and the types of summary data they enable (at
computational speeds, at machine scale, and in a reproducible way). Also, some
computational linguistic analysis tools enable the creation of custom dictionaries for
unique types of applied research. The tools used in this presentation include NVivo
11 Plus and LIWC2015.
2
SOME COMMON TYPES OF “DISTANT READING”
AND APPLICATIONS
Linguistic analysis
Topic modeling
 Theme and subtheme extraction
Sentiment analysis
• Positive and negative
Text networks
 Word relationships
Authorship analysis (based on latent
features)
 Stylometry “fingerprinting”
 Author gender identification
Psychological analysis
Cultural analysis, culturomics
History-based applications
Literary analysis
 Dialogue analysis
 Geographical referencing and patterning
 Character analysis
Predictive analytics
 Classification
 Trend
3
STUDIED PHENOMENA IN THE COMPUTATIONAL
LINGUISTIC ANALYSIS RESEARCH LITERATURE
Political science, leader speech analysis
(for profiling)
State-of-a-field research
Authorship identification
Plagiarism detection
Suicidality
Movie popularity, song popularity
Language studies
Law enforcement
Fraud detection
Threat detection, and others
4
WHY DISTANT READING?
Textual interpretation
 At computational speeds
 At computational scale
Reproducible, repeatable
Measures various analytical constructs in quantized ways
Surfacing latent (hidden) ideas and data patterns not seeable otherwise (such as by
human “close reading”)
Results comparable against large textual datasets of particular types of text (such as
comparing a Tweetstream against other social media texts or even microblogging
texts)
Complementary to and augmentary of human “close reading”
5
COMMON ANALYTICAL TRAJECTORIES
Curation of text sets (corpora) -> distant reading data summaries -> zoomed-in
analysis (of concepts, names, dates, locations, symbols, and numbers, etc.) -> human
close reading
 General-to-specific trajectory
Baseline text set statistics based on curated text collections and text corpora
Comparisons across text sets
 Relative data
6
“MASS SURVEILLANCE” AS A
SEEDING TOPIC
7
WHY “MASS SURVEILLANCE”?
A timely construct
A point-of-global discussion
A mixed group of competing stakeholders re: the issue
Wide public availability of five (somewhat) disparate text sets:
 Academic
 Mainstream journalism
 Microblogging
 Wikipedia articles
 Leaked government data
8
9
10
pie chart
11
line chart
12
stacked bar chart
13
line graph
14
bar chart
15
stacked bar chart
16
bar chart
17
line graph
18
stacked bar chart
19
line chart
20
line chart
21
line chart
22
line chart
23
combined bar and line chart
24
bar chart
25
bar chart
26
bar chart
27
bar chart
28
bar chart
29
spider (radar) chart
(LIWC2015 and
Excel)
30
Gunning Fog Index Coleman Liau Index Flesch Kincaid
Grade Level
ARI (Automated
Readability Index)
SMOG Readability
Formula
Flesch
Reading
Ease ( /100)
Set 1: Academic article text
set (partial)
13.20 11.71 10.71 9.29 12.80 43.26
Set 2: Mainstream
journalistic text set
14.28 13.88 12.12 12.40 13.75 39.25
Set 3: Twitter
microblogging hashtag
discourse text set
28.88 32.36 24.40 29.73 21.75 -38.46 (on a
100 point
scale)
Set 4: Wikipedia article
network text set (partial)
11.09 12.25 9.46 8.31 11.07 44.39
Set 5: Leaked U.S.
government text set (partial)
14.65 12.45 12.29 10.89 13.97 36.44
data table
31
Final Full Set Academic Themes and Subthemes Treemap
treemap diagram
32
Final Full Set Mainstream Journalist Themes and Subthemes Treemap
treemap diagram
33
Final Full Set #surveillance Microblogging Themes and Subthemes Treemap
treemap diagram
34
line graph
35
Manually Coded #surveillance Hashtag Network on Twitter
treemap diagram
36
Final Full Set of Mass-surveillance Article Network from Wikipedia Themes and Subthemes Treemap
treemap diagram
37
Final Full Set Leaked Government Documents Themes and Subthemes Treemap
treemap diagram
38
from the
academic article
dataset
(interactive) 3d cluster diagram
39
from the
academic article
dataset
(interactive) treemap diagram
40
from the
academic article
dataset
(interactive) word cloud
41
from the
journalism
dataset
(interactive) word tree
42
from the
journalism
dataset
(interactive) horizontal dendrogram
43
from the
journalism
dataset
(interactive) 2d cluster diagram
44
from the
microblogging
dataset
(interactive)
treemap diagram
45
from the
microblogging
dataset
(interactive) 3d bar chart
46
from the
microblogging
dataset
(interactive) word cloud
47
from the
microblogging
dataset
(interactive) 2d cluster chart
48
from the
microblogging
dataset
(interactive) word tree
49
article-article network
from Wikipedia
(NodeXL or
“Network Overview,
Discovery and
Exploration for Excel”)
article network graph
50
from the
crowd-sourced
encyclopedia
dataset
(interactive) word cloud
51
from the
crowd-sourced
encyclopedia
dataset
(interactive) treemap diagram
52
from the
crowd-sourced
encyclopedia
dataset
(interactive) 3d bar chart
53
from the
leaked government
dataset
(interactive) 2d cluster diagram
54
from the
leaked government
dataset
(interactive) word tree
55
from the
leaked government
dataset
(interactive) word cloud
56
from the
leaked government
dataset
(long tail analysis)
data table
57
from the
leaked government
dataset (coding nodes)
(interactive) 3d cluster diagram
58
from the
leaked government
dataset
(interactive) 3d word tree
59
sunburst diagram
60
(interactive) intensity matrix
61
0
1
2
3
4
5
6
7
8
A : content B : dissemination C : front door D : hidden service E : information F : jflftflvjff
dissemination
G : node H : onion I : r dissemination
NumberofMentions
Auto-extracted Top-Level Themes from a Government Document
An Article Histogram of a Leaked Government Document
article
histogram
w/ main
theme
extractions
62
0 0.5 1 1.5 2 2.5 3 3.5
A : event
B : facebook
C : msn
D : notification
E : sources
F : target
Counts of Mentions of Top-Level Themes
Auto-extractedTop-LevelThemes
A Theme Histogram from a Government Document
article
histogram
w/ main
theme
extractions
ABOUT THE SEEDING TOPIC:
“MASS SURVEILLANCE”?
63
CONTRIBUTIONS TO THE “MASS SURVEILLANCE”
TOPIC
Academic writing: legal, philosophical, technological, and practical implications
Mainstream journalistic articles: domestic and foreign government engagement
with the issue (executive, legislative, judicial, and others)
Microblogging messages: global surveillance challenges, changing technologies
(drones)
Wikipedia (open-source and crowdsourced encyclopedia): summary details,
highlighted events, personages, URLs, and timely observations
Government documents: bureaucratese, technical capabilities
64
ABOUT THE RELATED TEXT SETS…FROM DISTANT
READING
Different genres of writing, based on a particular topic, manifest differently on
different textual dimensions.
 Some textual features seem to co-vary and may be because these are features of prose writing, or
other factors.
 Analysis of different features of the text sets may be helpful in identifying source types that may be
most useful for certain types of research or questions.
 Social media “netspeak” has not yet fully been captured in the two commercial tools used for this
analysis.
Average word counts per unit differed: academic (7,624 – 8,073 words per unit),
mainstream journalistic articles (1,460 – 1488 words per unit), microblogging
hashtag discourse (44 – 61 per user account), Wikipedia articles (6,710 – 7,216
words per article), and leaked government documents (1,711 – 1,800 words).
 Variance in word counts were based on the uses of differing software programs to do the counts…and
natural ambiguity in word identification.
65
ABOUT THE RELATED TEXT SETS…FROM DISTANT
READING (CONT.)
Computational analysis of the five text sets showed a spike in terms of human drives
across all sets…in terms of “power.” Because this applied across all five text sets, it
may be that “power” is a driving issue of concern regarding “mass surveillance.”
Sentiment was most present in the following (in descending order): Wikipedia articles,
academic articles, leaked government documents, mainstream journalism, and hashtag
discourse, according to analysis in NVivo 11 Plus but a different order was found
using LIWC2015 (in descending order): mainstream journalism, Wikipedia articles,
academic articles, leaked government documents, and hashtag discourse.
 The only rank position of agreement was having hashtag discourse in last place with the least
sentiment, which can partially be explained by the brevity of Tweets and the expression of emotion in
emoticons and punctuation marks.
66
ABOUT THE RELATED TEXT SETS…BASED IN PART
ON SELECTED CLOSE READING
All five text sets—academic, mainstream journalistic, microblogging messages,
Wikipedia articles, and the government documents—were informed by the source
government documents.
The journalistic articles, with a rights narrative of deep intrusions into privacy, seem to
have captured the readership’s attention, while academic and government documents
were not consumed as broadly.
 Journalistic articles ranked high in sociality measures—and that may indicate why people see it as
connecting with their lives.
Twitter was used to advertise writings from academia and mainstream journalism.
Some academic publications cited mainstream journalistic pieces, but fewer
journalistic pieces cited academic works.
67
ABOUT THE RELATED TEXT SETS…BASED IN PART
ON SELECTED CLOSE READING(CONT.)
Academia did not have a lot of pieces on this issue in the subscription databases and
other sources that were checked.
 It may be that more time has to pass for researchers to study the issues.
The technological complexity of the government documents required technology and
legal and policy experts to interpret.
 These documents were generally handled in a non-consumptive way for computational linguistic
analysis. Non-consumptiveness refers to the extraction of statistical features of a text set without direct
access to the underlying texts. For this analysis, the focus was on computational reading of the related
documents, not a human interpretation of the text set or the related capabilities.
68
ABOUT USING COMPUTATIONAL LINGUISTIC
ANALYSIS TO “READ” UP ON AN ISSUE
Selected text sets should be as comprehensive as possible in order to represent the
topic. The text sets should be cleaned, so irrelevant elements may be eliminated.
There should be clear documentation about how data was collected and processed
and handled.
 How the text sets are handled affect the results.
 The bundling of particular text sets will affect results as well.
Because social media only attracts some to participate, there can be some large gaps
in informational coverage.
 Social media platform APIs are often rate- and data-limited, so it’s important to review the terms of
access to such data.
Using multiple software tools to conduct analysis makes sense because there are
differences between tool designs which will affect what is observed or not. The
“validity” and “reliability” of software tools vary…
69
ABOUT USING COMPUTATIONAL LINGUISTIC
ANALYSIS TO “READ” UP ON AN ISSUE (CONT.)
How the researcher asks questions and wields the technology will affect what is
seeable and seen. There is not an “objective” reading machine… Subjectivity and
judgment play a role.
 External validation may be an important piece of research using computational reading.
The data visualizations here are mostly interactive, and it is possible to link to original
underlying data. All the data visualizations are informed by underlying data, and
these should be accessed for deeper understandings.
 These interactive features and underlying data should be engaged to fully benefit from the
computational analyses. (Data visualizations are not used independent of the underlying data.)
 “Non-consumptive” text analysis can sometimes be helpful even without the benefit of close reading
and examination of the underlying text corpora used for the computational analysis.
70
ABOUT USING COMPUTATIONAL LINGUISTIC
ANALYSIS TO “READ” UP ON AN ISSUE (CONT.)
Close reading always a part of the work, even though distant reading is brought to
bear. Both enhance the other, and there are many rich processing sequences to read.
 What a human reader “sees” vs. what a computer does differs.
71
SOME POSSIBLE EFFECTS OF THE RESEARCH
Different genres of texts may reach different parts of a population. Those who limit
themselves to particular genres will only capture some aspects of information about a
topic.
 Those engaged in strategic communications would benefit from gaining a sense of which
communications modes to engage in order to reach their target audience.
It helps to know what issues are trending at any particular time…and the collective
emotions which are being expressed.
It helps to strategically target limited human close reading attention based on
observations from distant reading.
72
WHY “MASS SURVEILLANCE” AND “DISTANT
READING”?
There is an elision of mass surveillance and distant reading…in this slideshow…in
part because technological enablements enable “mass surveillance” and
dataveillance (data + surveillance, in a portmanteau term).
 Practically speaking, human close reading would be wholly insufficient to interact with mass data.
There are not enough human years to plough through the masses of structured and unstructured data
being created today.
 For complex data, human close reading requires close and slow attention (200 wpm / words per
minute).
 Human close reading is not known for great objective accuracy. Rather, human reading is informed by
a trained and subjective lens. Human reading is known for a unique perspective and voice.
73
WHY “MASS SURVEILLANCE” AND “DISTANT
READING”?(CONT.)
Together, “distant” and “close” reading expand human power to read, interpret, and
learn. Sometimes, these complementary efforts help solve very human challenges.
 Computational distant reading does not “displace” people or what they can bring to research and
analysis. Oftentimes, the findings from each diverge, resulting in different insights attained in different
ways.
74
NVIVO 11 PLUS
75
ABOUT NVIVO 11 PLUS
Enables the building of unstructured, semi-structured, and structured data (using SQL
as the understructure on Windows)
Enables analysis of any data represented by UTF-8 (Unicode character set) but
requires a main base language
 Enables exact matches, stemmed words, synonyms, specializations, and generalizations
 Enables the application of special characters and Boolean terms
Enables the building of an exportable code dictionary
Enables topic modeling, sentiment analysis, and “coding by existing pattern”
Enables “distant reading” and interactive data visualizations including word trees,
dendrograms, treemaps, cluster diagrams, and others
76
LIWC2015
77
ABOUT LIWC2015 PLUS
Has a built-in linguistic analysis dictionary which has been built up over decades of
refinement and empirical research
Summarizes datasets on four scores: Analytic, Clout, Authentic, and Tone
Includes psychological and socio-psychological elements
Includes sentiment and emotional analysis features
Includes gender reference counts
Includes human drives counts
Includes generic linguistic analysis counts (including for function words)
78
ABOUT LIWC2015 PLUS(CONT.)
Is back-stopped by decades of solid research
Is a very well and smartly documented tool
Is set up as a processor and a dictionary
Enables the building of custom dictionaries to run against textual datasets to surface
more unique insights
79
ABOUT LIWC2015 PLUS(CONT.)
Requires some in-depth reading of the related documentation
 The Development and Psychometric Properties of LIWC2015
 Linguistic Inquiry and Word Count: LIWC2015
Requires reading of years of research for the smoothest research applications
Requires experience in Excel since data dump out into .xl or .xlsx
 There is no proprietary file to save an analysis using LIWC2015
80
CONTACT AND CONCLUSION
Dr. Shalin Hai-Jew
 Instructional Designer
 Kansas State University
 785-532-5262
 shalin@k-state.edu
“Distant reading” is a term originated by Franco Moretti (founder of the Stanford
Literary Lab) in 2011.
This slideshow is based on a research-based chapter forthcoming in 2017.
81

Weitere ähnliche Inhalte

Was ist angesagt?

ICDMWorkshopProposal.doc
ICDMWorkshopProposal.docICDMWorkshopProposal.doc
ICDMWorkshopProposal.doc
butest
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
9866825059
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
anujessy
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
silambu111
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
wejia
 

Was ist angesagt? (20)

Information retrieval system
Information retrieval systemInformation retrieval system
Information retrieval system
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
ICDMWorkshopProposal.doc
ICDMWorkshopProposal.docICDMWorkshopProposal.doc
ICDMWorkshopProposal.doc
 
Hci
HciHci
Hci
 
Ir 01
Ir   01Ir   01
Ir 01
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
Week12
Week12Week12
Week12
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
INFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATIONINFORMATION RETRIEVAL ‎AND DISSEMINATION
INFORMATION RETRIEVAL ‎AND DISSEMINATION
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
 
INFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.LINFORMATION RETRIEVAL Anandraj.L
INFORMATION RETRIEVAL Anandraj.L
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 

Andere mochten auch

Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
Shalin Hai-Jew
 
An introduction to inbound marketing analytics
An introduction to inbound marketing analyticsAn introduction to inbound marketing analytics
An introduction to inbound marketing analytics
Nuno Fraga Coelho
 

Andere mochten auch (20)

See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
See Ya!  Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...See Ya!  Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
 
Formations & Deformations of Social Network Graphs
Formations & Deformations of Social Network GraphsFormations & Deformations of Social Network Graphs
Formations & Deformations of Social Network Graphs
 
Designing Online Learning to Actual Human Capabilities
Designing Online Learning to Actual Human CapabilitiesDesigning Online Learning to Actual Human Capabilities
Designing Online Learning to Actual Human Capabilities
 
LIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic PatternsLIWC-ing at Texts for Insights from Linguistic Patterns
LIWC-ing at Texts for Insights from Linguistic Patterns
 
Beauty as a Bridge to NodeXL
Beauty as a Bridge to NodeXLBeauty as a Bridge to NodeXL
Beauty as a Bridge to NodeXL
 
Sentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 PlusSentiment Analysis with NVivo 11 Plus
Sentiment Analysis with NVivo 11 Plus
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online SurveysFully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
 
Surveillance
SurveillanceSurveillance
Surveillance
 
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
 
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
 
Using Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online TrainingsUsing Qualtrics to Create Automated Online Trainings
Using Qualtrics to Create Automated Online Trainings
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
 
Am I being spied on: Low-tech ways of detecting high-tech surveillance (DEFCO...
Am I being spied on: Low-tech ways of detecting high-tech surveillance (DEFCO...Am I being spied on: Low-tech ways of detecting high-tech surveillance (DEFCO...
Am I being spied on: Low-tech ways of detecting high-tech surveillance (DEFCO...
 
Smart Wireless Surveillance Monitoring using RASPBERRY PI
Smart Wireless Surveillance Monitoring using RASPBERRY PISmart Wireless Surveillance Monitoring using RASPBERRY PI
Smart Wireless Surveillance Monitoring using RASPBERRY PI
 
An introduction to inbound marketing analytics
An introduction to inbound marketing analyticsAn introduction to inbound marketing analytics
An introduction to inbound marketing analytics
 
How to Build Your Inbound Marketing Game Plan - Paul Roetzer
How to Build Your Inbound Marketing Game Plan - Paul RoetzerHow to Build Your Inbound Marketing Game Plan - Paul Roetzer
How to Build Your Inbound Marketing Game Plan - Paul Roetzer
 
Expert Perceptions of the Feasibility of MOOCs
Expert Perceptions of the Feasibility of MOOCsExpert Perceptions of the Feasibility of MOOCs
Expert Perceptions of the Feasibility of MOOCs
 
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
Native Emigration from the U.S. and Renunciation of U.S. Citizenship Native Emigration from the U.S. and Renunciation of U.S. Citizenship
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
 

Ähnlich wie "Mass Surveillance" through Distant Reading

ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors ThesisData-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Alfredo Hickman
 

Ähnlich wie "Mass Surveillance" through Distant Reading (20)

Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
 
Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...
 
Data Science & Analytics (light overview)
Data Science & Analytics (light overview) Data Science & Analytics (light overview)
Data Science & Analytics (light overview)
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
ANALYSIS OF TOPIC MODELING WITH UNPOOLED AND POOLED TWEETS AND EXPLORATION OF...
 
AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
Argument Structures Of Political Debates
Argument Structures Of Political DebatesArgument Structures Of Political Debates
Argument Structures Of Political Debates
 
Enhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft PowerEnhancing Soft Power: using cyberspace to enhance Soft Power
Enhancing Soft Power: using cyberspace to enhance Soft Power
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors ThesisData-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
Data-Mining Twitter for Political Science -Hickman, Alfredo - Honors Thesis
 

Mehr von Shalin Hai-Jew

Mehr von Shalin Hai-Jew (20)

Writing a Long Non-Fiction Chapter......
Writing a Long Non-Fiction Chapter......Writing a Long Non-Fiction Chapter......
Writing a Long Non-Fiction Chapter......
 
Overcoming Reluctance to Pursuing Grant Funds in Academia
Overcoming Reluctance to Pursuing Grant Funds in AcademiaOvercoming Reluctance to Pursuing Grant Funds in Academia
Overcoming Reluctance to Pursuing Grant Funds in Academia
 
Pursuing Grants in Higher Ed
Pursuing Grants in Higher EdPursuing Grants in Higher Ed
Pursuing Grants in Higher Ed
 
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...
 
Creating Seeding Visuals to Prompt Art-Making Generative AIs
Creating Seeding Visuals to Prompt Art-Making Generative AIsCreating Seeding Visuals to Prompt Art-Making Generative AIs
Creating Seeding Visuals to Prompt Art-Making Generative AIs
 
Poster: Multimodal "Art"-Making Generative AIs
Poster:  Multimodal "Art"-Making Generative AIsPoster:  Multimodal "Art"-Making Generative AIs
Poster: Multimodal "Art"-Making Generative AIs
 
Poster: Digital Templating
Poster:  Digital TemplatingPoster:  Digital Templating
Poster: Digital Templating
 
Poster: Digital Qualitative Codebook
Poster:  Digital Qualitative CodebookPoster:  Digital Qualitative Codebook
Poster: Digital Qualitative Codebook
 
Common Neophyte Academic Book Manuscript Reviewer Mistakes
Common Neophyte Academic Book Manuscript Reviewer MistakesCommon Neophyte Academic Book Manuscript Reviewer Mistakes
Common Neophyte Academic Book Manuscript Reviewer Mistakes
 
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIFashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AI
 
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...
 
Introduction to Adobe Aero 2023
Introduction to Adobe Aero 2023Introduction to Adobe Aero 2023
Introduction to Adobe Aero 2023
 
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...
 
Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)  Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)
 
Augmented Reality for Learning and Accessibility
Augmented Reality for Learning and AccessibilityAugmented Reality for Learning and Accessibility
Augmented Reality for Learning and Accessibility
 
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
Art-Making Generative AI and Instructional Design Work:  An Early BrainstormArt-Making Generative AI and Instructional Design Work:  An Early Brainstorm
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
 
Engaging Pixabay as an open-source contributor to hone digital image editing,...
Engaging Pixabay as an open-source contributor to hone digital image editing,...Engaging Pixabay as an open-source contributor to hone digital image editing,...
Engaging Pixabay as an open-source contributor to hone digital image editing,...
 
Publishing about Educational Technology
Publishing about Educational TechnologyPublishing about Educational Technology
Publishing about Educational Technology
 
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...
Human-Machine Collaboration:  Using art-making AI (CrAIyon) as  cited work, o...Human-Machine Collaboration:  Using art-making AI (CrAIyon) as  cited work, o...
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...
 
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...
 

Kürzlich hochgeladen

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

"Mass Surveillance" through Distant Reading

  • 1. “MASS SURVEILLANCE” THROUGH DISTANT READING Shalin Hai-Jew • Aesthesia • March 2, 2017 • Marianna Kistler Beach Museum of Art • Kansas State University
  • 2. OVERVIEW Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015. 2
  • 3. SOME COMMON TYPES OF “DISTANT READING” AND APPLICATIONS Linguistic analysis Topic modeling  Theme and subtheme extraction Sentiment analysis • Positive and negative Text networks  Word relationships Authorship analysis (based on latent features)  Stylometry “fingerprinting”  Author gender identification Psychological analysis Cultural analysis, culturomics History-based applications Literary analysis  Dialogue analysis  Geographical referencing and patterning  Character analysis Predictive analytics  Classification  Trend 3
  • 4. STUDIED PHENOMENA IN THE COMPUTATIONAL LINGUISTIC ANALYSIS RESEARCH LITERATURE Political science, leader speech analysis (for profiling) State-of-a-field research Authorship identification Plagiarism detection Suicidality Movie popularity, song popularity Language studies Law enforcement Fraud detection Threat detection, and others 4
  • 5. WHY DISTANT READING? Textual interpretation  At computational speeds  At computational scale Reproducible, repeatable Measures various analytical constructs in quantized ways Surfacing latent (hidden) ideas and data patterns not seeable otherwise (such as by human “close reading”) Results comparable against large textual datasets of particular types of text (such as comparing a Tweetstream against other social media texts or even microblogging texts) Complementary to and augmentary of human “close reading” 5
  • 6. COMMON ANALYTICAL TRAJECTORIES Curation of text sets (corpora) -> distant reading data summaries -> zoomed-in analysis (of concepts, names, dates, locations, symbols, and numbers, etc.) -> human close reading  General-to-specific trajectory Baseline text set statistics based on curated text collections and text corpora Comparisons across text sets  Relative data 6
  • 7. “MASS SURVEILLANCE” AS A SEEDING TOPIC 7
  • 8. WHY “MASS SURVEILLANCE”? A timely construct A point-of-global discussion A mixed group of competing stakeholders re: the issue Wide public availability of five (somewhat) disparate text sets:  Academic  Mainstream journalism  Microblogging  Wikipedia articles  Leaked government data 8
  • 9. 9
  • 23. 23 combined bar and line chart
  • 30. 30 Gunning Fog Index Coleman Liau Index Flesch Kincaid Grade Level ARI (Automated Readability Index) SMOG Readability Formula Flesch Reading Ease ( /100) Set 1: Academic article text set (partial) 13.20 11.71 10.71 9.29 12.80 43.26 Set 2: Mainstream journalistic text set 14.28 13.88 12.12 12.40 13.75 39.25 Set 3: Twitter microblogging hashtag discourse text set 28.88 32.36 24.40 29.73 21.75 -38.46 (on a 100 point scale) Set 4: Wikipedia article network text set (partial) 11.09 12.25 9.46 8.31 11.07 44.39 Set 5: Leaked U.S. government text set (partial) 14.65 12.45 12.29 10.89 13.97 36.44 data table
  • 31. 31 Final Full Set Academic Themes and Subthemes Treemap treemap diagram
  • 32. 32 Final Full Set Mainstream Journalist Themes and Subthemes Treemap treemap diagram
  • 33. 33 Final Full Set #surveillance Microblogging Themes and Subthemes Treemap treemap diagram
  • 35. 35 Manually Coded #surveillance Hashtag Network on Twitter treemap diagram
  • 36. 36 Final Full Set of Mass-surveillance Article Network from Wikipedia Themes and Subthemes Treemap treemap diagram
  • 37. 37 Final Full Set Leaked Government Documents Themes and Subthemes Treemap treemap diagram
  • 49. 49 article-article network from Wikipedia (NodeXL or “Network Overview, Discovery and Exploration for Excel”) article network graph
  • 56. 56 from the leaked government dataset (long tail analysis) data table
  • 57. 57 from the leaked government dataset (coding nodes) (interactive) 3d cluster diagram
  • 61. 61 0 1 2 3 4 5 6 7 8 A : content B : dissemination C : front door D : hidden service E : information F : jflftflvjff dissemination G : node H : onion I : r dissemination NumberofMentions Auto-extracted Top-Level Themes from a Government Document An Article Histogram of a Leaked Government Document article histogram w/ main theme extractions
  • 62. 62 0 0.5 1 1.5 2 2.5 3 3.5 A : event B : facebook C : msn D : notification E : sources F : target Counts of Mentions of Top-Level Themes Auto-extractedTop-LevelThemes A Theme Histogram from a Government Document article histogram w/ main theme extractions
  • 63. ABOUT THE SEEDING TOPIC: “MASS SURVEILLANCE”? 63
  • 64. CONTRIBUTIONS TO THE “MASS SURVEILLANCE” TOPIC Academic writing: legal, philosophical, technological, and practical implications Mainstream journalistic articles: domestic and foreign government engagement with the issue (executive, legislative, judicial, and others) Microblogging messages: global surveillance challenges, changing technologies (drones) Wikipedia (open-source and crowdsourced encyclopedia): summary details, highlighted events, personages, URLs, and timely observations Government documents: bureaucratese, technical capabilities 64
  • 65. ABOUT THE RELATED TEXT SETS…FROM DISTANT READING Different genres of writing, based on a particular topic, manifest differently on different textual dimensions.  Some textual features seem to co-vary and may be because these are features of prose writing, or other factors.  Analysis of different features of the text sets may be helpful in identifying source types that may be most useful for certain types of research or questions.  Social media “netspeak” has not yet fully been captured in the two commercial tools used for this analysis. Average word counts per unit differed: academic (7,624 – 8,073 words per unit), mainstream journalistic articles (1,460 – 1488 words per unit), microblogging hashtag discourse (44 – 61 per user account), Wikipedia articles (6,710 – 7,216 words per article), and leaked government documents (1,711 – 1,800 words).  Variance in word counts were based on the uses of differing software programs to do the counts…and natural ambiguity in word identification. 65
  • 66. ABOUT THE RELATED TEXT SETS…FROM DISTANT READING (CONT.) Computational analysis of the five text sets showed a spike in terms of human drives across all sets…in terms of “power.” Because this applied across all five text sets, it may be that “power” is a driving issue of concern regarding “mass surveillance.” Sentiment was most present in the following (in descending order): Wikipedia articles, academic articles, leaked government documents, mainstream journalism, and hashtag discourse, according to analysis in NVivo 11 Plus but a different order was found using LIWC2015 (in descending order): mainstream journalism, Wikipedia articles, academic articles, leaked government documents, and hashtag discourse.  The only rank position of agreement was having hashtag discourse in last place with the least sentiment, which can partially be explained by the brevity of Tweets and the expression of emotion in emoticons and punctuation marks. 66
  • 67. ABOUT THE RELATED TEXT SETS…BASED IN PART ON SELECTED CLOSE READING All five text sets—academic, mainstream journalistic, microblogging messages, Wikipedia articles, and the government documents—were informed by the source government documents. The journalistic articles, with a rights narrative of deep intrusions into privacy, seem to have captured the readership’s attention, while academic and government documents were not consumed as broadly.  Journalistic articles ranked high in sociality measures—and that may indicate why people see it as connecting with their lives. Twitter was used to advertise writings from academia and mainstream journalism. Some academic publications cited mainstream journalistic pieces, but fewer journalistic pieces cited academic works. 67
  • 68. ABOUT THE RELATED TEXT SETS…BASED IN PART ON SELECTED CLOSE READING(CONT.) Academia did not have a lot of pieces on this issue in the subscription databases and other sources that were checked.  It may be that more time has to pass for researchers to study the issues. The technological complexity of the government documents required technology and legal and policy experts to interpret.  These documents were generally handled in a non-consumptive way for computational linguistic analysis. Non-consumptiveness refers to the extraction of statistical features of a text set without direct access to the underlying texts. For this analysis, the focus was on computational reading of the related documents, not a human interpretation of the text set or the related capabilities. 68
  • 69. ABOUT USING COMPUTATIONAL LINGUISTIC ANALYSIS TO “READ” UP ON AN ISSUE Selected text sets should be as comprehensive as possible in order to represent the topic. The text sets should be cleaned, so irrelevant elements may be eliminated. There should be clear documentation about how data was collected and processed and handled.  How the text sets are handled affect the results.  The bundling of particular text sets will affect results as well. Because social media only attracts some to participate, there can be some large gaps in informational coverage.  Social media platform APIs are often rate- and data-limited, so it’s important to review the terms of access to such data. Using multiple software tools to conduct analysis makes sense because there are differences between tool designs which will affect what is observed or not. The “validity” and “reliability” of software tools vary… 69
  • 70. ABOUT USING COMPUTATIONAL LINGUISTIC ANALYSIS TO “READ” UP ON AN ISSUE (CONT.) How the researcher asks questions and wields the technology will affect what is seeable and seen. There is not an “objective” reading machine… Subjectivity and judgment play a role.  External validation may be an important piece of research using computational reading. The data visualizations here are mostly interactive, and it is possible to link to original underlying data. All the data visualizations are informed by underlying data, and these should be accessed for deeper understandings.  These interactive features and underlying data should be engaged to fully benefit from the computational analyses. (Data visualizations are not used independent of the underlying data.)  “Non-consumptive” text analysis can sometimes be helpful even without the benefit of close reading and examination of the underlying text corpora used for the computational analysis. 70
  • 71. ABOUT USING COMPUTATIONAL LINGUISTIC ANALYSIS TO “READ” UP ON AN ISSUE (CONT.) Close reading always a part of the work, even though distant reading is brought to bear. Both enhance the other, and there are many rich processing sequences to read.  What a human reader “sees” vs. what a computer does differs. 71
  • 72. SOME POSSIBLE EFFECTS OF THE RESEARCH Different genres of texts may reach different parts of a population. Those who limit themselves to particular genres will only capture some aspects of information about a topic.  Those engaged in strategic communications would benefit from gaining a sense of which communications modes to engage in order to reach their target audience. It helps to know what issues are trending at any particular time…and the collective emotions which are being expressed. It helps to strategically target limited human close reading attention based on observations from distant reading. 72
  • 73. WHY “MASS SURVEILLANCE” AND “DISTANT READING”? There is an elision of mass surveillance and distant reading…in this slideshow…in part because technological enablements enable “mass surveillance” and dataveillance (data + surveillance, in a portmanteau term).  Practically speaking, human close reading would be wholly insufficient to interact with mass data. There are not enough human years to plough through the masses of structured and unstructured data being created today.  For complex data, human close reading requires close and slow attention (200 wpm / words per minute).  Human close reading is not known for great objective accuracy. Rather, human reading is informed by a trained and subjective lens. Human reading is known for a unique perspective and voice. 73
  • 74. WHY “MASS SURVEILLANCE” AND “DISTANT READING”?(CONT.) Together, “distant” and “close” reading expand human power to read, interpret, and learn. Sometimes, these complementary efforts help solve very human challenges.  Computational distant reading does not “displace” people or what they can bring to research and analysis. Oftentimes, the findings from each diverge, resulting in different insights attained in different ways. 74
  • 76. ABOUT NVIVO 11 PLUS Enables the building of unstructured, semi-structured, and structured data (using SQL as the understructure on Windows) Enables analysis of any data represented by UTF-8 (Unicode character set) but requires a main base language  Enables exact matches, stemmed words, synonyms, specializations, and generalizations  Enables the application of special characters and Boolean terms Enables the building of an exportable code dictionary Enables topic modeling, sentiment analysis, and “coding by existing pattern” Enables “distant reading” and interactive data visualizations including word trees, dendrograms, treemaps, cluster diagrams, and others 76
  • 78. ABOUT LIWC2015 PLUS Has a built-in linguistic analysis dictionary which has been built up over decades of refinement and empirical research Summarizes datasets on four scores: Analytic, Clout, Authentic, and Tone Includes psychological and socio-psychological elements Includes sentiment and emotional analysis features Includes gender reference counts Includes human drives counts Includes generic linguistic analysis counts (including for function words) 78
  • 79. ABOUT LIWC2015 PLUS(CONT.) Is back-stopped by decades of solid research Is a very well and smartly documented tool Is set up as a processor and a dictionary Enables the building of custom dictionaries to run against textual datasets to surface more unique insights 79
  • 80. ABOUT LIWC2015 PLUS(CONT.) Requires some in-depth reading of the related documentation  The Development and Psychometric Properties of LIWC2015  Linguistic Inquiry and Word Count: LIWC2015 Requires reading of years of research for the smoothest research applications Requires experience in Excel since data dump out into .xl or .xlsx  There is no proprietary file to save an analysis using LIWC2015 80
  • 81. CONTACT AND CONCLUSION Dr. Shalin Hai-Jew  Instructional Designer  Kansas State University  785-532-5262  shalin@k-state.edu “Distant reading” is a term originated by Franco Moretti (founder of the Stanford Literary Lab) in 2011. This slideshow is based on a research-based chapter forthcoming in 2017. 81