LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
creating a trading zone around twitter srchives. case study: paris attacks
1. Creating a “trading zone”
around Twitter archives
Case Study: Paris Attacks
FIAT/IFTA, 13 October 2016
Zeynep Pehlivan (Ina)
Valérie Schafer (ISCC, CNRS/Paris-Sorbonne/UPMC)
3. Main issues
● to document the archiving of the Web and of Twitter during the events
● to question the conditions and possibilities of elaborating corpora
● to bring out the first elements which can emerge from these massive
data
Archive-It
Charlie Hebdo collection
https://archive-it.org/collection
s/5190
4. Legal Deposit Web @Ina
● Law 2006 shares Web deposit responsibilities between mandated institutions :
collects sites related to audiovisual communications
(radio, tv channels, blogs etc.)
collects all other sites
5. Archive Twitter @ina
● Crawling by using Twitter API (data)
● Since February 2014
● 11 000 users (timelines)
● 400 hashtags
● 400 millions of tweets
● Recontextualization(s)
6. Rest API : timelines
● Total : 50 millions
● Average per day: 48 000
8. Restrictions
● Streaming API : 400 hashtags, 5000 users
● 1% of tweets published at time t
● REST API : 3200 old tweets per user
● Search API : 15 minute window of 180 for user and
450 for app
10. Recontextualization
● Canonical version?
● Twitter page?
● Data
● Authenticity, integrity
● Second Screen
● TV Sync
● Indexing
● Search / data mining
● Data coverage
● Generic or specific tools
● Open data?
17. Opening the black boxes of Web archiving
3 interviews with
- Jefferson Bailey & Sylvie Rollason-Cass (Archive-It team) on March 2016
https://asap.hypotheses.org/125
- Annick Le Follic (BnF) on March 21, 2016
https://asap.hypotheses.org/168
- Thomas Drugeon (Ina) on March 21, 2016
https://asap.hypotheses.org/173
18. An emergency collect: why, when, what for, … ?
Which methods and tools ?
Which issues and limits?
Differences between both events?
Openness and closure
Governance, human and material resources
How to collect and document “the now” and
“the flow”
19. After observation, manipulation
•Some difficulties : discovering the tools before focusing on specific
questions
•Data deluge and dispersion
•Methodological false moves
as newbies
21. Some scientific issues
Long-term perspective / Non french-centric overview (Arquivo.pt, etc.)
A phenomena of third generation ? (cf: Dominique Boullier,
http://shs3g.hypotheses.org/114)
What does an hashtag, a retweet, etc. mean ? →boyd, danah, Scott Golder,
and Gilad Lotan. 2010. “Tweet, Tweet, Retweet: Conversational Aspects of
Retweeting on Twitter.” HICSS-43. IEEE: Kauai, HI, January 6.
Close and distant reading (--> F. Moretti)
22. “Although Moretti in the main uses the distant reading approach to the
study of large amounts of digital data, I will argue that none of these two
approaches are per se inscribed or inherent in the digital material. By this I
mean that simply because collections of digital material are in many cases
big data, which opens the possibility of asking and answering new types of
research questions, this does not necessarily mean that they have to be
approached as Big Data”. (p. 11)
“The question is not if scholarly studies within the Humanities want to “go
digital”, but rather how”. (p. 9)
Brügger N., Humanities, Digital Humanities, Media Studies, Internet Studies: An
inaugural Lecture, Aarhus, CFI, 2015.
23. Creating a trading zone: Conclusion
« The metaphor of a trading zone is being applied to collaborations in S &
T. The basis of the metaphor is anthropological studies of how different
cultures are able to exchange goods, despite differences in language and
culture »
https://en.wikipedia.org/wiki/Trading_zones