Referring is such an essential part of scholarly activity across disciplines that it has been regarded by John Unsworth (2000) as one of the scholarly primitives. There is, however, a kind of citation whose potential has not been fully exploited to date, despite the attention they recently received within Digital Classics research (Romanello, Boschetti, and Crane 2009; Smith 2010; Romanello 2011). These are called “canonical citations” and are the references commonly used to refer to passages of ancient texts. Given their importance to classicists, Crane et al. (2009) have argued, services for extracting and exploiting them should be part of the Cyberinfrastructure for Classics.
In this paper I discuss the various aspects of making such citations–together with the network of links they create–computable. Firstly, I will present the characteristics of such citations by showing how their semantics can be modeled by means of a formal ontology. Once such an ontology is created and populated, it can be used by a machine as a surrogate for domain knowledge in order to make inferences about texts and citations.
Secondly, I will illustrate how an expert system that captures canonical citations and their meaning from modern journal papers can be implemented by using Natural Language Processing techniques that are well known in Computer Science. I will then present two resources that were developed for this task and made available under Open Source licenses: 1) a manually corrected, multilingual corpus of approximately 30,000 tokens drawn from L’Année Philologique with annotated Named Entities; 2) a machine learning-based classifier that can be trained with this corpus to extract from texts canonical citations and mentions of ancient authors and works.
Finally, I will show some examples of how the citation network so extracted– consisting of journal papers and the ancient texts they refer to–can be exploited to offer scholars new ways and tools to studying intertexuality.
References
Crane, Gregory, Brent Seales, and Melissa Terras. 2009. “Cyberinfrastructure for Classical Philology.” Digital Humanities Quarterly 3.
Romanello, Matteo. 2011. “New Value-Added Services for Electronic Journals in Classics.” JLIS.it 2. doi:10.4403/jlis.it-4603.
Romanello, Matteo, Federico Boschetti, and Gregory Crane. 2009. “Citations in the digital library of classics: extracting canonical references by using conditional random fields.” In , 80–87. Morristown, NJ, USA: Association for Computational Linguistics.
Smith, Neel. 2010. “Digital Infrastructure and the Homer Multitext Project.” In Digital Research in the Study of Classical Antiquity, ed. Gabriel Bodard and Simon Mahony, 121–137. Burlington, VT: Ashgate Publishing.
Unsworth, John. 2000. “Scholarly Primitives: what methods do humanities researchers have in common, and how might our tools reflect this?.” http://www3.isrl.illinois.edu/~unsworth/Kings.5-00/primitives.html.
3. . Cyberinfrastructure for Classics
(Crane, Seales, and Terras 2009, 26) :
Scholarly disciplines such as classics need specialized
named entity searches: we need to determine not only
whether “Th. 1.38” is a citation to a primary source but
also, if so, whether it designates Thucydides, book 1,
chapter 38, Theocritus, Idyll 1, line 38 or some other text.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
4. . PhD Research Project
.
scope
.
modern (XIX–) publications in Classics
.
.
goal
.
new/more effective means
to find information for studying classical texts
over (possibly) large text corpora
.
.
methods
.
to capture and make computable canonical citations
by applying Computer Science methods/tools (NLP, ontology
modelling)
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
12. . Corpora: JSTOR & APh
JSTOR
comprehensive: ~71k paper
(sometimes noisy) OCR
API, license agreement
APh
high density of canonical citations
clean text
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
13. . L’Année Philologique as corpus
APh volumes: 1 (1924)–80 (2009)
my corpus
7.5-8% of vol. 75
~30k tokens (clearly transcribed text)
multilingual (de, fr, en, es, it)
annotations
POS (automatic)
NE (manually corrected)
CC License (BY-NC-SA)
https://github.com/mromanello/APh_Corpus
see Romanello (2013)
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
14. . Example: APh 75-06697
.
S. Braund & G. Gilbert “An ABC of epic ira: anger, beasts, and cannibalism”
Yale Classical Studies 32:250-285
.
In Statius’ « Achilleid » (2, 96-102) Achilles describes his diet of
wild animals in infancy, which rendered him fearless and may
indicate another aspect of his character - a tendency toward
aggression and anger.
The portrayal of angry warriors in Roman epic is effected for the
most part not by direct descriptions but indirectly, by similes of wild
beasts (e.g. Vergil, Aen. 12, 101-109 ; Lucan 1, 204-212 ;
Statius, Th. 12, 736-740 ; Silius 5, 306-315).
These similes may be compared to two passages from Statius (Th.
1, 395-433 and 8, 383-394) that portray the onset of anger in
direct narrative. Analysis of these passages demonstrates that the
concept of « ira » in epic takes its moral aspect from the context.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
19. . Named Entity Features
.
Linguistic Features
.
POS tags
neighboring words
.
.
Orthographic Features
.
punctuation
brackets
case
number
pattern
.
.
Semantic Features
.
matching vs dictionary of author names/abbreviations
matching vs dictionary of author titles/abbreviations
. . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
20. . Active Annotation (AA)
based on Active Learning paradigm (Ekbal et al. 2011)
tenet: training more effective when selection is supervised vs random
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
21. . NER Classifier Evaluation
classifier: Conditional Random Fields
10-fold cross evaluation
Class p r F
AAUTHOR 57.89 (62.75) 38.60 (40) 46.32 (48.85)
AWORK 68.11 (62.20) 78.85 (72.86) 73.09 (67.11)
REFAUWORK 71.58 (71.43) 78.16 (75) 74.73 (73.17)
REFSCOPE 72.37 (66.34) 86.14 (67.68) 78.66 (67)
Overall 69.64 (65.22) 79.73 (62.28) 74.34 (63.72)
Feature Set r p F
POS 42.22 66.34 51.12
POS+ortho 63.83 78.09 69.49
POS+ortho+sem 69.07 79.85 73.44
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
22. . Wrap Up
.
New Perspectives
.
tool to search/browse large corpora of publications
by cited author/work/passage
by co-cited author/work/passage
tool to study the history of Classics
track interpretations in intertextuality studies
combine 2 kinds of citation networks
.
.
Further work
.
citation resolution
use cases, e.g. Pentecontaetia
offer as an open web-service
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics
24. . References
Crane, Gregory, Brent Seales, and Melissa Terras. 2009. “Cyberinfrastructure
for Classical Philology.” Digital Humanities Quarterly 3.
http://www.digitalhumanities.org/dhq/vol/3/1/000023/000023.html.
Ekbal, Asif, Francesca Bonin, Sriparna Saha, Egon Stemle, Eduard Barbu,
Fabio Cavulli, Christian Girardi, and Massimo Poesio. 2011. “Rapid Adaptation
of NE Resolvers for Humanities Domains using Active Annotation.” Journal for
Language Technology and Computational Linguistics 26: 39–51.
Romanello, Matteo. 2013. “Creating an Annotated Corpus for Extracting
Canonical Citations from Classics-Related Texts by Using Active Annotation.”
In Computational Linguistics and Intelligent Text Processing. 14th
International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013,
Proceedings, Part I, ed. Alexander Gelbukh, 1:60–76. Springer Berlin
Heidelberg. doi:10.1007/978-3-642-37247-6_6.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Matteo Romanello (DAI, KCL) @mr56k Exploring Citation Networks to Study Intertextuality in Classics