Text is at the heart of many fields in the Humanities. These slides, from the introductory workshop strand of the Digital Humanities at Oxford Summer School (DHOxSS) provides an introduction to methods and technologies of remediating analogue text into digital forms.
1. Reborn Digital:
Text
Transmission
& Technology
Bodleian Libraries
UNIVERSITY OF OXFORD
Pip Willcox
Centre for Digital Scholarship
@pipwillcox
Digital Humanities at OxfordSummer School 2016
Images taken from the Shakespeare Quartos Archive,http://quartos.org/
ofHamlet Q3 (1611) Bodleian Arch. G e.13
Image credit: Bodleian Libraries
4. Mediation and Remediation
Metadata — Early Modern Letters Online
Image — Early English Books Online (EEBO)
Optical Character Recognition (OCR) — Google
Books
Handwritten Character Recognition (HCR) —
Transcribe Bentham
Transcribed — EEBO Text Creation Partnership
(EEBO-TCP)
Encoded — Shakespeare Quartos Archive
Edited — Digital Renaissance Editions
Digital print — Oxford Scholarly Editions Online
Born digital — UK Web Archive
Bodleian Libraries
UNIVERSITY OF OXFORD
5. Mediation and Remediation
Small
Large
Heavy
Unwieldy
Fragile
Rare
Old
Uncut
Foldouts
Bodleian Libraries
UNIVERSITY OF OXFORD
Haptic: paper
quality, heft,
fragility
Material: scale,
layout,
impression
Experiential:
smell, sound,
aura
miniatureLibrary,DepartmentofRareBooks,BodleianLibrary.
Photograph:PipWillcox
Material Text Network,University o
Oxford
Mak, Bonnie. How the Page Matter
Toronto: University of Toronto Pres
2011
6. Here’s one i made earlier
Shakespeare’s First Folio (1623)
Bodleian Arch. G c.7 (“bodleian”)
Bodleian Arch. G c.8 (“malone”)
Bodleian Libraries
UNIVERSITY OF OXFORD
pipwillcox
BodleianLibraries
7. Optical Character Recognition
Automated conversion of printed text into
machine-encoded text (data entry)
Pattern recognition with machine learning
e.g. Eighteenth Century Collections Online
http://gdc.gale.com/products/eighteenth-
century-collections-online/
Bodleian Libraries
UNIVERSITY OF OXFORD
8. over to you #2
AS YOU LIKE IT. This present SATURDAY, June 27, 1789, A New
Entertainment offer'd to the Public will be continued every
Evenin 'till the first of July, By Mr. P A L M E R, Of DRURY LANE
THEATRE, And Mr. C A R T W R I G H T, Whose Performances on
the MUSICAL GLASSES >h Has been honqured with such
diffinguilhed Patronage. The Performance will be in the Grand
Saloon of that capacious and elegant Building, The L Y C E U M S
T R A N D. Mr. PALMER will deliyer his Whimsical, Satirical,
Serious, and Comic, '0 L I , IN T H REE PA R T S: In which he
flatters himself, there will be SOMETHING to PLEASE all
PALATES. The PERFORMANCE shall have to recommend it
VARIETY and N)VELTY, In the following Manner: P A R T I. An
Occasional Address, (Written by Mr. Bellamy) The Beauties of
the Drama, fele&ed in the following Characers; Brutus, .... Sir
John Falstaff, Profpero, Mercutip, And various Comic
Characers, from FOOTE. P' A R T' II. A Tragedy for Warm
Weather, called L I N D A M I R A. The Charaters of the King,
Confidante, Lindamira, and the Epilogue By Mr. P A L M E R,
WVho […]
Bodleian Libraries
UNIVERSITY OF OXFORD
9. Bodleian Libraries
UNIVERSITY OF OXFORD
optical character recognition
Lyceum Theatre,
London, As you
like it (1789),
ESTC T160699.
ImagefromJISCHistoricTexts:
https://data.historicaltexts.jisc.ac.uk/view?pubId=ecco-0605102700&terms=AS%20YOU
%20LIKE%20IT.%20This%20present%20SATURDAY,%20June%2027,%201789,%20A
%20New%20Entertainment&pageId=ecco-0605102700-10
10. over to you #2
The Accomp!inh'd I A, D 'S DE :I, I s l N PRESERVING, PIiYSICK, BE/
AUT'r ,-- i JC, COOKERx, and GAR1DEING. C 0 N 'F A I N 1 N G, I.
The iArt of PrsIr.viN , and Cr?;'0 - i i;,, Fruitsand Flowers, andl
mkinlg al l forts of Conserves, Syrups, Jelics, and Pic!l!s II. The
Plr!Fcal CiRn,'t: Or, cxccd- i lent Receipts in Phfic-' and C/
i?'arhnyr. ;ilo ; some New Receipts relating to tie i air SEX,
v.lihereby they may be richly irnililli'd w.ithi all manner of el,::
2l.t!fi/' lL %tt's tO a''l 1 Lovelinef; to the Face andl Lody. HII Tle
Coml'i C "7 G,: Or Direa ions for UDre.'!h:n ' ll !'ort;. : 1'le:,.!,'
'...:' .alid Filh, aitcr ' it iC'.( !: li- :. , !;0o i:l IU e 'It the Bi t:. (.:r' ;
(h . . t . I.!ing of S.uce;, P)' i, .il_ i'.u t., C!'i; r ;i,1, c. 5'I. The
' ;'o.:?: ..f;.- s .' , iCiul.'lidnC .adi-:: :,ni otl}:er., il: t1.
' .!!'m : ._ ;iio2- o,' ta.ing all na ir. o.r "Fi'i,,. i t i} l: i-lonl'd or
River. V. The I.y ?; Di- ',; :., :., C.;- ,| 'e.s: Or, tl', comnp!..'.t ri/v:,:,'/
7', .ith hlle I .-atirvc ,red ;1: .1 ' ;:>!i ! n-ts *,tf Pl::'' .Tad T Flon
er. ThLe iF.le:'enth E D L I O N 101! i , A n. s i: r. I 0 N I) C .V:
Prilltei t;r i j,/ /; I' -- )'/'/*,/,' B_ !'i':.:-v, at tle .1e^.~ , 1 (;2'cc!t-T(,:
r..tr;' ',!:
Bodleian Libraries
UNIVERSITY OF OXFORD
11. Bodleian Libraries
UNIVERSITY OF OXFORD
optical character recognition
Hannah Woolley,
The accomplish’d
lady’s delight
(1720).
ImagefromJISCHistoricTexts:
https://data.historicaltexts.jisc.ac.uk/view?
pubId=ecco-0465502300&terms=woolley&filter=author%7C%7CWoolley,
%20Hannah&filter=service%7C
%7Cecco&pageTerms=woolley&pageId=ecco-0465502300-10
12. over to you #2
Bodleian Libraries
UNIVERSITY OF OXFORD
13. Bodleian Libraries
UNIVERSITY OF OXFORD
optical character recognition
Raphael
Holinshed,
The firste
volume of
the
Chronicles
of
England,
Scotlande,
and
Irelande…
(1577).
ImagefromEarlyEnglishBooksOnline:
http://gateway.proquest.com/openurl?
ctx_ver=Z39.88-2003&res_id=xri:eebo&rft_id=xri:eebo:image:20076
14. Things can only get better
Bodleian Libraries
UNIVERSITY OF OXFORD
http://emop.tamu.edu/
15. affordances of digital text
Bodleian Libraries
UNIVERSITY OF OXFORD
Read it — dissemination, preservation
Free text search
Distant reading
At scale
Automated tagging, e.g. linguistic, geographic
pipwillcox
16. Affordances of transcribed text
e.g. Early English Books Online Text
Creation Partnership
99.98% accuracy = 1 character error in
20,000
Or thereabouts
http://eebo.chadwyck.com/
http://www.textcreationpartnership.org/
Bodleian Libraries
UNIVERSITY OF OXFORD
17. Affordances of hand-encoded text
First pick your Extensible Markup Language
(XML):
Resource Description Framework (RDF)
Encoded Archival Description (EAD)
Text Encoding Initiative (TEI)
…anything to separate your data from your
interface
Bodleian Libraries
UNIVERSITY OF OXFORD
18. Affordances of XML
Machine-readable and human-readable(ish)
Interoperable open standard (W3C)
Extensible semantic markup
Bodleian Libraries
UNIVERSITY OF OXFORD
19. Affordances of XML
Machine-readable and human-readable(ish)
Interoperable open standard (W3C)
Extensible semantic markup
Bodleian Libraries
UNIVERSITY OF OXFORD
Not always the answer
Not an end in itself: a
research/publication tool
Hierarchical structure
20. Affordances of the Text Encoding Initiative (TEI)
An XML international standard
A set of Guidelines
For encoding historical text
A community of practice:
conference, mailing list, journal,
wiki, SourceForge, toolchain, publication
Future-proof
Bodleian Libraries
UNIVERSITY OF OXFORD
22. Revolutionizing research
“But the sad truth is that much of what it
has taken me a lifetime to build up by
painful accumulation can now be achieved
by a moderately diligent student in the
course of a morning.”
—Keith Thomas
Bodleian Libraries
UNIVERSITY OF OXFORD
http://www.lrb.co.uk/v32/n11/keith-
thomas/diary
23. Case Study: Shakespeare Quartos Archive (SQA)
Origins: pre-1642 quartos from
Bodleian Libraries
UNIVERSITY OF OXFORD
JISC/NEH Transatlantic Digitization
Collaboration Grant
http://quartos.org/
26. Case Study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
http://quartos.org/
27. Case Study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=“margin-left” hand=“#af” type=“intervention”
resp=“#fol”>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
http://quartos.org/
28. Case Study: SQA
Bodleian Libraries
UNIVERSITY OF OXFORD
Folger, STC 22279, copy 5
<l>With all my imperfections on my head.</l>
<l><add place=“margin-left” hand=“#af” type=“intervention”
resp=“#fol”>Ham</add>Oh horrible, O horrible, most horrible,</l>
<l>If thou hast nature in thee beare it not,</l>
<delSpan> surrounding the original <l>
<anchor> (for the <delSpan>)
<addSpan>
closing </sp> (speech)
opening <sp>
opening <speaker> with its associated attributes
the line, in its entirety
second closing </sp>
<anchor> (for the <addSpan>)
opening <sp> (to reopen the printed speech)
opening <speaker> (to repeat the original speaker)
http://quartos.org/
29. Case Study: First Folio
Bodleian Libraries
UNIVERSITY OF OXFORD
Nobody has ever answered “yes” to “Let me show you my XML”
HeatherFroehlich
...except a computer
DavidDeRoure
http://firstfolio.bodleian.ox.ac.uk/
30. Affordances of the Text Encoding Initiative (TEI)
An XML international standard
A set of Guidelines
For encoding historical text
A community of practice:
conference, mailing list, journal,
wiki, SourceForge, toolchain
Future-proof
Bodleian Libraries
UNIVERSITY OF OXFORD
Time and funding
Expert editors
A learning curve
“An extended subset”
31. Over to you #3
How much do you know about the text
resources you use?
Bodleian Libraries
UNIVERSITY OF OXFORD
32. Provenance and context
Who made the text?
From what?
How?
Why?
When?
Bodleian Libraries
UNIVERSITY OF OXFORD
33. Case study: Provenance
Bodleian Libraries
UNIVERSITY OF OXFORD
http://whise.kmi.open.ac.uk/#proceedings
DavidDeRoure,
AlfieAbdulRahman,
PipWillcox
34. Editorial Principles
For what purpose was the text produced?
Was OCR/HWR software used? Which? Is the
text available?
What were its transcription guidelines?
What were its encoding guidelines?
Has it been improved?
Bodleian Libraries
UNIVERSITY OF OXFORD
35. license for Re-use
What are you allowed to do with the text?
Unknown?
Open?
Licensed, e.g. Creative Commons?
(Which Creative Commons?)
Bodleian Libraries
UNIVERSITY OF OXFORD
37. license for Re-use
Bodleian Libraries
UNIVERSITY OF OXFORD
https://twitter.com/dder/status/
455707549593919490/photo/1
DavidDeRoure
38. Bodleian Libraries
UNIVERSITY OF OXFORD
Distant reading — CREME, Morph Adorner,
DocuScope, Duhaime and Zimmer, Linguistic DNA
Close reading — Verse Miscellanies Online, F21,
FORM: (Forms Online: Renaissance to Modern)
Re-use
39. Caveat lector: a digital canon?
A universal good?
Religious, political, economic, pragmatic
influences
Selective history, selected histories
More remediation: more distance
Bodleian Libraries
UNIVERSITY OF OXFORD
40. The Future, or, An Invitation to Hubris
More connections across: texts, programs,
communities…
More integration between: semantic
interoperability…
More tools, more animation
Bodleian Libraries
UNIVERSITY OF OXFORD
Co-constitution
Heterogeneous actors, human and machine
Performative and social
SusanHalfordetal
41. Acknowledgements
Bodleian First Folio: http://firstfolio.bodleian.ox.ac.uk/
De Roure, David, Abdul-Rahman, Alfie, and Willcox, Pip, ‘On the Description of Process in Digital Scholarship’, in the proceedings of
WHISE 2016: http://whise.kmi.open.ac.uk/#proceedings d
De Roure, David on the R Dimensions: https://twitter.com/dder/status/455707549593919490/photo/1
Early English Books Online Text Creation Partnership: http://eebo.chadwyck.com/
Early Modern OCR Project: http://emop.tamu.edu/
Eighteenth Century Collections Online: http://gdc.gale.com/products/eighteenth-century-collections-online/
foter.com: http://rack.2.mshcdn.com/media/ZgkyMDEyLzEyLzA4L2RkL0NDaW5mb2dyYXBoLmpJei5qcGc/f9f19a65/65e/CC-infographic.„g
Halford, Susan, Pope, Catherine and Carr, Leslie (2010) A manifesto for Web Science. In, Proceedings of the WebSci10: Extending the
Frontiers of Society On-Line, Raleigh, US, 26 - 27 Apr 2010, 1-6: http://eprints.soton.ac.uk/271033/
mak, Bonnie. How the Page Matters. Toronto: University of Toronto Press, 2011.
Shakespeare Quartos Archive: http://quartos.org/
Siefring, Judith & Willcox, Pip, “More than was Dreamt of in Our Philosophy: Encoding Hamlet for the Shakespeare Quartos Archive”,
in Nelson, B. & Terras, M., eds. Digitizing Medieval and Early Modern Material Culture. New Technologies in Renaissance Studies.
Toronto: Iter; Tempe, AZ: Arizona Centre for Medieval and Renaissance Studies, 2012.
Text Creation Partnership: http://www.textcreationpartnership.org/
Text Encoding Initiative: http://www.tei-c.org/
Bodleian Libraries
UNIVERSITY OF OXFORD
quoted: Heather Froehlich and David De Roure
Early English Books Online Text Creations Partnership editors: Jonathan Blaney, Simon Charles, Amanda Flynn, Emma Huber,
Colm MacCrossan, Judith Siefring, David Tomkins, Pip Willcox
Shakespeare Quartos Archive editors: Emma Huber, Judith Siefring, Pip Willcox
Bodleian First Folio editors: Lucienne Cummings, Emma Stanford, Judith Siefring, Pip Willcox