Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
From Early Modern Printing to Post-
Modern Indie Publishing
Using eMOP on AFP
Jennifer Hecker [@lasuprema]
 austinfanzine...
Fanzine? Zine?
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 3
“A magazine produced
for love, not money.”
- ...
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 4
Background
Original Concept
Austin Fanzine Digitization, Transcription
& Indexing Project
 Access-focused
 DIY Digitization & onli...
Evolution into DH Sandbox
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 6
Kevin Powell
Spring 2013
Kristin B...
Transcription Issues
Inconsistent layout (columns, offset
text, text-wrapped around other text)
Inconsistent humans (sty...
eMOP – Intro
 The Early Modern OCR Project (eMOP) is an
 Andrew W. Mellon Foundation funded grant project running out
of...
eMOP – The Numbers
Page Images
 Early English Books online
(Proquest) EEBO: ~125,000
documents, ~13 million
pages images ...
eMOP–TheData
10From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – The Problems
 Early Modern Printing
 Individual, hand-made typefaces
 Worn and broken type
 Poor quality equipm...
Page Images
12From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP–Workflow
13
Page image pre-processing
Tesseract Training
deNoising
From eMOP to AFP - Jennifer Hecker & Matt Christy ...
eMOP – Pre-processing
14
Original Binarized De-noised
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
AFP - Results
 Geek Weekly #3
 9 pages of GroundTruth for typed pages
 63.9% correct on all 9 pages
 94.2% correct on ...
eMOP – De-noising
16
Before: 35% After: 58%
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
eMOP – De-noising
17
Before After
From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
Integrating eMOP
 From the Page: new status designation will be added
 Launch refocused transcription effort this summer...
Possible Applications
 other collections of print ephemera with
messy layout like posters, flyers, handbills,
ticket stub...
More information:
 eMOP
 emop.tamu.edu/
 Austin Fanzine Project
 www.AustinFanzineProject.org
 www.facebook.com/Austi...
Nächste SlideShare
Wird geladen in …5
×

From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP

1.251 Aufrufe

Veröffentlicht am

A presentation on using the tools and workflow from the Early Modern OCR Project on the documents of the Austin Fanzine Project.

Veröffentlicht in: Bildung
  • My personal experience with research paper writing services was highly positive. I sent a request to ⇒ www.HelpWriting.net ⇐ and found a writer within a few minutes. Because I had to move house and I literally didn’t have any time to sit on a computer for many hours every evening. Thankfully, the writer I chose followed my instructions to the letter. I know we can all write essays ourselves. For those in the same situation I was in, I recommend ⇒ www.HelpWriting.net ⇐.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Gehören Sie zu den Ersten, denen das gefällt!

From Early Modern Printing to Post-Modern Indie Publishing: Using eMOP on AFP

  1. 1. From Early Modern Printing to Post- Modern Indie Publishing Using eMOP on AFP Jennifer Hecker [@lasuprema]  austinfanzineproject.org/ Matthew Christy [@matt_christy]  emop.tamu.edu/ &
  2. 2. Fanzine? Zine? From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 3 “A magazine produced for love, not money.” - I didn’t make this up, but I have no idea who said it first
  3. 3. From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 4 Background
  4. 4. Original Concept Austin Fanzine Digitization, Transcription & Indexing Project  Access-focused  DIY Digitization & online submissions  Creator/community-sourced transcription From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 5
  5. 5. Evolution into DH Sandbox From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 6 Kevin Powell Spring 2013 Kristin Bongiovanni Spring 2014 Kate Neptune Summer 2014
  6. 6. Transcription Issues Inconsistent layout (columns, offset text, text-wrapped around other text) Inconsistent humans (style-guides and subject knowledge help) Images From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 7
  7. 7. eMOP – Intro  The Early Modern OCR Project (eMOP) is an  Andrew W. Mellon Foundation funded grant project running out of the Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas A&M University, to  develop and test tools and techniques to apply Optical Character Recognition (OCR) to early modern English documents  from the hand press period, roughly 1475-1800.  eMOP aims to improve the visibility of early modern texts by making their contents fully searchable. The current paradigm of searching special collections for early modern materials by either metadata alone or “dirty” OCR is insufficient for scholarly research. 8From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  8. 8. eMOP – The Numbers Page Images  Early English Books online (Proquest) EEBO: ~125,000 documents, ~13 million pages images (1475-1700)  Eighteenth Century Collections Online (Gale Cengage) ECCO: ~182,000 documents, ~32 million page images (1700-1800)  Total: >300,000 documents & 45 million page images. GroundTruth  Text Creation Partnership TCP: ~46,000 double-keyed hand transcribed docuemnts  44,000 EEBO  2,200 ECCO 9From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  9. 9. eMOP–TheData 10From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  10. 10. eMOP – The Problems  Early Modern Printing  Individual, hand-made typefaces  Worn and broken type  Poor quality equipment/paper  Inconsistent line bases  Unusual page layouts, decorative page elements,  Special characters & ligatures  Spelling variations  Mixed typefaces and languages  over/under-inking  Digitization  Old, low-quality, small tiff files  Noise, skew, warp, bleedthrough 11From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  11. 11. Page Images 12From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  12. 12. eMOP–Workflow 13 Page image pre-processing Tesseract Training deNoising From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  13. 13. eMOP – Pre-processing 14 Original Binarized De-noised From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  14. 14. AFP - Results  Geek Weekly #3  9 pages of GroundTruth for typed pages  63.9% correct on all 9 pages  94.2% correct on 6 pages  Analysis of what didn’t work  Handwriting  Page 10 was printed in an unusual italic typeface  could create training – eMOP  Pages 24 & 25 had good text recognition, but wrong reading order  Can put in FromThePage 15 Page 10 From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  15. 15. eMOP – De-noising 16 Before: 35% After: 58% From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  16. 16. eMOP – De-noising 17 Before After From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15
  17. 17. Integrating eMOP  From the Page: new status designation will be added  Launch refocused transcription effort this summer From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 18
  18. 18. Possible Applications  other collections of print ephemera with messy layout like posters, flyers, handbills, ticket stubs, track listings, liner notes, other publications  DH coursework, public engagement From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 19
  19. 19. More information:  eMOP  emop.tamu.edu/  Austin Fanzine Project  www.AustinFanzineProject.org  www.facebook.com/AustinFanzineProject  @ATXFanzineProj  AFDTIP@gmail.com  “Why We’re Not Digitizing Zines,” Kelly Wooten, 2009, http://blogs.library.duke.edu/digital- collections/2009/09/21/why-were-not-digitizing-zines/ From eMOP to AFP - Jennifer Hecker & Matt Christy - 4/10/15 20

×