SlideShare ist ein Scribd-Unternehmen logo
1 von 50
18thConnect A Scholar-Directed Information Architecture ALA 11 July 2009
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Weitere ähnliche Inhalte

Andere mochten auch

Les Mediawijsheid 2 Invloed Van De Media Ilse
Les Mediawijsheid 2   Invloed Van De Media   IlseLes Mediawijsheid 2   Invloed Van De Media   Ilse
Les Mediawijsheid 2 Invloed Van De Media Ilsemirjamcs
 
APLIC 2014 - Sharing IS the point
APLIC 2014 - Sharing IS the pointAPLIC 2014 - Sharing IS the point
APLIC 2014 - Sharing IS the pointAPLICwebmaster
 
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...Stephan Raimer
 
Macro trends presentatie tom gouman
Macro trends presentatie tom goumanMacro trends presentatie tom gouman
Macro trends presentatie tom goumanTom Gouman
 
adv report Ukrainian digitals 2011
adv report Ukrainian digitals 2011adv report Ukrainian digitals 2011
adv report Ukrainian digitals 2011Elena Peday
 
Reading Out the State of the Body and How it Changes Under Therapy
Reading Out the State of the Body and How it Changes Under TherapyReading Out the State of the Body and How it Changes Under Therapy
Reading Out the State of the Body and How it Changes Under TherapyLarry Smarr
 
Pengaruh pengalaman terhadap peningkatan keahlian
Pengaruh pengalaman terhadap peningkatan keahlianPengaruh pengalaman terhadap peningkatan keahlian
Pengaruh pengalaman terhadap peningkatan keahlianyogieardhensa
 
Pemrograman visual 1 materi 7
Pemrograman visual 1 materi 7Pemrograman visual 1 materi 7
Pemrograman visual 1 materi 7R.m. Diyandaru
 
Nkra pendidikan (1)
Nkra pendidikan (1)Nkra pendidikan (1)
Nkra pendidikan (1)Waris Skss
 

Andere mochten auch (19)

2009 Blog & Twitter Forum
2009 Blog & Twitter Forum2009 Blog & Twitter Forum
2009 Blog & Twitter Forum
 
Task 2
Task 2Task 2
Task 2
 
Les Mediawijsheid 2 Invloed Van De Media Ilse
Les Mediawijsheid 2   Invloed Van De Media   IlseLes Mediawijsheid 2   Invloed Van De Media   Ilse
Les Mediawijsheid 2 Invloed Van De Media Ilse
 
cloud computing
cloud computingcloud computing
cloud computing
 
APLIC 2014 - Sharing IS the point
APLIC 2014 - Sharing IS the pointAPLIC 2014 - Sharing IS the point
APLIC 2014 - Sharing IS the point
 
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...
Referent Johannes Fahrenkrug: Die Grundbausteine von iPhone Anwendungen: View...
 
Macro trends presentatie tom gouman
Macro trends presentatie tom goumanMacro trends presentatie tom gouman
Macro trends presentatie tom gouman
 
adv report Ukrainian digitals 2011
adv report Ukrainian digitals 2011adv report Ukrainian digitals 2011
adv report Ukrainian digitals 2011
 
Reading Out the State of the Body and How it Changes Under Therapy
Reading Out the State of the Body and How it Changes Under TherapyReading Out the State of the Body and How it Changes Under Therapy
Reading Out the State of the Body and How it Changes Under Therapy
 
Venise Et Pavaroti Penses Tuquec Est musical
Venise Et Pavaroti Penses Tuquec Est musical Venise Et Pavaroti Penses Tuquec Est musical
Venise Et Pavaroti Penses Tuquec Est musical
 
Telehealth
Telehealth Telehealth
Telehealth
 
Proceso de controll
Proceso de controllProceso de controll
Proceso de controll
 
BULLDOG GIN
BULLDOG GINBULLDOG GIN
BULLDOG GIN
 
Storyboard
StoryboardStoryboard
Storyboard
 
Pengaruh pengalaman terhadap peningkatan keahlian
Pengaruh pengalaman terhadap peningkatan keahlianPengaruh pengalaman terhadap peningkatan keahlian
Pengaruh pengalaman terhadap peningkatan keahlian
 
Pemrograman visual 1 materi 7
Pemrograman visual 1 materi 7Pemrograman visual 1 materi 7
Pemrograman visual 1 materi 7
 
Htoh telehealth rpm
Htoh telehealth rpmHtoh telehealth rpm
Htoh telehealth rpm
 
Credits
CreditsCredits
Credits
 
Nkra pendidikan (1)
Nkra pendidikan (1)Nkra pendidikan (1)
Nkra pendidikan (1)
 

Kürzlich hochgeladen

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Kürzlich hochgeladen (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

18th Connect: A Scholar-Directed Information Architecture

Hinweis der Redaktion

  1. Today I will be arguing that it may hurt us in the long run to cordon-off open access from proprietary digital resources. Next Wednesday I’ll be attending a meeting with 18 th -century data owners, most notably Gale Cengage and the ESTC, one proprietary, one open access, and I need your advice and commentary on this presentation – I’ll be presenting to them what I present to you today, so feedback will help us in negotiations tremendously. The goal?
  2. We are allegedly in the middle of a data deluge, but my question is, deluge OF WHAT? Any text originally produced before about 1830, when modern printing practices firmly took hold, isn’t exactly data. We may have great metadata, we may have great page images, but if we don’t have clean plain text, this part of our cultural record will eventually be lost. REALLY? The image of me here on Youtube comes from a recent interview at the Digital Humanities Conference, and I’m sure 99% of the participants at that conference would think I was crazy to say “we don’t have any data.” But really, it will be the crunchable data, the texts that can be mined, that will be saved – I’m imagining myself a voice crying in the wilderness on this one.
  3. Dino has just described for us the Networked Infrastructure for nineteenth-century electronic scholarship, and I will be discussing 18thConnect, a similar scholarly community and research environment for
  4. the eighteenth century that has been spawned by NINES. Like NINES, 18thConnect will serve
  5. to guide and sustain the remastering of our cultural heritage in digital media and to insist that access to it is an intellectual right.
  6. 18thConnect will perform the same functions performed by NINES:
  7. it will provide peer review
  8. of digital projects created by scholars. Blake and Whitman were the first projects to be peer-reviewed by NINES.
  9. 18Connect has a similarly stellar project with which to open its doors, the Old Bailey Online, and its sister project, Plebeian Lives, forthcoming.
  10. NINES puts a sign of peer review on every member site, at the same time providing immediate access to the NINES online finding aid from any member site.
  11. NINES is a federated publisher, aggregating electronic scholarship
  12. while leaving it all in the hands of its developers;
  13. In order to provide a comprehensive research environment for scholars, and in order to make these digital projects interoperable within the universe of scholarship and primary materials, NINES also takes in proprietary databases, some of which you can see listed here on the right.
  14. NINES/18thConnect are aggregators of data, not possessors of it. JSTOR, ProjectMuse, Proquest, and Alexander Street Press give us metadata. Only users whose libraries subscribe to these proprietary resources get access to them.
  15. The18thConnect/NINES interface sends the user directly to the live records, updated and controlled by the proprietor, according to a permanent identification number associated with all metadata and plain text files.
  16. We ask those who participate in NINES to give us plain text files of all their digital resources. The NINES SOLR indexer crawls those files, tokenizes the words, and then, when the word is searched, returns a portion of its context, as you can see here. But no resource is reconstructable from that word index, so proprietary data is safe.
  17. The index contains all the uses of the word tiger, 300 of them, say, as tokens all in a row (this is how I’m picturing it) each linked to its own snippet. You’d have to sort the whole index of 400,000 items to recreate the texts in it.
  18. But for certain resources that scholars would really like to use, getting plain text that has been keyed is impossible, and so a big debate among members of the Executive council is whether to take OCR, often called dirty.
  19. Plain text files for eighteenth-century texts are very problematic.
  20. In fact, the OCR running behind these images is bad enough to have spawned a massive effort to actually transcribe – that is, type – these texts. The ECCO Text Creation Partnership
  21. has so far re-keyed 2418 texts. Unfortunately, ECCO has not yet re-incorporated these 2418 typed texts into its database of OCR so the corrections do not yet benefit all scholars, only those at participating institutions.
  22. As can be seen in a similar agreement made between Proquest and the Text Creation Partnership, the typing and coding work done by the participating university libraries is not distributable except to other institutions who are similarly typing and encoding until a period of exclusivity expires.
  23. While that period is five years, in the case of ECCO, it is not very clear when the clock starts, and libraries may be unable to openly distribute the expensively typed and coded texts for up to 10 years. They also cannot distribute ECCO’s page images – any images that they don’t own – ever.
  24. And there are currently 138,000 texts that have been digitized by Gale from that set of films and put on line into ECCO, with no immediate plans to keep going – digitizing from the microfilms has stopped.
  25. based on information from the Eighteenth Century Short Title Catalogue. So the information flow goes roughly like this: there are now over 400,000 entries in the ESTC – now become the “English Short Title Catalog”; there are 200,000 texts in the Microfilm series in roughly 11,000 rolls of film.
  26. One gets only a few more hits on the word “curiosity” in Clarissa using the medial “f” as opposed to the “s”
  27. And there are a few more hits per page.
  28. This is nonetheless a great improvement over Google’s OCR which will tell you that the 1784 edition of Richardson’s _Clarissa_ never uses the word “Curiosity,”
  29. and, when you look at a page where you know the word is used,
  30. it shows you something quite different: Cmiofity.
  31. Things look better in Google books when you search an 1820 version of the text, but this is still not optimal. Anna Barbauld’s 1820 edition of the novel, having lost the long ‘s’, does much better, giving us 7 returns, but again, I’m looking at one page containing three instances. The problems specific to 18 th -century type did not disappear until the 1830s, with the advent of what print historians call “modern type,” when the punch began to be situated in its matrix according to mathematical principles.
  32. 18thConnect has just been awarded a grant from the NCSA (National Supercomputer Center) and I-CHASS (Institute for Computing in Humanities, Arts, and Social Sciences)
  33. for supercomputer time which we will use to OCR 138,000 texts provided by Gale Cengage from the ECCO catalogue. We will meet with Gale on July 15, 2009, to discuss how this will work.
  34. 18thConnect is just at the beginning of a process that the Old Bailey has successfully completed. The Old Bailey Online project double-keyed everything before 1834, and only on proceedings published after that date did they work by correcting OCR.
  35. Part-of-speech tagging would enhance the capacity, but so will an improved OCR program that can handle the movement up and down of letters on a page: Gamera is of course designed to read notes which move up and down a page, and it is for this reason that we are developing it instead of the OCRopus, child of Google’s release to the open-source development world of Tesseract, although we will try to incorporate some of the methods used for creating it.
  36. Now I will depict for you the future of 18thConnect, based on the reality of NINES. As Dino has shown, you log into NINES to get to your own “My NINES” page.
  37. Here is a mock-up of a future “My 18thConnect Page,” and every member of ASECS, BSECS, and ISECS – national and international societies for eighteenth-century studies – will be given one of these pages.
  38. I’m relying on some work by Brad Pasanek to imagine this future, so I wanted to site his database The Mind is a Metaphor.
  39. Here are his results, but pretend they came from a search in 18thConnect. The first return, one would click on the title and be sent to the document in Gale
  40. , unless one’s library did not subscribe.
  41. The second return, if we clicked on it
  42. Would take us to the ESTC record.
  43. Snippets of text would be returned, and scholars who don’t subscribe would be able to get the names of holding libraries from the ESTC should they needto see the text. Let me now conclude.
  44. There are many things to praise and to critique about ECCO, the most thorough analysis having occurred recently at an MLA panel, the talks of which are available on YouTube.
  45. I myself have attacked the ECCO catalogue in videos about Open Access resources.
  46. And we want to do all we can to encourage projects such as Benjamin Pauley’s attempt to connect editions in google books to ESTC numbers.
  47. But in the future we will only know the eighteenth century through the texts that can be culled from the avalanche of data, and only machine readable, crunchable data will come to the top. We need to work with commerical providers in order to make the best data possible.
  48. We will re-OCR and then automatically tag as much of Gale’s ECCO data as possible, returning it to them better than we found it so that they can create their own tools.
  49. We’ll use the tags that we generate to populate our own finding aids, giving scholars the opportunity to know and find what’s there, all the while helping commercial firms add value to their corpuses. Only this, I believe, will preserve for us our precious cultural heritage.