SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Outbound Harvesting with Encore as a
Library Space-Saving Strategy: The
Case of HathiTrust Docs
Christopher C. Brown
University of Denver, Penrose Library
(303) 871-3404
cbrown@du.edu
Friday, April 15, 2011
DR, IR,
Digital
Texts

Inbound Harvesting
Outbound
Harvesting

This presentation will show how Encore
harvesting can be used to mitigate a space
problem in a library, substituting online access
for the need for physical access to the collection.
The government documents collection will be the
primary focus.
Depository since 1909
 Historically a 70-75%
selective
 Now a 4.8% selective, but
receive 100% of online
cataloging
 Adding URLs to historic
documents


About University of Denver
Currently 80% of our paper
documents are in storage
 We will be remodelling our
library – totally displaced for
at least 18 months; 100% of
documents will be in storage
 Government documents will
remain in storage after
renovation


The Problem
Our users are accustomed to using
electronic documents
 Need to divert attention away from
physical collection holdings
 Encore harvesting of Hathi Trust can do
this
 OCLC report: 15% of HathiTrust public
domain materials are government docs*


Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library
Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/201101.pdf.

Partial Solution: Using Encore for
Outbound Harvesting
http://www.openarchives.org/
 Promotes interoperability standards for
dissemination of content
 Hathi Trust allows harvesting of its
records
 Innovative Interface’s Encore catalog
allows for records to be harvested (with
the purchase of a harvester connection)


OAI-PMH Harvesting
Local Site

Classic
OPAC

Remote Site
with Digital
Content

Harvester

with Digital
Content

Encore
(III)

(next-gen
catalog outside
the ILS box)

Traditional
III
Millennium
ILS

Remote Site
with Digital
Content

• Harvested records appear only in Encore,
not in “classic” catalog
• Harvested records update on a periodic
schedule – in our case daily

Encore Model
Hathi Trust Attributes
From: http://www.hathitrust.org/rights_database

PD = where docs generally live
•

•
•
•

Mass identification of copyright status based on bibliographicallyderived information: a) As texts are loaded, a set query in Mirlyn identifies
those texts that are:US federal government documents, or
published in the US prior to 1923, or
published outside of the US before 1870
These are treated as public domain (ATTRIBUTE name=pd) based on
bibliographically-derived information (REASON name=bib). We do not restrict
access to these materials. b) Those texts that do not meet these criteria (e.g,.
US post-1923 and not a government document) are treated as in-copyright
(i.e., ATTRIBUTE name=ic and REASON name=bib). c) An additional attribute
is used to represent works published outside the United States between 1870
and 1923 because copyright status for these works depends on the location of
the user. Works published outside the US prior to 1923 are in the public
domain; however, due to the variations in copyright law in countries outside
the US, it is estimated that 1870 is the earliest date works published in these
countries may still be under copyright. Therefore, users accessing the volume
from US IP addresses will have access to the works published outside the US
between 1870 through 1923; however, users with non-US IP addresses will not
(ATTRIBUTE name=pdus and REASON name=bib).

PD vs. PDUS
Public Domain Distribution
I wanted to see how many government
documents were in our HathiTrust harvest
 Limit to HathiTrust for a given year
 Examine first result on each page of 25
results (4% of results) [limitation: Encore
only displays first 1,000 results]


Sampling Method
Date Range
2000-2009
1990-1999
1980-1989
1970-1979
1960-1969
1950-1959
1940-1949
1930-1939
1920-1929
1910-1919
1900-1909
1890-1899
1880-1889
1870-1879
1860-1869

Hathi Totals
505,682
709,214
723,657
631,110
546,914
281,615
184,755
175,103
175,226
175,148
179,018
112,295
83,950
58,624
50,907
4,593,218

Hathi All Pub
Domain
pdus + pd
14,140
29,163
33,753
28,633
21,244
20,861
17,096
16,237
66,563
169,923
153,284
110,605
82,809
57,826
50,337
872,474

Hathi pdus DU pd Harvest
726
13,369
880
28,164
1,204
32,321
2,046
26,189
1,987
18,991
863
19,893
600
16,253
654
15,317
27,108
28,854
75,955
61,230
70,900
47,999
50,502
34,742
38,928
23,855
27,202
17,751
2,273
45,790
301,828
430,718

Docs Sampling
13,340
99.78%
26,662
94.67%
31,370
97.06%
25,607
97.78%
7,668
40.38%
3,888
19.54%
3,771
23.21%
2,600
16.97%
1,529
5.30%
4,124
6.73%
2,265
4.72%
596
1.72%
699
2.93%
319
1.80%
248
0.54%
124,686
28.95%

Statistics as of mid-March, 2011
The Docs Sampling columns show the estimated numbers of docs per
year and the estimated percentage of docs per year from the Harvest

Harvesting Hathi Docs: The Stats
30000

25000

20000

15000

Total Docs
Hathi Docs

10000

5000

2009
2006
2003
2000
1997
1994
1991
1988
1985
1982
1979
1976
1973
1970
1967
1964
1961
1958
1955
1952
1949
1946
1943
1940
1937
1934
1931
1928
1925
1922
1919
1916
1913
1910
1907
1904
1901
1898
1895

0

Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP

Hathi Docs Usage in Proportion to Docs Distribution
600,000
550,000

500,000
450,000
400,000
350,000
300,000
250,000

Harvested Records

200,000

Harvested Docs

150,000
100,000
50,000
-

Tracking of daily harvesting since harvesting began, April 16, 2010
through January 1, 2011

Hathi Harvest in Perspective
Although serial holdings do not
sort properly, users can figure
out what they need.

Inclusion of Serials
Access to Older Serials
And Very Old Serials
Multivolume Works
U. Of Michigan and U. of
California holdings both
show in this record

Duplicate Holdings
Now, the Bad News:
Records are Stripped Down

“Lumber, Lumber, Lumber”
Notice the multiple
duplications of
subject headings

Harvested Record from our
Catalog
Same record, but
subject heading
subfields are present

Original Record in Hathi Trust
008 fixed field data

650 subfields other than “a”

500 notes
5xx shipping list info
300 subfields after “a”

086 SuDocs number

Stripped-Out Fields
Represents clickthroughs from the catalog record to individual government
documents over 7+ years.

Use Stats for Regular Online Docs
Statistics from Google Analytics
•Statistics for all Hathi Trust records accessed, not just documents
•Spikes in usage are docs librarian (my) testing, not real users

Use Stats for Hathi Trust?









Encore provides an easy way to add external content to
a library catalog experience
HathiTrust records are freely available and are easy to
harvest
The Encore-harvested records are stripped-down and
inadequate, providing too few access points and
inadequate descriptions
The content is superb, contain monographic and serial
documents holdings over a span of about 150 years
Overall the project is worth having in our Encore
catalog, especially since our legacy documents are all in
storage and will remain there
We are considering adding other external collections
using Encore, such as Center for Research Libraries
digital holdings.

Conclusions

Weitere ähnliche Inhalte

Andere mochten auch

Discovery strategies for Kuali OLE - VuFind at the University of London
Discovery strategies for Kuali OLE - VuFind at the University of LondonDiscovery strategies for Kuali OLE - VuFind at the University of London
Discovery strategies for Kuali OLE - VuFind at the University of LondonAndrew Preater
 
Iug2009 Discovery Delivery
Iug2009 Discovery DeliveryIug2009 Discovery Delivery
Iug2009 Discovery DeliveryAlicia Abramson
 
INSTG004 lecture for UCL DIS students - Discovery at the University of London
INSTG004 lecture for UCL DIS students - Discovery at the University of LondonINSTG004 lecture for UCL DIS students - Discovery at the University of London
INSTG004 lecture for UCL DIS students - Discovery at the University of LondonAndrew Preater
 
UTEP Library Online Catalog Focus Group
UTEP Library Online Catalog Focus Group UTEP Library Online Catalog Focus Group
UTEP Library Online Catalog Focus Group Jason Moore
 

Andere mochten auch (7)

Goggin: Encore Careers
Goggin: Encore CareersGoggin: Encore Careers
Goggin: Encore Careers
 
Discovery strategies for Kuali OLE - VuFind at the University of London
Discovery strategies for Kuali OLE - VuFind at the University of LondonDiscovery strategies for Kuali OLE - VuFind at the University of London
Discovery strategies for Kuali OLE - VuFind at the University of London
 
Iug2009 Discovery Delivery
Iug2009 Discovery DeliveryIug2009 Discovery Delivery
Iug2009 Discovery Delivery
 
Webpac 2.0
Webpac 2.0Webpac 2.0
Webpac 2.0
 
Iug2009 Lsl Encore
Iug2009 Lsl EncoreIug2009 Lsl Encore
Iug2009 Lsl Encore
 
INSTG004 lecture for UCL DIS students - Discovery at the University of London
INSTG004 lecture for UCL DIS students - Discovery at the University of LondonINSTG004 lecture for UCL DIS students - Discovery at the University of London
INSTG004 lecture for UCL DIS students - Discovery at the University of London
 
UTEP Library Online Catalog Focus Group
UTEP Library Online Catalog Focus Group UTEP Library Online Catalog Focus Group
UTEP Library Online Catalog Focus Group
 

Ähnlich wie Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Case of HathiTrust Docs

Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...lljohnston
 
What Is Corporeal Archive
What Is Corporeal ArchiveWhat Is Corporeal Archive
What Is Corporeal ArchiveKimberly Haynes
 
Irondequoit NY Newspapers for Genealogy Sept 2013
Irondequoit NY Newspapers for Genealogy Sept 2013Irondequoit NY Newspapers for Genealogy Sept 2013
Irondequoit NY Newspapers for Genealogy Sept 2013Larry Naukam
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collectionslljohnston
 
Newsapers for Genealogy - Genesee Area (NY) Genealogists
Newsapers for Genealogy - Genesee Area (NY) GenealogistsNewsapers for Genealogy - Genesee Area (NY) Genealogists
Newsapers for Genealogy - Genesee Area (NY) GenealogistsLarry Naukam
 
U.S. Government Information: Changes in 2009
U.S. Government Information: Changes in 2009U.S. Government Information: Changes in 2009
U.S. Government Information: Changes in 2009infoscience
 
Datasets slidesrachel kotarski
Datasets slidesrachel kotarskiDatasets slidesrachel kotarski
Datasets slidesrachel kotarskiRobin Saklatvala
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011lljohnston
 
Downsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your AdministrationDownsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your AdministrationChristopher Brown
 
Newspapers for Genealogy for the JCC
Newspapers for Genealogy for the JCCNewspapers for Genealogy for the JCC
Newspapers for Genealogy for the JCCLarry Naukam
 
Aust lii ihcl_unsw_2017-2
Aust lii ihcl_unsw_2017-2Aust lii ihcl_unsw_2017-2
Aust lii ihcl_unsw_2017-2SusanMRob
 
Indigenous Digital Archive - IIIF at MoMA May 2016
Indigenous Digital Archive - IIIF at MoMA May 2016Indigenous Digital Archive - IIIF at MoMA May 2016
Indigenous Digital Archive - IIIF at MoMA May 2016Anna Naruta-Moya
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewJudith Ahronheim
 
Digitised Content in an API world
Digitised Content in an API worldDigitised Content in an API world
Digitised Content in an API worldAlastair Dunning
 
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...History Associates
 
Free Culture! Public Domain Photography on the Web
Free Culture! Public Domain Photography on the WebFree Culture! Public Domain Photography on the Web
Free Culture! Public Domain Photography on the WebIan McDermott
 
15 Arcola MHS Legacy Grant
15 Arcola MHS Legacy Grant15 Arcola MHS Legacy Grant
15 Arcola MHS Legacy GrantFitzie Heimdahl
 
The Internet Archive
The Internet ArchiveThe Internet Archive
The Internet Archiveguest620f31
 
Using historical open data for family history - and the value of GB1900 data
Using historical open data for family history - and the value of GB1900 dataUsing historical open data for family history - and the value of GB1900 data
Using historical open data for family history - and the value of GB1900 dataTom Pert
 

Ähnlich wie Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Case of HathiTrust Docs (20)

Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
Leslie Johnston: Big Data at Libraries, Georgetown University Law School Symp...
 
What Is Corporeal Archive
What Is Corporeal ArchiveWhat Is Corporeal Archive
What Is Corporeal Archive
 
Irondequoit NY Newspapers for Genealogy Sept 2013
Irondequoit NY Newspapers for Genealogy Sept 2013Irondequoit NY Newspapers for Genealogy Sept 2013
Irondequoit NY Newspapers for Genealogy Sept 2013
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Newsapers for Genealogy - Genesee Area (NY) Genealogists
Newsapers for Genealogy - Genesee Area (NY) GenealogistsNewsapers for Genealogy - Genesee Area (NY) Genealogists
Newsapers for Genealogy - Genesee Area (NY) Genealogists
 
U.S. Government Information: Changes in 2009
U.S. Government Information: Changes in 2009U.S. Government Information: Changes in 2009
U.S. Government Information: Changes in 2009
 
Datasets slidesrachel kotarski
Datasets slidesrachel kotarskiDatasets slidesrachel kotarski
Datasets slidesrachel kotarski
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Downsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your AdministrationDownsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your Administration
 
Newspapers for Genealogy for the JCC
Newspapers for Genealogy for the JCCNewspapers for Genealogy for the JCC
Newspapers for Genealogy for the JCC
 
Aust lii ihcl_unsw_2017-2
Aust lii ihcl_unsw_2017-2Aust lii ihcl_unsw_2017-2
Aust lii ihcl_unsw_2017-2
 
Indigenous Digital Archive - IIIF at MoMA May 2016
Indigenous Digital Archive - IIIF at MoMA May 2016Indigenous Digital Archive - IIIF at MoMA May 2016
Indigenous Digital Archive - IIIF at MoMA May 2016
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright review
 
Digitised Content in an API world
Digitised Content in an API worldDigitised Content in an API world
Digitised Content in an API world
 
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...
Digging Deeper: Uncovering the Hidden Potential of Historical State and Local...
 
Free Culture! Public Domain Photography on the Web
Free Culture! Public Domain Photography on the WebFree Culture! Public Domain Photography on the Web
Free Culture! Public Domain Photography on the Web
 
Finding Gems: Unearthing Interesting and Valuable Reports from Federal Research
Finding Gems: Unearthing Interesting and Valuable Reports from Federal ResearchFinding Gems: Unearthing Interesting and Valuable Reports from Federal Research
Finding Gems: Unearthing Interesting and Valuable Reports from Federal Research
 
15 Arcola MHS Legacy Grant
15 Arcola MHS Legacy Grant15 Arcola MHS Legacy Grant
15 Arcola MHS Legacy Grant
 
The Internet Archive
The Internet ArchiveThe Internet Archive
The Internet Archive
 
Using historical open data for family history - and the value of GB1900 data
Using historical open data for family history - and the value of GB1900 dataUsing historical open data for family history - and the value of GB1900 data
Using historical open data for family history - and the value of GB1900 data
 

Mehr von Christopher Brown

Migrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo ExperienceMigrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo ExperienceChristopher Brown
 
Downsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and IdeasDownsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and IdeasChristopher Brown
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationChristopher Brown
 
The Darkening of Government Information
The Darkening of Government InformationThe Darkening of Government Information
The Darkening of Government InformationChristopher Brown
 
Collecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government ResourcesCollecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government ResourcesChristopher Brown
 
Item Deselection on the Fast Track
Item Deselection on the Fast TrackItem Deselection on the Fast Track
Item Deselection on the Fast TrackChristopher Brown
 
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...Christopher Brown
 
Harvesting HathiTrust Documents: A New Model for Online Access
Harvesting HathiTrust Documents: A New Model for Online  AccessHarvesting HathiTrust Documents: A New Model for Online  Access
Harvesting HathiTrust Documents: A New Model for Online AccessChristopher Brown
 
The Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic SettingThe Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic SettingChristopher Brown
 
Planning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information ConferencePlanning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information ConferenceChristopher Brown
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheChristopher Brown
 
Summon and the Art of Discovery
Summon and the Art of DiscoverySummon and the Art of Discovery
Summon and the Art of DiscoveryChristopher Brown
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...Christopher Brown
 

Mehr von Christopher Brown (14)

Migrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo ExperienceMigrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo Experience
 
Downsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and IdeasDownsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and Ideas
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government Information
 
The Darkening of Government Information
The Darkening of Government InformationThe Darkening of Government Information
The Darkening of Government Information
 
Collecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government ResourcesCollecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government Resources
 
Item Deselection on the Fast Track
Item Deselection on the Fast TrackItem Deselection on the Fast Track
Item Deselection on the Fast Track
 
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
 
Harvesting HathiTrust Documents: A New Model for Online Access
Harvesting HathiTrust Documents: A New Model for Online  AccessHarvesting HathiTrust Documents: A New Model for Online  Access
Harvesting HathiTrust Documents: A New Model for Online Access
 
The Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic SettingThe Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic Setting
 
The Front Face of the ERM
The Front Face of the ERMThe Front Face of the ERM
The Front Face of the ERM
 
Planning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information ConferencePlanning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information Conference
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
Summon and the Art of Discovery
Summon and the Art of DiscoverySummon and the Art of Discovery
Summon and the Art of Discovery
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
 

Kürzlich hochgeladen

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Kürzlich hochgeladen (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 

Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Case of HathiTrust Docs

  • 1. Outbound Harvesting with Encore as a Library Space-Saving Strategy: The Case of HathiTrust Docs Christopher C. Brown University of Denver, Penrose Library (303) 871-3404 cbrown@du.edu Friday, April 15, 2011
  • 2. DR, IR, Digital Texts Inbound Harvesting Outbound Harvesting This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection. The government documents collection will be the primary focus.
  • 3. Depository since 1909  Historically a 70-75% selective  Now a 4.8% selective, but receive 100% of online cataloging  Adding URLs to historic documents  About University of Denver
  • 4. Currently 80% of our paper documents are in storage  We will be remodelling our library – totally displaced for at least 18 months; 100% of documents will be in storage  Government documents will remain in storage after renovation  The Problem
  • 5. Our users are accustomed to using electronic documents  Need to divert attention away from physical collection holdings  Encore harvesting of Hathi Trust can do this  OCLC report: 15% of HathiTrust public domain materials are government docs*  Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/201101.pdf. Partial Solution: Using Encore for Outbound Harvesting
  • 6. http://www.openarchives.org/  Promotes interoperability standards for dissemination of content  Hathi Trust allows harvesting of its records  Innovative Interface’s Encore catalog allows for records to be harvested (with the purchase of a harvester connection)  OAI-PMH Harvesting
  • 7. Local Site Classic OPAC Remote Site with Digital Content Harvester with Digital Content Encore (III) (next-gen catalog outside the ILS box) Traditional III Millennium ILS Remote Site with Digital Content • Harvested records appear only in Encore, not in “classic” catalog • Harvested records update on a periodic schedule – in our case daily Encore Model
  • 8. Hathi Trust Attributes From: http://www.hathitrust.org/rights_database PD = where docs generally live
  • 9. • • • • Mass identification of copyright status based on bibliographicallyderived information: a) As texts are loaded, a set query in Mirlyn identifies those texts that are:US federal government documents, or published in the US prior to 1923, or published outside of the US before 1870 These are treated as public domain (ATTRIBUTE name=pd) based on bibliographically-derived information (REASON name=bib). We do not restrict access to these materials. b) Those texts that do not meet these criteria (e.g,. US post-1923 and not a government document) are treated as in-copyright (i.e., ATTRIBUTE name=ic and REASON name=bib). c) An additional attribute is used to represent works published outside the United States between 1870 and 1923 because copyright status for these works depends on the location of the user. Works published outside the US prior to 1923 are in the public domain; however, due to the variations in copyright law in countries outside the US, it is estimated that 1870 is the earliest date works published in these countries may still be under copyright. Therefore, users accessing the volume from US IP addresses will have access to the works published outside the US between 1870 through 1923; however, users with non-US IP addresses will not (ATTRIBUTE name=pdus and REASON name=bib). PD vs. PDUS
  • 11. I wanted to see how many government documents were in our HathiTrust harvest  Limit to HathiTrust for a given year  Examine first result on each page of 25 results (4% of results) [limitation: Encore only displays first 1,000 results]  Sampling Method
  • 12. Date Range 2000-2009 1990-1999 1980-1989 1970-1979 1960-1969 1950-1959 1940-1949 1930-1939 1920-1929 1910-1919 1900-1909 1890-1899 1880-1889 1870-1879 1860-1869 Hathi Totals 505,682 709,214 723,657 631,110 546,914 281,615 184,755 175,103 175,226 175,148 179,018 112,295 83,950 58,624 50,907 4,593,218 Hathi All Pub Domain pdus + pd 14,140 29,163 33,753 28,633 21,244 20,861 17,096 16,237 66,563 169,923 153,284 110,605 82,809 57,826 50,337 872,474 Hathi pdus DU pd Harvest 726 13,369 880 28,164 1,204 32,321 2,046 26,189 1,987 18,991 863 19,893 600 16,253 654 15,317 27,108 28,854 75,955 61,230 70,900 47,999 50,502 34,742 38,928 23,855 27,202 17,751 2,273 45,790 301,828 430,718 Docs Sampling 13,340 99.78% 26,662 94.67% 31,370 97.06% 25,607 97.78% 7,668 40.38% 3,888 19.54% 3,771 23.21% 2,600 16.97% 1,529 5.30% 4,124 6.73% 2,265 4.72% 596 1.72% 699 2.93% 319 1.80% 248 0.54% 124,686 28.95% Statistics as of mid-March, 2011 The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest Harvesting Hathi Docs: The Stats
  • 14. 600,000 550,000 500,000 450,000 400,000 350,000 300,000 250,000 Harvested Records 200,000 Harvested Docs 150,000 100,000 50,000 - Tracking of daily harvesting since harvesting began, April 16, 2010 through January 1, 2011 Hathi Harvest in Perspective
  • 15. Although serial holdings do not sort properly, users can figure out what they need. Inclusion of Serials
  • 16. Access to Older Serials
  • 17. And Very Old Serials
  • 19. U. Of Michigan and U. of California holdings both show in this record Duplicate Holdings
  • 20. Now, the Bad News: Records are Stripped Down “Lumber, Lumber, Lumber”
  • 21. Notice the multiple duplications of subject headings Harvested Record from our Catalog
  • 22. Same record, but subject heading subfields are present Original Record in Hathi Trust
  • 23. 008 fixed field data 650 subfields other than “a” 500 notes 5xx shipping list info 300 subfields after “a” 086 SuDocs number Stripped-Out Fields
  • 24. Represents clickthroughs from the catalog record to individual government documents over 7+ years. Use Stats for Regular Online Docs
  • 25. Statistics from Google Analytics •Statistics for all Hathi Trust records accessed, not just documents •Spikes in usage are docs librarian (my) testing, not real users Use Stats for Hathi Trust?
  • 26.       Encore provides an easy way to add external content to a library catalog experience HathiTrust records are freely available and are easy to harvest The Encore-harvested records are stripped-down and inadequate, providing too few access points and inadequate descriptions The content is superb, contain monographic and serial documents holdings over a span of about 150 years Overall the project is worth having in our Encore catalog, especially since our legacy documents are all in storage and will remain there We are considering adding other external collections using Encore, such as Center for Research Libraries digital holdings. Conclusions