The Big Shift: Managing Research Collections in the Cloud
1. The Big Shift:
Managing Research Collections
Annual Meeting in the Cloud
28 April 2011
Constance Malpas
Program Officer, OCLC Research
2. Roadmap
• Think Big – sourcing and scaling, mega regions
• Emerging infrastructure – managing collections ‘in the cloud’
• Shared print service provision - opportunities, challenges
• ASERL in perspective – regional and system-wide context
3. You are … where?
http://www.creativeclass.com/whos_your_city/maps/#Mega-Regions_of_North_America
4. A Master Plan for a mega region
“*Midwestern universities + work
together on both regional and national
agendas, merging library and research
resources, and sharing curricula and
instructional resources with faculty and
students. Aggregating these spires of
excellence by linking these institutions
gives the Midwest region many of the
“Sharing of library and research
facilities can augment scholarly
world’s leading programs in a broad
production and assure fuller range of key knowledge areas.” (p. 37)
use of cultural assets without
great extra cost to the state.”
(p. 37)
5. Boundary work and the library „service bundle‟
Shared print is a prime example:
a core operation that
is moving “outside”
institutional boundaries
University of California
Orbis Cascade
WEST
CIC
TRLIN
Hathi Print
CAVAL, UKRR, JURA etc.
7. Shared Print: what‟s the problem?
Shift in scholarly attention from print to electronic means
low-use retrospective print collections are perceived to
deliver less library value
Competing demands for library space: teaching, learning,
collaborative research vs. “warehouse of books”
Among academic libraries, a shrinking pool of institutions
with mandate, capacity to support print preservation
As transaction costs for managing legacy print collections
decrease, libraries will seek to externalize print operations
to shared repositories
8. Shared Print: OCLC Research
Active portfolio of work since 2007:
• North American library storage capacity (2007)
• ~70M volumes in storage; cooperative models in the minority
• Policy requirements shared print repositories (2009)
• critical need: disclosure of print preservation commitments
• Leveraging infrastructure: MARC21 583 Action Note (2009/2011)
• copy-level retention, condition statements are required
• Cloud-sourcing research collections (2010)
• mass digitization of monographs accelerates shift to shared print
9. Shared Print value proposition(s)
1) Ensures long-term survivability of „last copies‟ and low-
use print journals and books
Extension of traditional repository function; limited
motivation to subsidize
2) Enables reduction in redundant inventory for moderately
and widely-held titles, facilitating redirection of library
resources toward more distinctive service portfolio
Strategic reserve provides a hedge against disruption in
the marketplace, rapid fluctuations in scholarly value &
function of print; provides tangible value to participant
10. Growth of US library storage infrastructure
140,000,000
Aggregate off-site capacity has increased exponentially
120,000,000
Built Capacity in Volume Equivalents (2007)
+ 70 million volumes in storage (2007)
68 high-
100,000,000
density
facilities
80,000,000
60,000,000
40,000,000
20,000,000
2 high-density facilities
0
1982 1986 1987 1992 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
Date of Original Construction Derived from L. Payne (OCLC, 2007)
11. Aggregate preservation resource: a black box?
Of 68 storage facilities identified
in Payne (OCLC, 2007):
Titles in „shared print‟ collections
less widely held?
• 2 are visible in WorldCat
100%
today: UC NRLF & UC SRLF 90%
80% More widely held
70%
• Proxies: CRL, LC? 60%
50%
40%
30% Less widely held
20%
10%
0%
Among 9 ASERL storage SRLF NRLF CRL AZ State UC Irvine Rutgers
(ZAS) (ZAP) (AZS) (CUI) (NJR)
collections profiled in 2004:
<25 libraries 25-99 libraries 100-499 libraries >499 libraries
• 80% of monographic titles held
in a single storage facility
12. Projected growth of HathiTrust Digital Library
June 2010 - June 2020
Growth in
Linear (Series1) volumes Growth in
Linear (Series2) titles
40,000,000
35,000,000
30,000,000
*
25,000,000 Library of Congress
Harvard University Library in constant 2008 volumes
20,000,000
in constant 2008 volumes
15,000,000 *
10,000,000
5,000,000
0
OCLC Research. June 2010
13. Premise of Cloud Library project (2009-2010)
Emergence of large scale shared print and digital
repositories creates an opportunity for strategic
externalization of traditional repository function
• Reduce total costs of preserving scholarly record
• Enable reallocation of institutional resources
• Support renovation of library service portfolio
• Create new business relationships among libraries
A bridge strategy to guarantee access and
preservation of long tail, low use collections
during ongoing p- to e- transition
14. Shared infrastructure: books & bits
Academic off-site storage
0101010101010
1010101010101
25 years 15 months
0101010101010
+70M vols. 1010101010101 +5M vols.
0101010101010
HathiTrust
Will this intersection create new operational efficiencies?
For which libraries?
Under what conditions?
How soon and with what impact?
15. A global change in the library environment
60%
Academic print book collection already substantially
50% duplicated in mass digitized book corpus (HathiTrust)
% of Titles in Local Collection
June 2010
40% Median duplication: 31%
30%
20%
10% June 2009
Median duplication: 19%
0%
0 20 40 60 80 100 120
OCLC Research. June 2010
Rank in 2008 ARL Investment Index
16. A mirror of the academic print collection
Distribution of Titles in HathiTrust Digital Library by Subject and Copyright Status
(June 2010)
Communicable Diseases & Misc.
Health Facilities, Nursing
Physical Education & Recreation
Medicine By Body System
Preclinical Sciences
Chemistry
Computer Science
Psychology
Medicine By Discipline
Performing Arts
Anthropology
Mathematics
Health Professions & Public Health
Agriculture
Biological Sciences
Medicine
Geography & Earth Sciences
Physical Sciences Public Domain
Law
Education In Copyright
Music
Sociology
Library Science, Reference
Political Science A critical mass of retrospective literature
Government Documents
Engineering & Technology
Art & Architecture in the humanities, social sciences
Philosophy & Religion
Business & Economics
Unknown Classification
History & Auxiliary Sciences
Language, Linguistics & Literature
0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000
N = 3.64M titles Titles / Editions
C. Malpas Cloud-sourcing Research Collections (OCLC, 2010)
17. An opportunity and a challenge
An opportunity to
>50% of titles are ‘widely held’ rationalize holdings, but…
library print supply chain
will be needed for some time
>80% of titles are
OCLC Research. June 2010 in copyright
18. Mass-digitized books in print repositories
~3.5M titles
3,500,000
~75% of mass digitized corpus is ‘backed up’
3,000,000
in one or more shared print repositories
~2.5M
2,500,000
Unique Titles
2,000,000
1,500,000
1,000,000
500,000
0
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
19. Prediction
Within the next 5-10 years, focus of shared print archiving and
service provision will shift to monographic collections
• large scale service hubs will provide low-cost print
management on a subscription basis;
• reducing local expenditure on print operations, releasing
space for new uses and facilitating a redirection of library
resources;
• enabling rationalization of aggregate print collection and
renovation of library service portfolio
Mass digitization of retrospective print
collections will drive this transition
21. Shared Print provision: capacity varies
% of HathiTrust titles duplicated in print repository
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of February 2011.
22. Shared print marketplace: who has the edge?
C. Malpas Cloud-sourcing Research Collections (OCLC, 2010)
23. Or, reconfigure resource to maximize value
C. Malpas Cloud-sourcing Research Collections (OCLC, 2010)
24. Management Perspective: How Much is Enough?
Shared Print service must deliver
• Space recovery equal to “one floor” at outset
• Volume reduction equal to X years of print acquisitions
• Cost not to exceed current storage options
• Minimize (visible) disruption in operations
If management of mass-digitized monographs could be
externalized to large scale providers today:
average space recovery of 20,000 ASF per ARL library
cost avoidance of ~$1M for new storage module
cost avoidance of $1M per year for on-site management
25. Staff Perspective: What‟s Good Enough
Shared Print service provision must equal or exceed
• Turnaround/delivery from local storage (<2 days)
• Local loan period
• Local access/availability guarantee, ability to recall etc
• Discoverability of local resource
Local retention mandated when title held by <10 libraries
No one mentioned . . .
Home delivery option direct to patron
Acceptable loss rate repository viability
Penalties for late return impact on other clients
26. Implications: Shared Print
A small number of repositories may suffice for „global‟ shared
print provision of low-use monographs
Generic service offer is needed to achieve economies of
scale, build network; uniform T&C
Fuller disclosure of storage collections is needed to judge
capacity of current infrastructure, identify potential hubs
Service hubs will need to shape inventory to market needs;
more widely duplicated, moderately used titles
If extant providers aren‟t motivated to change service model, a
new organization may be needed
28. ASERL in system-wide context
~880 academic libraries in ASERL region (2008)
• represents 23% of all academic libraries in the US
• 134 (15%) support institutions offering doctoral programs
38 ASERL libraries provide backbone for academic institutions
throughout the region
• Rich collections, robust infrastructure, reliable fulfillment
• ASERL holdings account for ~47% of regional academic collection
• Upholding print preservation mandate an increasing challenge
29. Diversity of institutional mandates
Least reliant on
traditional library
infrastructure
OCLC Research. Derived from U.S. Department of Education, National Center for Education Statistics, Academic Libraries Survey, 2008.
30. Circulation per FTE student is on a decline
Declining ROA?
OCLC Research. Derived from NCES Academic Libraries Surveys, 1992-2000.
31. Same trend holds within ASERL
Median Circulation Transactions per FTE Student
in ASERL Member Libraries
25
20
15
10
-41%
5
0
2002 2003 2004 2005 2006 2007 2008 2009
OCLC Research. Derived from ASERL Annual Statistics, 2002/2003 – 2009/2010.
32. A long term, system-wide trend
US Academic Library Expenditures
vs. Total Spending on Post-Secondary Education
$400,000,000 3.00%
$350,000,000
2.50%
$300,000,000
2.00%
$250,000,000
$200,000,000 1.50%
$150,000,000
$6.8 billion in 2008 1.00%
$100,000,000
0.50%
$50,000,000
$0 0.00%
Aggregate US Spending on Post-Secondary Education US Library Operating Exp. as % of Ed. Spending
OCLC Research. Derived from data reported in NCES Digest of Education Statistics: 2008.
34. Institutional autonomy varies
Modes of cooperation will vary
… as will motivation to share
OCLC Research. Derived from U.S. Department of Education, National Center for Education Statistics, Academic Libraries Survey, 2008.
35. Increasing privatization of Higher Education
OCLC Research. Derived from U.S. Department of Education, National Center for Education Statistics, Academic Libraries Surveys, 2000-2008.
36. Visible differences, hidden similarities
ASERL Member Holdings ASERL Member Holdings
in WorldCat Duplicated in HathiTrust
4,000,000 45%
3,500,000
>56M holdings in aggregate 40%
35%
3,000,000
30%
Title Overlap (%)
2,500,000
25%
Titles
2,000,000
20%
~34% of collective ASERL coll’n duplicated
1,500,000
15%
~2M unique (discrete) titles
1,000,000
10%
500,000 5%
0 0%
KLG
ALM
GSU
KUK
AAU
TMA
VGM
FTU
VA@
VPI
NKM
TJC
NDD
TKN
SEA
NGU
FHM
GUA
NDD
NGU
MFM
FXG
FDA
KUK
EMU
AAU
TMA
VPI
FUG
ERE
ALM
LRU
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
37. Median ASERL duplication in HathiTrust: 33%
45%
40%
Tennessee: 41%
35%
30%
Titles Duplicated
25%
Florida: 27%
20%
15%
10%
[Standard deviation: 3%]
5%
0%
0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000
Holdings in WorldCat
OCLC Research. Analysisfrom U.S. DepartmentandEducation, National Center for Educationas of April 2011. Libraries Survey, 2008.
Derived based on HathiTrust of WorldCat snapshot data. Data current Statistics, Academic
38. This edition held in
print by more than
2,200 libraries . . .
including all 38 ASERL
members
A total of 3 ILL
requests since 2007
0 from (or to) ASERL
members
39. An example: the University of Miami
~1.2 million University of Miami (FQG) library holdings in WorldCat
30,472 titles
Full View
363,405 titles Search Only
393,877 (33%) duplicated in HathiTrust Digital Library
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
40. Weighing risks and benefits
System-wide Print Distribution of University of Miami Titles
Duplicated in HathiTrust Digital Library
70,000
96% of mass-digitized titles in
77% of mass-digitized titles in
60,000
Miami’s collection are held by
Miami’s collection are held by
50,000 >24 libraries
>99 libraries … low risk but
Titles / Editions
40,000 print supply chain still needed
30,000
Search Only
20,000
Full View
10,000
0
N = 393,877 titles Holding Libraries
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
41. Sizing up a potential shared print supplier
~1.2 million Miami (FQG) holdings
FUG could supply FUG can't supply
Represents
at least
2.75 miles
of library
shelving @
232,827 titles Miami
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
42. Risk and opportunity profiles differ
>10
libraries 10 to 24
N=370K titles >10 2% libraries
10 to 24 13%
libraries libraries
0% 1%
>99 >99
libraries libraries
25 to 99
90% 51% libraries
34%
25 to 99
libraries
9%
N=1.16M titles
Locally held titles in mass- HathiTrust undergirds
digitized corpus abundant stewardship
in system-wide collection mission, redistributes costs of
curation
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
43. Stewardship & sustainability: a pragmatic view
Using recent life-cycle adjusted cost model* for library print collections,
$4.25 per volume per year -- on campus
$ .86 per volume per year -– in high-density storage
East Carolina University is spending, at minimum, between
[373K titles * $.86 =] $320K to $1.6M [=373K titles * $4.25 ] annually
to retain local copies of content preserved in the HathiTrust Digital Library
and widely-held in the ASERL community
The library is not financially accountable for these costs
but it is responsible for managing them
*Paul Courant and M. “Buzzy” Nielson, “On the Cost of Keeping a Book” in The Idea of Order (CLIR, 2010)
44. Where to turn?
• Existing cooperative network: UNC system
• UNC, NCSU & Duke are HathiTrust partners, participate in
TRLN shared copy program – potential shared print suppliers?
~1.2 million ECU
Represents at
(ERE) holdings
least 4 miles
of library
shelving @
East Carolina
373,370 (32%) in HathiTrust Digital Library
45. ASERL libraries:
a common trajectory, different timelines
Private non-ARL Public non-ARL Public ARL Private ARL
70%
Sep 2012 Dec 2012
60%
The next few years are critical
% of titles duplicated in HathiTrust Digital Library
50%
40%
Jun 2013
30%
20%
Sep 2013
How can regional infrastructure be
10%
leveraged to support this change?
0%
OCLC Research. Analysis based on HathiTrust and WorldCat snapshot data. Data current as of April 2011.
46. A closing thought
If we don‟t
demonstrate a little
backbone
developing shared
print solutions
the future of legacy
print could look like
this
Guillotined books en route to recycling station.
47. Thanks for your attention.
Comments, Questions?
Constance Malpas
malpasc@oclc.org
@ConstanceM
48. For discussion
• What criteria matter most in assessing potential shared print
partners?
• Geographic proximity, institutional governance, scope of
collection, delivery guarantee, etc?
• Is the economic integration of Southeastern mega-region(s) a
factor to consider in shared print business planning?
• Are partnerships in zones of strong economic integration be
likely to be more sustainable?
• How is the increasing privatization of higher education likely to
affect regional shared print planning?
• Do private and charter universities have greater flexibility in
externalizing print operations?
Hinweis der Redaktion
An example of thinking big.Richard Florida -- Mega regions of economic integrationASERL ‘the largest regional consortium of research libraries’ actually encompasses two mega regions.Mega regions are on my mind for a couple of reasons. First, in the context of ‘cloud sourcing research collections’ , there is the question of where and how regional service hubs are likely to emerge. For academic and research libraries, in particular, there is some question of whether the existing consortium and group purchasing cooperatives are situated at the right ‘scale’ to provide shared service solutions. We can think of mega regions as a factor that may create or constrain cooperative partnerships.For example, we have the interesting example of the Western Regional Storage Trust (WEST), which began as a cooperative print preservation effort among research libraries in CA, OR and AZ but now includes at least one institution in Illinois. That runs right up against another mega-region, which is served by different consortial organizations, notably the CIC – which encompasses institutions as far East as Penn State University.The CIC has a position of real prominence in the higher education community. It’s customary to joke that it represents the shared interests of the university football teams, but in fact I think the ‘mega regions’ framework suggests that it holds together for other reasons. Reasons that go far beyond academic peer groups, that are embedded in deeper social and economic relationships.
Mega regions instantiated in strategic plans for higher education, most recently in Jim Duderstadt’s report and recommendations on HE in the Midwest. [former president of the University of Michigan] Interestingly, he cites the CIC and HathiTrust as examples of core infrastructure that has enabled universities in the region to establish a world class reputation. This is a rare acknowledgment of the value of library cooperation in supporting the ‘business’ of education.There are echoes in Duderstadt’s master plan of an earlier master plan for HE in California. I was prompted to go back to this document recently and was interested to find that library cooperation was highlighted there too.So my exhortations to think big are really nothing new. The point here is that library cooperation is about more than improving local institutional efficiencies, it’s about supporting a whole ecology of knowledge production and economic growth.DifferentiationUC was to be the primary repository of scarce and unique resourceshttp://www.ucop.edu/acadinit/mastplan/MasterPlan1960.pdf
So, mega regions are one way to think about the way boundaries are established at the super institutional level. Mycolleagues Brian Lavoie and Lorcan Dempsey have proposed a framework for understanding the circumstances in which operations that used to be organized at the institutional scale (cataloging, for example) move outside the organizational boundaries of the academic. They characterize the library as a bundle of services that was for many years internalized by the university. Building a substantial local collection was one of these functions. In this framework, shared print is simply the latest expression of shift in operational boundaries.Again, sourcing and scaling: it’s a question of where this work is optimally organized: across peer groups, regions, mega regions and so on.
Shared print does represent a Big Shift in thinking about library organizational models. It’s a shift that’s been getting quite a lot of attention of late, in the Chronicle, major library blogs and most recently Inside Higher Ed..
Shared print is not just a trend, it’s a response to a number of specific and increasingly urgent challenges facing academic libraries.
I want to distinguish between two different (but related) perspectives on shared print. Traditionally, shared print has been motivated by a desire to ensure the long-term preservation of scarce and unique resources. This is a noble goal, but it is not one that has succeeded in bringing many institutions to the table. There’s a second way in which cooperative print management delivers value, and that’s by enabling a reduction in redundant inventory and relieving library space pressures in an environment where scholarly communication is increasingly reliant on digital resources.My remarks today will focus on the second of these two value propositions.
This is the infrastructure we used to think about in the context of shared print provision. ASERL was a leader in promoting a vision of networked storage repositories, largely through the efforts of Paul Gherman here at Vanderbilt University.
ASERL storage study provided an early indication that cooperation between facilities is essential if they are serve as a surrogate preservation resource.
Need to address a new kind of infrastructure – shared digital repositories.HathiTrust is a partnership of research libraries that have committed to joint curation of digitized library content. It includes many of the original Google Library partners (Michigan, UC, the CIC) and others including several (5) ASERL institutions. Triangle Research libraries plus Emory and UVa.
So this brings us to our recent cloud-sourcing project, which was a joint piece of work between OCLC Research, the HathiTrust and a number of academic research libraries – NYU, Columbia, Princeton and NYPL.
How big is this shift likely to be and on what timeline? Over the last year we have studied the mass digitized book corpus in the context of systemwide print holdings and have found that a substantial part of the average academic library is already substantially duplicated. This scatter chart provide a simple but effective visualization of an important pattern that this project has revealed: that is, that the risks and opportunities associated with moving collection management ‘into the cloud’ are uniformly distributed across the research library community as a whole. This is a picture of the ARL membership (a microcosm of the larger research library community) that shows the level of duplication between individual library collections and the mass digitized book collection in Hathi. Over the course of this project, we have seen the rate of duplication between locally held print and mass digitized books increase steadily and significantly. In June of last year, an average of 20% of monographic titles in an academic library were duplicated in the Hathi repository; today that figure is about 30% (up to 40% for some institutions). [CLICK] In real terms, this means that rate of digital replication is exceeding the pace of growth in monographic acquisitions in most academic institutions. We estimate that the rate of duplication has increased by about 8% per library in the past year. Monographic acquisitions typically grow at about 2% per year in research libraries.A very low standard deviation (variance of ~4%), and across the population very little movement outside this range: 2/3rds of ARL community falls within standard deviation. We project that in a year’s time, many academic libraries are liable to find themselves “underwater,” holding a massive inventory of over-valued assets.Library directors will be called to account and expected to respond to questions about how an increasingly redundant local print collection is serving the educational and research mission of theparent institution. We need to be preparing for a world in which just-in-time, print on demand delivery is an option for a large share of the retrospective book collection.
This distribution has remained fairly stable, though in recent months we have seen slight decline in representation of history & auxiliary sciences.
Another major finding of our study is that the mass digitized book corpus is substantially ‘backed up’ in one or more large-scale storage collections. As I mentioned earlier, we have a very incomplete picture of what’s currently in storage, so this figure may actually be quite a bit higher. The figures here are based on just 5 major repositories The important point is that we seem to have the beginnings of what I characterized earlier as a ‘strategic reserve’ of print that could significantly offset the costs of local operations. As you can see here, the proportion has remained relatively stable over the course the past year. As of this month, about 2.5 million of the 3.5 million digitized books in Hathi are also held in one or more of 5 large scale shared print repositories.
27K feet = 5 milesRemarkably, the largest shared print collections in the country appear inadequate to meet this goal.
10 copy threshold is probably excessive – reflects lack of confidence in potential shared print solutions.Also learned something about what isn’t ‘top of mind’ in thinking about shared print service agreements. To a certain degree, surprised that participants weren’t more demanding. I think this is a reflection of the fact that shared print is still in its infancy and is expected to run alongside of local print operations for some time to come.
Some general remarks about the ASERL community and the context in which it operates, both regionally and nationally.
Before we talk about ASERL as a community, worth saying something about the context in which ASERL operates. There are about 880 libraries serving post-secondary academic institutions in the 10 states in ASERL’s catchment area, representing nearly a quarter of the academic libraries in the United States. About 135 support institutions offering doctoral programs – the core requirement of ASERL membership.So right away we can see that ASERL itself, with its 38 members represents a tiny fraction (about 4%) of the academic library community in Southeastern mega-region(s).Because academic research libraries tend to have very large collections, ASERL members hold a ‘disproportionate’ part of the aggregate library resource for this region. My back of the envelope calculation is something like 47%. But even for this cohort, which has traditionally embraced a stewardship role, the feasibility of continuing to acquire and retain comprehensive or even nearly comprehensive print collections is increasingly called into question.That is, of course, the reason that ‘shared print’ is the focus of today’s meeting.38 ASERL libraries … 4% of total academic libraries in ASERL region119,013,523acad lib holdings in WC for the 10 ASERL states 56,043,608 ASERL library holdings Upholding print preservation mandate on a local basis increasingly difficult
This is an important feature of the organizational context in which ASERL operates.Majority of HE institutions in the region are not dependent on comprehensive print collections.Expectations are concentrated on relatively small – and arguably shrinking -- population .
One reason localprint preservation is an increasing challenge to justify, is that collections are perceived to be delivering less value, as measured by use. This chart shows median annual per-student circulation rates for different segments of the academic library community. The actual numbers here are less important than the overall trend – which is on the decline in all sectors. The red line at the bottom represents libraries supporting doctoral programs, where circ is generally low (due in part to large collection size).
10 years ago, median circ per student at ASER libraries was 20 transactions per year. Today, it’s about 12. This trend holds for the ASERL community in the aggregate, ‘tho there are variations from one institution to the next.It’s not clear if this overall trend can be changed – or even if it’s desirable to do so. After all, for the journal literature, the shift to digital provisioning has been widely embraced by the scholarly community and has resulted in operational efficiencies for academic libraries.
An optical illusion? This downward path looks very like the declining circulation rates in ARL libraries. This chart shows that while total institutional investment in higher education has increased dramatically in the past 30 years, proportional spending on academic libraries has been on a steady decline. If this trend continues, we can project that the university allocation to libraries will fall below 1% by about 2013. This has something to do with the increasing costs of educational infrastructure – spending on laboratories and technology has grown much more rapidly than spending on library infrastructure. So while library expenditures have increased each year, they represent a diminishing part of the university’s total spending in support of research, teaching and learning. This is a trend that is driving a certain amount of change in the academic library environment, encouraging a shift to collaborative sourcing of collections and services, increased attention to the return on library investment, and a stern focus on identifying and eliminating operational inefficiencies.I want to emphasize that the trend toward diminished support for academic libraries is not a new phenomenon and it is not merely a knock-on effect of regional or institutional economic pressures. It is a reflection of much broader changes in the higher education environment, including funding mandates that create incentives for increased institutional attention to science and engineering, a decline in the number of students pursuing advanced degrees in the humanities, and new models of educational provisioning -- including distance learning – that are no longer reliant on locally-sourced collections or infrastructure.
That said, current circumstances do not tend toward additional largesse for academic libraries. “At least 43 states have implemented cuts to public colleges and universities and/or made large increases in college tuition to make up for insufficient state funding.”http://www.cbpp.org/cms/index.cfm?fa=view&id=1214Ex: UNC is facing budget cuts of up to 20%.
Here’s another feature of the local context that is worth bearing in mind as we think about regional approaches to shared print. While the proportion of public and private HE institutions in the ASERL region looks much like the US as a whole, there is considerable variability from state to state.In Mississippi, Alabama, NC etc public institutions predominate. But in Virginia, Florida and here in Tennessee, a much greater part of the HE community is privately governed.As a result, I think we can anticipate that modes of cooperation will vary. This is not to say that private institutions are any less averse to sharing than public institutions, only that the incentives that will bring them to the table are likely to be somewhat different.
There’s another major trend that we should be paying attention to: the increasing privatization of higher education in the US. The greatest growth sector in education today is private, for-profit universities. As public spending on education is squeezed, more institutions are seeking to maximize their autonomy – witness the discussions in Wisconsin about separating the research intensive Madison campus from the rest of the system, and the proposal for charter universities in Ohio. Virginia embraced charter universities five years ago. And in the other ASERL states where the larges collections are located, there is a visible trend toward increasing privatization. These are the states where the largest ASERL collections are located.It’s not clear what the impact of this trend on library cooperation and shared print service models is likely to be – but I believe this is something we need to be watching carefully.
Finally, the bit you’ve been waiting for – where we bring the cloud-sourcing model ‘down home’ to ASERL.ASERL represents a heterogeneous mix of research libraries, ranging in size from less than 500K titles to more than 3.5 million. In total, the collective collection of ASERL libraries amounts more than 56 million holdings. Yet when we hold that collective collection up to the mirror that is the mass-digitized book corpus, we find remarkable similarities – echoing the pattern we observed for the ARL community as a whole. More than a third of the aggregate ASERL holdings are duplicated in the HathiTrust Digital Library. More than 2 million unique titles. That’s greater than the median individual holdings in ASERL member libraries. And as you can see, the line on the right is remarkably flat compared to the line on the left.
This is another view of ASERL holdings. Despite very wide differences in collection size, we find that all members fall within a narrow band of duplication when compared to the HathiTrust Digital Library. Median duplication is 33% as of April 2011. Tennessee has the highest level of duplication (41%) and Florida has the lowest. But between these two ‘extremes’ there is remarkably little variation in percentage duplication.This is important, since it means that the ASERL community as a whole is exposed to the same risk and conversely will benefit in equal measure to ‘cloud sourcing’ solutions.The fact that Florida is an outlier is also important, as we’ll see in just a moment.
Here’s a concrete example of the kind of content that we find duplicated in the HathiTrust library. This historical study of European politics is held by thousands of libraries. In the early days of the Google Library partnership it was scanned not once or twice but four times.In Google Books it is available as ‘snippet only’ but in Hathi it is presented as full-view.328 titles like this (held by all 38 ASERL libraries) in April 2011600 titles if Air University is excluded.Hard to imagine circumstance in which this level of duplication within ASERL can be justified, except perhaps popular fiction and coursebooks (which are not much represented in Hathi collection).
Miami is an interesting example – it is not only representative of the median holdings in ASERL, and the median overlap between ASERL institutions and HathiTrust, but as a private institution it may have more latitude in the kinds of business agreements it can establish with external providers.(Why is the PD yield so low? Because PD titles tend to have fewer library holdings. The 20% ‘yield’ on HathiTrust doesn’t necessarily translate at the institutional level.)
Another way to look at the titles duplicated in HathiTrust – this time by the level of duplication in the systemwide print collection. A very large part of the U of Miami holdings duplicated in HathiTrust are also widely held in print. More than 70% are held by 100 or more libraries. More than 95% are held by 25 or more libraries. Of course, since virtually all of the titles are in copyright, it will be very important that Miami establish a viable print supply chain for these titles. Institutional risk tolerance will determine where libraries draw the line on how much print needs to be retained locally or regionally. In the case of Miami, I would say that the university library is a position of enviable flexibility.
“Libraries of the University of Florida form the largest information resource system in the state of Florida.”Surprisingly, these data suggest that Florida may not be the optimal shared print supplier for mass-digitized titles.Of the almost 400K Miami-owned titles duplicated in the HathiTrust, about 60% could be supplied by the University of Florida. Surprisingly, the Univ of South Florida, which has a library collection a little more than half the size of U of F and is geographically closer to Miami, could supply almost 50%.
Both of these institutions have about a 33% duplication rate with Hathi. But the contours of that duplicated content are very different. At East Carolina, as at Miami, the overwhelming majority of content in the mass-digitized corpus is also very widely held in print.As Duke, however, a significant portion of the digitized content is relatively scarce. For an institution like Duke, the HathiTrust offers a different kind of value – it provides an additional layer of preservation for titles that Duke has purchased but may now want to manage differently than it has in the past.ECU is not a HathiTrust partner; Duke is (via TRLN).
This is where the rubber meets the road. I mentioned that there has been increased attention to the long-term costs of acquiring and retaining low-use print materials. This is especially true for retrospective print collections that have been digitized. On recent study by the Dean of Libraries at the University of Michigan suggests that it costs about $4.25 per volume per year to store a book on campus, and less than a third as much to manage it off-site. This means that ECU is currently spending between $320K and $1.6 million dollars each year to retain copies of books that are preserved in the HathiTrust repository and also widely held by other ASERL members.The library is not accountable for these costs – they are not charged to the library budget – but is in some sense responsible for them.
As we look to the future, it is clear that the academic library environment as a whole is changing. Here I have plotted projections for the duplication of academic print collections in the HathiTrust Digital Library for a range of ASERL member libraries. The blue and violet lines at the top of the stack represent smaller academic institutions . We can predict that 50% or more of their library holdings will be duplicated within the next 18 months. In larger research institutions, that watershed moment will occur somewhat later. At very large ARL institutions, it may take another year or two before redundant print inventory begins to look less like an asset and more like a liability. But this change is coming, and we need to plan for it. Wake ForestUNC CharlotteUNC Chapel HillDuke