Measures of Central Tendency: Mean, Median and Mode
Reshaping the Research Library: Some Observations on the Future of Academic Collections
1. Reorganizing the Research Library:
Carnegie Mellon
University
a system-wide perspective
26 January 2011
Constance Malpas
Program Officer, OCLC Research
2. OCLC Research: what we do
Supports global cooperative by providing internal data
and process analyses to inform enterprise service
development (R&D) and deploying collective research
capacity to deepen public understanding of the evolving
library system
Special focus on libraries in research institutions:
in US, libraries supporting doctoral-level education account for
<20% of academic libraries;>70% of library spending
changes in this sector impact library system as a whole;
collective preservation and access goals, shared infrastructure, &c.
3. OCLC Research: who we are
• ~45 FTE with offices in Ohio, California and the UK
• Sponsored by OCLC and a partnership of research libraries
around the world that share:
• A strong motivation to effect system-wide change
• A commitment to collaboration as a means of achieving collective gains
• A desire to engage internationally
• Senior management ready to provide leadership within the transnational
research library community
• Deep and rich collections and a mandate to make them accessible
• The capacity and the will to contribute
4. Our collaborators
Then: Now:
• ARL set the tone; size • Nimble institutions,
matters and this is filler unburdened by legacy
to adjust spacing print mandate
• Collections of distinction • Distinctive purpose
• Doing the same, better • Transforming the portfolio
• Change is possible • Change is imperative
A new coalition is needed
to advance the research library agenda
6. System-wide organization
Research theme addresses “big picture” questions about the
future of libraries in the network environment; implications
for collections, services, institutions embedded in complex
networks of collaboration, cooperation and exchange
• Characterization of the aggregate library resource
Collections, services, user behaviors, institutional profiles
• Re-organization of individual libraries in network context
Institutions adapting to changes in system-wide organization
• Re-organization of the library system in network context
„Multi-institutional‟ library framework, collective adaptation
7. Defining characteristics of SO activities
• Emphasis on analytic frameworks and heuristic models
that characterize (academic) library service environment
as a whole
• Identifying and interpreting patterns in distribution,
character, use and value of library resource; implications
for future organization of collections and services
• Provides context for decision-making, not prescriptive
judgments about a single, best course of action
• Shared understanding of how network environment is
transforming library organization on micro and macro level
8. Exemplar:
Re-organization of library system
• Externalization of print repository function facilitates
redirection of institutional resources; new scholarly record
• Cloud Library analysis (OCLC, Hathi, NYU, ReCAP)
• Case study in de-composition of library service bundle: “cloud
sourcing” research collections
• Data-mining Hathi and WorldCat to determine where cost-
effective reductions in print inventory can be achieved for
individual libraries (micro economic context)
• Characterizing optimal service profile for shared print/digital
service providers; collective market for service (macro
economic context)
• Exploring social and economic infrastructure requirements;
technical infrastructure a separate, secondary challenge
9. Prediction
Within the next 5-10 years, focus of shared print archiving
and service provision will shift to monographic collections
• large scale service hubs will provide low-cost print
management on a subscription basis;
• reducing local expenditure on print operations, releasing
space for new uses and facilitating a redirection of library
resources;
• enabling rationalization of aggregate print collection and
renovation of library service portfolio
Mass digitization of retrospective print
collections will drive this transition
10. A global change in the library environment
60%
Academic print book collection already substantially
50% duplicated in mass digitized book corpus
% of Titles in Local Collection
June 2010
40% Median duplication: 31%
30%
20%
10% June 2009
Median duplication: 19%
0%
0 20 40 60 80 100 120
Rank in 2008 ARL Investment Index
11. Mass Digitized Books in Shared Repositories
~3.5M titles
3,500,000
~75% of mass digitized corpus is ‘backed up’
3,000,000
in one or more shared print repositories
~2.5M
2,500,000
Unique Titles
2,000,000
1,500,000
1,000,000
500,000
0
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
12. Shared Print Service Provision: Capacity Varies
80%
Union of 5 major shared print collections
70%
Library of Congress
60%
% of Mass Digitized Corpus Duplicated
50%
UC NRLF/SRLF
40%
30%
20%
ReCAP
10%
CRL
0%
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10
13. Carnegie Mellon University Library Collections
Optimizing print holdings . . .
• ~ 700,000 CMU holdings in WorldCat (PMC)
Cf. 1.2M vols. ; are WorldCat holdings up to date?
• ~240,000 titles held by CMU (PMC) replicated in mass-
digitized book collection
~16,000 (6%) in the public domain
• >190,000 mass-digitized titles held by CMU also held by PSU
Shared print agreement feasible?
14. 35% of titles held in CMU Libraries are
duplicated in the HathiTrust Digital Library
~700K Carnegie Mellon University (PMC) holdings in WorldCat
15,785 titles
227,729 Full View
titles
Limited View
~243K duplicated in HathiTrust Digital Library
Represents ~$1M in annual operating costs
OCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010.
15. System-wide print distribution of CMU-owned
titles duplicated in HathiTrust Digital Library
89% of titles represent very low
preservation risk; suitable for
withdrawal, shared print agreement?
Decreasing preservation risk
OCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010.
16. Subject distribution of CMU-owned titles
duplicated in HathiTrust Digital Library
Communicable Diseases & Misc.
Unclassified
Health Facilities
Physical Education & Recreation
Medicine By Body System
Agriculture
Medicine
Anthropology
Preclinical Sciences
Medicine By Discipline
Geography & Earth Sciences
Biological Sciences
Represents 2.8 miles of library shelving;
Law
Psychology <1000 feet if limited to public domain
Health Professions & Public Health
Government Documents
Chemistry
Education
Public domain…
Performing Arts
Computer Science
Library Science
low risk, limited return
Mathematics
Philosophy & Religion
Political Science
Sociology
Physical Sciences
Music
Public domain
Engineering & Technology
Business & Economics
In copyright
Art & Architecture
History & Auxiliary Sciences
Language, Literature, Linguistics
0 10,000 20,000 30,000 40,000 50,000 60,000
OCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010.
Titles / Editions
17. Maximize benefit, minimize risk
Titles Linear Feet Offsite $ (p/a)
Risk Level Strategy PD IC Min Max Min Max
Relegate based on
Highest Hathi 227,729 15,785 14,233 15,220 $195,847 $ 209,422
… Hathi & total
High WC holdings >24 15,302 225,687 956 15,062 $ 13,160 $ 207,251
… Penn State
Moderate without agreement 9,101 182,142 569 11,953 $ 7,827 $ 164,469
… Penn State
without agreement
Lower & holdings >24 9,073 182,026 567 11,944 $ 7,803 $ 164,345
… Penn State with
Low service agreement 9,101 182,142 569 11,953 $ 7,827 $ 164,469
18. Academic libraries in the Keystone State:
a common trajectory, different timelines
The next few years are critical
Jul „11 Nov „11 Aug ‟12 Aug ‟13
* * * *
OCLC Research. Projection based on HathiTrust and WorldCat snapshot data, Jun 2009 – Dec 2010.
19. For discussion
• What is the function of local print collection in long-term
library strategy?
• Is selective externalization of print management functions
to Penn State or another potential provider an option?
• Can faculty be persuaded that shared print strategy is
sound?
• How soon does change need to happen?
Hinweis der Redaktion
With that as background, I’d like to offer a prediction about the future of shared print, and that’s our attention will begin to shift to pooled management of the retrospective print book collection. With this shift, I think we will see the emergence of a relatively small number of larger service hubs providing just-in-time delivery and longterm preservation services on a subscription basis. Individual academic libraries will contract with those service providers because they offer a cost efficient alternative to local operations and more importantly because they allow the library to redirect its attention and resources to renovating its service portfolio. As a result, I think we will see a progressive rationalization of the systemwide print book collection.I belive mass digitization of retrospective print collections will be a primary driver in this transition, preceding a broader shift to commercial provisioning of e-books.
How big is this shift likely to be and on what timeline? Over the last year we have studied the mass digitized book corpus in the context of systemwide print holdings and have found that a substantial part of the average academic library is already substantially duplicated. This scatter chart provide a simple but effective visualization of an important pattern that this project has revealed: that is, that the risks and opportunities associated with moving collection management ‘into the cloud’ are uniformly distributed across the research library community as a whole. [CLICK] This is a picture of the ARL membership (a microcosm of the larger research library community) that shows the level of duplication between individual library collections and the mass digitized book collection in Hathi. Over the course of this project, we have seen the rate of duplication between locally held print and mass digitized books increase steadily and significantly. In June of last year, an average of 20% of monographic titles in an academic library were duplicated in the Hathi repository; today that figure is about 30% (up to 40% for some institutions). [CLICK] In real terms, this means that rate of digital replication is exceeding the pace of growth in monographic acquisitions in most academic institutions. We estimate that the rate of duplication has increased by about 8% per library in the past year. Monographic acquisitions typically grow at about 2% per year in research libraries.A very low standard deviation (variance of ~4%), and across the population very little movement outside this range: 2/3rds of ARL community falls within standard deviation. [CLICK] We project that in a year’s time, many academic libraries are liable to find themselves “underwater,” holding a massive inventory of over-valued assets.Library directors will be called to account and expected to respond to questions about how an increasingly redundant local print collection is serving the educational and research mission of theparent institution. We need to be preparing for a world in which just-in-time, print on demand delivery is an option for a large share of the retrospective book collection.
Another major finding of our study is that the mass digitized book corpus is substantially ‘backed up’ in one or more large-scale storage collections. As I mentioned earlier, we have a very incomplete picture of what’s currently in storage, so this figure may actually be quite a bit higher. The figures here are based on just 5 major repositories The important point is that we seem to have the beginnings of what I characterized earlier as a ‘strategic reserve’ of print that could significantly offset the costs of local operations. As you can see here, the proportion has remained relatively stable over the course the past year. As of this month, about 2.5 million of the 3.5 million digitized books in Hathi are also held in one or more of 5 large scale shared print repositories.
This is a picture of how the potential value of individual print storage collections has evolved in the past year, as the mass digitized corpus in Hathi has grown. Currently, about 75% of the mass digitized book collection is ‘backed up’ in one or more of the 5 large print storage collections that we have examined. I want to say a little bit about what’s causing this increase in shared print coverage, from just over 60% a year ago to almost 75% today. Much of the change is associated with the increasing visibility of individual storage repositories (institutional disclosure) and with the increasing visibility of holdings in those repositories.A few key observations: collectively, a small number of SP repositories provide substantial coverage of the mass digitized book collection. We don’t need many libraries to tuck away inventory to ‘back up’ the digitized resource – and it may be counterproductive to do so. If you had to pick a single surrogate print supplier, it would be LC, whose collections substantially duplicate the corpus of mass digitized books. But it is not obvious that LC can or should assume this role. Finally, and perhaps most importantly, the net increase in coverage that we have seen in the last 12 months is due in large part to the increasing visibility of storage holdings at these repositories. For example, the big bump we see in UC Regional Library Facilities holdings happened in October last year when the NRLF holdings in Richmond CA became visible under a distinctive library symbol in WorldCat. The visibility of ReCAP holdings increased when we enriched the holdings data (which are external to WorldCat) with OCLC numbers. I’m not sure what happened to increase the visibility of the LC holdings, but I’d guess it has to do with a batch process in WorldCat. I want to emphasize that without better and more comprehensive disclosure of storageholdings, it is very difficult to assess the carrying capacity of existing print preservation infrastructure.
As we look to the future, it is clear that the academic library environment as a whole is changing. Here I have plotted projections for the duplication of academic print collections in the HathiTrust Digital Library for a range of academic libraries in the state of Pennsylvania. The blue and violet lines at the top of the stack represent smaller academic institutions . We predict that 50% of their library holdings will be duplicated within the coming year. At research intensive institutions, that watershed moment will occur somewhat later. At the largest research libraries, it may take another year or two before redundant print inventory begins to look less like an asset and more like a liability. But this change is coming, and we need to plan for it.