1. Smita Chandra
Librarian
Indian Institute of Geomagnetism
smitac@iigs.iigm.res.in
2. What is a Repository?
Open access digital archive on open source software
A managed, persistent way of making research, learning and teaching
content with continuing value both discoverable and accessible
Repositories can be subject or institutional in their focus
Putting content into an institutional repository enables staff and
institutions to manage and preserve it, and therefore derive maximum
value from it
A repository can support research, learning, and administrative
processes. They are commonly used for open access research outputs
3. What is an institutional repository?
Clifford Lynch, Executive Director, Coalition for Networked Information,
stated
“In my view, a university-based institutional repository is a set of
services that a university offers to the members of
its community for the management and dissemination of digital
materials created by the institution and its community members. It is
most essentially an organizational commitment to the stewardship of
these digital materials, including long-term preservation where
appropriate, as well as organization and access or distribution.”
ARL: A Bimonthly Report, no. 226 (February 2003)
Institutional Repositories: Essential Infrastructure for Scholarship in the
Digital Age
http://www.arl.org/resources/pubs/br/br226/br226ir.shtml
4. Open Access Institutional Repositories
What is open access (OA)?
Many definitions – a report from the Joint Information Systems
Committee (JISC) in the UK of 2006 stated:
The Open Access research literature is composed of free, online
copies of peer-reviewed journal articles and conference papers as
well as technical reports, theses and working papers. In most
cases there are no licensing restrictions on their use by
readers. They can therefore be used freely for research, teaching
and other purposes.
(http://www.jisc.ac.uk/publications/publications/pub_openacce
ss_v2.aspx)
An Open access institutional repository is that repository where are
contents are freely available for use.
5. What OA is not ?
There are various misunderstandings about Open
Access. It is not self-publishing, nor a way to bypass
peer-review and publication, nor is it a kind of second-
class, cut-price publishing route. It is simply a means
to make research results freely available online to
the whole research community.
http://www.jisc.ac.uk/publications/briefingpapers/20
06/pub_openaccess_v2.aspx
6. Gold and Green OA publishing
Gold OA - uses a funding model that does not charge
readers or their institutions for access e.g. Ariadne, D-
Lib Magazine and First Monday
Green OA - authors publish papers in one of the 25,000
or so refereed journals in all disciplines and then self-
archive these papers in open
access/digital/institutional repositories.
7. Institutional Repositories are:
Centered around a university (other academic institution) and
contain items which are the scholarly output of that
institution
A collection of (digital) objects, in a variety of formats
Include works of various degrees of scholarly authority and
from various stages in the process of scholarly inquiry. In
addition to published works, an IR may include preprints, theses
& dissertations, images, data sets, working papers, course
material, or anything else a contributor deposits
Typically motivated by a commitment to open access
8. Institutional Repositories
Institutions are logical implementers of repositories
because they can take responsibility for:
– Centralising a distributed activity
– Framework and Infrastructure
– Permanence that can sustain changes
– Stewardship of Digital assets
– Preservation policy for long term access
– Provide central digital showcase for the research,
teaching and scholarship of the institution
9. IRs & Digital Libraries
Institutional Repositories Digital Libraries
Are organized around a May be built around any number of
particular institutional organizing principles (often topic,
community subject, or discipline)
Often are dependent upon the Are the product of a deliberate
voluntary contribution of collection development policy
materials by scholars for the
content in their collection
Typically include an important
Are mainly repositories and service aspect (reference and
therefore may only offer limited research assistance, interpretive
user services content, or special resources.)
10. How does an IR content differ from
other digital collections?
Content is deposited in a repository – by content
creator, owner etc.
Repository architecture manages the content and
the metadata
Repository software offers a minimum set of basic
services – put, get, search
Repository must be sustainable, trusted, well-
supported and well-managed
Heery, R. and Anderson S. (2005) Digital Repositories
Review. UKOLN and AHDS. Available at:
http://www.jisc.ac.uk/uploaded_documents/digital-
repositories-review-2005.pdf
11. Origins & Development
Open Archives Initiative-
Protocol for Metadata
Harvesting (OAI-PMH)
Digital Library
12. Why? – university view
An institutional repository is a tangible indicator
of research output of a university – thus increasing
its visibility, prestige and public value
Repository content is readily searchable – both
locally and globally
Can be used as a marketing tool for the institution
Allows an institution to manage its Intellectual
Property Rights appropriately
13. Why? – funder’s view
Funders see improved access to, and wider
dissemination of research
For example, in the UK the eight research
councils have adopted policies mandating that
results from their tax-payer funded research be
‘open’, available and accessible to all via IRs or
similar subject repositories
e.g. Economic and Social Research Council
http://www.esrc.ac.uk/_images/Full_text_decision
_tree_tcm8-4138.pdf/
14. IRs can be used for:
Scholarly communication
Storing learning materials and coursework
Managing collections of research documents
Preserving digital materials for the long term
Knowledge management
Electronic publishing
Research assessment exercise
Collaboration tool
15. Benefits of setting up an institutional
repository
For researchers
Showcase your institute’s output
Increases citation for authors
24-hour access through any web-enabled device
Life’s work in one location
Satisfies funder’s mandates
Persistent URLs
For librarians
Provides new ways for archiving & preserving valuable work
Time-saving and cost-effective
Help to identify trends
Reduce duplication of records
16. More Benefits
For the university
An effective marketing tool
Increase the visibility, reputation and prestige
Greater interdisciplinary research
Enhanced funding
Facilitates gathering data such as publications for Assessments
For the global community
Free access of scholarly information
Taxpayers fund a large amount of scientific research
Developing countries
Increase public knowledge
Gain access to a wide variety of materials
23. Publication and Deposition
Author writes paper
Submits to journal Deposits in e-print
repository
Paper is refreed
Revised by author
Author submits final version
24. Publication and Deposition
Author writes paper
Deposits in e-print
Submits to journal
repository
Paper is refreed
Revised by author
Author submits final version
25. What type of content can be deposited
in an Institutional Repository?
Faculty
Pre-prints, post-prints, research findings, working papers, technical
reports, conference papers
Multimedia, videos, teaching materials, learning objects
Data sets (scientific, demographic, etc.) and other ancillary research
material
Web-based presentations, exhibits, etc.
Students
Theses and dissertations
Projects and portfolios
Awarded research
Performances and recitals
26. Starting & Maintaining an IR
Steps to Building an IR
1. Justify the relevance to the institution and
contributors
2. Develop a policy framework. How will we find this
content and what will we do with it?
3. Build the infrastructure
Bonus: Get institutional support and a mandate.
27. Starting & Maintaining an IR
IR Technology
IR software (Open Source/Commercial)
OAI-PMH harvesting protocol/software (Free)
Intel/Pentium servers for IR
Linux/Red Hat OS, MySQL/PostGre DBMS,
Apache/Tomcat web server, Perl/Java (Free)
28. Starting & Maintaining an IR
Core issues
• Policy Decisions
• Organizational Issues
• Cultural Issues
29. Starting & Maintaining an IR
Policy decisions
• Scope : Reinforce the repository’s active support for the
institution’s mission, values and goals
- Identify/build a context in which the repository is necessary
- Multidiscipline / single subject /Entire research output
/database for each functional unit
• Types of documents
- Single database for different types /single one
• Software: OSS like DSpace or GNU Eprints or develop own
• Research Deposit Types: Thesis, Journal articles,
Preprints, Reports, Conference papers, Book Chapter, etc
• Resources: Human, IT, Funding
• Stake holders: Library, Each Department, Institute as a whole
• Services : Focus on building services not collections
30. Starting & Maintaining an IR
Management and Organizational Issues
• Deposit options
-Researcher self deposit and /or assisted deposit
- Metadata quality
- Ensuring quality and rich metadata is labour intensive
• Digitization: Born digital / Scanning
• File formats: Accept all, Only PDF and/or other, Conversion
• Only full text database and/or Bibliographic
• Copyright: RoMeO Publishers Copyright policies
• Quality assurance: Peer review, Editing
• Deposit Agreement and Use Agreement
- Depositor’s declaration: Non-exclusive license - Copyright/Patent/Trademarks
- Repository’s rights and responsibilities: Distribute, Store, Migrate, Copy
Rearrange, Remove
- Use Agreement: Copy, Distribute, Display, Share, Author credit
31. Starting & Maintaining an IR
Cultural Issues
• Advocacy
- Sensitive to organizational culture and background
- Community size
- Strategy: stakeholders, management committees
• Copyright
- Concern of researchers, Legal department
• Positioning
- Library/Institute Website
32. Starting & Maintaining an IR
Key Issues:
• Faculty buy-in
• Submission polices
• Copyright issues
• Deposit types
• Metadata
• OAI-PMH compliant systems
• Specialized staff
• Outreach and Liaison services
33. Obstacles to building a repository in-
house
Open source institutional repository software is free to acquire but
expensive to implement
Delays due to slow response times from over-burdened IT services
Lack of personnel with the correct skills
Projects often go on for much longer than necessary
Other priorities can crop up unexpectedly and divert resources away
from the repository project
34. Four Widely Used Systems
Produced by Berkeley Electronic Press (bepress), focused on maintaining
scholarly output. Not open source.
Developed at the University of Southampton (UK). Widely considered to be
the least complex of the major repository software platforms.
Developed at Cornell and University of Virginia. Based on a framework
known as the Flexible Extensible Digital Object and Repository
Framework.
Designed by MIT and Hewlett-Packard to manage the intellectual
output of research institutions and provide for long-term preservation.
35. Subject/Discipline Based Repositories
Definition : Subject repositories are archives
which collect and manage material
relating to one or more related subject
areas. A number currently exist mainly
within science subjects.
Subject repositories often managed by an individual
for a group
36. Subject/Discipline Based Repositories
Relies on peer interaction – no mandate
Individual agreements have to be struck
No definitive boundaries
Quality control issues
Sustainability issues
Transitory – collection at risk
Responsibility for preservation
Issues over the return on the money and effort
invested
37. Subject/Discipline Based Repositories
Significant subject repositories include many using e-Prints or DSpace
software:
ArXiv - http://www.arxiv.cornell.edu/ (physics, mathematics, non-
linear science and computer science)
Cogprints - http://cogprints.ecs.soton.ac.uk/ (Cognitive sciences
including psychology, neuroscience, linguistics and other related areas)
CiteSeer - http://citeseer.nj.nec.com/cs (computer science)
HTP Prints - http://htpprints.yorku.ca/ (History and theory of psychology)
PubMedCentral - http://www.pubmedcentral.nih.gov/ (US National
Library of Medicine's digital archive of life sciences journal literature.
PhilSci Archive - http://philsci-archive.pitt.edu/ (philosophy of science)
E-LIS - http://eprints.rclis.org/ (library and information science)
RePEc (Research Papers in Economics)
38. How Does an IR Work?
The Open Archival Information System (OAIS)
39. How Does an IR Work?
Submission and Ingestion
contributor metadata
formatting
Copyright
Post-Submission
quality metadata (DC)
Intellectual Property issues
User Query
Ongoing workflows
Preservation
Administration
Data Management
System customization
40. OpenDOAR – Directory of Open Access
Repositories
The OpenDOAR service provides a quality-assured
listing of open access repositories around the
world. OpenDOAR staff harvest and assign
metadata to allow categorisation and analysis to
assist the wider use and exploitation of
repositories. Each of the repositories has been
visited by OpenDOAR staff to ensure a high degree
of quality and consistency in the information
provided: OpenDOAR is maintained by SHERPA
consortium staff at the University of Nottingham,
UK
http://www.opendoar.org/about.html
43. Benefits in depositing material
Increase in citations, impact and usage (useful for
research evaluations such as the planned Research
Evaluation Framework in UK in 2013)
Increase in public research profile – both for the
individual as well as the institution
Preservation of research outputs from the institution
44.
45.
46. ROAR- Registry of Open Access
Repositories
Aims to monitor overall growth in the number of eprint
archives and to maintain a list of GNU EPrints sites
(http://roar.eprints.org)
Available from Southampton University, UK
Data gathered automatically via OAI-PMH
Also ROAR Materials Archiving Policies – ROARMAP -
163 Institutional repositories (including Rourkela National
Institute of Technology, Bharathidasan University in India)
(http://roarmap.eprints.org)
47.
48. Other ‘overviews’ of IRs
Repository66 – a mash-up by Stuart Lewis formerly of
Aberystwyth, now at Auckland University, New
Zealand based on OpenDOAR and ROAR
(http://maps.repository66.org/)
World ranking of institutional repositories
(http://repositories.webometrics.info/about_rank.html)
51. Repository architecture
Largely institutional focus though some exceptions –
arXiv, COGPRINTS, etc
Interoperability through centralized aggregators
(national and global)
Search services (OAIster, Intute, …)
Registries (DOAR, ROAR, …)
Harvesting metadata about content using OAI-PMH
(metadata = simple Dublin Core)
Content = PDF
52. Constraints of IR
Absence of a well defined institutional policy
Lack of IR expertise in India
Insufficient funds for IT Infrastructure and
manpower
Apathy of authors towards time consuming and
lengthy deposition procedure.
Ignorance of users in the absence of appropriate
literacy program
53. Constraints of IR (Contd…)
Publisher’s rigid attitude towards copyright policy
Customization of open source software is a bottle
neck
Nature of content: Classified/restricted and
Unclassified/Open
Diversity of content and the language used in the
full texts
Relying on unproven methods for long term digital
preservation.
58. Digital Preservation in IRs
Importance of Digital Information
Preservation
1975 – Two Viking space probes sent to Mars by USA.
Data generated by unrepeatable mission cost $1 billion.
Recorded data on magnetic tapes was corrupted /
unidentifiable after 2 decades despite being kept in climate
controlled environment.
Scientists could not access data, unable to decode the
formats used.
59. Importance of Digital Information
Preservation
Original format developers not alive.
Finally old printouts tracked and retyped.
NASA therefore is the biggest supporter of Digital
Preservation Projects.
This illustrates wide gap in information generation and its
management.
60. Threats
Media decay and failure
Massive storage failures, outdated media
Access Component
Obsolescence
Outdated formats, applications & systems
Human and Software errors &
External Events
61. Information Deluge
Present & Future Projections
Yawning gap between
Our ability to create digital information
Our infrastructure and capacity to manage and
preserve it over time
Cumulative effect foreseen as future “digital dark
ages”
62. Need for Digital Preservation
preserving natural/cultural heritages
for promoting academic research
enabling public access to legacy collections
63. IRs and Digital Preservation
An IR is a model for a preservation system
It requires “most essentially an organizational commitment to the
stewardship of … digital materials, including long-term
preservation where appropriate, as well as organization and access or
distribution”
Attributes of a “Trusted Digital Repository”
“…an organisation that has responsibility for the long-term
maintenance of digital resources, as well as making them available
[through time and across changing technologies] to communities
agreed on by the depositor and the repository.”
Research Libraries Group
http://www.rlg.org/longterm/attributes01.pdf
64. Definition: Digital Preservation
The maintenance of digital materials over the long-term
with a view to ensuring its continued accessibility. It
ensures that the digital resources are stored correctly
and maintained adequately in the online world, such
that they are available consistently for use over time.
“Long-term” includes timescales of decades or even centuries
65. Preservation Strategies
Technology preservation
Keep the hardware alive
Technology emulation
Create an environment to be able to run the existing
software
Data migration
Convert data to new formats to run in new applications
66. Open Archival Information System
(OAIS)
SIP = Submission Information
Package
AIP = Archive In formation
Package
DIP = Dissemination Information
Package
Published by Consultative Committee for Space Data System
(CCSDS) 2002, ISO 14721 : 2003 standard
An archive consists of an organization of people and systems with
responsibility to preserve information and make it available to
users.
67. OAIS: Definitions
To define an Open Archival Information System
The term 'open' means that the document was developed in an
open way, and does not imply that access to any OAIS should be
unrestricted
An archive is defined as an "organization that intends to preserve
information for access and use by a designated community." (p. 1-
8)
While an OAIS itself need not be permanent, the information
being maintained has been deemed to need "Long Term
Preservation"
Long term = long enough for there to be a concern about the impact
of changing technologies
68. OAIS: Purpose and Scope
Primary focus on digital information
Specific aims include:
A framework for the understanding and awareness of the archival
concepts needed for long term preservation (access)
Terminology and concepts for describing and comparing:
Architectures and operations
Preservation strategies and techniques
Data models
Consensus on elements and processes for long term preservation
A foundation for other standards
69. OAIS: Applicability
Applicability:
Applicable to any archive, but mainly focused on
organisations with responsibility for making information
available for the long term
Of interest to those who create information
Conformance
An OAIS must support the information model - but does not
specify any particular method of implementation
Mandatory responsibilities (section 3.1)
70. Implementing OAIS
Summing up the fundamentals :
OAIS is a reference model (conceptual framework), NOT a
blueprint for system design
It informs the design of system architectures, the development of
systems and components
It provides common definitions of terms, a common language and
means of making comparison
But it does NOT ensure consistency or interoperability between
implementations
71. Summing Up : OAIS
The OAIS model is a foundation stone for current and
future digital preservation efforts
It is already widely used to inform the development of
preservation tools and repositories
It could be used in the future as a basis for conformance
72. Research Objectives
1. To design an institutional repository using DSpace, that is both sustainable
and viable and can fulfill the long-term digital preservation of materials
deposited into it
2. To map the Open Archival Information System (OAIS) Reference Model
on the in situ institutional repository, weigh the benefits of OAIS features
against institutional repository usability and to identify the institutional
repository challenges to the relevant features of the OAIS
3. To assess the applicability of products developed by projects employing
the OAIS model on small and medium sized institutional repositories, using
the IIG institutional repository as a test bed
4. To ensure that the required policies, guidelines, strategies, procedures and
agreements exist while implementing the OAIS model, that will embed digital
preservation into IIG’s workflow
73. Conclusion from the study
This research was able to identify all the components
necessary for the implementation of the OAIS model
for a geoscience domain specific institutional
repository