DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
4. Data and Libraries
• Role of libraries in data management has
been a focus of discussion
• Academic libraries collect, preserve, and
disseminate human knowledge, within the
context of a particular institution (Research
and Teaching)
4
5. Drivers
• Increased interest in computational,
collaborative science has led to an
increase in interest in data management
and sharing
• Funder mandates have increased interest
at campus level
5
6. Data Services and Libraries
• Different views of library roles
• Curatorial Roles (Data Collection,
Appraisal, Selection, Description,
Preservation, etc.)
• Service Roles (Data Management
Planning, Preservation Planning, Data
Needs Assessment, Data Information
Literacy, Intellectual Property and
Governance)
6
8. Data Services at Purdue
• Assessment of Data Needs
• Development of Data Services
• Development of Data Repository
8
9. Looking Upstream
“published” unpublished “published” published secondary/
data/ research research research tertiary
datasets traditional/non non-traditional traditional resources
analyzed
Analyzed data might need to be reviewed prior to publication, or in
data/
datasets case of questions after publication
processed Quite often data must be scrubbed/anonymized, or processed to
data/ format prior to analysis; some disciplines share this data widely within
datasets
their communities (e.g., astronomy, physics, etc.)
“raw” Some raw data are shared readily (e.g., genetics), but
data/ also quite often are discarded, depending on discipline
datasets
9 Modified from: Brandt, D.S. “Scholarly Communication” (in To Stand the Test of Time: Long-Term
Stewardship of Digital Data Sets in Science and Engineering.: Final Report of Workshop New Collaborative
Relationships: Academic Libraries in the Digital Data Universe. ARL, Washington, DC, September 2006.)
10. Data Needs Assessment
• Needed to understand campus needs
before investing in solutions
• What are faculty needs, practices,
attitudes, etc.?
• What is the appropriate infrastructure at a
campus level?
• Where should we develop partnerships?
10
11. Data Curation Profiles
• An interview instrument that provides a guide for discussing
data with researchers
• Analysis of profiles:
• Gives insight into faculty needs and attitudes related to data sharing
• Help assess information needs related to data collections
• Gives insight into differences between data in various disciplines
• Help identify possible data services
• Create a starting point for curating a data set for archiving and
preservation
http://www.datacurationprofiles.org
11
16. Specific Data Services
• Data reference • Developing data resources
• Data mgmt planning (LibGuides, tutorials)
• Data consultation (may lead • Linking data to articles and
to collaborations/grants) dissertations
• Using PURR • Promoting open access
• Promoting data DOIs (Authors rights, IR deposit)*
• Data mgmt education and • Leveraging publishing
information literacy opportunities*
• Finding and using data • Developing local collections*
• Developing tools (DCP 2.0, • Collection mgmt of “e”
DataBib, DMP-SAQ) (journals, data, archives)*
• Data visualization/GIS • Integrating systems *
(i.e., finding data in Primo)
• * As relates to data
16
17. Campus Data Services at Purdue
Data Services is one
of many services
in the Libraries
DS
Liaison
Librarians
Other Purdue Data
Campus Data Services
Specialists Services Specialists
Other
Libraries
Specialists
17
18. Collaborative Model within the Libraries
The current service model is a combination of interaction
between researcher, subject liaison, and data services librarian.
When a researcher approaches a subject librarian about a data
related question, the librarian can:
1. Refer the question to the data services team, who will
engage the researcher and keep the librarian in the loop
regarding resolution (Referral)
2. Ask a data services team member to accompany them in
meeting with the researcher to determine question or
problem (“Buddy System”)
3. Meet with the researcher to understand and address the
problem, using the data services team as resource to
consult with as needed (Consultation)
18
4. Work directly with researcher (Solo)
20. PURR
• Based on HUBzero
• Collaboration between Libraries, IT, OVPR
• Subsidized by campus
• Grant-supported projects get 100GB working space, 10 GB
for published data
• Additional space can be purchased if needed
• Includes project space, “publishing” workflow
including DOIs
• Preservation layers under investigation
20
21. Data Services & PURR
Research Collaboration, Data Discovery,
Curating, Publishing & Archiving
Researchers
Libraries
Data Services OVPR
(Reference & Policy & Sponsored
Consulting) & Programs & Awards
Preservation
ITaP
Infrastructure
(HUBzero™)
21
24. Purdue University Research Repository (PURR)
1. Craft Data Management Plans
2. Consult on new projects
3. Collaborate and contribute to projects
4. Review datasets submitted for publication
5. Select / De-select published datasets from
the collection
24
26. What is DataCite?
An International Organization dedicated to:
• Establishing easier access to scientific research data
• Increasing acceptance of research data as legitimate,
citable contributions to the scientific record
• Supporting data archiving that will permit results to
be verified and re-purposed for future study
http://www.datacite.org
26
27. DataCite
• DOI Allocation
• 3 Full Members in US:
– Purdue University Libraries
– California Digital Library
– Office of Scientific and Technical Information (DOE)
• How to get involved?
– Work with a full member to assign DOIs to your data
– Attend DataCite workshops and conferences
http://datacite.org/DataCiteUS
27
30. Data Citation Services on Campus
There is a lack of resources, tools and standards to help
researchers manage, share, or preserve research data
“In an ideal situation we would somehow have some
sort of standard under which we named things and
stored things and kept track of things and we would,
you know, have a way to get this information to our
students.” (U1E2J1)
30
31. Data Citation Services on Campus
• Researchers state a general willingness to share their
data with others, but not without certain restrictions,
and not without benefits for themselves.
– Embargo
– Attribution (Citation)
– “Trust”
• “I need the people who use my dataset to cite it so that I get
credit for producing it.” focus on citation and identifier
standards
31
32. Availability of Identifier Services
• We use EZID as our platform
• DOIs are included in PURR, which is broadly
available on campus
• Pricing models for other projects, both for DOIs
and ARKs
32
33. Data Citation Services and Library Publishing Services
• Provides a connection between Data Citation
Services and Library Publishing Services at
Purdue
• Provides a selling point for both services. DOIs
provide credibility
• Exploring emerging publishing models
– Open Access
– Connecting Textual and non-Textual Resources
– Publishing Data (Data Papers, etc.)
33