Presentation by Susan Reilly at Bibsys2013 on the opportunties for libraries and their role in the collaborative data infrastructure. Looks at data sharing, authentication, preservation and advocacy.
Research Data Services and Data Collections: Library Synergies for Economic R...
Where is the opportunity for libraries in the collaborative data infrastructure?
1. Where is the opportunity for
libraries in the collaborative
data infrastructure?
Susan Reilly
Project Manager
LIBER
susan.reilly@kb.nl
@skreilly
2. Contents
About LIBER
Some context
What is the collaborative data infrastructure?
Introducing the researcher to the CDI
Introducing the CDI to the researcher
Now and next?
3. LIBER: reinventing the library of the future
Largest network of European reseach libraries: 450 in over 40
countries
Mission:
To provide an information infrastructure to enable research
in LIBER institutions to be world class
4. Key performance areas
Scholarly communication and research infrastructures
Reshaping the research library
Advocacy
5. LIBER Projects
Reshaping
The
research library
Scholarly
Communication
Advocacy
&
Research
Infrastructure
6. So why am I here?
Reshaping Collaborative data
The infrastructure
research library
Scholarly
Communication
Advocacy &
Research
Infrastructure
7. What is the collaborative data infrastructure
(scientific data infrastructure)?
…it’s about data
8. Not just the 20+ petabytes that the LHC at CERN
produces every year
9. Libraries in the data deluge
Increasing amount of digitised and born digital content
in libraries
Increasing emphasis on open access publications and
data: mandates, institutional repositories
Demand for data management support
10.
11. What is the collaborative data infrastructure?
“a broad, conceptual framework for how different
companies, institutes, universities, governments and
individuals would interact with the system – what types of
data, privileges, authentication or performance metrics
should be planned. This framework would ensure the
trustworthiness of data, provide for its curation, and
permit an easy interchange among the generators and
users of data”
12. Now and Next
Authentication & authorisation
New skills
13. Introducing the researcher to the CDI
Current situation
ODE & linking data to publications
Demand for data management support
Advocacy
14.
15. Opportunities for data exchange (ODE)
identify, collate, interpret and deliver evidence of
emerging best practices in sharing, re-using, preserving
and citing data, the drivers for these changes and barriers
impeding progress, in forms suited to each audience
policy makers, funders, infrastructure operators, data
centres, data providers and users, libraries and publishers
16. Steps to creating the conditions for data
sharing
Understand data sharing today
Collection of "success stories”, “near misses” and “honourable
failures” in data sharing, re-use and preservation
Data & scholarly communications
Integrating data and publications
Best practice in data citation
New roles
Identify drivers and barriers
Interviews with stakeholder
to seek consensus
Foto "Bell", Noordewierweg 116, Amersfoort.
17.
18. Hypotheses
“Without the infrastructure
that helps scientists manage
their data in a convenient
and efficient way, no
culture of data sharing will
evolve.”
Stefan Winkler-Nees
(German Research Foundation, DFG)
20. The Data
Publication Pyramid (1) Data
contained and
explained within
the article
(2) Further data
explanations in
any kind of
supplementary (3) Data
files to articles referenced from
the article and
held in data
centers and
(4) Data
repositories
publications,
describing
available
datasets
(5) Data in
drawers and on
disks at the
institute
21. The Pyramid’s likely short term reality:
(1) Top of the
pyramid is stable
but small
(2) Risk that
supplements to
articles turn into
Data Dumping (3) Too many
places disciplines lack
a community
endorsed data
archive
(4) Estimates
are that at least
75 % of
research data is
never made
openly avaiable
21
22. (1) More
integration of text
and data, viewers
and seamless
links to interactive
datasets
The Ideal Pyramid
(2) Only if data
cannot be
integrated in (3) Seamless links
article, and only (bi-directional)
relevant extra between
explanations publications and
data, interactive
(4) More Data viewers within the
Journals that articles
describe
datasets, data
mgt plans and
data methods
22
23. Issues for researchers
Researchers need somewhere to put data and
make it safe for reuse
Researchers need to control its sharing and
access
Researchers need the ability to integrate data and
publication
Researchers need to get credit
for data as a first class research
object
Researchers need someone to
pay for the costs of data availability
and re-use
24. Library support for the researcher
Libraries and data centres must support…
data as first class research object: Availability
publishing, persistent identification/citation
of datasets
data description, metadata, standards Findability
documentation and retrieval
proper documentation of data
Interpretability
long-term data archiving including data
curation and preservation
Re-usability
25. Implications for libraries
Level of integration Implication for library
Data contained within the article Prepare for adequate preservation
strategies
Data published in supplementary files to Presentation and preservation
articles mechanisms
Persistent link
Datasets referenced from the articles Citability of dataset
Persistent link
Perpetual access to dataset
Data published independently from written Support publication process
publications (“data publication”) Curation of datasets
Metadata and documentation
Data in drawers and on disks at the Engage in data management
institute planning
27. Advocacy
“Many researchers do not appear to see the value and
benefits of data citation. There is a gap, which could be
filled by libraries, in advocacy for data sharing, the use of
subject specific repositories, and best practice in data
citation. These, if filled, would increase the number of
researchers sharing and reusing data.”
http://www.alliancepermanentaccess.org/wp-content/plugins/download-monitor/downlo
28. Introducing the CDI to the researcher
Scoping the researcher’s requirements
Collaboration & policy development
29. The AAA Study: a research passport
“evaluate the feasibility of delivering an integrated
Authentication and Authorisation Infrastructure, AAI, to
help the emergence of a robust platform for access to and
preservation of scientific information within a Scientific
Data Infrastructure (SDI)”
30. Now and Next
Authentication & authorisation
New skills
33. Collaboration
“Networked science is on the rise, the researcher is no
longer working alone in his office, he is working virtually
with other researchers from around the world. For them it
is important that they can use the same software and
share and reuse the same content related objects, in a
trusted environment.”
Heinke Neuroth, Head of Innovation, Goettingen State &
University Library
34. Use Cases
1. Creating Data
2. Processing Data
3. Sharing Data
4. Preserving Data
5. Multi-disciplinary Data Services
6. Analysing Data
7. Accessing Data
8. Accessing Experiments and Data
35. Requirements…
Tracking of provenance, authenticity, integrity of the material
Integration of researcher ID with institutional credentials
Researchers’ self registration
Securely linking researcher and data identifiers for tracking
provenance
Delegation of identity management to home institute
Attribute provisioning for users participating in specific research
projects managed by the specific research groups (VOs)
Attribute aggregation
Unification and homogenisation of identity federations´ attributes and
agreed levels of assurance in order to facilitate authorisation
Accreditation of trusted identity Providers (IdPs), based on
international standards, depending on the required level of assurance
Entitlement management to minimise the occurrence of events where
license monies are being paid twice without necessity (e.g., for
access to scientific journals).
38. Collaboration & policy development
Policies for data sharing
Values & Ecosystems
Infrastructure & Technology
Legal & Ethical
Institutional Support
http://recodeproject.eu/
39. Now & next
What should our priorities be?
LIBER ten recommendations:
http://www.libereurope.eu/news/ten-recommendations-for-libraries-to-get-started-with-research-data
41. 2.Collaborate
Alliance for Permanent Access to the Record of Science
in Europe Network (APARSEN)
look across the excellent work in digital preservation which is
carried out in Europe and to try to bring it together under a
common vision
Trust! Sustainability! Usability! Access!
http://www.alliancepermanentaccess.org/
this figure suggests, in the broadest possible terms, how different actors, data types and services should interrelate in a global einfrastructure for science. Data generators and users gather, capture, transfer and process data - often, across the globe, in virtual research environments. they draw upon support services in their specific scientific communities - tools to help them find remote data, work with it, annotate it or interpret it. the support services, specific to each scientific domain and provided by institutes or companies, draw on a broad set of common data services that cut across the global system; these include systems to store and identify data, authenticate it, execute tasks, and mine it for unexpected insights. At every layer in the system, there are appropriate provisions to curate data - and to ensure its trustworthiness.
Libraries and data centres must support data publishing as a prerequisite for data availability, including persistent identification/citation of datasets, and solutions for data description and retrieval, which together facilitate findability. They must also ensure that data is properly documented as a condition for data interpretability and re-usability and prepare for long-term data archiving including data curation and preservation.
this figure suggests, in the broadest possible terms, how different actors, data types and services should interrelate in a global einfrastructure for science. Data generators and users gather, capture, transfer and process data - often, across the globe, in virtual research environments. they draw upon support services in their specific scientific communities - tools to help them find remote data, work with it, annotate it or interpret it. the support services, specific to each scientific domain and provided by institutes or companies, draw on a broad set of common data services that cut across the global system; these include systems to store and identify data, authenticate it, execute tasks, and mine it for unexpected insights. At every layer in the system, there are appropriate provisions to curate data - and to ensure its trustworthiness.