2. NERC Environmental Data Centres
Atmospheric science Earth sciences Earth observation
Marine science Polar science
Science-based
archaeology
Terrestrial & freshwater
sciences and hydrology
Support & guidance
Long-term data curation
3. Background
NERC Environmental Data Centres
– For NERC funded research data
– provide support and guidance in data management
– responsible for the long-term management of data
‘It is essential that data generated through
NERC supported activities are properly
managed to ensure their long-term
availability.’
4. Expectations on researchers
• Funders: e.g. NERC Data Policy
• Legislation
• E.g.UK Location Strategy – INSPIRE, FOI/EIR
• Publishers: audit trail
EIDCs job is to make researchers lives as easy as possible in meeting these.
5. NERC Data Policy
• covers environmental data acquired, assembled or created through
research, survey and monitoring activities that are either fully or
partially funded by NERC
• full data management plan: in conjunction with the relevant NERC
data centre
• environmental data of long-term value must be offered to NERC data
centres at end of project
• All environmental data held by the NERC data centres will be made
freely available without any restrictions on use
• 'right of first use‘ normally two years from the end of data collection.
• all research publications arising from NERC funding must include a
statement on how the supporting data and any other relevant research
materials can be accessed
6. Benefits to researchers
• Credit for output datasets
• Long-term access to coordinated, well curated
data & documentation. Standards-based.
• Dealing with legal aspects Eg.UK / EU, FOI/EIR
EIDCs job is to make your life as easy as possible in attaining these.
7. Support
• Programme
– Data Policy – NERC & BBSRC
– Keywords / vocabs
• Project
– Translate policy into practice
– Support & guidance in planning data tasks
– Cross-project comms / data search
– Not providing data manager for project /dictating how
data stored & used in project
8. Project & EIDC communications
Project EIDC
Dataset/s
Metadata
storage
discovery
view
access
citation
Service
Agreement
Details of what EIDC
will do in future
EIDC and the project Data Manager will develop key documents....
Data Management
Plan
Plan of activities needed so that
project aims met & data can be
appropriately re-used
Other
data
centre
e.g. UKDA
Long-term /
audit trail
9. Support Schedule Outline
Project
end
Project
start
Develop & update Data Management Plan
ID & plan main
activities incl
datasets of long-
term value
Access
Data deposit with
EIDC (or agreed other)
DOI issued
Document / format datasets of
long-term value
ODMPcompleted
Agree details
for future data
deposit
Embargo in place
10. Transfer of governance
Project EIDC
Dataset/s
Metadata
storage
discovery
view
access
citation
Handing over long-term responsibility from researcher to EIDC……..
EIR requests
Data integritySecure storage
Data citation
Web discoverable
Access to supporting docsData web accessible
mtg funders / legal
req’s
Formal
process
11. Re-usable data: what’s needed
• What complete & correct data files in non-proprietary format
are being provided?
• Who should be permanently credited as authors?
• Who actually owns the IPR on the data?
• What licence is correct for the data?
• Who will describe the dataset so it can easily be discovered
(and meet international standards)?
• What document(s) will describe the dataset in detail so it
can be re-used?
• When should the data be made public?
12. Preparation, handover & services setup
Project
Dataset/s
EIDC
storage
discovery
view
access
citation
EIDC SERVICE AGREEMENT
Dataset Name:
Format:
Size:
Filenames:
How delivered:
Authors for DOI:
Type of licence:
Date accessible:
Support docs name:
Web view service:Metadata
Sets out what the
project will deposit
Defines what EIDC
will do in future
Sets out how, when and
what will be handed over
13. Data Services from EIDC
EIDC
storage
discovery
view
Access/
embargo
citation
Secure data store
/checksum (integrity)
Supporting
docs
14. Benefits of EIDC Services
As a NERC Data Centre, EIDC guarantees:
• Secure long-term storage and retrieval
• Future usability of data (always current format)
• Web-based discovery, view and access of data based on international
standards and meeting legal and funders requirements (NERC,
GEMINI2, INSPIRE, UK Location)
• Licence and embargo management
• Persistence of web-accessible, linked contextual information
• Citation reference (if required)
• Dealing with data requests including those falling under Environmental
Information Regulations
15. Linking citation to data record
“.....the data
have been allocated a digital object identifier (http://doi.org/10.
5285/1a91c7d1-ec44-4858-9af2-98d80f169bbd).”
Guaranteed
persistence
Important info:
abstract, authors,
embargo, T&Cs etc Links to detailed
description,
data access
etc
16. Discovery & Access
1) The DOI is guaranteed to dereference to
information about the dataset incl how to access.
2) EIDC uses international standards and protocols to make
metadata records widely accessible.
Searching the following will lead to the same reference record.....
• Google (and similar)
• NERC Data Discovery Service
• CEH Environmental Information Platform (new)
• EIDC Holdings
• Data.gov.uk
• Any EU INSPIRE portal
19. When to Deposit Data
Data best deposited before project ends:
• Important information about the dataset will
be needed
• Licenses can take ages to sort
• Data needs to be in non-proprietary format
= effort. Project staff move on to other things
21. Data Deposit
Why deposit before project ends?
• Dataset documentation (for re-use):
• Abstract describing dataset
• Details (where applicable):
• Experimental Design
• Generation/Transformation
• Fieldwork and/or Lab Instrumentation
• Calibration Steps and Values
• Units of Values
• Analytical Methods
• Quality Control
• Data structure
22. Next Steps
EIDC will assign named contact & work
with named project Data Manager
In first 3 months.....
• Data management approaches to be followed
during the lifetime of the grant
• List of existing datasets to be used
• List of datasets being generated
Guidance from EIDC
23. Next Steps
Data Management
Beyond 3 months...
• Schedules for data activities to be followed
during the lifetime of the grant
• Plans for documentation of key data activities
• Plans for deposit of datasets (of long-term value)
being generated to EIDC – rough dates fine
Guidance from EIDC
27. Example Timeline - deposit
Project
end
Project
start
Data deposit
with EIDC
(*DOI, licensing, supporting docs etc)
Define &
agree
details for
future
deposit*
data
Identify
dataset
Develop
Supporting
docs
Format data
Data Services: storage,
DOI, embargo/access
data
data
28. Q: What format should data be in?
A: generally for long-term management data are
best in non-proprietary formats e.g. csv rather
than MS Excel. However some are OK e.g.
ESRI ArcGIS.
EIDC and the project Data Manager will agree
what format each dataset needs to be in when
handed over – this does not dictate what the
project uses.
29. Q: how will i find data handed to EIDC?
A: 1) The DOI is guaranteed to dereference to
information about the dataset incl how to access.
2) EIDC uses international standards and protocols to make
metadata records widely accessible.
Searching the following will lead to the same reference record.....
• Google (and similar)
• NERC Data Discovery Service
• CEH Information Gateway
• EIDC Hub Holdings
• Data.gov.uk
• Any EU INSPIRE portal
30. Q: do I have to hand over all data?
A: No, only data of long-term value – that which
would be useful to start with in future (could be
raw data or could be processed if raw data too
large to keep).
All data that underpins a publication would be
deemed of long-term value.
31. Q: I don’t want my data accessible for a couple of
years, can I give it to EIDC then?
A: EIDC can impose an embargo period only for
data they hold. Better to hand to EIDC now &
ensure all correct info is in place (no additional
effort years down the line), get a DOI to include
in any publication and let EIDC worry about
embargo management.
Hinweis der Redaktion
EIDC is hosted at the NERC Centre for Ecology & Hydrology.
This presentation is about the role that EIDC will play as a NERC Data Centre in overseeing data activities in a research programme and grants where NERC are funding either wholly or partly.
NERC funds data centres which service different science communities. However there are no hard and fast rules and the most appropriate repository for datasets will be assessed. Only datasets of long-term value need to be deposited.
NERC, in commissioning research from the public purse, clearly have a responsibility to ensure that maximum benefit comes from this. Data are a valuable resource and NERC has to ensure that maximum benefit to science can be derived – not restricted to a single project but made useful for future science.
NERC provides a mechanism for this through NERC Data Centres.
There are expectations on researchers, be it from funders or from legislation. EIDCs job is to make researchers life as easy as possible in meeting these.
EIDC will provide guidance, tools, services and support in order that NERC-funded researchers can meet expectations for data.
NERC Data Policy may not be the only policy affecting the research if co-funded. It is beneficial to clarify all aspects of datasets generated early in the project so that the long-term access to them is correctly managed and the correct people are credited.
There are expectations on researchers, be it from funders or from legislation. EIDCs job is to make researchers life as easy as possible in meeting these.
EIDC will provide guidance, tools, services and support in order that NERC-funded researchers can meet expectations for data.
EIDC work with the Programme coordinators to clarify or define data policy affecting the programme. This is often needed where joint-funded and multiple data policies exist. EIDC also help coordinate data approaches across the programme that can help e.g. use of coordinated keywords or vocabularies.
EIDC provide support to each project to help translate policy and funders expectation in to practice. This takes the form of a Data Management Plan which makes plans for activities that ensure long-term security and re-use of any datasets of long-term value. EIDC provide data centre repository services in the long-term but can also recommend alternatives for some specific datasets.
High-level approach ingestion of data into the custody of the data centre:
whereby projects and EIDC engage in discussion about what data will be deposited, how this will happen and what EIDC will provide in the future.
The Data Management Plan is the key mechanism for planning data activities that are required for the scientific outputs and to meet the needs of long-term management of data. EIDC will identify the most efficient process for deposit with the data centre.
An agreement is made for each dataset prior to actual transfer of the data which sets out exactly what will happen. This Service Agreement is appended to the DMP.
Data are then transferred in accordance, project may at this point stop, and EIDC will take responsibility for the dataset in the future and deliver secure storage, access etc.
Some key information will have already been provided as part of the Case for Support. EIDC will use the ODMP info to start the 1st draft of the DMP.
The DMP is a live document so isn’t really complete until the end of the project.
After 3 months it is useful to identify the main datasets of long-term value that will be generated and the rough timetable for this.
Beyond that point it depends on the project timetable but activities to document the datasets should be planned in well in advance of depositing the data.
Data should be deposited at the earliest opportunity do that a DOI can be issued and the correct licence conditions can be put in place. The data can be embargoed to an agreed date.
Transferring data into the custody of the data centre so they can make it re-usable in the long-term is about transferring the responsibility to look after the data in the future. Whoever holds the data is responsible for certain things (see pink boxes). During the project it is the project staff. It can be transferred to EIDC but key information needs to be provided about the data by the originators. A process covers the handover to make sure everything is in place correctly.
High-level approach ingestion of data into the custody of the data centre: is about transferring the responsibility to look after the data in the future. Whoever holds the data is responsible for certain things (see pink boxes). During the project it is the project staff. It can be transferred to EIDC but key information needs to be provided about the data by the originators. A process covers the handover to make sure everything is in place correctly.
The SA is the agreement about what needs to be done:
The depositor will know what format the data re to be in and what supporting docs are needed. They may need to add activities to their project so these happen.
EIDC know what will come to them, when and what needs to happen in future.
EIDC will set up various services once the data are received and checked against the Service Agreement.
Secure storage of datasets (guaranteed not to change)
DOI citation reference for the dataset
Metadata discoverable across the web.
Supporting documentation bound to metadata record and web-accessible
Web-accessible download (on agreed date)
EIDC is also at the forefront of implementing a citation mechanism for data. This gives the data creators a reward mechanism similar to sceintific papers and recognises datasets as valued outputs. A digital object identifer can be cited that EIDC guarantee will take the user to a web accessible ‘landing page’ where more information exists on how to access the data.
You may wish to obtain a DOI for the dataset which can be referenced in a scientific publication. A DOI can be issued only when a dataset is deposited at a NERC data centre.
Many journals now require datasets that underpin scientific papers to be accessible in a long-term repository.
Time is needed to ensure that any data being made accessible has the correct information along with it e.g. IPR, ownership and acknowledgements along with documentation that describes the data.