First Steps in Research Data Management Under Constraints of a National Security Laboratory
1. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
First Steps in
Research Data Management
Under Constraints of a
National Security Laboratory
Martin Klein
0000-0003-0130-2097
@mart1nkle1n
Brian Cain
0000-0001-7356-5860
@briancain101
Research Library
Los Alamos National Laboratory
Acknowledgements: Herbert Van de Sompel, Frances Knudson, Joshua Finnell, Wei Gu, Jason Keith
2. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
2
2013 OSTP MEMO
• All federal agencies over $100M annually in R&D required to make
data stored and publicly accessible to search, retrieve, and analyze.
• Scope: data necessary to validate research findings, including data
sets used to support scholarly publications.
3. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
NATIONAL LABS
Ames
Argonne
Brookhaven
Fermi
Idaho
Los Alamos
Lawrence Berkeley
Lawrence Livermore
NETL
NREL
Oak Ridge
Pacific Northwest
Princeton
SLAC
Sandia
Savannah River
Thomas Jefferson
SCIENTIFIC & TECHNICAL INFORMATION
(STI/R&D Results)
Text
• Journal articles/accepted manuscripts
• Technical reports
• Conference papers
• Patents
Data
• Large and small datasets
• Images
• Visualizations
Software/Code
≥ 50,000 STI “products” annually
$12 Billion
R&D Funding
United States Department of Energy
3
4. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
United States Department of Energy
4
5. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Environmental Scan
• Data Working Group identified 12 institutions and contacted data
services managers at each institution
• Questions regarded:
• Budget
• Staffing
• Data Security/Platform
• Discovery Tools
• Interdepartmental cooperation
• Permanent Data Storage
Budget $200,000 - $2,000,000
Staffing 2-15 FTE
Platform Hydra/Fedora ; Dataverse ; Dspace ; Third Party
Data Discovery Tools Individual silos and catalogs ; APIs
Interdepartmental Cooperation Library ; IT + Library ; Partnership between 3+
Permanent Storage None ; 10-year retention policy ; Forever?
5
6. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Data interviews
• Data collection consisted of in-depth interviews with researchers from
across the Lab and were completed during the summer of 2016
• Identify all LANL data sets deposited in external data repositories
by LANL researchers (Figshare, Zonodo, Dryad, Dataverse)
• Contact all researchers who have submitted a data set for review
• Contact all researchers who have created a Data Management
Plan using the Library’s DMPTool
• Identify data-intensive researchers from the Library Roadshows
• Recommendations from the Data Executive Team
6
7. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• “A centralized storage solution would be great. Wouldn’t have to keep all
my files on an office computer.”
• “It would be great if the Lab had a centralized repository where I could
collaborate, store, and share my data both internally and externally.
Something like Dropbox would even be helpful.”
• “Much of this data is hosted on old websites that we maintain (scripted in
Perl; accessible through FTP). In other words, old crusty crap.”
• “When my post-doc is gone, I don’t know where their stuff is.”
• “I know this experiment has been run before, but I can’t find the data, so
I’m running it again.”
• “There are many files named “data” on my group’s share drive.”
“Highlights” from Data Management Surveys, Working Groups
7
8. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Need for infrastructure to support internal and external
research collaboration
• Providing data storage and data sharing
• Integration with frequently utilized research tools/flows
• Support documentation and preservation
• Compliant with LANL policies re data management,
review and release, security
Results from Data Management Surveys, Working Groups
8
9. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Reality: LANL policies prevent the use of off-the-shelf, cloud-
based “open science” platforms that many other research
institutions use.
• Ergo:
• Investigate the feasibility of a local solution
• Goal: Provide internal collaboration platform as a path
toward structured data management
• Nucleus Project
• Pilot effort by the Research Library
• Since January 2017; 1 FTE hired; 4 PT contributors
• Based on a local install of the Open Science Framework
software
Taking Action at LANL
9
10. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Optimize a researcher’s use of time by:
• Making it easier to accomplish collaborative goals
• Reducing number of steps to achieve goals
• Reducing potential for errors when accomplishing goals
• Improving project management, communication
• By deploying a platform that:
• Streamlines workflows
• Provides glue between systems/tools
• Provides an overview of assets involved in research
collaboration
Some Anticipated Benefits for Researchers
10
11. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1.
11
12. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
12
13. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
13
14. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
3.
14
15. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
3.
15
16. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
3.
4.
16
17. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - Before
1. 2.
3.
5.
4.
17
18. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Making it easier to accomplish collaborative goals
• Overview of assets involved in research (collaboration)
• Tracking of idea funding data publication patent
• “Single” point of (data) preservation
• Synergies with internal and external funding requirements
• Provide a seamless method for compliance with LANL review
and release and security policies
Some Anticipated Benefits for LANL
18
19. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Open Source Software, developed and maintained by the
Center for Open Science
• Default use is in the cloud-based portal osf.io that
supports multi-organizational open science and
collaborative scholarship
• Provides glue for many aspects of the research workflow
• Offers integrations with many existing productivity tools
Open Science Framework
19
20. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Data
Storage
Data
Storage
Data
Storage
Data
Storage
STORAGE
TOOLS R MatLab Echo Granta TeXmaker................................
CODES RASSTI RASSTI
store
CODES
store
REVIEW
RELEASE
OUTSIDE
LANL
DOE
GitLab
DOE
OSTI
experiment simulation documentation
arXivGitHub ..........................
firewall
21. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Data
Storage
Data
Storage
Data
Storage
Data
Storage
STORAGE
TOOLS R MatLab Echo Granta TeXmaker................................
CODES RASSTI
LANL
authenticate
LANL
GitLab
RASSTI
store
CODES
store
REVIEW
RELEASE
OUTSIDE
LANL
DOE
GitLab
DOE
OSTI
experiment simulation documentation
arXivGitHub ..........................
firewall
22. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
S3
storage
connect
Google
Drive
connect
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Project
Metadata
&
Access
Control
CODES
connect
RASSTI
connect
LANL
auth
connect
GitLab
connect
STORAGE
TOOLS R MatLab Echo Granta TeXmaker................................
CODES RASSTI
LANL
authenticate
LANL
GitLab
NUCLEUS
OSF
project
collaboration
RASSTI
store
CODES
store
REVIEW
RELEASE
OUTSIDE
LANL
DOE
GitLab
DOE
OSTI
experiment simulation documentation
arXivGitHub ..........................
firewall
23. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Google
Drive
connect
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Project
Metadata
&
Access
Control
CODES
connect
RASSTI
connect
LANL
auth
connect
GitLab
connect
STORAGE
TOOLS R MatLab Echo Granta TeXmaker................................
CODES RASSTI
LANL
authenticate
LANL
GitLab
NUCLEUS
OSF
project
collaboration
RASSTI
store
CODES
store
REVIEW
RELEASE
OUTSIDE
LANL
DOE
GitLab
DOE
OSTI
experiment simulation documentation
arXivGitHub ..........................
firewall
24. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
24
25. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
25
26. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
26
27. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Submit Dataset to RASSTI - After
1. 2.
27
28. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
28
29. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• (Technical) Challenges
• Ownership of the software environment
• Storage at institution, division, group level?
• Contamination, who is responsible for clean-up?
• Cybersecurity review
Challenges & Questions
• Questions
• Is this useful to researchers? Under what conditions?
• What other local/homegrown systems should be integrated?
• What does success look like?
• Active users?
• Structured data management?
29
30. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
• Pilot Nucleus--Summer 2018
• Invited “friends and family” at LANL
• Identifying bugs and eliciting feedback before expanding pilot
• Generally positive response, some skepticism for yet another
tool
Outreach & Feedback
• Data Management Workshop--August 2018
• ~50 attendees introduced to Nucleus and “Data Management 101”
• Positive response, but constrained by “Pilot” status
• Some concerns about competition between collaboration tools
30
31. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Nucleus — Fall/Winter 2018
• Utilizing feedback to seek institutional support, ownership (i.e.
funding and project resources)
• Possible integration with institutional Google Suite, impact on
adoption and sustainability?
• New laboratory management (since November 1, 2018)
• Developing relationships with new partners and hierarchy
Current Status
31
32. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
Lessons Learned
• Limitation of a pilot to assess utility
• Contradictions between survey results and a few pilot responses
• LANL collaborations tools are terrible, we need something new!
LANL collaboration tools are amazing, why would you want to
compete with them?!
• What might be going on here? Grass is greener? Novelty?
Endowment effect?
• Need to have a strong use case and value-added proposition prepared
for end-users, they do not always see the apparent utility
32
33. 1st Steps in RDM at LANL
@mart1nkle1n & @briancain101
CNI Fall Meeting 2018, 12/10/2018, Washington, DC
First Steps in
Research Data Management
Under Constraints of a
National Security Laboratory
Martin Klein
0000-0003-0130-2097
@mart1nkle1n
Brian Cain
0000-0001-7356-5860
@briancain101
Research Library
Los Alamos National Laboratory