This document summarizes the findings of a research data census conducted at Montana State University. The census was a partnership between the university's Information Technology Center, Library, and Vice President for Research & Economic Development. It found that the amount of research data is growing significantly due to new instruments and technologies. Researchers are interested in data infrastructure and services to help store, share, and annotate their data. The census informed proposals to the National Science Foundation for new data network investments and a collaboration between the library and IT to provide data services to researchers.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Research Data Census
1. Towards A Data Driven Understanding of Research Data
September 3, 2015
Montana State University, Research Council
Jerry Sheehan
Montana State University
Chief Information Officer
jsheehan@montana.edu
2. The “Consumerization” of Research Data
Trend 1 Costs and Capacity
• A “Consumer Effect Has” Pushed
Prices Down While Increasing
Performance.
• Users Can Easily Buy More Storage
Than They Need.
• There are No Enterprise Strategies
for Research Data Discovery.
• No explicit way to inventory
• Instruments have “bursty” behavior
when the move data on the
network
Montana State University-Information Technology Center
3. “Commodity” Data Laboratory Equipment @ Montana State
Device Data Generation Per Run
Illumina Genomic Sequence .5Tb to 1Tb per run
Confocal Microscope 50-100Gb per run
Transmission Electron Microcope 10-20Gb per run
Montana State University-Information Technology Center
4. Research Data Census was a Three Way Institutional Partnership
Information Technology Center
University Library Vice President for Research &
Economic Development
Montana State University-Information Technology Center
5. Response Rates and Demographics
Montana State University-Information Technology Center
6. What Types of Research Data Do You Have?
Montana State University-Information Technology Center
7. How Do You Store Your Data?
Montana State University-Information Technology Center
8. How Large is Your Research Data?
Montana State University-Information Technology Center
9. Who Do You Share Your Data With and When?
Montana State University-Information Technology Center
10. Statistically Significant Findings
Montana State University-Information Technology Center
•Researchers who share their data, regardless of who they share it with (colleagues, students, or non-MSU
researchers) also tend to download data from other sources or repositories (78 percent of people sharing their
data also download data, versus 37 percent of people not sharing their data; p-value: 1.67x10-7
).
•Researchers with large research data tend to download data from other sources or repositories (90 percent of
people with data sets above one terabyte also download data, versus 42 percent for people with data sets
below 10 Gb; p-value: 1.58x10-5
).
•Researchers who back up their data also tend to annotate it (55 percent of people who back up their data
also annotate it, versus 22 percent of people who don't back up their data; p-value: 5x10-3
).
•Researchers with large research data tend to annotate it (62 percent of people with data sets above one
terabyte also annotate their data, versus 39 percent of people with data sets below 10 Gb; p-value: 0.024).
•Researchers interested in learning more about data infrastructure and services who do not back up their data
cite technical barriers as their main reason for not doing so (p-value: 0.014).
11. Qualitative Interview Findings
Montana State University-Information Technology Center
•Researchers don’t usually describe their data by size, although many know the exact size of their data. Instead,
their standard practice is to describe how they transfer the file (via email, placed on hard drives, put in cloud
services, etc.
•Researchers' sense of when and how data is disseminated and shared varied widely.
•There is no common definition of “big data”. Definitions change between disciplines, researchers build “bigger
data” by aggregating many small research results.
•Without exception, interviewees described their research practices as involving collaboration with others, both
inside and outside the institution.
•All researchers responded positively when asked if they would engage MSU Library services that focus on data
set annotation and metadata markup, assistance with deposit in relevant data repositories, and educational
programs and training on campus IT resources.
12. Impacts of the the RDC
Montana State University-Information Technology Center
• Creation of a multi-stakeholder proposal ($500K) to the National Science Foundation for investment
in a science network for the Bozeman campus. PI: Jerry Sheehan, Co-PIs: Kenning Arlitsch, Ben
Poulter, Phil Stewart, and Mark Young.
• Input from the Research Data Census and the NSF Proposal is Driving FY16 Capital Investments for
Campus.
• New Collaboration between ITC and the Library to Bundle A Set of Data Services and Infrastructure
for the Montana State University Research Community.
• Formal Publication of Survey Results in On-Line Educause Review (Sept/Oct 2015).
• Modification of the Survey Instrument, Adoption of Instrument by Other MSU Campuses, and
Sharing of Instrument with Higher Education Community.