Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
CSU-ACADIS_dataManagement101-20120217
1. Data Literacy For the Arctic and Below:
Help your data help you
(and satisfy NSF requirements in the process!)
Lynn Yarmey and Liz Schlagel – National Snow and Ice Data Center
2. Where we are going today:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
EVERYTHING we talk about will be able to go into your Data Management Plan (DMP)
3. Why Care – Big Picture
Photo from: http://www.mediafuturist.com/2010/11/gues-ddb-blog-future-of-marketing-media-data-is-new-oil.html
4. Why Care – Big Picture
These days, Dr. Hodes said, “the old model in
which researchers jealously guarded their data is
no longer applicable.”
http://www.nytimes.com/2011/04/04/health/04alzheimer.html
Image courtesy of:http://www.sciencemag.org/content/331/6018.cover-expansion
5. Why Care – Your work
http://www.phdcomics.com/comics/archive.php?comicid=382
You are a Data Manager
6. Data Management is Important! Because……
… Reproducibility is the foundation of science
… Journals are starting to require data deposit
… You want to get credit for producing data (data citations)
… Others can use and build on your work (data reuse)
7. Data Management is Important! Because……
… Reproducibility is the foundation of science
… Journals are starting to require data deposit
… You want to get credit for producing it (data citations)
… Others can use and build on your work (data reuse)
… Your new instruments collect a LOT more data than older ones
… Recreating a figure from a 2006 paper shouldn’t be painful
… Funders tell us so (See NSF, NIH, NOAA, etc)
… Students graduate!
8. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
9. Data Types
What types of data do
you collect or generate?
10. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
11. Data Stages
Raw
Organized
Standardized
Transformed
Processed
Quality Controlled
Analyzed
Summarized
Presented/Published Photo courtesy of Zillow Database gurus:
http://www.zillow.com/blog/2007-11-02/we-know-how-to-celebrate-halloween/
12. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
13. Data Storage
http://chronicle.texterity.com/chronicle/20110318a?pg=16#pg16
14. Data Storage
Tips:
- 1 working copy on your computer
- 1 copy on infrastructure near you
- 1 copy on infrastructure far away
- ‘Final’ copy with a data center/archive
- Get help! (CSS, CSU Libraries, etc.)
(Note: These won’t work well in all cases, ex. For Very Large
Data, but are a good start for coming up with a storage
plan)
15. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
16. Versioning
Tips:
- communicate with your lab/research group and agree on
a versioning system (file names, what makes a new version)
- WRITE IT DOWN and post/save to a shared space.
17. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
19. File naming conventions – Better example
Make names unique!
Include (as appropriate):
- Project name or acronym
- Study title
- Location
- Data type
- Researcher initials
- Date
- Data stage
- Version number
- File type
DO – Use_underscores-or-dashes DO NOT – Use spaces &/or special characters!
For more info - https://www.dataone.org/content/assign-descriptive-file-names
20. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
21. Metadata
“Data about Data”
But what does that MEAN?!
22. Metadata – The bottom line
What would someone* unfamiliar with your
data (and possibly your research) need in order
to find, evaluate, understand, and reuse them?
*How about someone:
- who works in your lab?
- from a different lab in your field?
- who is in a related interdisciplinary field?
- who researches a completely different area?
- who works for a newspaper? Congress?
24. Metadata – Example
Temperature
31.5
For what purpose?
Instrument precision/accuracy?
When was the sensor
last cleaned/calibrated?
AKA – T, Temp, degC, C, oF… lots of different names!
25. Metadata
Just like file names, metadata
does it’s job best when it is:
- consistent
- documented
- for people
- such that computers are happy
Enter Metadata Standards
26. Metadata Standards – Examples
Local (people -> people)
Naming Conventions
Standard Operating Procedures
Beyond (people -> computers -> people)
ISO 19115 (http://www.fgdc.gov/metadata/geospatial-metadata-standards#nap)
GCMD DIF (http://gcmd.nasa.gov/User/difguide/difman.html)
EML (http://knb.ecoinformatics.org/software/eml/)
27. Metadata Standards – Example
Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institution
of Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).
http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
28. Metadata Standards – Example (XML)
<attributeName>Sea Surface Temperature</attributeName>
<attributeDefinition>temperature measurement</attributeDefinition>
<measurementScale>
<unit>celsius</unit>
<numericDomain><numberType>real</numberType></numericDomain>
</measurementScale>
<missingValueCode><code>-99</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<missingValueCode><code>-999</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<missingValueCode><code>-99999</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<methods><description> subject { seaSurface } </description>
<description> calculationType { calculated }; calculationTypeDetail { average };
calculationInterval { day }; </description></methods>
Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institution
of Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).
http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
29. Metadata Standards – Example (XML)
<attributeName>Sea Surface Temperature</attributeName>
<attributeDefinition>temperature measurement</attributeDefinition>
<measurementScale>
<unit>celsius</unit>
<numericDomain><numberType>real</numberType></numericDomain>
</measurementScale>
<missingValueCode><code>-99</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<missingValueCode><code>-999</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<missingValueCode><code>-99999</code>
<codeExplanation>missing value</codeExplanation>
</missingValueCode>
<methods><description> subject { seaSurface } </description>
<description> calculationType { calculated }; calculationTypeDetail { average };
calculationInterval { day }; </description></methods>
Scripps Institution Of Oceanography Pier Water Temperature - Station Dataset (CCELTER). [Online]. Scripps Institution
of Oceanography Shore Station Program [Producer]. Oceaninformatics Datazoo [Distributor]. (February 28, 2011).
http://oceaninformatics.ucsd.edu/datazoo/data/ccelter/datasets?action=summary&id=15
30. Metadata – Standards
They exist!
If everyone used them, you could do very cool
science!
Compliance is often a lot of work
There are lots
HOWEVER, there are baby steps to get started
31. Metadata – Yikes and/or Yay!
Tips for the short-term:
- Get help!
- support@aoncadis.org, librarians, standards groups,
data centers, domain communities, tools
- Get your own house in order
- use common date formats, codes, smart file names
- WRITE EVERYTHING DOWN! (keep good readme files)
- Put in the time early on to implement a standard
- most have minimum compliance levels with options
to get more detailed
- Stay flexible
Tips for the long-term:
- Get help!
- Watch for Best Practices and standards in your field
32. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
33. Sharing and Access
Levels:
Low - Not sharing your data (note: appropriate in a few cases)
- Emailing your data to a researcher who asks for it
- Posting your data on your project or lab website
Funder Happiness
- Posting your data AND METADATA on your website
- Submitting your metadata to an online catalog (ex.
ACADIS)
- Submitting your data and metadata to an appropriate
repository and getting a permanent ID (DOI, EZID, etc)
- Data Repositories (ex. ACADIS, GenBank, Dryad)
- CSU Digital Repository
High
34. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
35. Archiving
Terminology Fuzziness in the data world:
Archival = Preservation (close enough)
Archival ≠ Storage!
Tips for the short-term:
- Leave yourself time at the end of a project to clean up
- Choose open source formats when you can (ex. CSV > XLS)
Tips for the long-term:
- Work with NREL data experts: IBIS team, LTER, Computer
Systems Support
36. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
37. Data Management Plans (DMPs)
NSF Data Management Plan - General Requirement (as of 2011-10-10)
1. the types of data, samples, physical collections, software, curriculum materials, and
other materials to be produced in the course of the project;
2. the standards to be used for data and metadata format and content (where existing
standards are absent or deemed inadequate, this should be documented along with any
proposed solutions or remedies);
3. policies for access and sharing including provisions for appropriate protection of
privacy, confidentiality, security, intellectual property, or other rights or requirements;
4. policies and provisions for re-use, re-distribution, and the production of derivatives;
and
5. plans for archiving data, samples, and other research products, and for preservation of
access to them.
From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
38. Data Management Plans (DMPs)
NSF Data Management Plan - General Requirement (as of 2011-10-10)
1. the types of data, samples, physical collections, software, curriculum materials, and
other materials to be produced in the course of the project;
2. the standards to be used for data and metadata format and content (where existing
standards are absent or deemed inadequate, this should be documented along with any
proposed solutions or remedies);
3. policies for access and sharing including provisions for appropriate protection of
privacy, confidentiality, security, intellectual property, or other rights or requirements;
4. policies and provisions for re-use, re-distribution, and the production of derivatives;
and
5. plans for archiving data, samples, and other research products, and for preservation of
access to them.
From http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#dmp
39. Data Management Plans (DMPs)
Tips for the short-term:
- Check your Directorate/Agency policy before every
proposal
- Keep it real(istic), you will need to include your actions in
your project report and next proposal.
Tips for the long-term:
- Keep working on implementing metadata standards
- Watch out for emerging trends, repositories, tools
- Partner with data people (data centers, libraries, etc)
40. Where we are:
• Why care about data management?
• “What the heck is metadata?” and other jargon
Data Types
Data Stages
Storage
Versioning
Naming Conventions
Metadata and Standards
Data Sharing and Access
Archiving and Preservation
• Pulling this all together – Data Management Plans
• From the lab to ACADIS (and beyond)
41. CADIS - Data Support for NSF-Arctic Program
• Cooperative Arctic Data and Information System
• The Mandate:
– Develop advanced data management system for the
Arctic Observing Network (AON)
– Preserve metadata and data
– Serve NSF-funded AON investigators
42. Transition to Advanced CADIS
NSF Arctic
• A new mandate Field Sites
– For all NSF programs that collect Arctic data
– Serve NSF-funded Arctic investigators by archiving
data from many field programs
• Other changes:
– An advisory group
– Value-added products
– Two full time Data Curators
– New data types – biological, social, terrestrial,
ecological
43. Transition to Advanced CADIS (ACADIS)
• A new mandate
– For all NSF/ARC programs that collect Arctic data
– Serve NSF-funded Arctic investigators by archiving
data from field programs and individual investigators
• Other changes:
– An advisory group
– Value-added products
– Two full time Data Curators
– New data types – biological, social, terrestrial,
ecological
– Expanded metadata tool for diverse disciplines
44. ACADIS – Metadata and Standards
Metadata Profile
Supports established
standards
Based on IPY-DIS profile.
Compatible with GCMD,
FGDC, ISO…
Profile driven interface
validates fields
NASA GCMD vocabulary
used where possible
45. ACADIS Data Management Plan Template
Example guidance from the ACADIS DMP template:
• Assists
investigators
in developing
the DMP now
required for
all NSF
proposals
• Linked from
aoncadis.org
46. Beyond ACADIS - Other Resources
• IBIS Local: NREL
• CSU Digital Repository Local: CSU
• Knowledge Network for Biocomplexity
• ESA Ecological Archives Remote:
• DAAC at ORNL Centralized and
Domain Specific
• Advanced Cooperative
• Arctic Data and Information Service
• Data Conservancy
• DataONE Federated and distributed
47. Beyond ACADIS – Other Resources
General Info and help -
Earth Science Information Partners (ESIP): http://wiki.esipfed.org/
UVA Libraries: http://www2.lib.virginia.edu/brown/data/
Data Management Plan and other tools –
DMP Tool: https://dmp.cdlib.org/
DataOne: https://www.dataone.org/cattools/Data%20and%20Metadata%20Management
Metadata -
Excel Plug-in tool (in development):
http://www.cdlib.org/cdlinfo/2011/09/01/facilitating-data-management-dcxl/
Lists of Standards (not complete!)
for bio, climate, ecology, oceanography - http://marinemetadata.org/conventions
Stanford-based portal for medical/bio - http://bioportal.bioontology.org/resources
48. Questions?
Contact me: Lynn.yarmey@nsidc.org
For questions, help, or to submit Arctic data:
support@aoncadis.org
Visit ACADIS: www.aoncadis.org
Special thanks for pilfered slides and content approaches: Florence Fetterer, Carly Strasser, and Dorothea Salo
Editor's Notes
How about physical samples?Read your DMP guidelines carefully!
NSF OPP meeting – open to including programmer time in budget requests to help with this kind of work
CADIS was funded initially in 2007, and this is the 3rd AMS IIPS presentation I’ve given on it. For the Arctic Observing Network Mostly field observationsServe NSF-funded AON investigators by archiving AON dataNot so much the wider communityAssumptions starting outThe AON data portal would support full integration of a diverse collection - scientists could archive their data AND find all data relevant to a location or processInformatics and cyberinfrastructure would play a large role Implications were that …all data have browse imagery and complete documentation; …time series or fields can be plotted online;…and all metadata are in a relational database
For all NSF programs that collect Arctic dataOffice of Polar Programs (OPP) Division of Arctic Sciences (ARC)AON, Arctic System Sciences (ARCSS), Arctic Natural Sciences (ANS) and the Arctic Social Sciences Program (ASSP) within OPP/ARCServe NSF-funded Arctic investigators by archiving data from many field programsStill little or no remote sensing dataEmphasis is still on serving those contributing data first, but will begin to shift to making the ACADIS portal more useful for those who need to use the data held or cataloged by ACADIS