Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data and Donuts: Data organization

528 Aufrufe

Veröffentlicht am

How to organize your data using folders, file names and spreadsheets.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Data and Donuts: Data organization

  1. 1. Data Organization C. Tobin Magle, PhD Feb. 28, 2017 10:00-11:30 a.m. Morgan Library Computer Classroom 175 *inspired by content from Data Carpentry
  2. 2. Hypothesis Data Experimental design ResultsArticle Data Management Plans The research cycle
  3. 3. Main topics • Hierarchical organization • Folders in folders • Open Science Framework • File naming • Human readability • Machine readability • “Tidy” data in spreadsheets
  4. 4. Folder systems • Organize your data hierarchically • Identify ways to divide your data into categories (Attributes) • Top level organization is the most important attribute
  5. 5. Hierarchical Organization Putting your files into a folder system my_project Data Notes protocols manuscripts Paper1 Figures Text References Paper2
  6. 6. Questions to ask • What kinds of files are there? (See data inventory) • How could you group them? • Project? • Time? • Location? • File type? • What are the most important attributes?
  7. 7. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes?
  8. 8. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time
  9. 9. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status
  10. 10. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status • Data Type
  11. 11. Exercise: Organize files • Download Lou’s files (look in the README file for insight) • http://tinyurl.com/hvna4mg • Create a hierarchical folder structure for Lou • Drag his files into the correct folders • Fix Lou’s README • Bonus: think about how you’d organize your data.
  12. 12. Tool: Open Science Framework • Components • Add-ons • Contributors • Wiki http://help.osf.io/m/collaborating/l/524109-using-the-wiki http://www.slideshare.net/DuraSpace/121014 -slides-roadmap-to-the-future-of-share
  13. 13. Organization tips • Be consistent • One directory per project • Separate components for • Raw data • Processed data • Code • Output • Make raw data read-only • Make README files http://help.osf.io/m/60347/l/611391-organizing-files
  14. 14. Components • “Subprojects” • Separate privacy settings, contributors, wiki, add-ons, and files. • Examples: • Different projects: https://osf.io/82fba/ • Clinical: https://osf.io/gq4mz/ • Manuscript: https://osf.io/if7ug/ • Collaboration: https://osf.io/ezcuj/
  15. 15. Demo: Getting started with OSF 1. Create a project 2. Add components 3. Add files
  16. 16. Don’t panic! • Just try something • There’s no right answer • Be consistent • Write a README.txt file http://4vector.com/i/free-vector-don-t-panic-clip- art_103946_Dont_Panic_clip_art_hight.png
  17. 17. File naming conventions Make file name both human and machine readable.
  18. 18. Use descriptive names • Bad name: file.txt • Ok name: 05-07-2016-mouse-data.txt • Good name: 2016-05-07-mouse-weight.tsv • Human readability: name contains information about content
  19. 19. Go from general to specific • Bad name: rep1-5-7-2016-gene-expression.csv • Good name: 2016-05-07-gene-expression-rep1.csv • Machine readability: can be sorted meaningfully
  20. 20. Avoid abbreviations • Bad name: “sprlbgp1” • Good name: “spencer_lab_group_1” • Human readability: no one understands your acronyms
  21. 21. Avoid spaces • Alternatives • Dashes-are-cool.txt • I_also_like_underscores.txt • CamelCaseIsNeatToo.txt • Machine readability: spaces are delimiters in programming • Human readability: delineates words
  22. 22. Avoid special characters • Bad characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " • Machine readability: can have special meanings in scripting languages • Example: ~ tells unix to go to your home directory • Alternatives: underscore (_) dash( - ) dot (.)
  23. 23. Be consistent • Establishing standards makes data more findable • Extending standards to everyone who works on a project is even better
  24. 24. Renaming files • Ways to Automate file renaming • Bulk Rename Utility (Windows, free) • Renamer 5 (Mac) • PSRenamer (Linux, Mac, or Windows, free)
  25. 25. Exercise: Rename Lou’s files • Use descriptive names • General to specific • Avoid abbreviations, spaces and special characters • Be consistent
  26. 26. Tidy data How to organize your data efficiently in spreadsheets
  27. 27. Spreadsheets as lab notebook • Color coding • Formatting • Notes • Calculations • Graphs/Tables
  28. 28. Downsides • Computers don’t understand notes/formatting/color coding • Calculations/Graphs/Tables in spreadsheets are inefficient • “Tidy data” + automation = saved time
  29. 29. Using spreadsheets wisely • Don’t put multiple tables in one sheet • Don’t use multiple sheets • Use descriptive field names • Don’t mix notes and data
  30. 30. Tidy Data 1. Columns as variables • Don’t combine multiple pieces of info in one column 2. Rows as observations • One measured value
  31. 31. Exercise: Tidy Lou’s data • Open MouseInventory.xls • Is he using spreadsheets wisely? • Is each column a variable? • Is each row an observation? • Open the January files for both weight and cytokines • What variables are being measured? –ie, what columns should we have? • Can we combine some of these tables?
  32. 32. Exercise: Data carpentry ecology • Lesson: http://www.datacarpentry.org/spreadsheet-ecology- lesson/ • File: https://ndownloader.figshare.com/files/2252083 • Goal: combine data from first 2 tabs into one table • Make a new tab, don’t edit the raw data!
  33. 33. Example: Supplemental_data_1_xls • https://figshare.com/articles/Supplemental_data_1_xls/4055544 • Description: “Table of the results given by HPLC analysis of the samples. Key: Rt, retention time; +, presence of peak; -, absence of peak.”
  34. 34. Example: cck8_xls • https://figshare.com/articles/cck8_xls/3505772 • Description: “This data are from CCK-8 assay and ELISA.”
  35. 35. Need help? • Email: tobin.magle@colostate.edu • Data Management Services website: http://lib.colostate.edu/services/data-management • Data Carpentry: http://www.datacarpentry.org/ • Software Carpentry: http://software-carpentry.org/

×