Assistant Dean, Director Career & Professional Development Center | Helping Students Become Successful Scientists um OHSU | Oregon Health & Science University
Assistant Dean, Director Career & Professional Development Center | Helping Students Become Successful Scientists um OHSU | Oregon Health & Science University
1. DATA MANAGEMENT 101
Nicole Vasilevsky, Jackie Wirz and Melissa Haendel
PMCB New Student Orientation
20 September 2013
3. 1 | Data definitions
2 | Dealing with data
3 | How the OHSU
Library can help
4. Nicole
Vasilevsky, Ph
D
Project
Manager, Ontolo
gy Development
Group
Jackie Wirz,
PhD
Assistant
Professor,
Bioinformation
Specialist
Melissa
Haendel, PhD
Assistant
Professor,
Lead,
Ontology
Development
Group
19. Do you get frustrated with any of the following?
a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers/clinicians
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata (ie describing the data to be more useful at a later
time or by others)
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given to data
sets/generated
31. Versioning
• Save a copy of every version of a file
• Follow a file naming convention
Data101_PMCB_Retreat_09-20-13_v1
Data101_PMCB_Retreat_09-20-13_v2
Data101_PMCB_Retreat_09-20-13_Final
36. Which of the following do you do?
a. Save copies of data on a disk, USB drive, or computer
hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web-based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files
37. 1 on your local workstation
1 local/removable, such as external hard
drive
1 on central server
1 remote, such as on a cloud server*
*Depending on the type of data, as cloud servers are not
always secure
Where can you backup your data?
43. data standards
Data standards are the rules by which data are
described and recorded. In order to
share, exchange, and understand data, we must
standardize the format as well as the meaning.
http://www.usgs.gov/datamanagement/plan/datastandards.php
48. Why are CVs and Ontologies useful?
• Can be used to structure your metadata
• Are often used to structure information in
databases
Cell Ontology Linnean Taxonomy
Order
Genus
Species
Phylum
Class
Family
Kingdom
51. Data Management tools and
repositories
• Purpose: Software where you can
organize, store and/or share data
• Often contain metadata to assist with data
entry and create structured data
53. Repositories use Unique IDs
• Document Object Identifier (DOI)
• Example: DOIs for publications
– doi: 10.1371/journal.pbio.1001339
• Unique resource identifier (URI)
• A URI will resolve to a single location on the
web
• URIs for people
54. • Example:
• John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR
• John L Campbell, Research Ecologist, Center for Research on
Ecosystem Change, Durham, NC
59. FACS analysis of T cells from LNs and tumors
T cells were liberated from LNs by disruption between two
frosted glass slides. Cells from LNs and tumors were stained
with various combination of the following Abs: FITC-
CD4, allophycocyanin-CD25, PE Cy7-CD8, APC-CD62L, PE-
CD25, PE Cy7-CD25, and biotinylated-KJ-126 and in some
experiments made permeable with
fixation/permeablization buffers and stained with PE-FoxP3
(eBioscience). Harvested samples, isotype controls, and
single stain controls were run on the FACSCalibur (BD
Biosciences).
Ruby and Weinberg (2009) J Immunol. 182(3):1481-9.
69. Why share data?
• Data sharing
mandates
• Further science and
and medicine
• Build collaborations
• Enable new
discoveries with
your data
• Can be required at
time of publication
72. Beyond the PDF:
What can be published (and cited)?
Raw Science Nanopublications Self-publishing
73. Beyond the PDF:
What can be published (and cited)?
Raw Science Nanopublications Self-publishing
Datasets
Code
Experimental
design
Argument or
passage
Blogging
Microblogging
Comments on
existing work
Annotations on
existing work
Single figure
publications
74. How?
Data Journals and Repositories
• FigShare
• Dryad
• DataVerse (social science)
• Institutional repositories
77. 1 | Large Lecture: Data Management 101
2 | 10 –15 Small Groups: data playground
• 1 researcher paired with 2 or 3 library staff
• Tailored analysis of data reporting and instruction
Save the date:
10/09/13
4-6pm
1k challenge award recipients
If you work on the command line, you can see all the file paths
JW
Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was madeNEW SLIDES:Examples of versions of dataData101_NV_v1Data101_NV_v2Simple software solutionsSome software keeps versions for youShow where to go get itVersion Control SoftwareVersion control softwareSVN, GITShow example of google codeCan write commit messages you version you commit
Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
NICOLE
Central servers will have multiple redundancy, back ups of back upsHigh quality secure USBs with passwords and encyrption, or burn to disk
JW
!
Move this
Information science is a parent
Ontologies classify terms and the relationships between them.
JW
Software that can rename your files, if you already have them named
Goal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher