SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Downloaden Sie, um offline zu lesen
Data	
  Stewardship	
  
for	
  Researchers	
  
Carly	
  Strasser,	
  PhD	
  
California	
  Digital	
  Library	
  
@carlystrasser	
  
carly.strasser@ucop.edu	
  
31	
  July	
  2013	
  
CLIR	
  Symposium	
  
From	
  Calisphere,	
  	
  Couretsy	
  of	
  	
  UC	
  Riverside,	
  California	
  Museum	
  of	
  Photography	
  
Tips,	
  Tools,	
  &	
  Guidance	
  
	
  From	
  Calisphere,	
  	
  Courtesy	
  of	
  Thousand	
  Oaks	
  Library	
  	
  	
  
Roadmap	
  
4.  Toolbox	
  
	
  
1.  Background	
  
	
  
2.  Why	
  you	
  should	
  care	
  
3.  Best	
  practices	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  
Two	
  main	
  goals:	
  
1.  Build	
  a	
  network	
  for	
  data	
  repositories	
  
2.  Build	
  community	
  around	
  data	
  
Focus	
  on	
  	
  
Earth	
  |	
  environmental	
  |	
  ecological	
  |	
  oceanographic	
  	
  
data	
  
	
  
Why	
  don’t	
  people	
  
share	
  data?	
  
Is	
  data	
  management	
  
being	
  taught?	
  
Do	
  attitudes	
  about	
  
sharing	
  differ	
  
among	
  disciplines?	
  
How	
  can	
  we	
  promote	
  storing	
  
data	
  in	
  repositories?	
  
What	
  barriers	
  to	
  sharing	
  
can	
  we	
  eliminate?	
  
What	
  role	
  can	
  
libraries	
  play	
  in	
  
data	
  education?	
  
Why	
  is	
  data	
  
management	
  	
  	
  
a	
  hot	
  topic?	
  
From	
  Flickr	
  by	
  Velo	
  Steve	
  
Back in the day…
Da	
  Vinci	
  
Curie	
  
Newton	
  
classicalschool.blogspot.com	
  
Darwin	
  
Digital	
  data	
  
From	
  Flickr	
  by	
  Flickmor	
  
From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
C.	
  Strasser	
  
Courtesey	
  of	
  WHOI	
  
From	
  Flickr	
  by	
  	
  deltaMike	
  
Digital	
  data	
  
+	
  	
  
Complex	
  
workflows	
  
From	
  Flickr	
  by	
  ~Minnea~	
  
Data	
  management	
  
Documentation	
  
Reproducibility	
  
From	
  Flickr	
  by	
  iowa_spirit_walker	
  
•  Cost	
  
•  Confusion	
  about	
  
standards	
  
•  Lack	
  of	
  training	
  
•  Fear	
  of	
  lost	
  rights	
  or	
  
benefits	
  
•  No	
  incentives	
  
THE
TRUTH
From	
  sandierpastures.com	
  
Data	
  management	
  
Metadata	
  
Data	
  repositories	
  
Data	
  sharing	
  
RESEARCHERS
NEED TO KNOW
ABOUT
From	
  Flickr	
  by	
  johntrainor	
  
Who	
  cares?	
  
From	
  Flickr	
  by	
  hyperion327	
  
From	
  Flickr	
  by	
  Redden-­‐McAllister	
  
…	
  “Federal	
  agencies	
  investing	
  in	
  research	
  and	
  
development	
  (more	
  than	
  $100	
  million	
  in	
  annual	
  
expenditures)	
  must	
  have	
  clear	
  and	
  coordinated	
  
policies	
  for	
  increasing	
  public	
  access	
  to	
  research	
  
products.”	
  
Back	
  in	
  
February:	
  	
  
1.  Maximize	
  free	
  public	
  access	
  
2.  Ensure	
  researchers	
  create	
  data	
  
management	
  plans	
  
3.  Allow	
  costs	
  for	
  data	
  preservation	
  and	
  access	
  
in	
  proposal	
  budgets	
  
4.  Ensure	
  evaluation	
  of	
  data	
  management	
  
plan	
  merits	
  
5.  Ensure	
  researchers	
  comply	
  with	
  their	
  data	
  
management	
  plans	
  
6.  Promote	
  data	
  deposition	
  into	
  public	
  
repositories	
  
7.  Develop	
  approaches	
  for	
  identification	
  and	
  
attribution	
  of	
  datasets	
  
8.  Educate	
  folks	
  about	
  data	
  stewardship	
  
From	
  Flickr	
  by	
  Joe	
  Crimmings	
  Photography	
  
From	
  Flickr	
  by	
  twm1340	
  
Culture	
  
Shift	
  Ahead	
  
science	
  
source	
  
notebook	
  
content	
  
access	
  
data	
  
government	
  
knowledge	
  
From	
  Flickr	
  by	
  cdsessums	
  
flowingdata.com
Map	
  of	
  Scientific	
  Collaborations	
  
From	
  Flickr	
  by	
  ~shorts	
  and	
  longs	
  
Publications	
  &	
  	
  
Their	
  Citation	
  
	
   &	
  data	
  
availability	
  
Data	
  are	
  being	
  recognized	
  
as	
  first	
  class	
  products	
  of	
  
research	
  
From	
  Flickr	
  by	
  Richard	
  Moross	
  
Data	
  management	
  plans	
  
Data	
  sharing	
  mandates	
  
Data	
  publications	
  
Data	
  citation	
  
From	
  Flickr	
  by	
  torkildr	
  
Data	
  publications	
  
Data	
  citation	
  
Data	
  management	
  plans	
  
Data	
  sharing	
  mandates	
  
What	
  should	
  
researchers	
  be	
  doing?	
  
From	
  Flickr	
  by	
  whatthefeed	
  
NOT
V
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old data
Algal Washed Rocks
Dec. 16
Tray 004
SD for delta
13
C = 0.07 SD for delta
15
N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.
A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354
A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356
A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358
A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con
A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22
A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32
A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c
A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368
A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370
A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372
B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c
B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376
B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c
B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c
B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382
B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384
B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386
B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388
B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390
B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392
C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c
C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396
C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398
23.78 1.17
Reference statistics:
Sampling Site / Identifier:
Sample Type:
Date:
Tray ID and Sequence:
From	
  Stephanie	
  Hampton	
  (2010)	
   	
  	
  
ESA	
  Workshop	
  on	
  Best	
  Practices	
  
2	
  tables	
   Random	
  notes	
  
From	
  Stephanie	
  Hampton	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old data
Algal Washed Rocks
Dec. 16
Tray 004
SD for delta
13
C = 0.07 SD for delta
15
N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.
A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354
A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356
A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358
A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con
A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22
A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32
A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c
A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368
A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370
A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372
B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c
B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376
B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c
B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c
B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382
B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384
B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386
B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388
B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390
B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392
C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c
C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396
C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398
23.78 1.17
Reference statistics:
Sampling Site / Identifier:
Sample Type:
Date:
Tray ID and Sequence:
From	
  Stephanie	
  Hampton	
  (2010)	
   	
  	
  
ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
From	
  Stephanie	
  Hampton	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old data
Algal Washed Rocks
Dec. 16
Tray 004
SD for delta
13
C = 0.07 SD for delta
15
N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.
A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354
A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356
A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358
A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con
A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22
A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32
A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c
A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368
A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370
A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372
B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT
B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376
B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics
B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158
B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178
B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024
B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378
B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11
B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390
B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA
C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F
C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813
C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278
23.78 1.17 Total 10 35.55962
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Reference statistics:
Sampling Site / Identifier:
Sample Type:
Date:
Tray ID and Sequence:
Random	
  stats	
  output	
  
From	
  Stephanie	
  Hampton	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old data
Algal Washed Rocks
Dec. 16
Tray 004
SD for delta
13
C = 0.07 SD for delta
15
N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.
A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354
A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356
A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358
A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con
A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22
A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32
A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c
A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368
A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370
A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372
B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT
B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376
B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics
B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158
B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178
B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024
B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378
B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11
B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390
B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA
C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F
C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813
C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278
23.78 1.17 Total 10 35.55962
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Reference statistics:
Sampling Site / Identifier:
Sample Type:
Date:
Tray ID and Sequence:
SampleID ALG03 ALG05 ALG07 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07
Weight (mg) 2.91 2.91 3.04 2.95 3.01 3 2.99 2.92 2.9
%C 6.85 35.56 33.49 41.17 43.74 4.51 1.59 4.37 33.58
delta 13C -21.11 -28.05 -29.56 -27.32 -27.50 -22.68 -24.58 -21.06 -29.44
delta 13C_ca -20.65 -27.59 -29.10 -26.86 -27.04 -22.22 -24.12 -20.60 -28.98
%N 0.48 2.30 1.68 1.97 1.36 0.34 0.15 0.34 1.74
delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62
delta 15N_ca -1.62 -0.06 0.14 2.06 0.34 3.66 -2.34 -2.17 -0.03
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
-35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00
Series1
From	
  Stephanie	
  Hampton	
  
From	
  Flickr	
  by	
  whatthefeed	
  
What	
  should	
  
researchers	
  be	
  doing?	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
Create	
  unique	
  identifiers	
  
•  Decide	
  on	
  naming	
  scheme	
  early	
  
•  Create	
  a	
  key	
  
•  Different	
  for	
  each	
  sample	
  
2.	
  Data	
  collection	
  &	
  organization	
  
From	
  Flickr	
  by	
  sjbresnahan	
  
From	
  Flickr	
  by	
  zebbie	
  
Standardize	
  
•  Consistent	
  within	
  columns	
  
– only	
  numbers,	
  dates,	
  or	
  text	
  
•  Consistent	
  names,	
  codes,	
  formats	
  
Modified	
  from	
  K.	
  Vanderbilt	
  	
  
From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  
Google	
  Docs	
  
Forms	
  
Standardize	
  
•  Reduce	
  possibility	
  
of	
  manual	
  error	
  by	
  
constraining	
  entry	
  
choices	
  
Modified	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  
Excel	
  lists	
  
Data	
  
validataion	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
Create	
  parameter	
  table	
  
Create	
  a	
  site	
  table	
  
From	
  doi:10.3334/ORNLDAAC/777	
  
From	
  doi:10.3334/ORNLDAAC/777	
  
From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
 Use	
  descriptive	
  file	
  names	
  
•  Unique	
  
•  Reflect	
  contents	
  
From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
Bad:	
   	
  Mydata.xls	
  
	
   	
  2001_data.csv	
  
	
   	
  best	
  version.txt	
  
Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
Site	
  
name	
  
Year	
  
What	
  was	
  
measured	
  	
  
Study	
  
organism	
  
2.	
  Data	
  collection	
  &	
  organization	
  
*Not	
  for	
  everyone	
  
*	
  
Organize	
  files	
  	
  logically	
  
Biodiversity	
  
Lake	
  
Experiments	
  
Field	
  work	
  
Grassland	
  
Biodiv_H20_heatExp_2005to2008.csv	
  
Biodiv_H20_predatorExp_2001to2003.csv	
  
…	
  
Biodiv_H20_PlanktonCount_2001toActive.csv	
  
Biodiv_H20_ChlAprofiles_2003.csv	
  
…	
  
	
  
From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  
 Preserve	
  information	
  
•  Keep	
  raw	
  data	
  raw	
  
•  Use	
  scripts	
  to	
  process	
  data	
   	
  
	
  &	
  save	
  them	
  with	
  data	
  
Raw	
  data	
  as	
  .csv	
  
R	
  script	
  for	
  processing	
  &	
  
analysis	
  
2.	
  Data	
  collection	
  &	
  organization	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
Before	
  data	
  collection	
  
•  Define	
  &	
  enforce	
  standards	
  
•  Assign	
  responsibility	
  for	
  data	
  quality	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  
From	
  Flickr	
  by	
  StacieBee	
  
After	
  data	
  entry	
  
•  Check	
  for	
  missing,	
  impossible,	
  
anomalous	
  values	
  
•  Perform	
  statistical	
  summaries	
  	
  
•  Look	
  for	
  outliers	
  
	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  
0	
  
10	
  
20	
  
30	
  
40	
  
50	
  
60	
  
0	
   10	
   20	
   30	
   40	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
4.	
  Metadata	
  basics	
   Why	
  are	
  you	
  
promoting	
  
Excel?	
  
What	
  is	
  
metadata?	
  
•  Digital	
  context	
  
•  Name	
  of	
  the	
  data	
  set	
  
•  The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
  
set	
  
•  Date	
  the	
  data	
  set	
  was	
  last	
  modified	
  
•  Example	
  data	
  file	
  records	
  for	
  each	
  data	
  
type	
  file	
  
•  Pertinent	
  companion	
  files	
  
•  List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
  
•  Software	
  (including	
  version	
  number)	
  
used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
  
•  Data	
  processing	
  that	
  was	
  performed	
  
•  Personnel	
  &	
  stakeholders	
  
•  Who	
  collected	
  	
  
•  Who	
  to	
  contact	
  with	
  questions	
  
•  Funders	
  
•  Scientific	
  context	
  
•  Scientific	
  reason	
  why	
  the	
  data	
  were	
  
collected	
  
•  What	
  data	
  were	
  collected	
  
•  What	
  instruments	
  (including	
  model	
  &	
  
serial	
  number)	
  were	
  used	
  
•  Environmental	
  conditions	
  during	
  collection	
  
•  Where	
  collected	
  &	
  spatial	
  resolution	
  When	
  
collected	
  &	
  temporal	
  resolution	
  
•  Standards	
  or	
  calibrations	
  used	
  
•  Information	
  about	
  parameters	
  
•  How	
  each	
  was	
  measured	
  or	
  produced	
  
•  Units	
  of	
  measure	
  
•  Format	
  used	
  in	
  the	
  data	
  set	
  
•  Precision	
  &	
  accuracy	
  if	
  known	
  
•  Information	
  about	
  data	
  
•  Definitions	
  of	
  codes	
  used	
  
•  Quality	
  assurance	
  &	
  control	
  measures	
  
•  Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
uncertainty,	
  sampling	
  problems)	
  	
  
•  How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
  
•  Provides	
  structure	
  to	
  describe	
  data	
  
Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  
4.	
  Metadata	
  basics	
  
•  Lots	
  of	
  different	
  standards	
  
	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
•  Tools	
  for	
  creating	
  metadata	
  files	
  
	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
	
  
	
  
What	
  is	
  
metadata?	
  
Select	
  the	
  appropriate	
  standard	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
Temperature	
  
data	
  
Salinity	
  	
  	
  	
  	
  	
  	
  	
  
data	
  
Data	
  import	
  into	
  R	
  
Analysis:	
  mean,	
  SD	
  
Graph	
  production	
  
Quality	
  control	
  &	
  
data	
  cleaning	
  “Clean”	
  T	
  
&	
  S	
  data	
  
Summary	
  
statistics	
  
Data	
  in	
  R	
  
format	
  
5.	
  Workflows	
  
Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
products	
  of	
  your	
  research	
  
	
  
Simple	
  workflows:	
  flow	
  charts	
  
•  R,	
  SAS,	
  MATLAB	
  
•  Well-­‐documented	
  code	
  is…	
  
Easier	
  to	
  review	
  
Easier	
  to	
  share	
  
Easier	
  to	
  repeat	
  analysis	
  
5.	
  Workflows	
  
Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
products	
  of	
  your	
  research	
  
	
  
Simple	
  workflows:	
  commented	
  scripts	
  
#	
  %	
  
$	
  
&	
  
Fancy	
  Schmancy	
  workflows:	
  Kepler	
  
Resulting	
  output	
  
5.	
  Workflows	
  
https://kepler-­‐project.org	
  
Workflows	
  enable…	
  
	
  
Reproducibility	
  
	
  can	
  someone	
  independently	
  validate	
  findings?	
  
Transparency	
  	
  
	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
Executability	
  	
  
	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
	
  
5.	
  Workflows	
  
From	
  Flickr	
  by	
  merlinprincesse	
  
Coming	
  Soon:	
  
workflow	
  sharing	
  
requirements!	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6. Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
Use	
  stable	
  formats	
  
	
   	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
Modified from R. Cook	
  
Store	
  your	
  data	
  in	
  a	
  repository	
  
Institutional	
  archive	
  
Discipline/specialty	
  archive	
  
	
  
	
  
	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
From	
  Flickr	
  by	
  torkildr	
  
Ask	
  a	
  librarian	
  
Repos	
  of	
  repos:	
  
databib.org	
  
re3data.org	
  
Allows	
  readers	
  to	
  find	
  data	
  products	
  
Get	
  credit	
  for	
  data	
  and	
  publications	
  
Promotes	
  reproducibility	
  
Better	
  measure	
  of	
  research	
  impact	
  
Example:	
  
Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  
morphological	
  diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  
phylogeny:	
  a	
  case	
  study	
  from	
  characiform	
  fishes.	
  Dryad	
  Digital	
  
Repository.	
  doi:10.5061/dryad.20	
   Persistent	
  Unique	
  
Identifier	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
Practice	
  Data	
  Citation	
  
data management
From	
  Flickr	
  by	
  Big	
  Swede	
  Guy	
  
1.  Planning	
  
2.  Data	
  collection	
  &	
  
organization	
  
3.  Quality	
  control	
  &	
  assurance	
  
4.  Metadata	
  
5.  Workflows	
  
6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  
A	
  document	
  that	
  
describes	
  what	
  you	
  will	
  
do	
  with	
  your	
  data	
  
throughout	
  	
  
the	
  research	
  project	
  
From Flickr by Barbies Land
What	
  is	
  a	
  data	
  
management	
  plan?	
  
DMP	
  for	
  funders:	
  
A	
  short	
  plan	
  submitted	
  
alongside	
  grant	
  applications	
  
But they all have
different requirements
and express them in
different ways
From	
  Flickr	
  by	
  401(K)	
  2013	
  
	
  An	
  outline	
  of	
  	
  
–  what	
  will	
  be	
  collected	
  
–  methods	
  
–  Standards	
  
–  Metadata	
  
–  sharing/access	
  
–  long-­‐term	
  storage	
  
	
  Includes	
  how	
  and	
  why	
  
 DMP	
  supplement	
  may	
  include:	
  
1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  (where	
  
existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  be	
  
documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
protection	
  of	
  privacy,	
  confidentiality,	
  security,	
  intellectual	
  property,	
  or	
  other	
  
rights	
  or	
  requirements	
  
4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
derivatives	
  
5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
preservation	
  of	
  access	
  to	
  them	
  
NSF	
  DMP	
  Requirements	
  
From	
  Grant	
  Proposal	
  Guidelines:	
  
•  Types	
  of	
  data	
  
•  Existing	
  data	
  
•  How/when/where	
  created?	
  
•  How	
  processed?	
  
•  Quality	
  control	
  	
  
•  Security	
  
•  Who	
  is	
  responsible	
  	
  
1.  Types	
  of	
  data	
  &	
  other	
  information	
  
biology.kenyon.edu	
  
C.	
  Strasser	
  
From	
  Flickr	
  by	
  Lazurite	
  
Wired.com	
  
•  Metadata	
  needed	
  
•  How	
  captured	
  	
  
•  Standards	
  
2.  Data	
  &	
  metadata	
  standards	
  
•  Obligation	
  to	
  share	
  	
  
•  How/when/where	
  available	
  
•  Getting	
  access	
  	
  
•  Copyright	
  /	
  IP	
  
•  Permission	
  restrictions	
  
•  Embargo	
  periods	
  	
  
•  Ethics/privacy	
  	
  
•  How	
  cited	
  
3.  Policies	
  for	
  access	
  &	
  sharing	
  
4.  Policies	
  for	
  re-­‐use	
  &	
  re-­‐distribution	
  
From	
  Flickr	
  by	
  maryfrancesmain	
  
•  What	
  &	
  where	
  	
  
•  Metadata	
  
•  Who’s	
  responsible	
  
5.  Plans	
  for	
  archiving	
  &	
  preservation	
  
From	
  Flickr	
  by	
  theManWhoSurfedTooMuch	
  
Don’t	
  forget	
  the	
  budget	
  
dorrvs.com	
  
NSF’s	
  Vision*	
  
DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  
change	
  over	
  time	
  	
  
Peer	
  review	
  will	
  determine	
  next	
  steps	
  
Community-­‐driven	
  guidelines	
  	
  
Evaluation	
  will	
  vary	
  with	
  directorate,	
  
division,	
  &	
  program	
  officer	
  
	
  
*Unofficially	
  
From	
  Flickr	
  by	
  celikins	
  
Where	
  to	
  start?	
  
From	
  Flickr	
  by	
  Andy	
  Graulund	
  
Make	
  a	
  
resolution	
  
• Triage	
  on	
  current	
  
projects	
  
• Get	
  	
  advisor,	
  lab	
  mates,	
  
collaborators	
  on	
  board	
  
• Do	
  better	
  next	
  time	
  
Start	
  working	
  
online	
  
From	
  Flickr	
  by	
  karindalziel	
  
From	
  Flickr	
  by	
  karindalziel	
  
E-­‐notebooks	
  
Online	
  science	
  	
  	
  
http://datapub.cdlib.org/software-­‐for-­‐reproducibility-­‐part-­‐2-­‐the-­‐tools/	
  
Reproducibility	
  
From	
  Flickr	
  by	
  dipster1	
  
Toolbox	
  
Step-by-step wizard for generating DMP
create | edit | re-use | share
Free & open to community
dmptool.org	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Write	
  a	
  DMP	
  
databib.org	
  
Where	
  
should	
  I	
  put	
  
my	
  data?	
  
Find	
  a	
  repository	
  
Get	
  help	
  
FromFlickrbythewmatt
Get	
  help	
  from	
  your	
  library	
  
From	
  Flickr	
  by	
  North	
  Carolina	
  Digital	
  
Heritage	
  Center	
  
From	
  Flickr	
  by	
  Madison	
  Guy	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  
www.dataone.org	
  
Get	
  help	
  
B	
  
C	
  A	
  
•  Data	
  Education	
  Tutorials	
  
•  Database	
  of	
  best	
  practices	
  	
  &	
  
software	
  tools	
  
•  Primer	
  on	
  data	
  management	
  
•  Investigator	
  Toolkit	
  
www.dataone.org	
  
From	
  Flickr	
  by	
  Skakerman	
  
A	
  word	
  about	
  
Metrics…	
  
Articles	
  are	
  the	
  butterfly	
  pinned	
  on	
  
the	
  wall.	
  Pretty	
  but	
  not	
  very	
  
useful.	
  They	
  are	
  only	
  the	
  
advertisements	
  for	
  scholarship.	
  	
  
	
  
–	
  A.	
  Levi,	
  U.	
  Maryland	
  College	
  of	
  Information	
  
Studies	
  	
  
From	
  Flickr	
  by	
  LisaW123	
  
How to
incentivize
good data
stewardship?
Data	
  Citation	
  
Altmetrics	
  (Alternative	
  Metrics)	
  
From	
  Flickr	
  by	
  chriscook04	
  
From	
  Flickr	
  by	
  dotpolka	
  
Doing	
  science	
  is	
  a	
  
privilege	
  –	
  not	
  a	
  right	
  
 There	
  is	
  a	
  social	
  contract	
  of	
  science:	
  we	
  
have	
  an	
  obligation	
  to	
  ensure	
  dissemination,	
  
validation,	
  &	
  advancement.	
  
To	
  not	
  do	
  so	
  is	
  science	
  malpractice.	
  	
  
Who's	
  responsible?	
  Researchers,	
  
publishers,	
  libraries,	
  repositories…	
  
	
  
–	
  Brian	
  Hole,	
  Ubiquity	
  Press	
  at	
  UCL	
  
From	
  Flickr	
  by	
  mikerosebery	
  
From	
  Flickr	
  by	
  Michael	
  Tinkler	
  
Data	
  Pub	
  Blog:	
  datapub.cdlib.org	
  
My	
  website	
  
Email	
  me	
  
Tweet	
  me	
  
My	
  slides	
  
carlystrasser.net	
  
carlystrasser@gmail.com	
  
@carlystrasser	
  	
  
slideshare.net/carlystrasser	
  

Weitere ähnliche Inhalte

Ähnlich wie Data Stewardship Best Practices for Researchers

DataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyDataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyCarly Strasser
 
Gwi data management
Gwi data managementGwi data management
Gwi data managementsusan borda
 
DataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekDataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekCarly Strasser
 
M Resources Technical Marketing Sample Pack 2015
M Resources Technical Marketing Sample Pack 2015M Resources Technical Marketing Sample Pack 2015
M Resources Technical Marketing Sample Pack 2015Ross Stainlay
 
Flux optimization in air gap membrane distillation system for water desalina...
Flux optimization in air gap membrane distillation system  for water desalina...Flux optimization in air gap membrane distillation system  for water desalina...
Flux optimization in air gap membrane distillation system for water desalina...Dahiru Lawal
 
Google Analytics Reports
Google Analytics ReportsGoogle Analytics Reports
Google Analytics ReportsReportGarden
 
Data Center Architecture Trends
Data Center Architecture TrendsData Center Architecture Trends
Data Center Architecture TrendsPanduit
 
Paper2_figures_fire_modeling
Paper2_figures_fire_modelingPaper2_figures_fire_modeling
Paper2_figures_fire_modelingTony Randolph
 
Gravity water supply design illustration using SW software
Gravity water supply design illustration using SW softwareGravity water supply design illustration using SW software
Gravity water supply design illustration using SW softwarePratap Bikram Shahi
 
Synergizing mixture do e with cfd for ash slurry optimization
Synergizing mixture do e with cfd for ash slurry optimizationSynergizing mixture do e with cfd for ash slurry optimization
Synergizing mixture do e with cfd for ash slurry optimizationDr. Bikram Jit Singh
 
Interlocking safety-gratings
Interlocking safety-gratingsInterlocking safety-gratings
Interlocking safety-gratingsJack Cui
 
Study on baltim field,b.sc graduation project 2015, by atam team
Study on baltim field,b.sc graduation project 2015, by atam teamStudy on baltim field,b.sc graduation project 2015, by atam team
Study on baltim field,b.sc graduation project 2015, by atam teamPE Mahmoud Jad
 

Ähnlich wie Data Stewardship Best Practices for Researchers (20)

DataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyDataUp Presentation at Cal Poly
DataUp Presentation at Cal Poly
 
Gwi data management
Gwi data managementGwi data management
Gwi data management
 
DataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekDataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research Week
 
DataUp for USGS CDI
DataUp for USGS CDIDataUp for USGS CDI
DataUp for USGS CDI
 
Btt
BttBtt
Btt
 
TABLE3-1.RTF
TABLE3-1.RTFTABLE3-1.RTF
TABLE3-1.RTF
 
Me 3101 text
Me 3101 textMe 3101 text
Me 3101 text
 
CBG_BSc Grad
CBG_BSc GradCBG_BSc Grad
CBG_BSc Grad
 
M Resources Technical Marketing Sample Pack 2015
M Resources Technical Marketing Sample Pack 2015M Resources Technical Marketing Sample Pack 2015
M Resources Technical Marketing Sample Pack 2015
 
Flux optimization in air gap membrane distillation system for water desalina...
Flux optimization in air gap membrane distillation system  for water desalina...Flux optimization in air gap membrane distillation system  for water desalina...
Flux optimization in air gap membrane distillation system for water desalina...
 
Google Analytics Reports
Google Analytics ReportsGoogle Analytics Reports
Google Analytics Reports
 
Data Center Architecture Trends
Data Center Architecture TrendsData Center Architecture Trends
Data Center Architecture Trends
 
Paper2_figures_fire_modeling
Paper2_figures_fire_modelingPaper2_figures_fire_modeling
Paper2_figures_fire_modeling
 
ASR.ppt
ASR.pptASR.ppt
ASR.ppt
 
Chemistry
ChemistryChemistry
Chemistry
 
Gravity water supply design illustration using SW software
Gravity water supply design illustration using SW softwareGravity water supply design illustration using SW software
Gravity water supply design illustration using SW software
 
Synergizing mixture do e with cfd for ash slurry optimization
Synergizing mixture do e with cfd for ash slurry optimizationSynergizing mixture do e with cfd for ash slurry optimization
Synergizing mixture do e with cfd for ash slurry optimization
 
Interlocking safety-gratings
Interlocking safety-gratingsInterlocking safety-gratings
Interlocking safety-gratings
 
Metode nakayasu
Metode nakayasuMetode nakayasu
Metode nakayasu
 
Study on baltim field,b.sc graduation project 2015, by atam team
Study on baltim field,b.sc graduation project 2015, by atam teamStudy on baltim field,b.sc graduation project 2015, by atam team
Study on baltim field,b.sc graduation project 2015, by atam team
 

Mehr von Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14Carly Strasser
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 

Mehr von Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 

Kürzlich hochgeladen

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Data Stewardship Best Practices for Researchers

  • 1. Data  Stewardship   for  Researchers   Carly  Strasser,  PhD   California  Digital  Library   @carlystrasser   carly.strasser@ucop.edu   31  July  2013   CLIR  Symposium   From  Calisphere,    Couretsy  of    UC  Riverside,  California  Museum  of  Photography   Tips,  Tools,  &  Guidance    From  Calisphere,    Courtesy  of  Thousand  Oaks  Library      
  • 2. Roadmap   4.  Toolbox     1.  Background     2.  Why  you  should  care   3.  Best  practices  
  • 3. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Two  main  goals:   1.  Build  a  network  for  data  repositories   2.  Build  community  around  data   Focus  on     Earth  |  environmental  |  ecological  |  oceanographic     data    
  • 4. Why  don’t  people   share  data?   Is  data  management   being  taught?   Do  attitudes  about   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?   What  barriers  to  sharing   can  we  eliminate?   What  role  can   libraries  play  in   data  education?  
  • 5.
  • 6. Why  is  data   management       a  hot  topic?   From  Flickr  by  Velo  Steve  
  • 7. Back in the day… Da  Vinci   Curie   Newton   classicalschool.blogspot.com   Darwin  
  • 8. Digital  data   From  Flickr  by  Flickmor   From  Flickr  by  US  Army  Environmental  Command   From  Flickr  by    DW0825   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by    deltaMike  
  • 9. Digital  data   +     Complex   workflows  
  • 10. From  Flickr  by  ~Minnea~   Data  management   Documentation   Reproducibility  
  • 11. From  Flickr  by  iowa_spirit_walker   •  Cost   •  Confusion  about   standards   •  Lack  of  training   •  Fear  of  lost  rights  or   benefits   •  No  incentives  
  • 12. THE TRUTH From  sandierpastures.com   Data  management   Metadata   Data  repositories   Data  sharing   RESEARCHERS NEED TO KNOW ABOUT
  • 13. From  Flickr  by  johntrainor   Who  cares?  
  • 14. From  Flickr  by  hyperion327   From  Flickr  by  Redden-­‐McAllister  
  • 15. …  “Federal  agencies  investing  in  research  and   development  (more  than  $100  million  in  annual   expenditures)  must  have  clear  and  coordinated   policies  for  increasing  public  access  to  research   products.”   Back  in   February:    
  • 16. 1.  Maximize  free  public  access   2.  Ensure  researchers  create  data   management  plans   3.  Allow  costs  for  data  preservation  and  access   in  proposal  budgets   4.  Ensure  evaluation  of  data  management   plan  merits   5.  Ensure  researchers  comply  with  their  data   management  plans   6.  Promote  data  deposition  into  public   repositories   7.  Develop  approaches  for  identification  and   attribution  of  datasets   8.  Educate  folks  about  data  stewardship   From  Flickr  by  Joe  Crimmings  Photography  
  • 17. From  Flickr  by  twm1340   Culture   Shift  Ahead  
  • 18. science   source   notebook   content   access   data   government   knowledge   From  Flickr  by  cdsessums  
  • 20. From  Flickr  by  ~shorts  and  longs   Publications  &     Their  Citation     &  data   availability  
  • 21. Data  are  being  recognized   as  first  class  products  of   research   From  Flickr  by  Richard  Moross  
  • 22. Data  management  plans   Data  sharing  mandates   Data  publications   Data  citation   From  Flickr  by  torkildr  
  • 23. Data  publications   Data  citation   Data  management  plans   Data  sharing  mandates  
  • 24. What  should   researchers  be  doing?   From  Flickr  by  whatthefeed   NOT V
  • 25. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   2  tables   Random  notes   From  Stephanie  Hampton  
  • 26. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   Wash  Cres  Lake  Dec  15  Dont_Use.xls   From  Stephanie  Hampton  
  • 27. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: Random  stats  output   From  Stephanie  Hampton  
  • 28. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: SampleID ALG03 ALG05 ALG07 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 Weight (mg) 2.91 2.91 3.04 2.95 3.01 3 2.99 2.92 2.9 %C 6.85 35.56 33.49 41.17 43.74 4.51 1.59 4.37 33.58 delta 13C -21.11 -28.05 -29.56 -27.32 -27.50 -22.68 -24.58 -21.06 -29.44 delta 13C_ca -20.65 -27.59 -29.10 -26.86 -27.04 -22.22 -24.12 -20.60 -28.98 %N 0.48 2.30 1.68 1.97 1.36 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 delta 15N_ca -1.62 -0.06 0.14 2.06 0.34 3.66 -2.34 -2.17 -0.03 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 Series1 From  Stephanie  Hampton  
  • 29. From  Flickr  by  whatthefeed   What  should   researchers  be  doing?  
  • 30. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 31. Create  unique  identifiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   2.  Data  collection  &  organization   From  Flickr  by  sjbresnahan   From  Flickr  by  zebbie  
  • 32. Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com   2.  Data  collection  &  organization  
  • 33. Google  Docs   Forms   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Modified  from  K.  Vanderbilt     2.  Data  collection  &  organization   Excel  lists   Data   validataion  
  • 34. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 35.  Use  descriptive  file  names   •  Unique   •  Reflect  contents   From  R  Cook,  ESA  Best  Practices  Workshop  2010   Bad:    Mydata.xls      2001_data.csv      best  version.txt   Better:  Eaffinis_nanaimo_2010_counts.xls   Site   name   Year   What  was   measured     Study   organism   2.  Data  collection  &  organization   *Not  for  everyone   *  
  • 36. Organize  files    logically   Biodiversity   Lake   Experiments   Field  work   Grassland   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     From  S.  Hampton   2.  Data  collection  &  organization  
  • 37.  Preserve  information   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv   R  script  for  processing  &   analysis   2.  Data  collection  &  organization  
  • 38. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 39. Before  data  collection   •  Define  &  enforce  standards   •  Assign  responsibility  for  data  quality   3.  Quality  control  and  quality  assurance   From  Flickr  by  StacieBee  
  • 40. After  data  entry   •  Check  for  missing,  impossible,   anomalous  values   •  Perform  statistical  summaries     •  Look  for  outliers     3.  Quality  control  and  quality  assurance   0   10   20   30   40   50   60   0   10   20   30   40  
  • 41. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 42. 4.  Metadata  basics   Why  are  you   promoting   Excel?   What  is   metadata?  
  • 43. •  Digital  context   •  Name  of  the  data  set   •  The  name(s)  of  the  data  file(s)  in  the  data   set   •  Date  the  data  set  was  last  modified   •  Example  data  file  records  for  each  data   type  file   •  Pertinent  companion  files   •  List  of  related  or  ancillary  data  sets   •  Software  (including  version  number)   used  to  prepare/read    the  data  set   •  Data  processing  that  was  performed   •  Personnel  &  stakeholders   •  Who  collected     •  Who  to  contact  with  questions   •  Funders   •  Scientific  context   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  What  instruments  (including  model  &   serial  number)  were  used   •  Environmental  conditions  during  collection   •  Where  collected  &  spatial  resolution  When   collected  &  temporal  resolution   •  Standards  or  calibrations  used   •  Information  about  parameters   •  How  each  was  measured  or  produced   •  Units  of  measure   •  Format  used  in  the  data  set   •  Precision  &  accuracy  if  known   •  Information  about  data   •  Definitions  of  codes  used   •  Quality  assurance  &  control  measures   •  Known  problems  that  limit  data  use  (e.g.   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set   4.  Metadata  basics  
  • 44. •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   4.  Metadata  basics   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…   •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)         What  is   metadata?   Select  the  appropriate  standard  
  • 45. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 46. Temperature   data   Salinity                 data   Data  import  into  R   Analysis:  mean,  SD   Graph  production   Quality  control  &   data  cleaning  “Clean”  T   &  S  data   Summary   statistics   Data  in  R   format   5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  flow  charts  
  • 47. •  R,  SAS,  MATLAB   •  Well-­‐documented  code  is…   Easier  to  review   Easier  to  share   Easier  to  repeat  analysis   5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  commented  scripts   #  %   $   &  
  • 48. Fancy  Schmancy  workflows:  Kepler   Resulting  output   5.  Workflows   https://kepler-­‐project.org  
  • 49. Workflows  enable…     Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis     5.  Workflows   From  Flickr  by  merlinprincesse   Coming  Soon:   workflow  sharing   requirements!  
  • 50. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6. Data  stewardship  &  reuse   Best  Practices  
  • 51. Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   6.  Data  stewardship  &  reuse   Modified from R. Cook  
  • 52. Store  your  data  in  a  repository   Institutional  archive   Discipline/specialty  archive         6.  Data  stewardship  &  reuse   From  Flickr  by  torkildr   Ask  a  librarian   Repos  of  repos:   databib.org   re3data.org  
  • 53. Allows  readers  to  find  data  products   Get  credit  for  data  and  publications   Promotes  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of   morphological  diversification  in  the  absence  of  a  detailed   phylogeny:  a  case  study  from  characiform  fishes.  Dryad  Digital   Repository.  doi:10.5061/dryad.20   Persistent  Unique   Identifier   6.  Data  stewardship  &  reuse   Practice  Data  Citation  
  • 54. data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 55. A  document  that   describes  what  you  will   do  with  your  data   throughout     the  research  project   From Flickr by Barbies Land What  is  a  data   management  plan?  
  • 56. DMP  for  funders:   A  short  plan  submitted   alongside  grant  applications   But they all have different requirements and express them in different ways From  Flickr  by  401(K)  2013    An  outline  of     –  what  will  be  collected   –  methods   –  Standards   –  Metadata   –  sharing/access   –  long-­‐term  storage    Includes  how  and  why  
  • 57.  DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them   NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:  
  • 58. •  Types  of  data   •  Existing  data   •  How/when/where  created?   •  How  processed?   •  Quality  control     •  Security   •  Who  is  responsible     1.  Types  of  data  &  other  information   biology.kenyon.edu   C.  Strasser   From  Flickr  by  Lazurite  
  • 59. Wired.com   •  Metadata  needed   •  How  captured     •  Standards   2.  Data  &  metadata  standards  
  • 60. •  Obligation  to  share     •  How/when/where  available   •  Getting  access     •  Copyright  /  IP   •  Permission  restrictions   •  Embargo  periods     •  Ethics/privacy     •  How  cited   3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   From  Flickr  by  maryfrancesmain  
  • 61. •  What  &  where     •  Metadata   •  Who’s  responsible   5.  Plans  for  archiving  &  preservation   From  Flickr  by  theManWhoSurfedTooMuch  
  • 62. Don’t  forget  the  budget   dorrvs.com  
  • 63. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &   change  over  time     Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     Evaluation  will  vary  with  directorate,   division,  &  program  officer     *Unofficially  
  • 64. From  Flickr  by  celikins   Where  to  start?  
  • 65. From  Flickr  by  Andy  Graulund   Make  a   resolution   • Triage  on  current   projects   • Get    advisor,  lab  mates,   collaborators  on  board   • Do  better  next  time  
  • 66. Start  working   online   From  Flickr  by  karindalziel  
  • 67. From  Flickr  by  karindalziel   E-­‐notebooks   Online  science       http://datapub.cdlib.org/software-­‐for-­‐reproducibility-­‐part-­‐2-­‐the-­‐tools/   Reproducibility  
  • 68.
  • 69. From  Flickr  by  dipster1   Toolbox  
  • 70. Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community dmptool.org                     Write  a  DMP  
  • 71. databib.org   Where   should  I  put   my  data?   Find  a  repository  
  • 73. Get  help  from  your  library   From  Flickr  by  North  Carolina  Digital   Heritage  Center   From  Flickr  by  Madison  Guy  
  • 74. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   www.dataone.org   Get  help  
  • 76. •  Data  Education  Tutorials   •  Database  of  best  practices    &   software  tools   •  Primer  on  data  management   •  Investigator  Toolkit   www.dataone.org  
  • 77. From  Flickr  by  Skakerman   A  word  about   Metrics…  
  • 78. Articles  are  the  butterfly  pinned  on   the  wall.  Pretty  but  not  very   useful.  They  are  only  the   advertisements  for  scholarship.       –  A.  Levi,  U.  Maryland  College  of  Information   Studies     From  Flickr  by  LisaW123  
  • 79. How to incentivize good data stewardship? Data  Citation   Altmetrics  (Alternative  Metrics)   From  Flickr  by  chriscook04  
  • 80. From  Flickr  by  dotpolka   Doing  science  is  a   privilege  –  not  a  right  
  • 81.  There  is  a  social  contract  of  science:  we   have  an  obligation  to  ensure  dissemination,   validation,  &  advancement.   To  not  do  so  is  science  malpractice.     Who's  responsible?  Researchers,   publishers,  libraries,  repositories…     –  Brian  Hole,  Ubiquity  Press  at  UCL   From  Flickr  by  mikerosebery  
  • 82. From  Flickr  by  Michael  Tinkler  
  • 83. Data  Pub  Blog:  datapub.cdlib.org  
  • 84. My  website   Email  me   Tweet  me   My  slides   carlystrasser.net   carlystrasser@gmail.com   @carlystrasser     slideshare.net/carlystrasser