SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Data	
  Management	
  for	
  Scientists	
  
                     	
  
       Reduce	
  your	
  workload	
  
            Reuse	
  your	
  ideas	
  
           Recycle	
  your	
  data	
  
                                  	
  

                                                                                 www.oddee.com	
  



Carly	
  Strasser,	
  PhD	
                                                      UC	
  Riverside	
  
California	
  Digital	
  Library,	
  UC	
  Office	
  of	
  the	
  President	
     February	
  2012	
  
carly.strasser@ucop.edu	
  
www.carlystrasser.net	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
What	
  role	
  can	
  
                                                             libraries	
  play	
  in	
  
                                                             data	
  education?	
  


                                         What	
  barriers	
  to	
  sharing	
  
                                           can	
  we	
  eliminate?	
  
            Why	
  don’t	
  people	
  
              share	
  data?	
  
                                     Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                            being	
  taught?	
  
  sharing	
  differ	
  
among	
  disciplines?	
  
                                           How	
  can	
  we	
  promote	
  storing	
  
                                              data	
  in	
  repositories?	
  
Who	
  we	
  are	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                                 From	
  Flickr	
  by	
  Flickmor	
  




                                                          From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                       Digital	
  data	
  




                                             www.woodrow.org	
  
                                                                                            C.	
  Strasser	
  




                                                                                                                                                        Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
Digital	
  data	
  
       +	
  	
  
Complex	
  analyses	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
UGLY TRUTH
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
2	
  tables	
                             Random	
  notes	
  

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00


                                                                                                                                                                                                                                                                                    12	
  
Random	
  stats	
  output	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Data	
  Hangover	
  
                   	
  


What	
  happened?	
  



                        From	
  Flickr	
  by	
  SteveMcN	
  
Where	
  data	
  end	
  up	
  
                                                       From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                  www




                         blog.order2disorder.com	
  




                                                                                                  From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                      From	
  Flickr	
  by	
  csessums	
  
                                                                          Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Who	
  cares?	
  
       	
  

                                               From	
  Flickr	
  by	
  Redden-­‐McAllister	
  




From	
  Flickr	
  by	
  AJC1	
     www.rba.gov.au	
  
Where	
  data	
  end	
  up	
  
                                                                    From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                               www




  Data	
  
                                                                                         www
Metadata	
  
                             From	
  Flickr	
  by	
  torkildr	
  




                                                                                       Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Data	
  
   Reuse	
  

   Data	
  
  Sharing	
  

   Data	
  
Management	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  etc.	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  etc.	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
	
  
Funders	
  
Data	
  management	
  requirements	
  
	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

    1.  Planning	
  
    2.  Data	
  collection	
  &	
  organization	
  
    3.  Quality	
  control	
  &	
  assurance	
  
    4.  Metadata	
  
    5.  Workflows	
  
    6.  Data	
  stewardship	
  &	
  reuse	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Create	
  unique	
  identifiers	
  
     •  Decide	
  on	
  naming	
  scheme	
  early	
  
     •  Create	
  a	
  key	
  
     •  Different	
  for	
  each	
  sample	
  




   From	
  Flickr	
  by	
  zebbie	
          From	
  Flickr	
  by	
  sjbresnahan	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Consistent	
  within	
  columns	
  
                                    – only	
  numbers,	
  dates,	
  or	
  text	
  
                      •  Consistent	
  names,	
  codes,	
  formats	
  




Modified	
  from	
  K.	
  Vanderbilt	
  	
  
                                                                                     From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Reduce	
  possibility	
  
                         of	
  manual	
  error	
  by	
  
                         constraining	
  entry	
  
                         choices	
  


                    Excel	
  lists	
  
                         Data   Google	
  Docs	
  
                                  	
  
                                       Forms	
  
                   validataion	
  

Modified	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
           Create	
  parameter	
  table	
  
           Create	
  a	
  site	
  table	
  




                                              From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  doi:10.3334/ORNLDAAC/777	
  


                                                                      From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Use	
  descriptive	
  file	
  names	
  




                                         PhDcomics.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

   	
  Use	
  descriptive	
  file	
  names	
  *	
  
       •  Unique	
  
       •  Reflect	
  contents	
  

Bad:	
       	
  Mydata.xls	
              Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
   	
        	
  2001_data.csv	
  
   	
        	
  best	
  version.txt	
  
                                                Study	
                          Year	
  
                                              organism	
      Site	
  
                                                             name	
                                       What	
  was	
  
                                                                                                          measured	
  	
  



           *Not	
  for	
  everyone	
  
                                                                         From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Organize	
  files	
  	
  logically	
  


                      Biodiversity	
  


                              Lake	
  


                              Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
                                                 Biodiv_H20_predatorExp_2001to2003.csv	
  
                                                 …	
  
                               Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
                                                 Biodiv_H20_ChlAprofiles_2003.csv	
  
                                                 …	
  
                                                 	
  
                           Grassland	
  
                                                                                            From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  

	
  Preserve	
  information	
                                            R	
  script	
  for	
  processing	
  &	
  
                                                                                                   analysis	
  
 •  Keep	
  raw	
  data	
  raw	
  
 •  Use	
  scripts	
  to	
  process	
  data	
                     	
  
        	
  &	
  save	
  them	
  with	
  data	
  

                                  Raw	
  data	
  as	
  .csv	
  
2.	
  Data	
  collection	
  &	
  oAll	
  of	
  the	
  things	
  that	
  
                                        rganization	
  
                                               make	
  Excel	
  great	
  for	
  
                                               data	
  organization	
  
                                               are	
  bad	
  for	
  archiving!	
  
                                               What	
  to	
  do?	
  



1.    Create	
  archive-­‐ready	
  raw	
  data	
  
2.    Put	
  it	
  somewhere	
  special	
  
3.    Have	
  your	
  fun	
  with	
  fancy	
  Excel	
  techniques	
  
4.    Keep	
  archiving	
  in	
  mind	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  
 Define	
  &	
  enforce	
  standards	
  
 Double	
  data	
  entry	
  
 Document	
  changes	
  
 Minimize	
  manual	
  data	
  entry	
  
 No	
  missing,	
  impossible,	
  or	
  anomalous	
  values	
  
 	
  
                                         60	
  

                                         50	
  

                                         40	
  

                                         30	
  

                                         20	
  

                                         10	
  

                                           0	
  
                                                   0	
     5	
     10	
     15	
     20	
     25	
     30	
     35	
  
4.	
  Metadata	
  basics	
     Why	
  are	
  you	
  
                                What	
  is	
  
                               promoting	
  
                               metadata?	
  
                                 Excel?	
  
4.	
  Metadata	
  basics	
  

    	
  	
  Metadata	
  =	
  Data	
  reporting	
  
                                            	
  



      WHO	
  created	
  the	
  data?	
  
      WHAT	
  is	
  the	
  content	
  of	
  the	
  data	
  set?	
  
      WHEN	
  was	
  it	
  created?	
  
      WHERE	
  was	
  it	
  collected?	
  
      HOW	
  was	
  it	
  developed?	
  
      WHY	
  was	
  it	
  developed?	
  
•    Scientific	
  context	
  

       4.	
  Metadata	
  basics	
                                                          •       Scientific	
  reason	
  why	
  the	
  data	
  were	
  
                                                                                                   collected	
  
                                                                                           •       What	
  data	
  were	
  collected	
  
•    Digital	
  context	
                                                                  •       What	
  instruments	
  (including	
  model	
  &	
  
      •     Name	
  of	
  the	
  data	
  set	
                                                     serial	
  number)	
  were	
  used	
  
      •     The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
           •       Environmental	
  conditions	
  during	
  collection	
  
            set	
                                                                          •       Where	
  collected	
  &	
  spatial	
  resolution	
  When	
  
      •     Date	
  the	
  data	
  set	
  was	
  last	
  modified	
                                 collected	
  &	
  temporal	
  resolution	
  
      •     Example	
  data	
  file	
  records	
  for	
  each	
  data	
                     •       Standards	
  or	
  calibrations	
  used	
  
            type	
  file	
                                                            •    Information	
  about	
  parameters	
  
      •     Pertinent	
  companion	
  files	
                                               •       How	
  each	
  was	
  measured	
  or	
  produced	
  
      •     List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
                     •       Units	
  of	
  measure	
  
      •     Software	
  (including	
  version	
  number)	
                                 •       Format	
  used	
  in	
  the	
  data	
  set	
  
            used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
  
                                                                                           •       Precision	
  &	
  accuracy	
  if	
  known	
  
      •     Data	
  processing	
  that	
  was	
  performed	
  
                                                                                     •    Information	
  about	
  data	
  
•    Personnel	
  &	
  stakeholders	
  
                                                                                           •       Definitions	
  of	
  codes	
  used	
  
      •     Who	
  collected	
  	
  
                                                                                           •       Quality	
  assurance	
  &	
  control	
  measures	
  
      •     Who	
  to	
  contact	
  with	
  questions	
  
                                                                                           •       Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
      •     Funders	
                                                                              uncertainty,	
  sampling	
  problems)	
  	
  
                                                                                     •    How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
                                                                                What	
  is	
  a	
  
                                                                                                            What	
  is	
  
                                                                                                           metadata	
  
                                                                                                           metadata?	
  
                                                                                                           standard?	
  
Select	
  the	
  appropriate	
  
metadata	
  standard	
  
•  Provides	
  structure	
  to	
  describe	
  data	
  
              Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Lots	
  of	
  different	
  standards	
  
            	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
     	
  




•  Tools	
  for	
  creating	
  metadata	
  files	
  
            	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
4.	
  Metadata	
  basics	
  
5.	
  Workflows	
  

 Simplest	
  workflows:	
  commented	
  scripts,	
  flow	
  charts	
  

 Temperature	
  
    data	
  
                                                             Data	
  import	
  into	
  R	
     Data	
  in	
  R	
  
     Salinity	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                format	
  
      data	
  
                                                              Quality	
  control	
  &	
  
                                        “Clean”	
  T	
         data	
  cleaning	
  
                                        &	
  S	
  data	
  

                                                             Analysis:	
  mean,	
  SD	
  
                                                                                                Summary	
  
                                                                                                statistics	
  

                                                             Graph	
  production	
  
5.	
  Workflows	
  
Fancy	
  Schmancy:	
  Kepler	
  
                                                         Resulting	
  output	
  




                      https://kepler-­‐project.org	
  
5.	
  Workflows	
  

 Workflows	
  enable	
  
 	
  
                                                                                                       From	
  Flickr	
  by	
  merlinprincesse	
  
        Reproducibility	
  
               	
  can	
  someone	
  independently	
  validate	
  findings?	
  
        Transparency	
  	
  
               	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
        Executability	
  	
  
               	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
        	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                                                                          From	
  Flickr	
  by	
  greensambaman	
  




      The 20-Year Rule
     The	
  metadata	
  accompanying	
  a	
  
     data	
  set	
  should	
  be	
  written	
  for	
  a	
  
      user	
  20	
  years	
  into	
  the	
  future	
                    RULE	
  
                            	
  
                                 	
  



                                                              (National	
  Research	
  Council	
  1991)	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

Use	
  stable	
  formats	
  
     	
     	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
             original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  




                                                                      Modified from R. Cook	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                                                                         Where	
  do	
  I	
  
                                                                        put	
  my	
  data?	
  


                      Insitutional	
  archive	
  
              Discipline/specialty	
  archive	
  
              DataCite	
  list	
  of	
  repostiories:	
  
                	
  www.datacite.org/repolist	
  
                                                          	
  
                                                          	
  
                                                                 	
  

                   From	
  Flickr	
  by	
  torkildr	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
            Data	
  Citation:	
  Why	
  everyone	
  should	
  do	
  it	
  

                Allow	
  readers	
  to	
  find	
  data	
  products	
  
                Get	
  credit	
  for	
  data	
  and	
  publications	
  
                Promote	
  reproducibility	
  
                Better	
  measure	
  of	
  research	
  impact	
  
     Example:	
  
     Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
     diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
     characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
     	
  


Learn	
  more	
  at	
  www.datacite.org	
                                                             Modified from R. Cook	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

    1.  Planning	
  
    2.  Data	
  collection	
  &	
  organization	
  
    3.  Quality	
  control	
  &	
  assurance	
  
    4.  Metadata	
  
    5.  Workflows	
  
    6.  Data	
  stewardship	
  &	
  reuse	
  
    7.  Planning	
  
1.	
  Planning	
  

    What	
  is	
  a	
  data	
  management	
  plan?	
  
A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  data	
  
during	
  your	
  research	
  and	
  after	
  you	
  complete	
  your	
  research	
  



                                Data	
  
                              Hangover	
  
                                 	
  
1.	
  Planning	
  
              Why	
  should	
  I	
  prepare	
  a	
  DMP?	
  
        	
                           	
  
        Saves	
  time	
  
        Increases	
  efficiency	
  
        Easier	
  to	
  use	
  data	
  	
  	
  
        Others	
  can	
  understand	
  &	
  use	
  data	
  
        Credit	
  for	
  data	
  products	
  
        Funders	
  require	
  it	
  
	
  
NSF	
  DMP	
  Requirements	
  
 From	
  Grant	
  Proposal	
  Guidelines:	
  
	
  DMP	
  supplement	
  may	
  include:	
  
     1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
         materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
  2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  (where	
  
      existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  be	
  
      documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
  3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
      protection	
  of	
  privacy,	
  confidentiality,	
  security,	
  intellectual	
  property,	
  or	
  other	
  
      rights	
  or	
  requirements	
  
  4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
      derivatives	
  
  5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
      preservation	
  of	
  access	
  to	
  them	
  
1.  Types	
  of	
  data	
  &	
  other	
  information	
  

•  Types	
  of	
  data	
  produced	
  
•  Relationship	
  to	
  existing	
  data	
  
•  How/when/where	
  will	
  the	
  data	
  be	
  captured	
  or	
  
   created?	
                                                                      C.	
  Strasser	
  




•  How	
  will	
  the	
  data	
  be	
  processed?	
  
•  Quality	
  assurance	
  &	
  quality	
  control	
  measures	
  
•  Security:	
  version	
  control,	
  backing	
  up	
                  biology.kenyon.edu	
  



•  Who	
  will	
  be	
  responsible	
  for	
  data	
  management	
  
   during/after	
  project?	
  

                                                                       From	
  Flickr	
  by	
  Lazurite	
  
2.  Data	
  &	
  metadata	
  standards	
  

•  What	
  metadata	
  are	
  needed	
  to	
  make	
  the	
  data	
  meaningful?	
  
•  How	
  will	
  you	
  create	
  or	
  capture	
  these	
  metadata?	
  	
  
                                                                                 Wired.com	
  

•  Why	
  have	
  you	
  chosen	
  particular	
  standards	
  and	
  approaches	
  
   for	
  metadata?	
  
3.  Policies	
  for	
  access	
  &	
  sharing	
  
       4.  Policies	
  for	
  re-­‐use	
  &	
  re-­‐distribution	
  
•  Are	
  you	
  under	
  any	
  obligation	
  to	
  share	
  data?	
  	
  
•  How,	
  when,	
  &	
  where	
  will	
  you	
  make	
  the	
  data	
  available?	
  	
  
•  What	
  is	
  the	
  process	
  for	
  gaining	
  access	
  to	
  the	
  data?	
  	
  
•  Who	
  owns	
  the	
  copyright	
  and/or	
  intellectual	
  property?	
  
•    Will	
  you	
  retain	
  rights	
  before	
  opening	
  data	
  to	
  wider	
  use?	
  How	
  long?	
  
•    Are	
  permission	
  restrictions	
  necessary?	
  
•    Embargo	
  periods	
  for	
  political/commercial/patent	
  reasons?	
  	
  
•    Ethical	
  and	
  privacy	
  issues?	
  
•    Who	
  are	
  the	
  foreseeable	
  data	
  users?	
  
•    How	
  should	
  your	
  data	
  be	
  cited?	
  
5.  Plans	
  for	
  archiving	
  &	
  preservation	
  

•  What	
  data	
  will	
  be	
  preserved	
  for	
  the	
  long	
  term?	
  For	
  how	
  long?	
  	
  	
  
•  Where	
  will	
  data	
  be	
  preserved?	
  
•  What	
  data	
  transformations	
  need	
  to	
  occur	
  before	
  
   preservation?	
  
•  What	
  metadata	
  will	
  be	
  submitted	
  
   alongside	
  the	
  datasets?	
  
•  Who	
  will	
  be	
  responsible	
  for	
  preparing	
  
   data	
  for	
  preservation?	
  Who	
  will	
  be	
  the	
  
   main	
  contact	
  person	
  for	
  the	
  archived	
  
   data?	
  

                                                                              From	
  Flickr	
  by	
  theManWhoSurfedTooMuch	
  
Don’t	
  forget:	
  Budget	
  
•  Costs	
  of	
  data	
  preparation	
  &	
  documentation	
  
           Hardware,	
  software	
  
           Personnel	
  
           Archive	
  fees	
  
•  How	
  costs	
  will	
  be	
  paid	
  	
  
           Request	
  funding!	
  



                                                                  dorrvs.com	
  
NSF’s	
  Vision*	
  


    DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  change	
  over	
  time	
  
    (similar	
  to	
  broader	
  impacts)	
  
    Peer	
  review	
  will	
  determine	
  next	
  steps	
  
    Community-­‐driven	
  guidelines	
  	
  
           –  Different	
  disciplines	
  have	
  different	
  definitions	
  of	
  acceptable	
  
              data	
  sharing	
  
           –  Flexibility	
  at	
  the	
  directorate	
  and	
  division	
  levels	
  
           –  Tailor	
  implementation	
  of	
  DMP	
  requirement	
  

    Evaluation	
  will	
  vary	
  with	
  directorate,	
  division,	
  &	
  program	
  
    officer	
  
    	
  
*Unofficially	
  
                                                                                Help	
  from	
  Jennifer	
  Schopf,	
  NSF	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
DMPTool:	
  	
  	
  	
  	
  dmp.cdlib.org	
  




                       Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
             Create	
  	
  |	
  	
  edit	
  	
  |	
  	
  re-­‐use	
  	
  |	
  	
  share	
  	
  |	
  	
  save	
  	
  |	
  	
  generate	
  	
  
                                                 Open	
  to	
  community	
  	
  
                                    Links	
  to	
  institutional	
  resources	
  
                              Directorate	
  information	
  &	
  updates	
  
E-­‐notebooks	
  

•    NoteBook	
  
•    ORNL	
  eNote	
  	
  
•    Evernote	
  
•    Google	
  Docs	
  
•    Blogs	
  
•    wikis	
  
•    TheLabNotebook.com	
  
•    iPad	
  ELN	
  
•    NoteBookMaker	
  
                       iPad ELN, the flexible
                       electronic laboratory notebook


                  TheLabNotebook.com"
CDL	
  Services	
  for	
  UC	
  Community	
  


  Where	
  
should	
  I	
  put	
                             Data	
  Repository	
  
 my	
  data?	
           Deposit	
  	
  |	
  	
  Manage	
  	
  |	
  	
  Share	
  	
  |	
  	
  Preserve	
  




                                                  www.cdlib.org/services/uc3	
  
CDL	
  Services	
  for	
  UC	
  Community	
  


                Create	
  &	
  manage	
  persistent	
  identifiers	
  
                   •     Precise	
  identification	
  of	
  a	
  dataset	
  
                   •     Credit	
  to	
  data	
  producers	
  and	
  data	
  publishers	
  
                   •     A	
  link	
  from	
  the	
  traditional	
  literature	
  to	
  the	
  data	
  
                   •     Research	
  metrics	
  for	
  datasets	
  


Example:	
  
Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
	
  

                                                             www.cdlib.org/services/uc3	
  
Why	
  are	
  you	
  
                                                                                                 promoting	
  
                                                                                                   Excel?	
  


•    Open	
  source	
  add-­‐in	
  
•    Facilitate	
  data	
  management,	
  sharing,	
  archiving	
  for	
  scientists	
  
•    Focus	
  on	
  atmospheric,	
  ecological,	
  hydrological,	
  and	
  
     oceanographic	
  data	
  
•    Collecting	
  requirements	
  for	
  add-­‐in	
  from	
  scientists,	
  data	
  
     centers,	
  libraries	
  




                   Funders:	
  Gordon	
  and	
  Betty	
  Moore	
  Foundation,	
  Microsoft	
  Research	
  
Why	
  are	
  you	
  
                             promoting	
  
                               Excel?	
  




Everyone	
  uses	
  it	
  
Stopgap	
  measure	
  
	
  




	
  
B	
  




A	
             C	
  
www.dataone.org	
  



•    Data	
  Education	
  Tutorials	
  
•    Database	
  of	
  best	
  practices	
  	
  
     &	
  software	
  tools	
  
•    Links	
  to	
  DMPTool	
  
•    Primer	
  on	
  data	
  management	
  




                                                           From	
  Flickr	
  by	
  Robert	
  Hruzek	
  
Data Management 101"




dcxl.cdlib.org	
  
•    Data	
  Education	
  Tutorials	
  
•    Other	
  resources	
  
From	
  tripwow.tripadvisor.com,	
  Travelpod	
  member	
  Sutiramisu	
  
Process	
  
1.  Assess	
  needs	
  
2.  Gather	
  requirements	
  
3.  Build	
  requirements	
  
    document	
  
4.  Build	
  community	
  
Requirements	
  
1.  Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.  No	
  additional	
  software	
  (other	
  than	
  add-­‐in	
  and	
  Excel)	
  necessary	
  
3.  Can	
  be	
  used	
  offline	
  
4.  Perform	
  CSV	
  compatibility	
  checks,	
  reporting,	
  and	
  automated	
  fixes	
  
5.  Add	
  Metadata	
  to	
  data	
  file	
  
     a.  Can	
  use	
  existing	
  metadata	
  as	
  a	
  template	
  
     b.  Add-­‐in	
  can	
  automatically	
  generate	
  some	
  of	
  the	
  metadata	
  
           where	
  the	
  info	
  is	
  available	
  from	
  the	
  file	
  
6.  Generate	
  a	
  citation	
  for	
  the	
  data	
  file	
  
7.  Deposit	
  data	
  and	
  metadata	
  in	
  a	
  repository	
  
The	
  Great	
  Debate	
  
Add-­‐in	
  	
  
•  Little	
  pieces	
  of	
  software	
  	
  
•  Download	
  to	
  extend	
  the	
  capabilities	
  of	
  Excel	
  
•  Appear	
  as	
  “ribbon”	
  

Web-­‐based	
  application	
  	
  
•  Require	
  the	
  web:	
  www	
  +	
  wba	
  
•  Do	
  not	
  require	
  that	
  you	
  download	
  a	
  program	
  
•  Websites	
  that	
  do	
  something	
  with	
  info/files	
  provided	
  by	
  user	
  
•  Examples:	
  Facebook,	
  YouTube	
  
Add-­‐in	
  
                                                                                                           New	
  &	
  
  Download	
                                                                                             improved	
  
   add-­‐in	
                                                    DCXL	
                                 spreadsheet	
  
                                                                 add-­‐in	
  




                 Check	
                                Create	
                                 Connect	
  
              Compatibility	
                          Metadata	
                                   to	
  
                                                                                                repository	
  


     1.  Parse	
  for	
  compatibility	
     1.    Make	
  template	
                   1.  Version	
  control	
  
     2.  Report	
  potential	
  errors	
     2.    Auto-­‐fill	
                         2.  Backing	
  up	
  
     3.  Allow	
  user-­‐directed	
          3.    Parameter	
  list	
  selection	
     3.  Retrieve	
  info:	
  
         error	
  correction	
               4.    Citation	
  generation	
                    Authentication	
  
                                             5.    DOI	
  connection	
                         Keyword	
  list	
  
                                                                                               Metadata	
  standard	
  
                                                                                               Citation	
  format	
  
                                                                                               Acceptable	
  file	
  formats	
  
                                                                                        	
  
Summary:	
  Add-­‐in	
  
The Good                                    The Bad
•    Integrated	
  in	
  workflow	
          •  Windows	
  only	
  
•    Familiar	
  UI,	
  functionality	
     •  Install	
  &	
  updates	
  required	
  
•    Smaller	
  shift	
                     •  Not	
  as	
  generalizable/
•    Available	
  offline	
                      extensible	
  
                                            •  Not	
  as	
  easy	
  for	
  community	
  
                                               to	
  get	
  involved	
  
Web	
  
application	
  
                                                                                                     New	
  &	
  
                           Upload	
                            Web-­‐based	
  
                                                                                                   improved	
  
                         spreadsheet	
                         application	
                      spreadsheet	
  




                  Check	
                                Create	
                                 Connect	
  
               Compatibility	
                          Metadata	
                                   to	
  
                                                                                                 repository	
  


      1.  Parse	
  for	
  compatibility	
     1.    Make	
  template	
                   1.  Version	
  control	
  
      2.  Report	
  potential	
  errors	
     2.    Auto-­‐fill	
                         2.  Backing	
  up	
  
      3.  Allow	
  user-­‐directed	
          3.    Parameter	
  list	
  selection	
     3.  Retrieve	
  info:	
  
          error	
  correction	
               4.    Citation	
  generation	
                    Authentication	
  
                                              5.    DOI	
  connection	
                         Keyword	
  list	
  
                                                                                                Metadata	
  standard	
  
                                                                                                Citation	
  format	
  
                                                                                                Acceptable	
  file	
  formats	
  
                                                                                         	
  
Summary:	
  Web	
  based	
  
The Good                                       The Bad
•    Easier	
  to	
  maintain,	
  update	
     •    Not	
  familiar	
  
•    Can	
  use	
  with	
  Mac	
               •    Requires	
  new	
  UI	
  
•    Generalizable/extensible	
                •    Not	
  integrated	
  in	
  Excel	
  
•    Community	
  involvement	
                •    Offline	
  use	
  not	
  guaranteed	
  
     possible	
  
Moving	
  forward…	
  
•  Simple,	
  clean	
  user	
  interface	
  
•  Connect	
  to	
  web	
  application	
  from	
  within	
  Excel	
  
•  Offline	
  use	
  of	
  web	
  application,	
  especially	
  ability	
  to	
  
   create	
  metadata	
  offline	
  
Send	
  me	
  feedback!	
  




                                                         From	
  Flickr	
  by	
  hashmil	
  
Comment	
  on	
  the	
  blog	
     dcxl.cdlib.org	
  
             Email	
  me	
         carlystrasser@gmail.com	
  
            Tweet	
  me	
          @carlystrasser	
  
    FB	
  message	
  me	
          DCXLatCDL	
  
Diane	
  Bisom	
  
                                                 Ann	
  Frenkel	
  
                                         Dr.	
  Ruth	
  Jackson	
  


dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  


                              www.carlystrasser.net	
  
                          carlystrasser@gmail.com	
  
                                     @carlystrasser	
  

Weitere ähnliche Inhalte

Ähnlich wie UC Riverside: Data Management for Scientists

UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Carly Strasser
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesStefan Dietze
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current LandscapeCarly Strasser
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPToolCarly Strasser
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessCarly Strasser
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel Carly Strasser
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data AbigailGoben
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataCarly Strasser
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsCarly Strasser
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 

Ähnlich wie UC Riverside: Data Management for Scientists (20)

UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA Week
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPTool
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for Success
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share Data
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small Datasets
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data EquivalenceNISO Forum, Denver, Sept. 24, 2012: Data Equivalence
NISO Forum, Denver, Sept. 24, 2012: Data Equivalence
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 

Mehr von Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 

Mehr von Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

UC Riverside: Data Management for Scientists

  • 1. Data  Management  for  Scientists     Reduce  your  workload   Reuse  your  ideas   Recycle  your  data     www.oddee.com   Carly  Strasser,  PhD   UC  Riverside   California  Digital  Library,  UC  Office  of  the  President   February  2012   carly.strasser@ucop.edu   www.carlystrasser.net  
  • 2. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 3. What  role  can   libraries  play  in   data  education?   What  barriers  to  sharing   can  we  eliminate?   Why  don’t  people   share  data?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  • 5. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 6. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 7. Digital  data   +     Complex  analyses  
  • 8. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 9. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 10. 2  tables   Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 11. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 12. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 12  
  • 13. Random  stats  output   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
  • 14. Data  Hangover     What  happened?   From  Flickr  by  SteveMcN  
  • 15. Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 16. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  • 17. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 18. Data   Reuse   Data   Sharing   Data   Management  
  • 19. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers  etc.   Ecological  Archives,  Beyond  the  PDF  
  • 20. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers  etc.   Ecological  Archives,  Beyond  the  PDF     Funders   Data  management  requirements    
  • 21. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 22. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse  
  • 23. 2.  Data  collection  &  organization   Create  unique  identifiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   From  Flickr  by  zebbie   From  Flickr  by  sjbresnahan  
  • 24. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  • 25. 2.  Data  collection  &  organization   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Excel  lists   Data Google  Docs     Forms   validataion   Modified  from  K.  Vanderbilt    
  • 26. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 27. 2.  Data  collection  &  organization   Use  descriptive  file  names   PhDcomics.com  
  • 28. 2.  Data  collection  &  organization    Use  descriptive  file  names  *   •  Unique   •  Reflect  contents   Bad:    Mydata.xls   Better:  Eaffinis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 29. 2.  Data  collection  &  organization   Organize  files    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  • 30. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  • 31. 2.  Data  collection  &  oAll  of  the  things  that   rganization   make  Excel  great  for   data  organization   are  bad  for  archiving!   What  to  do?   1.  Create  archive-­‐ready  raw  data   2.  Put  it  somewhere  special   3.  Have  your  fun  with  fancy  Excel  techniques   4.  Keep  archiving  in  mind  
  • 32. 3.  Quality  control  and  quality  assurance   Define  &  enforce  standards   Double  data  entry   Document  changes   Minimize  manual  data  entry   No  missing,  impossible,  or  anomalous  values     60   50   40   30   20   10   0   0   5   10   15   20   25   30   35  
  • 33. 4.  Metadata  basics   Why  are  you   What  is   promoting   metadata?   Excel?  
  • 34. 4.  Metadata  basics      Metadata  =  Data  reporting     WHO  created  the  data?   WHAT  is  the  content  of  the  data  set?   WHEN  was  it  created?   WHERE  was  it  collected?   HOW  was  it  developed?   WHY  was  it  developed?  
  • 35. •  Scientific  context   4.  Metadata  basics   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  file(s)  in  the  data   •  Environmental  conditions  during  collection   set   •  Where  collected  &  spatial  resolution  When   •  Date  the  data  set  was  last  modified   collected  &  temporal  resolution   •  Example  data  file  records  for  each  data   •  Standards  or  calibrations  used   type  file   •  Information  about  parameters   •  Pertinent  companion  files   •  How  each  was  measured  or  produced   •  List  of  related  or  ancillary  data  sets   •  Units  of  measure   •  Software  (including  version  number)   •  Format  used  in  the  data  set   used  to  prepare/read    the  data  set   •  Precision  &  accuracy  if  known   •  Data  processing  that  was  performed   •  Information  about  data   •  Personnel  &  stakeholders   •  Definitions  of  codes  used   •  Who  collected     •  Quality  assurance  &  control  measures   •  Who  to  contact  with  questions   •  Known  problems  that  limit  data  use  (e.g.   •  Funders   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  • 36. 4.  Metadata  basics   What  is  a   What  is   metadata   metadata?   standard?   Select  the  appropriate   metadata  standard   •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…     •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  • 38. 5.  Workflows   Simplest  workflows:  commented  scripts,  flow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  • 39. 5.  Workflows   Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  • 40. 5.  Workflows   Workflows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  • 41. 6.  Data  stewardship  &  reuse   From  Flickr  by  greensambaman   The 20-Year Rule The  metadata  accompanying  a   data  set  should  be  written  for  a   user  20  years  into  the  future   RULE       (National  Research  Council  1991)  
  • 42. 6.  Data  stewardship  &  reuse   Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   Modified from R. Cook  
  • 43. 6.  Data  stewardship  &  reuse   Where  do  I   put  my  data?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  • 44. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  find  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Learn  more  at  www.datacite.org   Modified from R. Cook  
  • 45. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 46. 1.  Planning   What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your  data   during  your  research  and  after  you  complete  your  research   Data   Hangover    
  • 47. 1.  Planning   Why  should  I  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  • 48. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  • 49. 1.  Types  of  data  &  other  information   •  Types  of  data  produced   •  Relationship  to  existing  data   •  How/when/where  will  the  data  be  captured  or   created?   C.  Strasser   •  How  will  the  data  be  processed?   •  Quality  assurance  &  quality  control  measures   •  Security:  version  control,  backing  up   biology.kenyon.edu   •  Who  will  be  responsible  for  data  management   during/after  project?   From  Flickr  by  Lazurite  
  • 50. 2.  Data  &  metadata  standards   •  What  metadata  are  needed  to  make  the  data  meaningful?   •  How  will  you  create  or  capture  these  metadata?     Wired.com   •  Why  have  you  chosen  particular  standards  and  approaches   for  metadata?  
  • 51. 3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   •  Are  you  under  any  obligation  to  share  data?     •  How,  when,  &  where  will  you  make  the  data  available?     •  What  is  the  process  for  gaining  access  to  the  data?     •  Who  owns  the  copyright  and/or  intellectual  property?   •  Will  you  retain  rights  before  opening  data  to  wider  use?  How  long?   •  Are  permission  restrictions  necessary?   •  Embargo  periods  for  political/commercial/patent  reasons?     •  Ethical  and  privacy  issues?   •  Who  are  the  foreseeable  data  users?   •  How  should  your  data  be  cited?  
  • 52. 5.  Plans  for  archiving  &  preservation   •  What  data  will  be  preserved  for  the  long  term?  For  how  long?       •  Where  will  data  be  preserved?   •  What  data  transformations  need  to  occur  before   preservation?   •  What  metadata  will  be  submitted   alongside  the  datasets?   •  Who  will  be  responsible  for  preparing   data  for  preservation?  Who  will  be  the   main  contact  person  for  the  archived   data?   From  Flickr  by  theManWhoSurfedTooMuch  
  • 53. Don’t  forget:  Budget   •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees   •  How  costs  will  be  paid     Request  funding!   dorrvs.com  
  • 54. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over  time   (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  definitions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &  program   officer     *Unofficially   Help  from  Jennifer  Schopf,  NSF  
  • 55. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 56. DMPTool:          dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &  updates  
  • 57. E-­‐notebooks   •  NoteBook   •  ORNL  eNote     •  Evernote   •  Google  Docs   •  Blogs   •  wikis   •  TheLabNotebook.com   •  iPad  ELN   •  NoteBookMaker   iPad ELN, the flexible electronic laboratory notebook TheLabNotebook.com"
  • 58. CDL  Services  for  UC  Community   Where   should  I  put   Data  Repository   my  data?   Deposit    |    Manage    |    Share    |    Preserve   www.cdlib.org/services/uc3  
  • 59. CDL  Services  for  UC  Community   Create  &  manage  persistent  identifiers   •  Precise  identification  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the  data   •  Research  metrics  for  datasets   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     www.cdlib.org/services/uc3  
  • 60. Why  are  you   promoting   Excel?   •  Open  source  add-­‐in   •  Facilitate  data  management,  sharing,  archiving  for  scientists   •  Focus  on  atmospheric,  ecological,  hydrological,  and   oceanographic  data   •  Collecting  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries   Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  • 61. Why  are  you   promoting   Excel?   Everyone  uses  it   Stopgap  measure      
  • 62. B   A   C  
  • 63. www.dataone.org   •  Data  Education  Tutorials   •  Database  of  best  practices     &  software  tools   •  Links  to  DMPTool   •  Primer  on  data  management   From  Flickr  by  Robert  Hruzek  
  • 64. Data Management 101" dcxl.cdlib.org   •  Data  Education  Tutorials   •  Other  resources  
  • 66. Process   1.  Assess  needs   2.  Gather  requirements   3.  Build  requirements   document   4.  Build  community  
  • 67. Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  (other  than  add-­‐in  and  Excel)  necessary   3.  Can  be  used  offline   4.  Perform  CSV  compatibility  checks,  reporting,  and  automated  fixes   5.  Add  Metadata  to  data  file   a.  Can  use  existing  metadata  as  a  template   b.  Add-­‐in  can  automatically  generate  some  of  the  metadata   where  the  info  is  available  from  the  file   6.  Generate  a  citation  for  the  data  file   7.  Deposit  data  and  metadata  in  a  repository  
  • 68. The  Great  Debate   Add-­‐in     •  Little  pieces  of  software     •  Download  to  extend  the  capabilities  of  Excel   •  Appear  as  “ribbon”   Web-­‐based  application     •  Require  the  web:  www  +  wba   •  Do  not  require  that  you  download  a  program   •  Websites  that  do  something  with  info/files  provided  by  user   •  Examples:  Facebook,  YouTube  
  • 69. Add-­‐in   New  &   Download   improved   add-­‐in   DCXL   spreadsheet   add-­‐in   Check   Create   Connect   Compatibility   Metadata   to   repository   1.  Parse  for  compatibility   1.  Make  template   1.  Version  control   2.  Report  potential  errors   2.  Auto-­‐fill   2.  Backing  up   3.  Allow  user-­‐directed   3.  Parameter  list  selection   3.  Retrieve  info:   error  correction   4.  Citation  generation   Authentication   5.  DOI  connection   Keyword  list   Metadata  standard   Citation  format   Acceptable  file  formats    
  • 70. Summary:  Add-­‐in   The Good The Bad •  Integrated  in  workflow   •  Windows  only   •  Familiar  UI,  functionality   •  Install  &  updates  required   •  Smaller  shift   •  Not  as  generalizable/ •  Available  offline   extensible   •  Not  as  easy  for  community   to  get  involved  
  • 71. Web   application   New  &   Upload   Web-­‐based   improved   spreadsheet   application   spreadsheet   Check   Create   Connect   Compatibility   Metadata   to   repository   1.  Parse  for  compatibility   1.  Make  template   1.  Version  control   2.  Report  potential  errors   2.  Auto-­‐fill   2.  Backing  up   3.  Allow  user-­‐directed   3.  Parameter  list  selection   3.  Retrieve  info:   error  correction   4.  Citation  generation   Authentication   5.  DOI  connection   Keyword  list   Metadata  standard   Citation  format   Acceptable  file  formats    
  • 72. Summary:  Web  based   The Good The Bad •  Easier  to  maintain,  update   •  Not  familiar   •  Can  use  with  Mac   •  Requires  new  UI   •  Generalizable/extensible   •  Not  integrated  in  Excel   •  Community  involvement   •  Offline  use  not  guaranteed   possible  
  • 73. Moving  forward…   •  Simple,  clean  user  interface   •  Connect  to  web  application  from  within  Excel   •  Offline  use  of  web  application,  especially  ability  to   create  metadata  offline  
  • 74. Send  me  feedback!   From  Flickr  by  hashmil   Comment  on  the  blog   dcxl.cdlib.org   Email  me   carlystrasser@gmail.com   Tweet  me   @carlystrasser   FB  message  me   DCXLatCDL  
  • 75. Diane  Bisom   Ann  Frenkel   Dr.  Ruth  Jackson   dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser