SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
DataONE:	
  An	
  interoperable	
  data	
  
repositories	
  case	
  study	
  
	
  
John	
  W.	
  Cobb	
  
	
  
R&D	
  Staff	
  and	
  DataONE	
  Leadership	
  Team	
  Member	
  
Oak	
  Ridge	
  Na;onal	
  Laboratory	
  
	
  
HUBbub	
  2012	
  ,	
  the	
  HUBzero	
  conference	
  
	
  
Indianapolis,	
  IN	
  
24	
  September	
  2012	
  
	
  
Acknowledgment:	
  
•    Authorship:	
  This	
  talk	
  
     represents	
  work	
  of	
  the	
  
     en;re	
  DataONE	
  
     extended	
  team.	
  
•    It	
  especially	
  draws	
  upon	
  
     slide	
  material	
  from	
  
      •     Bill	
  Michener,	
  UNM	
  
            (esp.	
  recent	
  DataONE	
  
            AHM	
  Sept.	
  18,	
  2012)	
  
      •     Amber	
  Budden	
  –	
  
            DataONE	
  Ass.	
  Dir.	
  For	
  
            CE	
  
•    DataONE	
  is	
  an	
  NSF	
  
     supported	
  project	
  
     (OCI-­‐0830944)	
  


                                                 2	
  
Hubs	
  and	
  data	
  repositories	
  
•    A	
  personal	
  view	
  	
  
     (apologies	
  for	
  a	
  possibly	
  mis-­‐informed	
  speaker)	
  
•    HUB-­‐roots	
  (history	
  and	
  pre-­‐history)	
  
      •     PUNCH:	
  	
  web	
  portal	
  for	
  running	
  tools	
  	
  
            (DOI:	
  10.1109/40.846308)	
  
      •     -­‐>	
  NanoHUB:	
  Applica;on	
  orchestra;on	
  environment	
  
      •     +	
  RAPPTURE:	
  Rapid	
  Applica;on	
  por;ng	
  and	
  development	
  
      •     +	
  Framelesss	
  VNC	
  windows	
  –	
  	
  
            seamless	
  hosted	
  environment	
  on	
  clients!	
  
      •     +	
  Rich	
  collabora;ve	
  environment	
  and	
  rich	
  user	
  experience	
  !!	
  (“wishlist”)	
  
      •     Repurpose:	
  Hubzero	
  -­‐>	
  hubs	
  explode	
  	
  
            (ex.	
  NEESHub	
  a	
  cri;cal	
  advantage	
  for	
  largest	
  research	
  award	
  in	
  Purdue	
  history)	
  
•    Now	
  (and	
  recent	
  past)	
  turn	
  to	
  Hub+Data	
  Integra;on.	
  Some	
  successes	
  
     already	
  
•    Opportunity:	
  Richer	
  interac;ons	
  between	
  HUB’s	
  and	
  mul;ple	
  data	
  
     repositories	
  
•    Perhaps	
  for	
  example:	
  Enable	
  mul;-­‐project	
  collabora;on	
  within	
  PURR?	
  
•    Or:	
  Integrate	
  NEES	
  DB’s	
  with	
  SCEC	
  simula;ons	
  and	
  IRIS	
  waveforms?	
  

                                                                                                                                  3	
  
Mul;ple	
  data	
  repository	
  access?	
  
•      HUB	
  +	
  Database	
  exists	
  
•      HUB	
  +	
  external	
  data	
  repository	
  access	
  use	
  case.	
  
•      But	
  …..	
  What	
  if?	
  
	
  
	
  
•      Access	
  mul;ple	
  (possibly	
  external)	
  repositories	
  from	
  within	
  a	
  HUB	
  
       environment?	
  
        •      Access	
  mul;ple	
  external	
  repositories	
  with	
  similar	
  data?	
  Say	
  aggregate	
  all	
  data	
  
               from	
  state	
  hydrologists?	
  C.f.	
  driNET	
  hip://drinet.hubzero.org	
  
        •      Integrate	
  disparate	
  data	
  sets	
  for	
  new	
  and	
  novel	
  analysis.	
  
                  Recall	
  	
  Noshir	
  Contractor’s	
  comments	
  this	
  morning:	
  teaming	
  and	
  	
  
                  interdisciplinary	
  work	
  has	
  increased	
  impact	
  (Wuchty,	
  Jones,	
  Uzzi)	
  
        •      Enable	
  reproducible	
  analysis	
  and	
  synthesis	
  via	
  a	
  automated	
  workflow	
  to	
  create	
  
               synthe;c	
  data	
  products	
  
        •      Programma;c	
  access	
  
        •      More	
  integra;on	
  (more	
  than	
  just	
  raw	
  search	
  terms	
  a	
  la	
  Google)	
  
        •      …	
  
        •      What	
  do	
  you	
  want	
  to	
  discover	
  today?	
  (to	
  paraphrase	
  Microsol)	
  


                                                                                                                                  4	
  
DataONE	
  mo;va;on	
  
•    DataONE	
  is	
  a	
  project	
  to	
  address	
                                   Plan	
  
     these	
  issues	
  
•    Build	
  (assemble/aggregate)	
  	
                               Analyze	
                    Collect	
  
     data	
  repository	
  interoperability	
  
•    Advance	
  state	
  of	
  the	
  prac;ce	
  
     data	
  lifecycle	
  management	
  
      •       Planning	
                                     Integrate	
                                     Assure	
  
      •       Deposi;on	
  
      •       Metadata	
  genera;on	
  
      •       Seman;c	
  integra;on	
  
                                                                      Discover	
                    Describe	
  
      •       Workflow	
  and	
  provenance	
  
      •       Analysis	
                                                             Preserve	
  
      •       Synthesis	
  
•    Focus	
  on	
  a	
  broad	
  science	
  area	
  
•    Deploy	
  a	
  working	
  CI	
  and	
  grow	
  it	
  
•    DataONE	
  –	
  Data	
  Observa;on	
  
     Network	
  Earth	
  


                                                                                                                          5	
  
6	
  
Pressing	
  issues	
  for	
  the	
  digital	
  data	
  lifecycle	
  

                                                Plan	
  


                             Analyze	
                     Collect	
  




             Integrate	
                                                  Assure	
  




                             Discover	
                    Describe	
  



                                            Preserve	
  


                                                                                       7	
  
Mul;ple	
  data	
  sources	
  –	
  mutually	
  reinforcing	
  
                                       Increasing	
  Process	
  Knowledge	
  
Decreasing	
  Spa;al	
  Coverage	
  




                                                                                                                     Intensive	
  science	
  sites	
  	
  
                                                                                                                     and	
  experiments	
  


                                                                                                                            Extensive	
  science	
  sites	
  


                                                                                                                                     Volunteer	
  &	
  	
  
                                                                                                                                     educa;on	
  networks	
  

                                                                                                                                              Remote	
  
                                                                                                                                              sensing	
  
                                                                                Adapted	
  from	
  CENR-­‐OSTP	
  

                                                                                                                                                                8	
  
Scaiered	
  data	
  sources	
  
                     “finding	
  the	
  needle	
  in	
  the	
  haystack”	
  


Data	
  are	
  massively	
  dispersed	
  
   •    Ecological	
  field	
  sta;ons	
  and	
  research	
  centers	
  (100s)	
  
   •    Natural	
  history	
  museums	
  and	
  biocollec;on	
  facili;es	
  (100s)	
  
   •    Agency	
  data	
  collec;ons	
  (100s	
  to	
  1000s)	
  
   •    Individual	
  scien;sts	
  (1000s	
  to	
  10,000s	
  to	
  100,000s)	
  




                                                                                          9	
  
Data	
  Preserva;on	
  and	
  Planning	
  




 ✔                                    ?	
     10	
  
Preserva;on:	
  Poor	
  data	
  prac;ce	
  
                                  “data	
  entropy”	
  

                                     Time	
  of	
  publica?on	
  
                                           Specific	
  details	
  

                                                   General	
  details	
  


                                                                            Re?rement	
  or	
  	
  
Informa?on	
  Content	
  




                                                                            career	
  change	
  

                            Accident	
  

                                                                                                   Death	
  



                                                               Time	
                  (Michener	
  et	
  al.	
  1997)	
  




                                                                                                                             11	
  
 Preserva;on:	
  Data	
  longevity	
  
                                                                              Resource
         Study                            Resource Type
                                                                               Half-life
    Rumsey (2002)                               Legal Citations                1.4 years

 Harter and Kim (1996)                 Scholarly Article Citations             1.5 years

Koehler (1999 and 2002)                    Random Web Pages                    2.0 years

    Spinellis (2003)                         Computer Science                  4.0 years
                                                Citations
 Markwell and Brooks                        Biological Science                 4.6 years
      (2002)                               Education Resources
Nelson and Allen (2002)                    Digital Library Object              24.5 years

    Koehler,	
  W.	
  (2004)	
  Informa(on	
  Research	
  9(2):	
  174.	
                   12	
  
The Long Tail of Orphan Data	



                  The	
  Ultra-­‐violet	
                Most of the bytes
                  divergence	
                         are at the high end,
            Specialized repositories	

                but most of the
            (e.g. GenBank, PDB)	

                     datasets are at the
Volume	





                                                       low end – Jim Gray	

                                                                 The	
  Infrared	
  
                                    Orphan data	

               Catastrophe	
  




                                                                        (B. Heidorn)	

                                 Rank frequency of datatype	


                                                                                          13	
  
Data	
  deluge	
  and	
  interoperability	
  
              “the	
  flood	
  of	
  increasingly	
  heterogeneous	
  data”	
  

Data	
  are	
  heterogeneous	
  
   •  Syntax	
  
      •  (format)	
  
   •  Schema	
  
      •  (model)	
  
   •  Seman;cs	
  
      •  (meaning)	
  




                                                                                 Jones	
  et	
  al.	
  2007	
  




                                                                                                                  14	
  
Metadata	
  universe	
  (mul;-­‐verse)	
  
•  There	
  are	
  a	
  mul;tude	
  of	
  metadata	
  standards	
  
•  Discipline	
  and	
  sub-­‐discipline	
  specific	
  
•  Each	
  with	
  different	
  terms	
  and	
  context	
  




Source:	
  Jenn	
  Riley,	
  Indiana	
  U.	
  Digital	
  Librarian	
  
hip://www.dlib.indiana.edu/~jenlrile/metadatamap/	
  	
  Via	
  John	
  Kunze,	
  Cal.	
  Dig.	
  Lib	
  	
  


                                                                                                                15	
  
Each	
  dot	
  is	
  its	
  own	
  standard	
  !	
  




“…billions	
  and	
  billions	
  of	
  worlds	
  …”	
  –	
  Carl	
  Sagan	
  
                                                                                16	
  
DataONE	
  	
  CI	
  architectural	
  Elements	
  
•    Hard-­‐core	
  cyberinfrastructure	
  (CI)	
  
      •      CI	
  Member	
  Node	
  (MN)	
  data	
  
             repositories	
  
      •      Coordina;ng	
  Node	
  (CN)	
  global	
  
             metadata	
  repo’s	
                                                                   Investigator Toolkit

             Simple,	
  but	
  powerful	
  REST	
  API/SPI	
  for	
  
                                                                            Web Interface          Analysis, Visualization      Data Management
      • 
             universal	
  access	
                                                                    Client Libraries

      •      Inves;gator	
  toolkit	
  (ITK)	
  solware	
  tools	
               Java                      Python                Command Line

             to	
  allow	
  access	
  to	
  the	
  data	
  repository	
  
             collec;ve	
  via	
  familiar	
  access	
  idioms	
                Member Nodes                                Coordinating Nodes


•    Cultural	
  and	
  wetware	
  issues	
                                                                             Service Interfaces
                                                                                                                     Resolution      Discovery
             Educa;onal	
  Materials	
  
                                                                              Service Interfaces
      •                                                                                                              Replication      Registration

      •      Best	
  prac;ces	
                                             Bridge to non-DataONE                        Coordination Layer
      •      Workshops	
  and	
  tutorials	
                                Member Node services                      Identifiers      Catalog

      •      Surveys	
  and	
  assessments	
                                                                        Preservation        Monitor
                                                                               Data Repository
      •      Scien;st,	
  policymaker,	
  ci;zen	
                                                                  Object Store          Index
             engagement	
  
      •      Collabora;on,	
  governance,	
  and	
  
             sustainability	
  

                                                         hip://mule1.dataone.org/ArchitectureDocs-­‐current/	
  	
  	
  


                                                                                                                                                     17	
  
A	
  User’s	
  View	
  




                          18	
  
Key	
  Cyberinfrastructure	
  Elements	
  

                          •  Unique	
  iden;fiers	
  
                          •  Search	
  and	
  deliver	
  
                          •  Replica;on	
  
                          •  Federated	
  iden;ty	
  

                          Usable	
  by	
  
                             People	
  and	
  their	
  Agents	
  




                                                                    19	
  
Suppor;ng	
  the	
  data	
  lifecycle	
  



                                                         ORC	
  
                                                         Node	
  
                                           UCSB	
  
                                           Node	
  

                                                      UNM	
  
                                                      Node	
  




                      1.    Deposi;on/acquisi;on/ingest	
  


                  }
                      2.    Cura;on	
  and	
  metadata	
  management	
  
The	
  data	
         3.    Protec;on,	
  including	
  privacy	
  
lifecycle	
           4.    Discovery,	
  access,	
  use,	
  and	
  dissemina;on	
  
	
                    5.    Interoperability,	
  standards,	
  and	
  integra;on	
  
                      6.    Evalua;on,	
  analysis,	
  and	
  visualiza;on"            20	
  
DataONE	
  Supports	
  Data	
  Preserva;on	
  
Three	
  major	
  components	
  for	
  a	
     Member	
  Nodes	
  
flexible,	
  scalable,	
  sustainable	
         •  diverse	
  ins;tu;ons	
  
                                               Coordina?ng	
  Nodes	
  
network	
                                      •  serve	
  local	
  community	
  
                                               •  retain	
  complete	
  metadata	
  
                                               Inves?gator	
  Toolkit	
  
                                               •  provide	
  resources	
  for	
  
                                                  catalog	
  	
  
                                                  managing	
  their	
  data	
  
                                               •  indexing	
  for	
  search	
  
                                               •  retain	
  copies	
  of	
  data	
  
                                               •  network-­‐wide	
  services	
  
                                               •  ensure	
  content	
  
                                                  availability	
  (preserva;on)	
  	
  	
  
                                               •  replica;on	
  services	
  




                                                                                              21	
  
DataONE	
  sa;sfies	
  arch	
  requirements	
  

•  Enables	
  integra;on	
  of	
  mul;ple	
  geographically	
  
   diverse	
  and	
  metadata	
  diverse	
  repositories	
  
•  Presents	
  collec;ve	
  search	
  results	
  across	
  mul;ple	
  
   repository	
  
•  Provides	
  a	
  unified	
  API/SPI	
  for	
  search	
  and	
  
   programma;c	
  interface	
  
     hip://mule1.dataone.org/ArchitectureDocs-­‐current/	
  	
  	
  
•    DataONE	
  content	
  has	
  unique	
  iden;fiers	
  (DOI’s)	
  for	
  
     referencable/citable	
  data	
  objects	
  
•    Supports	
  both	
  large	
  datasets	
  and	
  the	
  long-­‐tails	
  


                                                                               22	
  
DataONE	
  spurs	
  innova;on	
  

•  Enables	
  new	
  analysis	
  and	
  synthesis	
  efforts	
  by	
  
     integra;ng	
  tasks	
  across	
  repositories	
  	
  
•  Provides	
  means	
  for	
  data	
  replica;on	
  and	
  basis	
  for	
  
     repositories	
  to	
  build	
  “data	
  wills”	
  or	
  “data	
  trust”	
  
     plans	
  	
  
•  Provides	
  a	
  plavorm	
  to	
  develop	
  advanced	
  
     interoperable	
  workflow	
  tools	
  and	
  seman;c	
  
     integra;on	
  tools	
  
	
  


                                                                                   23	
  
DataONE:	
  current	
  state/recent	
  progress	
  




                                                      24	
  
DataONE:	
  Suppor;ng	
  Scien;fic	
  Data	
  
           Preserva;on,	
  Discovery,	
  and	
  Innova;on	
  	
  
                                                 Current	
  Member	
  Nodes:	
  
                                                 	
  	
  
                                                 	
  	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 	
  
                                                 Coming	
  Soon:	
  
                                                 	
  	
  
Current	
  Tools:	
                              	
  
	
  	
  
	
  
	
  
Tools	
  Coming	
  Soon:	
  	
                              Queensland	
  University	
  of	
  Technology	
  



                                                                                                               25	
  
Data	
  Management	
  Planning	
  Tool	
  
                          hips://dmp.cdlib.org/	
  	
  




                                                          26	
  
27	
  
28	
  
Plans	
  per	
  template	
  (as	
  of	
  June	
  2012)	
  
                                                              400	
  
Approximate	
  number	
  of	
  plans	
  per	
  template	
  




                                                              350	
       339	
  

                                                              300	
                 287	
                                                              Templates	
  of	
  greatest	
  interest	
  to	
  
                                                                                                                                                           the	
  DataONE	
  community	
  in	
  red;	
  
                                                              250	
  
                                                                                                                                               	
  	
  	
  2,302	
  unique	
  users	
  to	
  date	
  
                                                                                              197	
  
                                                              200	
  
                                                                                                        159	
  
                                                              150	
                                               133	
   133	
   124	
  
                                                                                                                                            101	
  
                                                              100	
                                                                                   71	
     65	
      60	
  
                                                                                                                                                                                   46	
      37	
      36	
  
                                                                50	
                                                                                                                                             34	
  
                                                                                                                                                                                                                           17	
     15	
     6	
  
                                                                  0	
  




                                                                                                                                                                        hip://dmptool.org	
  	
  	
  	
  	
  	
  	
  	
  	
  	
                      29	
  
✔ Check	
  for	
  best	
  prac;ces	
  
                        ✔ Create	
  metadata	
  
                        ✔ Connect	
  to	
  ONEShare	
  




   Data	
  &	
  	
  
Metadata	
  (EML)	
  




                                                            30	
  
2.	
  Data	
  Discovery	
  




                              31	
  
The	
  DataONE	
  Federa;on	
  




                                  32	
  
ORNL	
  DAAC	
  	
  	
  
as	
  a	
  DataONE	
  
Member	
  Node	
  	
                NASA	
  collectors	
     DAAC	
  Users	
  	
  	
  (UWG)	
  




 Inves?gator	
  Toolkit	
  




             DataONE	
  Users	
  
                                                                                             33	
  
hips://cn.dataone.org/onemercury/	
  	
  




                                            34	
  
35	
  
36	
  
Inves;gator	
  Toolkit	
  Support	
  	
  

                                  Plan	
  
                                DMP-Tool	
                  Analyze	
                     Collect	
  
Kepler




         Integrate	
                                     Assure	
  




                 Discover	
                    Describe	
  

                                Preserve	
  
                                                                      37	
  
Explora;on,	
  Visualiza;on,	
  and	
  Analysis	
  

                      Diverse	
  bird	
  observa;ons	
  and	
                   Model	
  results	
  
                      environmental	
  data	
  from	
  
                      300,00	
  loca;ons	
  in	
  the	
  US	
      Occurrence	
  of	
  Indigo	
  Bun?ng	
  (2008)	
  
                      integrated	
  and	
  analyzed	
  using	
  
                      High	
  Performance	
  Compu;ng	
  
                      Resources	
  


Land	
  Cover	
  


                                                                      Jan	
     Apr	
     Jun	
     Sep	
     Dec	
  

Meteorology	
  
                                                                     •  Examine	
  paierns	
  of	
  
                                                                        migra;on	
  	
  
MODIS	
  –	
          Spa;o-­‐Temporal	
  Exploratory	
              •  Infer	
  how	
  climate	
  
Remote	
              Model	
  iden;fies	
  factors	
                    change	
  may	
  affect	
  
                      affec;ng	
  paierns	
  of	
  migra;on	
  
sensing	
  data	
                                                       bird	
  migra;on	
  
                      	
  

                                                                                                                        38	
  
Public	
  Par?cipa?on	
  in	
  Scien?fic	
  Research	
  Conference:	
  4-­‐5	
  August	
  2012	
  in	
  Portland,	
  
Oregon	
  USA	
  prior	
  to	
  Ecological	
  Society	
  of	
  America	
  mee;ng	
  (6-­‐10	
  Aug.):	
  
hip://www.birds.cornell.edu/citscitoolkit/conference/2012	
  




                                                                                                                  39	
  
User	
  Assessments	
  


	
  	
  Scien;sts:	
  BL	
                                  Scien;sts:	
  FU	
  


                           Library	
  Policies:	
  BL	
                                 Library	
  Policies:	
  FU	
  


                           Librarians:	
  BL	
                                          Librarians:	
  FU	
  


                                            Policy	
  Makers:	
  BL	
                                        Policy	
  Makers:	
  FU	
  


                                                                            Educators:	
  BL	
                                             Educators:	
  FU	
  




            Year	
  1	
                        Year	
  2	
                    Year	
  3	
                   Year	
  4	
                    Year	
  5	
  


                                                                                                                                                                  40	
  
What	
  standard	
  do	
  you	
  currently	
  use?	
  

                                                              676




                                                      266


                    95        95        96     97
  12    21    26

  DIF   DwC   DC    EML    FGDC       Open     ISO   My Lab   none
                                      GIS
                    Metadata	
  language	
  

                                                                     41	
  
Many	
  are	
  interested	
  in	
  sharing	
  data	
  

Willing	
  to	
  share	
  data	
  across	
  a	
  broad	
                                                     81%	
  
                     group	
  of	
  researchers	
  

Willing	
  to	
  place	
  at	
  least	
  some	
  of	
  my	
  
                                                                                                           78%	
  
 data	
  into	
  a	
  central	
  data	
  repository	
  
                          with	
  no	
  restric;ons	
  

Appropriate	
  to	
  create	
  new	
  datasets	
                                                          76%	
  
                        from	
  shared	
  data	
  

Willing	
  to	
  place	
  all	
  of	
  my	
  data	
  into	
  a	
  
                                                                                           41%	
  
     central	
  data	
  repository	
  with	
  no	
  
                                         restric;ons	
  

                                                                     0%	
     20%	
     40%	
   60%	
   80%	
   100%	
  
                                                                                        Percent	
  agree	
  
                                                                                                                           42	
  
Modeler
                                                   Scientist




         Manager
         Resource
                    Ecological
                                 Data Librarians
                                                               Data
                                                               Service
                                                                              User	
  Matrix	
  



                                                               Investigator
                                                               ToolKit

                                                               Data
                                                               Management
                                                               Planning

                                                               Best
                                                               Practices


                                                               Tools
                                                                Database


                                                               Training



                                                               Curricula
43	
  
Community	
  Engagement	
  




                              44	
  
Best	
  Prac;ces	
  and	
  Solware	
  Tools	
  




                                                  45	
  
46	
  
DataONE:	
  Next	
  steps	
  
•  Member	
  node	
  growth	
  
     •         Number	
  of	
  member	
  nodes	
  
     •         Increase	
  the	
  number	
  and	
  size	
  of	
  data	
  sets	
  
     •         Sustainably	
  
          •      In	
  terms	
  of	
  resource	
  needs	
  form	
  MN’s	
  
          •      In	
  terms	
  of	
  resource	
  demands	
  on	
  DataONE	
  
•  New	
  Inves;gator	
  toolkit	
  tools	
  (strategically)	
  
•  An	
  increasing	
  number	
  of	
  science	
  use	
  cases	
  with	
  
   more	
  breakthrough	
  science	
  
•  Also,	
  re-­‐purposing	
  DataONE	
  CI	
  outside	
  of	
  Bio/Eco/
   Env	
  areas	
  in	
  strategic	
  collabora;ve	
  partnerships	
  

                                                                                    47	
  
Ack:	
  DataONE	
  Team	
  and	
  Sponsors	
  
             • Amber	
  Budden,	
  Roger	
  Dahl,	
  Rebecca	
  Koskela,	
  	
  Bill	
     • Ewa	
  Deelman	
  
               Michener,	
  Robert	
  Nahf,	
  Skye	
  Roseboom,	
  Mark	
  
               Servilla	
  
                                                                                           • Deborah	
  McGuinness	
  
             • 	
  Dave	
  Vieglais	
  	
  
             • Suzie	
  Allard,	
  Nick	
  Dexter,	
  Kimberly	
  Douglass,	
              • Jeff	
  Horsburgh	
  
               Carol	
  Tenopir,	
  Robert	
  Waltz,	
  Bruce	
  Wilson	
  
             • John	
  Cobb,	
  Bob	
  Cook,	
  Ranjeet	
  Devarakonda,	
                  • Robert	
  Sandusky	
  
               Giri	
  Palanismy,	
  Line	
  Pouchard	
  	
  
             • Patricia	
  Cruse,	
  John	
  Kunze	
                                       • Bertram	
  Ludaescher	
  

             •  Sky	
  Bristol,	
  Mike	
  Frame,	
  Richard	
  Huffine,	
  Viv	
            • Peter	
  Honeyman	
  
                Hutchison,	
  Jeff	
  Moriseie,	
  Jake	
  Weltzin,	
  Lisa	
  Zolly	
  
             • Stephanie	
  Hampton,	
  Chris	
  Jones,	
  Mai	
                           • Cliff	
  Duke	
  
               Jones,	
  Ben	
  Leinfelder,	
  Andrew	
  Pippin	
  

             • Paul	
  Allen,	
  Rick	
  Bonney,	
  Steve	
  Kelling	
                     • Carole	
  Goble	
  

             • Ryan	
  Scherle,	
  Todd	
  Vision	
                                        • Donald	
  Hobern	
  

             • Randy	
  Butler	
                                                           • David	
  DeRoure	
  


                            LEON LEVY 
                            FOUNDATION
                                                                                  48	
  
Ques;ons?	
  	
  
                            Contact	
  Points	
  
                                      	
  
                       John	
  W.	
  Cobb,	
  Ph.D.	
  
                               Oak	
  Ridge	
  
                                      	
  
                       John	
  W.	
  Cobb,	
  Ph.D.	
  
                      Oak	
  Ridge	
  Na;onal	
  Lab	
  
                        cobbjw@ornl.gov	
  
                             865.576.5439	
  
                                      	
  
                     hip://www.dataone.org/	
  	
  
                     	
  hip://docs.dataone.org	
  	
  
                    	
  
                    	
  
                                                           49	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
Scientific data management (v2)
Scientific data management (v2)Scientific data management (v2)
Scientific data management (v2)Jian Qin
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalSayeed Choudhury
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingJongwook Woo
 
Paper talk: Idcc 11
Paper talk: Idcc 11  Paper talk: Idcc 11
Paper talk: Idcc 11 Paolo Missier
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsMarieke Guy
 
Cloud Economics in Training and Simulation
Cloud Economics in Training and SimulationCloud Economics in Training and Simulation
Cloud Economics in Training and SimulationNane Kratzke
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction EMC
 

Was ist angesagt? (12)

Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Scientific data management (v2)
Scientific data management (v2)Scientific data management (v2)
Scientific data management (v2)
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 
NISO Webinar: Part 2: Managing Data for Scholarly Communications
NISO Webinar:  Part 2: Managing Data for Scholarly CommunicationsNISO Webinar:  Part 2: Managing Data for Scholarly Communications
NISO Webinar: Part 2: Managing Data for Scholarly Communications
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_final
 
Andreas Hense: Climate data for our future – acquired, analysed, archived
Andreas Hense: Climate data for our future – acquired, analysed, archivedAndreas Hense: Climate data for our future – acquired, analysed, archived
Andreas Hense: Climate data for our future – acquired, analysed, archived
 
Big Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive ComputingBig Data and Advanced Data Intensive Computing
Big Data and Advanced Data Intensive Computing
 
Paper talk: Idcc 11
Paper talk: Idcc 11  Paper talk: Idcc 11
Paper talk: Idcc 11
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 
Cloud Economics in Training and Simulation
Cloud Economics in Training and SimulationCloud Economics in Training and Simulation
Cloud Economics in Training and Simulation
 
Deroure Repo3
Deroure Repo3Deroure Repo3
Deroure Repo3
 
White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction   White Paper: Hadoop in Life Sciences — An Introduction
White Paper: Hadoop in Life Sciences — An Introduction
 

Andere mochten auch

100 questions about finance
100 questions about finance100 questions about finance
100 questions about financeNYVegan
 
Trujillo N1
Trujillo N1Trujillo N1
Trujillo N1Luz Zas
 
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011Whittles Presentation - The Complete Managers Toolkit - 25 March 2011
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011francescoandreone
 
Administracion
AdministracionAdministracion
Administracionpantro756
 

Andere mochten auch (6)

Karm nu-vignan
Karm nu-vignanKarm nu-vignan
Karm nu-vignan
 
100 questions about finance
100 questions about finance100 questions about finance
100 questions about finance
 
Trujillo N1
Trujillo N1Trujillo N1
Trujillo N1
 
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011Whittles Presentation - The Complete Managers Toolkit - 25 March 2011
Whittles Presentation - The Complete Managers Toolkit - 25 March 2011
 
Doing Business Civ
Doing Business CivDoing Business Civ
Doing Business Civ
 
Administracion
AdministracionAdministracion
Administracion
 

Ähnlich wie DataONE_cobb_hubbub2012_20120924_v05

Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Sarah Callaghan
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationJoint ALMA Observatory
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...TERN Australia
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12ASIS&T
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Neuroscience Information Framework
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...TERN Australia
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Curation of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesCuration of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesChris Rusbridge
 

Ähnlich wie DataONE_cobb_hubbub2012_20120924_v05 (20)

Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Curation of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesCuration of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositories
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

DataONE_cobb_hubbub2012_20120924_v05

  • 1. DataONE:  An  interoperable  data   repositories  case  study     John  W.  Cobb     R&D  Staff  and  DataONE  Leadership  Team  Member   Oak  Ridge  Na;onal  Laboratory     HUBbub  2012  ,  the  HUBzero  conference     Indianapolis,  IN   24  September  2012    
  • 2. Acknowledgment:   •  Authorship:  This  talk   represents  work  of  the   en;re  DataONE   extended  team.   •  It  especially  draws  upon   slide  material  from   •  Bill  Michener,  UNM   (esp.  recent  DataONE   AHM  Sept.  18,  2012)   •  Amber  Budden  –   DataONE  Ass.  Dir.  For   CE   •  DataONE  is  an  NSF   supported  project   (OCI-­‐0830944)   2  
  • 3. Hubs  and  data  repositories   •  A  personal  view     (apologies  for  a  possibly  mis-­‐informed  speaker)   •  HUB-­‐roots  (history  and  pre-­‐history)   •  PUNCH:    web  portal  for  running  tools     (DOI:  10.1109/40.846308)   •  -­‐>  NanoHUB:  Applica;on  orchestra;on  environment   •  +  RAPPTURE:  Rapid  Applica;on  por;ng  and  development   •  +  Framelesss  VNC  windows  –     seamless  hosted  environment  on  clients!   •  +  Rich  collabora;ve  environment  and  rich  user  experience  !!  (“wishlist”)   •  Repurpose:  Hubzero  -­‐>  hubs  explode     (ex.  NEESHub  a  cri;cal  advantage  for  largest  research  award  in  Purdue  history)   •  Now  (and  recent  past)  turn  to  Hub+Data  Integra;on.  Some  successes   already   •  Opportunity:  Richer  interac;ons  between  HUB’s  and  mul;ple  data   repositories   •  Perhaps  for  example:  Enable  mul;-­‐project  collabora;on  within  PURR?   •  Or:  Integrate  NEES  DB’s  with  SCEC  simula;ons  and  IRIS  waveforms?   3  
  • 4. Mul;ple  data  repository  access?   •  HUB  +  Database  exists   •  HUB  +  external  data  repository  access  use  case.   •  But  …..  What  if?       •  Access  mul;ple  (possibly  external)  repositories  from  within  a  HUB   environment?   •  Access  mul;ple  external  repositories  with  similar  data?  Say  aggregate  all  data   from  state  hydrologists?  C.f.  driNET  hip://drinet.hubzero.org   •  Integrate  disparate  data  sets  for  new  and  novel  analysis.   Recall    Noshir  Contractor’s  comments  this  morning:  teaming  and     interdisciplinary  work  has  increased  impact  (Wuchty,  Jones,  Uzzi)   •  Enable  reproducible  analysis  and  synthesis  via  a  automated  workflow  to  create   synthe;c  data  products   •  Programma;c  access   •  More  integra;on  (more  than  just  raw  search  terms  a  la  Google)   •  …   •  What  do  you  want  to  discover  today?  (to  paraphrase  Microsol)   4  
  • 5. DataONE  mo;va;on   •  DataONE  is  a  project  to  address   Plan   these  issues   •  Build  (assemble/aggregate)     Analyze   Collect   data  repository  interoperability   •  Advance  state  of  the  prac;ce   data  lifecycle  management   •  Planning   Integrate   Assure   •  Deposi;on   •  Metadata  genera;on   •  Seman;c  integra;on   Discover   Describe   •  Workflow  and  provenance   •  Analysis   Preserve   •  Synthesis   •  Focus  on  a  broad  science  area   •  Deploy  a  working  CI  and  grow  it   •  DataONE  –  Data  Observa;on   Network  Earth   5  
  • 7. Pressing  issues  for  the  digital  data  lifecycle   Plan   Analyze   Collect   Integrate   Assure   Discover   Describe   Preserve   7  
  • 8. Mul;ple  data  sources  –  mutually  reinforcing   Increasing  Process  Knowledge   Decreasing  Spa;al  Coverage   Intensive  science  sites     and  experiments   Extensive  science  sites   Volunteer  &     educa;on  networks   Remote   sensing   Adapted  from  CENR-­‐OSTP   8  
  • 9. Scaiered  data  sources   “finding  the  needle  in  the  haystack”   Data  are  massively  dispersed   •  Ecological  field  sta;ons  and  research  centers  (100s)   •  Natural  history  museums  and  biocollec;on  facili;es  (100s)   •  Agency  data  collec;ons  (100s  to  1000s)   •  Individual  scien;sts  (1000s  to  10,000s  to  100,000s)   9  
  • 10. Data  Preserva;on  and  Planning   ✔ ?   10  
  • 11. Preserva;on:  Poor  data  prac;ce   “data  entropy”   Time  of  publica?on   Specific  details   General  details   Re?rement  or     Informa?on  Content   career  change   Accident   Death   Time   (Michener  et  al.  1997)   11  
  • 12.  Preserva;on:  Data  longevity   Resource Study Resource Type Half-life Rumsey (2002) Legal Citations 1.4 years Harter and Kim (1996) Scholarly Article Citations 1.5 years Koehler (1999 and 2002) Random Web Pages 2.0 years Spinellis (2003) Computer Science 4.0 years Citations Markwell and Brooks Biological Science 4.6 years (2002) Education Resources Nelson and Allen (2002) Digital Library Object 24.5 years Koehler,  W.  (2004)  Informa(on  Research  9(2):  174.   12  
  • 13. The Long Tail of Orphan Data The  Ultra-­‐violet   Most of the bytes divergence   are at the high end, Specialized repositories but most of the (e.g. GenBank, PDB) datasets are at the Volume low end – Jim Gray The  Infrared   Orphan data Catastrophe   (B. Heidorn) Rank frequency of datatype 13  
  • 14. Data  deluge  and  interoperability   “the  flood  of  increasingly  heterogeneous  data”   Data  are  heterogeneous   •  Syntax   •  (format)   •  Schema   •  (model)   •  Seman;cs   •  (meaning)   Jones  et  al.  2007   14  
  • 15. Metadata  universe  (mul;-­‐verse)   •  There  are  a  mul;tude  of  metadata  standards   •  Discipline  and  sub-­‐discipline  specific   •  Each  with  different  terms  and  context   Source:  Jenn  Riley,  Indiana  U.  Digital  Librarian   hip://www.dlib.indiana.edu/~jenlrile/metadatamap/    Via  John  Kunze,  Cal.  Dig.  Lib     15  
  • 16. Each  dot  is  its  own  standard  !   “…billions  and  billions  of  worlds  …”  –  Carl  Sagan   16  
  • 17. DataONE    CI  architectural  Elements   •  Hard-­‐core  cyberinfrastructure  (CI)   •  CI  Member  Node  (MN)  data   repositories   •  Coordina;ng  Node  (CN)  global   metadata  repo’s   Investigator Toolkit Simple,  but  powerful  REST  API/SPI  for   Web Interface Analysis, Visualization Data Management •  universal  access   Client Libraries •  Inves;gator  toolkit  (ITK)  solware  tools   Java Python Command Line to  allow  access  to  the  data  repository   collec;ve  via  familiar  access  idioms   Member Nodes Coordinating Nodes •  Cultural  and  wetware  issues   Service Interfaces Resolution Discovery Educa;onal  Materials   Service Interfaces •  Replication Registration •  Best  prac;ces   Bridge to non-DataONE Coordination Layer •  Workshops  and  tutorials   Member Node services Identifiers Catalog •  Surveys  and  assessments   Preservation Monitor Data Repository •  Scien;st,  policymaker,  ci;zen   Object Store Index engagement   •  Collabora;on,  governance,  and   sustainability   hip://mule1.dataone.org/ArchitectureDocs-­‐current/       17  
  • 19. Key  Cyberinfrastructure  Elements   •  Unique  iden;fiers   •  Search  and  deliver   •  Replica;on   •  Federated  iden;ty   Usable  by   People  and  their  Agents   19  
  • 20. Suppor;ng  the  data  lifecycle   ORC   Node   UCSB   Node   UNM   Node   1.  Deposi;on/acquisi;on/ingest   } 2.  Cura;on  and  metadata  management   The  data   3.  Protec;on,  including  privacy   lifecycle   4.  Discovery,  access,  use,  and  dissemina;on     5.  Interoperability,  standards,  and  integra;on   6.  Evalua;on,  analysis,  and  visualiza;on" 20  
  • 21. DataONE  Supports  Data  Preserva;on   Three  major  components  for  a   Member  Nodes   flexible,  scalable,  sustainable   •  diverse  ins;tu;ons   Coordina?ng  Nodes   network   •  serve  local  community   •  retain  complete  metadata   Inves?gator  Toolkit   •  provide  resources  for   catalog     managing  their  data   •  indexing  for  search   •  retain  copies  of  data   •  network-­‐wide  services   •  ensure  content   availability  (preserva;on)       •  replica;on  services   21  
  • 22. DataONE  sa;sfies  arch  requirements   •  Enables  integra;on  of  mul;ple  geographically   diverse  and  metadata  diverse  repositories   •  Presents  collec;ve  search  results  across  mul;ple   repository   •  Provides  a  unified  API/SPI  for  search  and   programma;c  interface   hip://mule1.dataone.org/ArchitectureDocs-­‐current/       •  DataONE  content  has  unique  iden;fiers  (DOI’s)  for   referencable/citable  data  objects   •  Supports  both  large  datasets  and  the  long-­‐tails   22  
  • 23. DataONE  spurs  innova;on   •  Enables  new  analysis  and  synthesis  efforts  by   integra;ng  tasks  across  repositories     •  Provides  means  for  data  replica;on  and  basis  for   repositories  to  build  “data  wills”  or  “data  trust”   plans     •  Provides  a  plavorm  to  develop  advanced   interoperable  workflow  tools  and  seman;c   integra;on  tools     23  
  • 24. DataONE:  current  state/recent  progress   24  
  • 25. DataONE:  Suppor;ng  Scien;fic  Data   Preserva;on,  Discovery,  and  Innova;on     Current  Member  Nodes:                           Coming  Soon:       Current  Tools:             Tools  Coming  Soon:     Queensland  University  of  Technology   25  
  • 26. Data  Management  Planning  Tool   hips://dmp.cdlib.org/     26  
  • 27. 27  
  • 28. 28  
  • 29. Plans  per  template  (as  of  June  2012)   400   Approximate  number  of  plans  per  template   350   339   300   287   Templates  of  greatest  interest  to   the  DataONE  community  in  red;   250        2,302  unique  users  to  date   197   200   159   150   133   133   124   101   100   71   65   60   46   37   36   50   34   17   15   6   0   hip://dmptool.org                     29  
  • 30. ✔ Check  for  best  prac;ces   ✔ Create  metadata   ✔ Connect  to  ONEShare   Data  &     Metadata  (EML)   30  
  • 33. ORNL  DAAC       as  a  DataONE   Member  Node     NASA  collectors   DAAC  Users      (UWG)   Inves?gator  Toolkit   DataONE  Users   33  
  • 35. 35  
  • 36. 36  
  • 37. Inves;gator  Toolkit  Support     Plan   DMP-Tool Analyze   Collect   Kepler Integrate   Assure   Discover   Describe   Preserve   37  
  • 38. Explora;on,  Visualiza;on,  and  Analysis   Diverse  bird  observa;ons  and   Model  results   environmental  data  from   300,00  loca;ons  in  the  US   Occurrence  of  Indigo  Bun?ng  (2008)   integrated  and  analyzed  using   High  Performance  Compu;ng   Resources   Land  Cover   Jan   Apr   Jun   Sep   Dec   Meteorology   •  Examine  paierns  of   migra;on     MODIS  –   Spa;o-­‐Temporal  Exploratory   •  Infer  how  climate   Remote   Model  iden;fies  factors   change  may  affect   affec;ng  paierns  of  migra;on   sensing  data   bird  migra;on     38  
  • 39. Public  Par?cipa?on  in  Scien?fic  Research  Conference:  4-­‐5  August  2012  in  Portland,   Oregon  USA  prior  to  Ecological  Society  of  America  mee;ng  (6-­‐10  Aug.):   hip://www.birds.cornell.edu/citscitoolkit/conference/2012   39  
  • 40. User  Assessments      Scien;sts:  BL   Scien;sts:  FU   Library  Policies:  BL   Library  Policies:  FU   Librarians:  BL   Librarians:  FU   Policy  Makers:  BL   Policy  Makers:  FU   Educators:  BL   Educators:  FU   Year  1   Year  2   Year  3   Year  4   Year  5   40  
  • 41. What  standard  do  you  currently  use?   676 266 95 95 96 97 12 21 26 DIF DwC DC EML FGDC Open ISO My Lab none GIS Metadata  language   41  
  • 42. Many  are  interested  in  sharing  data   Willing  to  share  data  across  a  broad   81%   group  of  researchers   Willing  to  place  at  least  some  of  my   78%   data  into  a  central  data  repository   with  no  restric;ons   Appropriate  to  create  new  datasets   76%   from  shared  data   Willing  to  place  all  of  my  data  into  a   41%   central  data  repository  with  no   restric;ons   0%   20%   40%   60%   80%   100%   Percent  agree   42  
  • 43. Modeler Scientist Manager Resource Ecological Data Librarians Data Service User  Matrix   Investigator ToolKit Data Management Planning Best Practices Tools Database Training Curricula 43  
  • 45. Best  Prac;ces  and  Solware  Tools   45  
  • 46. 46  
  • 47. DataONE:  Next  steps   •  Member  node  growth   •  Number  of  member  nodes   •  Increase  the  number  and  size  of  data  sets   •  Sustainably   •  In  terms  of  resource  needs  form  MN’s   •  In  terms  of  resource  demands  on  DataONE   •  New  Inves;gator  toolkit  tools  (strategically)   •  An  increasing  number  of  science  use  cases  with   more  breakthrough  science   •  Also,  re-­‐purposing  DataONE  CI  outside  of  Bio/Eco/ Env  areas  in  strategic  collabora;ve  partnerships   47  
  • 48. Ack:  DataONE  Team  and  Sponsors   • Amber  Budden,  Roger  Dahl,  Rebecca  Koskela,    Bill   • Ewa  Deelman   Michener,  Robert  Nahf,  Skye  Roseboom,  Mark   Servilla   • Deborah  McGuinness   •   Dave  Vieglais     • Suzie  Allard,  Nick  Dexter,  Kimberly  Douglass,   • Jeff  Horsburgh   Carol  Tenopir,  Robert  Waltz,  Bruce  Wilson   • John  Cobb,  Bob  Cook,  Ranjeet  Devarakonda,   • Robert  Sandusky   Giri  Palanismy,  Line  Pouchard     • Patricia  Cruse,  John  Kunze   • Bertram  Ludaescher   •  Sky  Bristol,  Mike  Frame,  Richard  Huffine,  Viv   • Peter  Honeyman   Hutchison,  Jeff  Moriseie,  Jake  Weltzin,  Lisa  Zolly   • Stephanie  Hampton,  Chris  Jones,  Mai   • Cliff  Duke   Jones,  Ben  Leinfelder,  Andrew  Pippin   • Paul  Allen,  Rick  Bonney,  Steve  Kelling   • Carole  Goble   • Ryan  Scherle,  Todd  Vision   • Donald  Hobern   • Randy  Butler   • David  DeRoure   LEON LEVY FOUNDATION 48  
  • 49. Ques;ons?     Contact  Points     John  W.  Cobb,  Ph.D.   Oak  Ridge     John  W.  Cobb,  Ph.D.   Oak  Ridge  Na;onal  Lab   cobbjw@ornl.gov   865.576.5439     hip://www.dataone.org/      hip://docs.dataone.org         49