SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Small	
  Data,	
  or:	
  
Bridging	
  the	
  Gap	
  Between	
  Specific	
  
and	
  Generic	
  Research	
  Repositories	
  
                   April	
  11,	
  2013	
  
                     Anita	
  de	
  Waard	
  
           VP	
  Research	
  Data	
  CollaboraDons	
  
                 a.dewaard@elsevier.com	
  
                               	
  
                               	
  
                               	
  
                               	
  
          hHp://researchdata.elsevier.com/	
  	
  	
  
There	
  are	
  many	
  efforts	
  to	
  enhance	
  	
  
              data	
  storing	
  and	
  sharing...	
  
•  Many	
  different	
  research	
  databases–	
  both	
  generic	
  (Dryad,	
  
   Dataverse,	
  …)	
  and	
  specific	
  (NIF,	
  IEDA,	
  PDB,	
  …)	
  
•  Many	
  systems	
  for	
  creaDng/sharing	
  workflows	
  (Taverna,	
  
   MyExperiment,	
  Vistrails,	
  Workflow4Ever	
  etc)	
  
•  Many	
  e-­‐lab	
  notebooks	
  (LabGuru,	
  LabArchives,	
  	
  LaBlog,	
  etc)	
  
•  Scores	
  of	
  projects,	
  commiHees,	
  standards,	
  	
  bodies,	
  grants,	
  
   iniDaDves,	
  conferences	
  for	
  discussing	
  and	
  connecDng	
  all	
  of	
  
   this	
  (KEfED,	
  Pegasus,	
  PROV,	
  RDA,	
  Science	
  Gateways,	
  
   Codata,	
  BRDI,	
  Earthcube,	
  etc.	
  etc)	
  	
  
•  You	
  can	
  make	
  a	
  living	
  out	
  of	
  this	
  ;-­‐)!	
  (and	
  many	
  of	
  us	
  do…)	
  
…but	
  this	
  is	
  what	
  scienDsts	
  do:	
  
Using	
  anDbodies	
  
and	
  squishy	
  bits	
  	
  	
  
Grad	
  Students	
  experiment	
  
and	
  enter	
  details	
  into	
  their	
  
lab	
  notebook.	
  	
  
The	
  PI	
  then	
  tries	
  to	
  	
  
make	
  sense	
  of	
  this,	
  
and	
  writes	
  a	
  paper.	
  	
  	
  
End	
  of	
  story.	
  	
  
Why	
  save	
  research	
  data?	
  
A.  Data	
  PreservaDon:	
  	
  	
  
       –  Preserve	
  record	
  of	
  scienDfic	
  process,	
  
          provenance	
  
       –  Enable	
  reproducible	
  research	
  
B.  Data	
  Use:	
  
       –  Use	
  results	
  obtained	
  by	
  others	
  
       –  Do	
  beHer	
  science!	
  
       –  Improve	
  interdisciplinary	
  work	
  
	
  
Where	
  the	
  data	
  goes	
  now:	
  
                                                                                                                           PDB:	
  	
  	
  
                                               A	
  small	
  porDon	
  of	
  data	
  	
                                   88,3	
  k	
  	
  
                                               (1-­‐2%?)	
  stored	
  in	
  small,	
  	
        PetDB:	
  	
  
   >	
  50	
  My	
  Papers	
                                                                     1,5	
  k	
                                   SedDB:	
  	
  
                                                      topic-­‐focused	
  
      2	
  M	
  scienDsts	
                          data	
  repositories	
                                                                    0.6	
  k	
  
                                                                                                            MiRB:	
  	
  	
  
   2	
  M	
  papers/year	
                                                                                   25k	
  
                                                                                                                                         TAIR:	
  	
  	
  
                                                                                                                                         72,1	
  k	
  
                                                                                   Some	
  data	
  	
  
                                                                              (8%?)	
  stored	
  in	
  large,	
  	
  
                                                                                  generic	
  data	
  	
  
                      Majority	
  of	
  data	
                                    repositories	
  
                      (90%?)	
  	
  is	
  stored	
  	
  
                     on	
  local	
  hard	
  drives	
                                                                                   	
  
                                                                                               Dryad:	
                            Dataverse:	
  
                                                                                             7,631	
  files	
                         0.6	
  M	
  

                                                                                                                        	
  
                                                                                                                        	
  
                                                                                                                     Datacite:	
  	
  
                                                                                                                      1.5	
  M	
  
                                                                                                                        	
  
So	
  this	
  needs	
  to	
  happen:	
  
                                                                                                                            PDB:	
  	
  	
  
                                                A	
  small	
  porDon	
  of	
  data	
  	
                                   88,3	
  k	
  	
  
                                                (1-­‐2%?)	
  stored	
  in	
  small,	
  	
        PetDB:	
  	
  
    >	
  50	
  My	
  Papers	
                                                                     1,5	
  k	
                                   SedDB:	
  	
  
                                                       topic-­‐focused	
  	
  
       2	
  M	
  scienDsts	
                          data	
  repositories	
                                                                    0.6	
  k	
  
                                                                                                             MiRB:	
  	
  	
  
    2	
  M	
  papers/year	
                                                                                   25k	
  
                                                                                                                                          TAIR:	
  	
  	
  
                                                                                                                                          72,1	
  k	
  
                                                                                    Some	
  data	
  	
  
                                                                               (8%?)	
  stored	
  in	
  large,	
  	
  
                                                                                   generic	
  data	
  	
  
                       Majority	
  of	
  data	
                                    repositories	
  
                       (90%?)	
  	
  is	
  stored	
  	
  
                      on	
  local	
  hard	
  drives	
                                                                                   	
  
                                                                                                Dryad:	
                            Dataverse:	
  
                                                                                              7,631	
  files	
                         0.6	
  M	
  


                                      INCREASE	
  DATA	
                                                                 	
  
                                                                                                                         	
  
                                      PRESERVATION	
                                                                  Datacite:	
  	
  
                                                                                                                       1.5	
  M	
  
                                                                                                                         	
  
Data	
  PreservaDon	
  Issues:	
  
ObjecDon:	
  “Our	
  lab	
  notebooks	
  are	
  all	
  on	
  paper	
  
–	
  it’s	
  how	
  we	
  do	
  things”	
  
Response:	
  Grao	
  tools	
  closely	
  on	
  scienDsts’	
  daily	
  
pracDce	
  
Example:	
  create	
  tailored	
  metadata	
  collecDon	
  tools	
  
on	
  mini-­‐tablets	
  in	
  labs	
  to	
  replace	
  paper	
  notebooks	
  
Data	
  PreservaDon	
  Issues:	
  
ObjecDon:	
  “I	
  need	
  to	
  see	
  a	
  direct	
  benefit	
  of	
  any	
  
effort	
  I	
  put	
  in.”	
  
Response:	
  Create	
  tools	
  to	
  allow	
  beHer	
  insight	
  in	
  own	
  	
  
and	
  other’s	
  results.	
  
Example:	
  ‘PI-­‐Dashboard’:	
  allow	
  immediate	
  access/
analysis	
  of	
  shared	
  data:	
  new	
  science!	
  
Data	
  Use	
  Issues:	
  
ObjecDon:	
  “I	
  don’t	
  really	
  trust	
  anyone	
  else’s	
  data	
  –	
  
and	
  don’t	
  think	
  they’ll	
  trust	
  mine”	
  	
  
Response:	
  Create	
  social	
  networking	
  context;	
  allow	
  data	
  
owner	
  to	
  provide	
  granular	
  access	
  control.	
  
Example:	
  	
  
•  In	
  Urban	
  Lab	
  app,	
  data	
  stored	
  by	
  researcher	
  name.	
  
•  PI	
  decides	
  who	
  gets	
  to	
  see	
  which	
  data	
  
•  Match	
  up	
  with	
  NIF	
  and	
  Eagle-­‐I	
  ontologies	
  on	
  back	
  end	
  
   so	
  export	
  of	
  (part	
  of)	
  data	
  is	
  possible	
  at	
  any	
  Dme.	
  	
  

                                                 c	
  o	
  n	
  s	
  o	
  r	
  t	
  i	
  u	
  m	
  
Data	
  Use	
  Issues:	
  
•  ObjecDon:	
  “I	
  am	
  afraid	
  other	
  people	
  might	
  scoop	
  my	
  
     discoveries”	
  
•  Response:	
  Reward	
  system	
  needs	
  to	
  move	
  from	
  direct	
  
     compeDDon	
  to	
  a	
  ‘shared	
  mission’	
  approach	
  (cf.	
  Mars)	
  
•  Example:	
  Data	
  Rescue	
  Challenge	
  in	
  the	
  geosciences:	
  
     collect	
  and	
  reward	
  stories/pracDces	
  of	
  data	
  preservaDon,	
  
     enable	
  cross-­‐disciplinary	
  access	
  and	
  use	
  of	
  all	
  data.	
  	
  
	
               The	
  2013	
  Interna.onal	
  Data	
  Rescue	
  Award	
  
               in	
  the	
  Geosciences	
  
               Organised	
  by	
  IEDA	
  and	
  
               Elsevier	
  Research	
  Data	
  Services	
  
               	
  
               hHp://researchdata.elsevier.com/datachallenge	
  	
  
	
  
Data	
  PreservaDon	
  and	
  AnnotaDon:	
  :	
  	
  
Fine,	
  I’ll	
  do	
  it–	
  but	
  where	
  the	
  hell	
  do	
  I	
  put	
  it?	
  	
  
    WANT	
                                                                           AND	
  
 Domain-­‐Specific	
  	
         Domain	
  of	
  study:	
     Collaborators:	
             Local	
  	
  
 Data	
  Repository	
                                                             Data	
  Repository	
  
                                                  DIFFERENT	
  




         ALL	
                                                                      THEY	
  
          Generic	
                           METADATA!!!!	
                       InsDtuDonal	
  	
  
   	
  Data	
  Repository	
     Funding	
  Agency:	
           University:	
      Data	
  Repository	
  
Comparing	
  Repository	
  Types:	
  
Repository	
             Advantages	
  	
                       Disadvantages	
  




                                                                                                    Effort,	
  Reuse,	
  Credit,	
  Compliance	
  
Local	
  data	
          Easy!	
  No	
  one	
  steals	
         No	
  one	
  sees	
  it.	
  	
  




                                                                                                                                                    Habit,	
  Ease,	
  Privacy,	
  Control	
  	
  
repository	
             your	
  data.	
  	
                    Not	
  compliant	
  with	
  




                                                                                                                                                                                                     	
  MORE	
  ANNOTATION	
  
                                                                requirements	
  

InsDtuDonal	
            Not	
  very	
  difficult.	
              Data	
  can’t	
  easily	
  be	
  
Repository	
             Administrators	
  are	
                reused.	
  Credit?	
  
	
                       happy.	
  	
  	
  

Generic	
  data	
        Not	
  very	
  hard	
  to	
  do.	
     Data	
  can’t	
  be	
  easily	
  
repository	
             Have	
  complied!	
                    reused.	
  Credit…	
  


Domain-­‐specific	
       Data	
  can	
  be	
  reused.	
         Lot	
  of	
  work	
  –	
  for	
  
data	
  repository	
     Credit!	
  	
                          curators	
  
Conclusions	
  for	
  data	
  annotaDon:	
  
“Instead	
  of	
  building	
  newer	
  and	
  larger	
  weapons	
  of	
  mass	
  destrucHon,	
  I	
  
think	
  mankind	
  should	
  try	
  to	
  get	
  more	
  use	
  out	
  of	
  the	
  ones	
  we	
  have”	
  
                                                           Deep	
  Thoughts	
  by	
  Jack	
  Handy	
  
                                                                                                          	
  
•  Let’s	
  use	
  the	
  data	
  standards	
  we	
  already	
  have	
  –	
  and	
  
   agree	
  on	
  using	
  the	
  same	
  ones	
  
•  Work	
  with	
  exisDng	
  data	
  repositories	
  in	
  a	
  field	
  to	
  come	
  
   to	
  a	
  lowest	
  common	
  denominator	
  of	
  metadata	
  
•  Tailor	
  the	
  systems	
  to	
  be	
  opDmally	
  easy	
  to	
  use	
  for	
  
   scienDsts	
  in	
  terms	
  of	
  metadata:	
  add	
  as	
  liHle	
  as	
  you	
  have	
  
   to,	
  as	
  few	
  Dmes	
  as	
  you	
  can.	
  	
  
Summary:	
  
•  Data	
  PreservaDon:	
  	
  
     –  Tailor	
  tools	
  to	
  fit	
  scienDsts’	
  workflow	
  –	
  follow	
  the	
  experiment!	
  
     –  We	
  are	
  creaDng	
  repositories	
  of	
  shared	
  experiments:	
  Enable	
  
        demonstrably	
  beFer	
  science!	
  
•  Data	
  Use:	
  	
  
     –  Allow	
  owner	
  full	
  control	
  over	
  who	
  sees	
  which	
  data	
  -­‐	
  create	
  
        social	
  networking	
  context	
  
     –  CollecDvely	
  pioneer	
  long-­‐term	
  funding	
  opDons;	
  support/
        develop	
  ‘shared	
  mission’	
  funding	
  challenges	
  
•  How	
  annotaDon	
  can	
  help	
  reuse:	
  	
  
     –  Collaborate	
  between	
  (generic/specific,	
  insDtuDonal,	
  cross-­‐
        naDonal)	
  data	
  faciliDes	
  to	
  integrate	
  repositories,	
  enable	
  cross-­‐
        repository	
  usage	
  and	
  reuse	
  exisIng	
  metadata.	
  
QuesDons?	
  

           Anita	
  de	
  Waard	
  
 VP	
  Research	
  Data	
  CollaboraDons	
  
       a.dewaard@elsevier.com	
  
                     	
  
                     	
  
                     	
  
                     	
  
hHp://researchdata.elsevier.com/	
  	
  	
  
Elsevier	
  Research	
  Data	
  Services	
  Goals:	
  
1.  Increase	
  Data	
  PreservaDon:	
  	
  
    Help	
  increase	
  the	
  amount	
  and	
  quality	
  of	
  data	
  
    preserved	
  and	
  shared	
  	
  
2.  Improve	
  Data	
  Use:	
  	
  
    Help	
  increase	
  the	
  value	
  and	
  usability	
  of	
  the	
  data	
  
    shared	
  by	
  increasing	
  annotaDon,	
  normalizaDon,	
  
    provenance	
  enabling	
  enhanced	
  interoperability	
  
3.  Develop	
  Sustainable	
  Models:	
  	
  
    Help	
  measure	
  and	
  deliver	
  credit	
  for	
  shared	
  data,	
  the	
  
    researchers,	
  the	
  insDtute,	
  and	
  the	
  funding	
  body,	
  
    enabling	
  more	
  sustainable	
  plaworms.	
  
Guiding	
  Principles	
  of	
  RDS:	
  
•  In	
  principle,	
  all	
  open	
  data	
  stays	
  open	
  and	
  URLs,	
  
   front	
  end	
  etc.	
  stay	
  where	
  they	
  are	
  (i.e.	
  with	
  
   repository)	
  
•  CollaboraDon	
  is	
  tailored	
  to	
  data	
  repositories’	
  	
  
   unique	
  needs/interests-­‐	
  ‘service-­‐model’	
  type:	
  	
  
    –  Aspects	
  where	
  collaboraDon	
  is	
  needed	
  are	
  discussed	
  
    –  A	
  collaboraDon	
  plan	
  is	
  drawn	
  up	
  using	
  a	
  Service-­‐Level	
  
       Agreement:	
  agree	
  on	
  Dme,	
  condiDons,	
  etc.	
  	
  
•  Transparent	
  business	
  model	
  
•  Very	
  small	
  (2/3	
  people)	
  department;	
  immediate	
  
     communicaDon;	
  instant	
  deployment	
  of	
  ideas.	
  
	
  
“But	
  aren’t	
  you	
  guys	
  in	
  it	
  for	
  the	
  money?”	
  
•  Yes,	
  we	
  are-­‐	
  like	
  most	
  businesses…	
  	
  
•  Is	
  your	
  real	
  quesDon	
  perhaps:	
  ‘Does	
  no	
  one	
  want	
  to	
  work	
  
   with	
  you	
  anymore	
  because	
  of	
  the	
  Open	
  Access	
  debate?’	
  	
  
•  The	
  OA	
  debate	
  focuses	
  on	
  three	
  issues:	
  
    –  IPR	
  and	
  Access	
  issues	
           E.g.	
  BY-­‐NC-­‐SA?	
  Github?	
  ..?	
  
    –  Opaque	
  business	
  models	
  	
   E.g.	
  Gold	
  Open	
  Access?
                                                 Shared	
  funding	
  model?	
  
    	
                                           Commercial	
  analyDcs	
  with	
  
    	
                                           shared	
  royalDes?	
  
    –  Lack	
  of	
  perceived	
  added	
  	
   We	
  offer	
  a	
  service:	
  only	
  use	
  
         value	
                                it	
  if	
  it’s	
  any	
  good!	
  	
  	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...Amazon Web Services
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Cloudera, Inc.
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSIJEACS
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogsandrea huang
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Linked Data: some social challenges
Linked Data: some social challengesLinked Data: some social challenges
Linked Data: some social challengesMichele Barbera
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Edureka!
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Carlos Castillo (ChaTo)
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKANandrea huang
 

Was ist angesagt? (20)

Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
Strata + Hadoop World 2012: Data Science on Hadoop: How Cloudera Impala Unloc...
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFSImplementation of Multi-node Clusters in Column Oriented Database using HDFS
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs20160818 Semantics and Linkage of Archived Catalogs
20160818 Semantics and Linkage of Archived Catalogs
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Linked Data: some social challenges
Linked Data: some social challengesLinked Data: some social challenges
Linked Data: some social challenges
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
 

Andere mochten auch

Cshals2012dewaardsmall
Cshals2012dewaardsmallCshals2012dewaardsmall
Cshals2012dewaardsmallAnita de Waard
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research PaperAnita de Waard
 
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/Social barriers at http://projects.iq.harvard.edu/attribution_workshop/
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/Anita de Waard
 
Towards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataTowards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataAnita de Waard
 
Sensemaking in Science
Sensemaking in ScienceSensemaking in Science
Sensemaking in ScienceAnita de Waard
 

Andere mochten auch (6)

Cshals2012dewaardsmall
Cshals2012dewaardsmallCshals2012dewaardsmall
Cshals2012dewaardsmall
 
Executing the Research Paper
Executing the Research PaperExecuting the Research Paper
Executing the Research Paper
 
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/Social barriers at http://projects.iq.harvard.edu/attribution_workshop/
Social barriers at http://projects.iq.harvard.edu/attribution_workshop/
 
Annotation systems
Annotation systemsAnnotation systems
Annotation systems
 
Towards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental DataTowards Incidental Collaboratories For Experimental Data
Towards Incidental Collaboratories For Experimental Data
 
Sensemaking in Science
Sensemaking in ScienceSensemaking in Science
Sensemaking in Science
 

Ähnlich wie Small Data: Bridging the Gap Between Generic and Specific Repositories

The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowskaguest43b4df3
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World LazowskaWCET
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayChris Mattmann
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónFundación Ramón Areces
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataDaniel Vila Suero
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...Larry Smarr
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and DataGuy Coates
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptPerumalPitchandi
 
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...Tom Moritz
 

Ähnlich wie Small Data: Bridging the Gap Between Generic and Specific Repositories (20)

The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Cloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovaciónCloud Computing y Big Data, próxima frontera de la innovación
Cloud Computing y Big Data, próxima frontera de la innovación
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked Data
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
FDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.pptFDS Module I 20.1.2022.ppt
FDS Module I 20.1.2022.ppt
 
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them..."Some Reflections on Data in the Public Sector" : Communia: The European Them...
"Some Reflections on Data in the Public Sector" : Communia: The European Them...
 

Mehr von Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 

Mehr von Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Small Data: Bridging the Gap Between Generic and Specific Repositories

  • 1. Small  Data,  or:   Bridging  the  Gap  Between  Specific   and  Generic  Research  Repositories   April  11,  2013   Anita  de  Waard   VP  Research  Data  CollaboraDons   a.dewaard@elsevier.com           hHp://researchdata.elsevier.com/      
  • 2. There  are  many  efforts  to  enhance     data  storing  and  sharing...   •  Many  different  research  databases–  both  generic  (Dryad,   Dataverse,  …)  and  specific  (NIF,  IEDA,  PDB,  …)   •  Many  systems  for  creaDng/sharing  workflows  (Taverna,   MyExperiment,  Vistrails,  Workflow4Ever  etc)   •  Many  e-­‐lab  notebooks  (LabGuru,  LabArchives,    LaBlog,  etc)   •  Scores  of  projects,  commiHees,  standards,    bodies,  grants,   iniDaDves,  conferences  for  discussing  and  connecDng  all  of   this  (KEfED,  Pegasus,  PROV,  RDA,  Science  Gateways,   Codata,  BRDI,  Earthcube,  etc.  etc)     •  You  can  make  a  living  out  of  this  ;-­‐)!  (and  many  of  us  do…)  
  • 3. …but  this  is  what  scienDsts  do:   Using  anDbodies   and  squishy  bits       Grad  Students  experiment   and  enter  details  into  their   lab  notebook.     The  PI  then  tries  to     make  sense  of  this,   and  writes  a  paper.       End  of  story.    
  • 4. Why  save  research  data?   A.  Data  PreservaDon:       –  Preserve  record  of  scienDfic  process,   provenance   –  Enable  reproducible  research   B.  Data  Use:   –  Use  results  obtained  by  others   –  Do  beHer  science!   –  Improve  interdisciplinary  work    
  • 5. Where  the  data  goes  now:   PDB:       A  small  porDon  of  data     88,3  k     (1-­‐2%?)  stored  in  small,     PetDB:     >  50  My  Papers   1,5  k   SedDB:     topic-­‐focused   2  M  scienDsts   data  repositories   0.6  k   MiRB:       2  M  papers/year   25k   TAIR:       72,1  k   Some  data     (8%?)  stored  in  large,     generic  data     Majority  of  data   repositories   (90%?)    is  stored     on  local  hard  drives     Dryad:   Dataverse:   7,631  files   0.6  M       Datacite:     1.5  M    
  • 6. So  this  needs  to  happen:   PDB:       A  small  porDon  of  data     88,3  k     (1-­‐2%?)  stored  in  small,     PetDB:     >  50  My  Papers   1,5  k   SedDB:     topic-­‐focused     2  M  scienDsts   data  repositories   0.6  k   MiRB:       2  M  papers/year   25k   TAIR:       72,1  k   Some  data     (8%?)  stored  in  large,     generic  data     Majority  of  data   repositories   (90%?)    is  stored     on  local  hard  drives     Dryad:   Dataverse:   7,631  files   0.6  M   INCREASE  DATA       PRESERVATION   Datacite:     1.5  M    
  • 7. Data  PreservaDon  Issues:   ObjecDon:  “Our  lab  notebooks  are  all  on  paper   –  it’s  how  we  do  things”   Response:  Grao  tools  closely  on  scienDsts’  daily   pracDce   Example:  create  tailored  metadata  collecDon  tools   on  mini-­‐tablets  in  labs  to  replace  paper  notebooks  
  • 8. Data  PreservaDon  Issues:   ObjecDon:  “I  need  to  see  a  direct  benefit  of  any   effort  I  put  in.”   Response:  Create  tools  to  allow  beHer  insight  in  own     and  other’s  results.   Example:  ‘PI-­‐Dashboard’:  allow  immediate  access/ analysis  of  shared  data:  new  science!  
  • 9. Data  Use  Issues:   ObjecDon:  “I  don’t  really  trust  anyone  else’s  data  –   and  don’t  think  they’ll  trust  mine”     Response:  Create  social  networking  context;  allow  data   owner  to  provide  granular  access  control.   Example:     •  In  Urban  Lab  app,  data  stored  by  researcher  name.   •  PI  decides  who  gets  to  see  which  data   •  Match  up  with  NIF  and  Eagle-­‐I  ontologies  on  back  end   so  export  of  (part  of)  data  is  possible  at  any  Dme.     c  o  n  s  o  r  t  i  u  m  
  • 10. Data  Use  Issues:   •  ObjecDon:  “I  am  afraid  other  people  might  scoop  my   discoveries”   •  Response:  Reward  system  needs  to  move  from  direct   compeDDon  to  a  ‘shared  mission’  approach  (cf.  Mars)   •  Example:  Data  Rescue  Challenge  in  the  geosciences:   collect  and  reward  stories/pracDces  of  data  preservaDon,   enable  cross-­‐disciplinary  access  and  use  of  all  data.       The  2013  Interna.onal  Data  Rescue  Award   in  the  Geosciences   Organised  by  IEDA  and   Elsevier  Research  Data  Services     hHp://researchdata.elsevier.com/datachallenge      
  • 11. Data  PreservaDon  and  AnnotaDon:  :     Fine,  I’ll  do  it–  but  where  the  hell  do  I  put  it?     WANT   AND   Domain-­‐Specific     Domain  of  study:   Collaborators:   Local     Data  Repository   Data  Repository   DIFFERENT   ALL   THEY   Generic   METADATA!!!!   InsDtuDonal      Data  Repository   Funding  Agency:   University:   Data  Repository  
  • 12. Comparing  Repository  Types:   Repository   Advantages     Disadvantages   Effort,  Reuse,  Credit,  Compliance   Local  data   Easy!  No  one  steals   No  one  sees  it.     Habit,  Ease,  Privacy,  Control     repository   your  data.     Not  compliant  with    MORE  ANNOTATION   requirements   InsDtuDonal   Not  very  difficult.   Data  can’t  easily  be   Repository   Administrators  are   reused.  Credit?     happy.       Generic  data   Not  very  hard  to  do.   Data  can’t  be  easily   repository   Have  complied!   reused.  Credit…   Domain-­‐specific   Data  can  be  reused.   Lot  of  work  –  for   data  repository   Credit!     curators  
  • 13. Conclusions  for  data  annotaDon:   “Instead  of  building  newer  and  larger  weapons  of  mass  destrucHon,  I   think  mankind  should  try  to  get  more  use  out  of  the  ones  we  have”   Deep  Thoughts  by  Jack  Handy     •  Let’s  use  the  data  standards  we  already  have  –  and   agree  on  using  the  same  ones   •  Work  with  exisDng  data  repositories  in  a  field  to  come   to  a  lowest  common  denominator  of  metadata   •  Tailor  the  systems  to  be  opDmally  easy  to  use  for   scienDsts  in  terms  of  metadata:  add  as  liHle  as  you  have   to,  as  few  Dmes  as  you  can.    
  • 14. Summary:   •  Data  PreservaDon:     –  Tailor  tools  to  fit  scienDsts’  workflow  –  follow  the  experiment!   –  We  are  creaDng  repositories  of  shared  experiments:  Enable   demonstrably  beFer  science!   •  Data  Use:     –  Allow  owner  full  control  over  who  sees  which  data  -­‐  create   social  networking  context   –  CollecDvely  pioneer  long-­‐term  funding  opDons;  support/ develop  ‘shared  mission’  funding  challenges   •  How  annotaDon  can  help  reuse:     –  Collaborate  between  (generic/specific,  insDtuDonal,  cross-­‐ naDonal)  data  faciliDes  to  integrate  repositories,  enable  cross-­‐ repository  usage  and  reuse  exisIng  metadata.  
  • 15. QuesDons?   Anita  de  Waard   VP  Research  Data  CollaboraDons   a.dewaard@elsevier.com           hHp://researchdata.elsevier.com/      
  • 16. Elsevier  Research  Data  Services  Goals:   1.  Increase  Data  PreservaDon:     Help  increase  the  amount  and  quality  of  data   preserved  and  shared     2.  Improve  Data  Use:     Help  increase  the  value  and  usability  of  the  data   shared  by  increasing  annotaDon,  normalizaDon,   provenance  enabling  enhanced  interoperability   3.  Develop  Sustainable  Models:     Help  measure  and  deliver  credit  for  shared  data,  the   researchers,  the  insDtute,  and  the  funding  body,   enabling  more  sustainable  plaworms.  
  • 17. Guiding  Principles  of  RDS:   •  In  principle,  all  open  data  stays  open  and  URLs,   front  end  etc.  stay  where  they  are  (i.e.  with   repository)   •  CollaboraDon  is  tailored  to  data  repositories’     unique  needs/interests-­‐  ‘service-­‐model’  type:     –  Aspects  where  collaboraDon  is  needed  are  discussed   –  A  collaboraDon  plan  is  drawn  up  using  a  Service-­‐Level   Agreement:  agree  on  Dme,  condiDons,  etc.     •  Transparent  business  model   •  Very  small  (2/3  people)  department;  immediate   communicaDon;  instant  deployment  of  ideas.    
  • 18. “But  aren’t  you  guys  in  it  for  the  money?”   •  Yes,  we  are-­‐  like  most  businesses…     •  Is  your  real  quesDon  perhaps:  ‘Does  no  one  want  to  work   with  you  anymore  because  of  the  Open  Access  debate?’     •  The  OA  debate  focuses  on  three  issues:   –  IPR  and  Access  issues   E.g.  BY-­‐NC-­‐SA?  Github?  ..?   –  Opaque  business  models     E.g.  Gold  Open  Access? Shared  funding  model?     Commercial  analyDcs  with     shared  royalDes?   –  Lack  of  perceived  added     We  offer  a  service:  only  use   value   it  if  it’s  any  good!