SlideShare ist ein Scribd-Unternehmen logo
1 von 57
ELAG Workshop
“Data repository challenges”
       Wednesday, May 16th 2012
            Session 1 & 2
 Jeroen Rombouts & Egbert Gramsbergen
Programme
Session 1 (14:30 – 15:30): “meta - data - value - …”
2.Round of introduction: who-is-who and why this workshop?
3.Short intro 3TU.DC
4.Background information
5.Case: Traffic flow observations
6.Warming-up Graphs

Break

Session 2 (16:00 – 17:00): “producers - consumers - attitudes - …”
11.‘Discipline’ differences (researchers & repositories)
12.Dotmocracy ‘Lite’
13.Conclusions
1. Who is who?
•   Who are you?

•   Why interested in this topic?
2. 3TU.Datacentrum = …
• 3 Dutch TU’s: Delft, Eindhoven, Twente
• Project 2008-2011, going concern 2012-
• Data archive
   –   2008 -
   –   “finished” data
   –   preserve but do not forget usability
   –   meta data harvestable (OAI-PMH)
   –   crawlable (OAI-ORE linked data)
   –   data citation information (incl. DataCite DOI’s)
• Data labs
   – Just starting (hosting)
   – Unfinished data + software/scripts
Website & Data-archive
                                •   http://datacentrum.3tu.nl
                                •   Information
                                    News, announcements
                                    Publications, links and
                                    tutorials




•   http://data.3tu.nl
•   Data sets download and
    ‘management’
•   ‘Use’ data with Google
    Maps/Earth, OPeNDAP, …
Data archiving options

•   ‘Simple’ sets (Do It Yourself)
    Standard (self)upload form and descriptive information, single file
    per object (can be a ‘zipped’ collection), single DOI, …

    E.g.: Zandvliet, H.J.W. et al. (2010): Diffusion driven concerted
    motion of surface atoms: Ge on Ge(001). MESA+ Institute For
    Nanotechnology, University of Twente.
    doi:10.4121/uuid:3f71549c-6097-4bb8-bc00-6db77deb161d

•   Special collections (Do It Together)
    Negotiate: deposit procedure, description (xml, picture, preview),
    data model, level of DOI assignment, query online, …

    E.g.: Otto, T., Russchenberg, H.W.J. (2010): IDRA weather radar
    measurements - all data. TU Delft - Delft University of
    Technology.
    doi:10.4121/uuid:5f3bcaa2-a456-4a66-a67b-1eec928cae6d
Training & Data-labs
                                  •   http://dataintelligence.3tu.nl
                                  •   Reference, News & Events
                                      for training library staff.




•   OpenEarth, SHARE,
    …?
Questions
3. Background information
• Workshop scope
  – Need for change ?/!
  – Questions (for now)

• Report inputs
  – NSF/NSB: Definitions
  – RIN: Discipline/Data Differences
  – DANS/3TU.DC: Value/selection/DSA/…???
Data Deluge
   •   Data in 2015 approx. 18 million
       times Library of Congress (in size).

   •   Video data in 2005 half of all digital
       data.

   •   According to Eric Sieverts:
       At current growth rate in 2210
       number of bytes equal to number of
       atoms on planet earth.
       (predicts that before that happens
       something will change ;-))

   •   CERN-LHC: 10-15PB/yr.
Workshop scope
Preconditions
• Challenge: Too much data (to keep).
Technology (storage capacity, cooling, energy), organizations (strategies, budgets) and
people (awareness, training) can’t keep (this) up!
• Upside: Not all data is valuable in the future
some relevant (de)selection experience in archiving, some efficiency improvements,
‘some’ increase in storage capacity, …


Questions
F.Which research output to share and preserve?
G.Who are the players involved?
H.How to collect and preserve the research output?

 Roles of University Libraries…


Conclusions on differences between documents and research data?
NSF/NSB - 1/3
• Data.
For the purposes of this document, data are any and all complex data
   entities from observations, experiments, simulations, models, and
   higher order assemblies, along with the associated documentation
   needed to describe and interpret the data.


• Metadata.
Metadata are a subset of data, and are data about data. Metadata
  summarize data content, context, structure, interrelationships, and
  provenance (information on history and origins). They add
  relevance and purpose to data, and enable the identification of
  similar data in different data collections.
NSF/NSB - 2/3
3 functional types of data collections:

•Research Collections
Authors are individual investigators and investigator teams.
Research collections are usually maintained to serve immediate group
participants only for the life of a project, and are typically subjected to limited
processing or curation. Data may not conform to any data standards.



•Resource Collections
Resource collections are authored by a community of investigators, often within
a domain of science or engineering, and are often developed with community
level standards. Budgets are often intermediate in size.
Lifetime is between the mid- and long-term.
NSF/NSB - 3/3
• Reference Collections
Reference collections are authored by and serve large segments of the
  science and engineering community and conform to robust, well-
  established and comprehensive standards, which often lead to a
  universal standard. Budgets are large and are often derived from
  diverse sources with a view to indefinite support.

[NSF, Originally: National Science Board report on
Long-Lived Digital Data Collections, …]


Differences:
• Community size
• Collection lifetime
• Level of standardization
• Amount of processing
• Budget size & sources
• …
RIN
•   Many different kinds and categories of data:
     – scientific experiments;
     – models or simulations; and
     – observations of specific phenomena at a specific time or location.…
•   Datasets are generated for different purposes and through different
    processes.
•   Data may undergo various stages of transformation.
•   The quality of metadata provided for research datasets is very
    variable.
•   Varying degrees of data management, efforts, resources and
    expertise.
•   There are significant variations – as well as commonalities - in
    researchers’ attitudes, behaviors and needs, in the available
    infrastructure, and in the nature and effect of policy initiatives, in
    different disciplines and subject areas
•   …
DANS/3TU.DC
Key findings
•No solid definition of “research data” found
•Lot of literature on selection process, but…
•Not a single case of selection policy of digital data found
 Apparently a lot of implicit selection going on considering the available
digital research data

Reasons for preserving research data:
h)Obligation to enable re-use (by funder, publisher)
i)Other arguments: inter or intra disciplinary value, hard to repeat, value
for historic research
j)Obligation for verification (by code of conduct, employer, publisher)
k)Non scientific arguments (heritage, responsibilty to society)
Docs vs. Data (Differences)
•   Object sizes (capacity)
•   Collection sizes/granularity (number or objects)
•   Meta data (type, standards and distinction from object)
•   Heterogeneity of collections (not discipline differences)
     – Data category (experiment, model/simulation, observation)
     – Data generation process (man made vs. machine made or …)
     – File formats
•   Attitudes to ‘publishing’
•   Resources, expertise, efforts on
    data management
•   Selection inevitable
•   Value?
•   …
•   … Anything to add?

(list to be expanded in workshop)
Questions, suggestions, …
4. Case: Traffic flow observations
•   Case
    Researchers needed to clear the disk space and offered data which
    where “expensive to gather and had required quite a lot of
    computation to process.”
    Project was already finished.

•   Content
    Pictures of highway stretches shot from helicopter.
    Shoulder open/closed, several flights, raw/stabilized, several dates,
    calibration image, calibration software and settings.
Questions for case
•   Which data to ingest?
    raw pictures, stabilized pictures, movies or … vectors and type of cars?
    GPS logs
    calibration image
    stabilisation software/data

•   Who are involved?
    data-producer (researcher)
    research funder (owner)
    data repository

•   How to preserve?
    gps logs: as data or meta data, all flight data or only when recording?
    the software (code or executable?)
    picture formats (tiff, png, jpeg2000, …)?
    granularity (per flight, per location, per recording, ...?
The data
Collection
Top level dataset
Low level dataset (stabilized data)
…
•   …
Citation information
Docs vs. Data (Differences)
•   Object sizes (capacity)
•   Collection sizes/granularity (number of files)
•   Meta data (type, standards and distinction from object)
•   Heterogeneity of collections
     – Data category (experiment, model/simulation, observation)
     – Data generation process (man made vs. machine made or …)
     – File formats
•   Attitudes to ‘publishing’
•   Resources, expertise, efforts on
    data management
•   Selection inevitable
•   Value?
•   Citation practice
•   …
•   … Anything to add?
Questions, suggestions, …
5. To the graphs…
Break
Session 2

Session 2 (16:00 – 17:00): “producers - consumers - attitudes - …”
3.‘Discipline’ differences (researchers & repositories)
4.Dotmocracy ‘Lite’
5.Preliminary conclusions?

Back to plenary presentations
What our accountmanagers ‘sell’…

The benefits for data producers and data consumers
 • Increased visibility of research output.
   (metadata in repository networks, assigning doi’s, facilitate
   increases citation rate for ‘enhanced publications’, ...);
 • Improved quality of dataset (quality assurance for multi-
   user setup, checks on ingest, …);
 • Provide (long-term) preservation of and accessibility to,
   valuable research data;
 • Distribution of research data for reuse, including
 administration
   and usage statistics;
 • Provides advice on data management, rights, formats,
   metadata, etc.
Value
 Secure research data
 Cite/Claim (DOIs)
 Quality Assurance (support)
 Data exchange
 Data visibility




                                     Support EU projects, Communities
                                     Extra show window
                                     Relation with non-academic
                                      research, society
                                     Prepare for paradigm shift
                                     Enable verification
What do data producers say? 1/2


                     Only for long term     Datasets are
                                             stored by
                     continuous data
       No time!                              publisher




                                           Our research is
                                             once only
Interesting but
  not for me




Nobody needs my
     data                                  Our datasets are
                     Data transfer not       confidential
                  needed, every PhD does
                       own project
What do data producers say? 2/2


                           Very usefull, essential
        When can I store   metadata often missing
         my datasets?                                  Much to improve
                                                      in reuse of data



Good opportunity to
 share datasets we
      bought

                                                           Would like to
                                                           publish data
      Surprising our
     university had no
                            Transfer of data between
     faciltity for data
                             PhD’s can be improved
       preservation
Workshop with researchers
Data should only become available after publication
Workshop results
• Confirmed:
  – Different domains have commonalities
  – Need for support on research data management
    exists


• There are strong differences depending on
  – Research type
  – Data types
  – Individual attitudes
‘Conclusions’ on valuable data
Which data to preserve? And why?
• Data of ‘enhanced publications’ (underlying data and visualisations
  linked to publications).
  Increase publication value (stronger basis, more citations, …);

• Data generated by ‘hard to repeat’ processes.
  E.g. high cost, (environmental) observations, complex or
  continuous experiments, …;

• Data collected with public funding.
  Conditions by funding organisations or publishers like Nature
  Publishing Group, NWO, governmental organisations, universities,
  …;

• Preferably open access data with potential for reuse (verification,
  new research, …).
  Increase visibility, efficiency and quality of research efforts.

• … Anything to add?
Docs vs. Data (Differences)
•   Object sizes (capacity)
•   Collection sizes/granularity (number of files)
•   Meta data (type, standards and distinction from object)
•   Heterogeneity of collections
     – Data category (experiment, model/simulation, observation)
     – Data generation process (man made vs. machine made or …)
     – File formats
•   Attitudes to ‘publishing’
•   Resources, expertise, efforts on
    data management
•   Selection inevitable (due to size)
•   Value of research data higher
•   Readability of research data is lower (zero without metadata
•   Citation practice
•   …
•   … Anything to add?
The End
In one line:

“Challenge is to find the ready, able and willing
(researchers)”
To Dotmocracy…
•   15 min. to select or define new propositions
    (approx. 3) and write them on a sheet.

•   15 min. to ‘vote’on every sheet.

•   15 min. for plenary discussion on opposing
    opinions.
Responsibility Propositions 1/4
• All research data should be stored in disciplinary
  archives.

• Research institutes must register data produced
  by their researchers.

• Libraries are the best departments at universities
  to take on research data archiving.
Obligation Propositions 2/4
• Data-producers should be obliged to publish their
  (anonymous) research data as open data.

• High cost research facilities should be obliged to
  share (and preserve) their data.

• Users should login to download data

• Data-repositories should never accept data in
  proprietary file formats
Value Propositions 3/4
• Only datasets which are linked to publications
  need to be preserved for the long term.

• Not simulation results but algorithms and
  boundary conditions should be stored.

• Each dataset should also include the data in its
  rawest form.
Misc. Propositions 4/4
• University libraries have a harder job to attract
  datasets from exact sciences than from
  humanities.

• Researchers are sloppy (they regard
  documentation as irrelevant and annoying).

• Session #4 should be on the beach with lots of
  beer.
Docs vs. Data (Differences)
•   Object sizes (capacity)
•   Collection sizes/granularity (number of files)
•   Meta data (type, standards and distinction from object)
•   Heterogeneity of collections
     – Data category (experiment, model/simulation, observation)
     – Data generation process (man made vs. machine made or …)
     – File formats
•   Attitudes to ‘publishing’
•   Resources, expertise, efforts on data management
•   Selection inevitable (due to size)
•   Value of research data higher
•   Readability of research data is lower (zero without metadata
•   Citation practice
•   (A document is data)
•   Boundaries of data (sets) are less clear than for documents
•   Assigned responsibilities and tasks
•   Legal status
•   …
All Propositions 1/1
•   All research data should be stored in disciplinary archives.
•   Research institutes must register data produced by their researchers.
•   Libraries are the best departments at universities to take on research data
    archiving.
•   Data-producers should be obliged to publish their (anonymous) research data
    as open data.
•   High cost research facilities should be obliged to share (and preserve) their
    data.
•   Users should login to download data
•   Data-repositories should never accept data in proprietary file formats
•   Only datasets which are linked to publications need to be preserved for the long
    term.
•   Not simulation results but algorithms and boundary conditions should be stored.
•   Each dataset should also include the data in its rawest form.
•   University libraries have a harder job to attract datasets from exact sciences
    than from humanities.
•   Researchers are sloppy (they regard documentation as irrelevant and
    annoying).
•   Session #4 should be on the beach with lots of beer.
Dotmocracy results 1/3
“Users should login to download data”
Str. Agree   Agree   Neutral   Disagree   Str. Disagree
             xx      xx        x

+ Should be for some data types (sensitive)
+ It helps to get an idea of usage
+ Anonymity(?) on the net is a ‘2000’ thought
anyway
+ Accept license
+ Trace of use for data-producers
- Raise threshold for re-use
Dotmocracy results 2/3
“Data repositories should never accept files in
proprietary formats”
Str. Agree   Agree       Neutral      Disagree       Str. Disagree
             xxxxxx      xxxxxx       xxxxxx         xx

+ Easy to reuse data in open formats
- Better to have proprietary data than none at all
- May prelude data if insist on open format
- Can be migrated to open formats (sometimes)
Dotmocracy results 3/3
“Libraries are the best departments at
universities to take on research data archiving”
Str. Agree   Agree          Neutral   Disagree   Str. Disagree
xx           xxxxxxxxxxxx                        xx
             xxxxxx

+ Co-operation already with researchers
+ Librarians have good meta data skills
o The library’s vendor should deliver the service(?)
+ Full control and close to researcher(?)
- Challenge to big: long term sustainability
+ Builds on metadata knowledge of libraries
- Must have IT in co-operation
- Archiving skills
Responsibility
• All research data should be / is best stored in disciplinary
  archives.
   –   Bigger bodies of (mono)disciplinary data for consumers
   –   Discipline specific meta data, guidelines and support
   –   Sustainability of data-archive organisations
   –   Research data ownership at research institutes

• Research institutes should register data
   – …

• Libraries are the best departments at universities to take
  on research data archiving.
   –   Accessibility
   –   Archiving
   –   IT knowledge
   –   Infrastructure
Obligations
• Data producers / High cost facilities should be obliged
  to publish their (anonymised) research data.
   – Risk: “Garbage in …”
   – Funding consequences WOULD make a difference

• Login/registration of data-consumers
   –   Accept license
   –   User statistics for archive funding
   –   Trace of use for data-producers
   –   Raise threshold for re-use

• Data-repositories should refuse proprietary formats
   – …
Value
•   Only data linked to publications
     –   Data can be measured faster than it can be analyzed
     –   Accepted article proof of value AND documentation
     –   Possible future value without present publication?
     –   …

•   Not simulation results but algorithms
     –   Software more difficult to authentically reproduce
     –   Data calculation can be very time/resource consuming
     –   Simulation datasets can be very large
     –   Ability to calculate higher resolutions, faster is increasing
     –   …

•   At least data in its rawest form
     –   Interpretation (processing) might be done wrong
     –   Interpretation (processing) only for super-experts and generally accepted
     –   Raw data can be very large (PIV, IDRA, …)
     –   …

Weitere ähnliche Inhalte

Was ist angesagt?

Research Data in the Arts and Humanities: A Few Tricky Questions
Research Data in the Arts and Humanities: A Few Tricky QuestionsResearch Data in the Arts and Humanities: A Few Tricky Questions
Research Data in the Arts and Humanities: A Few Tricky QuestionsMartin Donnelly
 
Course Research Data Management
Course Research Data ManagementCourse Research Data Management
Course Research Data ManagementMaarten Van Bentum
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Knowledge Maps
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...National Institute of Informatics (NII)
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesMartin Donnelly
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policiesNikesh Narayanan
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Martin Donnelly
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopAaike De Wever
 
Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Plato L. Smith II
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchMartin Donnelly
 

Was ist angesagt? (20)

Research Data in the Arts and Humanities: A Few Tricky Questions
Research Data in the Arts and Humanities: A Few Tricky QuestionsResearch Data in the Arts and Humanities: A Few Tricky Questions
Research Data in the Arts and Humanities: A Few Tricky Questions
 
Course Research Data Management
Course Research Data ManagementCourse Research Data Management
Course Research Data Management
 
Rdm slides march 2014
Rdm slides march 2014Rdm slides march 2014
Rdm slides march 2014
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
Open Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the HumanitiesOpen Data and the Panton Principles in the Humanities
Open Data and the Panton Principles in the Humanities
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
Researh data management
Researh data managementResearh data management
Researh data management
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Open Science and Open Data for Librarians
Open Science and Open Data for LibrariansOpen Science and Open Data for Librarians
Open Science and Open Data for Librarians
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few Difficulties
 
Research data management free online courses, publisher policies
Research data management   free online courses, publisher policiesResearch data management   free online courses, publisher policies
Research data management free online courses, publisher policies
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshop
 
Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
Digital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
 

Andere mochten auch

Themamiddag ukb wg rdm introductie jr v06
Themamiddag ukb wg rdm introductie jr v06Themamiddag ukb wg rdm introductie jr v06
Themamiddag ukb wg rdm introductie jr v06Jeroen Rombouts
 
3 tu.dc 5min nordbib jp rombouts
3 tu.dc 5min nordbib jp rombouts3 tu.dc 5min nordbib jp rombouts
3 tu.dc 5min nordbib jp romboutsJeroen Rombouts
 
Chcaod511 b session three
Chcaod511 b session threeChcaod511 b session three
Chcaod511 b session threelmabbott
 
Themamiddag ukb wg rdm quick scan beleid jr v05
Themamiddag ukb wg rdm quick scan beleid jr v05Themamiddag ukb wg rdm quick scan beleid jr v05
Themamiddag ukb wg rdm quick scan beleid jr v05Jeroen Rombouts
 
Session one 080411
Session one 080411Session one 080411
Session one 080411lmabbott
 
Chcpol501 a session five 250311
Chcpol501 a session five 250311Chcpol501 a session five 250311
Chcpol501 a session five 250311lmabbott
 
Elag workshop sessie 3 v4
Elag workshop sessie 3 v4Elag workshop sessie 3 v4
Elag workshop sessie 3 v4Jeroen Rombouts
 

Andere mochten auch (7)

Themamiddag ukb wg rdm introductie jr v06
Themamiddag ukb wg rdm introductie jr v06Themamiddag ukb wg rdm introductie jr v06
Themamiddag ukb wg rdm introductie jr v06
 
3 tu.dc 5min nordbib jp rombouts
3 tu.dc 5min nordbib jp rombouts3 tu.dc 5min nordbib jp rombouts
3 tu.dc 5min nordbib jp rombouts
 
Chcaod511 b session three
Chcaod511 b session threeChcaod511 b session three
Chcaod511 b session three
 
Themamiddag ukb wg rdm quick scan beleid jr v05
Themamiddag ukb wg rdm quick scan beleid jr v05Themamiddag ukb wg rdm quick scan beleid jr v05
Themamiddag ukb wg rdm quick scan beleid jr v05
 
Session one 080411
Session one 080411Session one 080411
Session one 080411
 
Chcpol501 a session five 250311
Chcpol501 a session five 250311Chcpol501 a session five 250311
Chcpol501 a session five 250311
 
Elag workshop sessie 3 v4
Elag workshop sessie 3 v4Elag workshop sessie 3 v4
Elag workshop sessie 3 v4
 

Ähnlich wie Elag workshop sessie 1 en 2 v10

Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesIUPUI
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycleMarieke Guy
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDMMarieke Guy
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 

Ähnlich wie Elag workshop sessie 1 en 2 v10 (20)

Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slides
 
How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 
Data management
Data management Data management
Data management
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Imac 090924
Imac 090924Imac 090924
Imac 090924
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Elag workshop sessie 1 en 2 v10

  • 1. ELAG Workshop “Data repository challenges” Wednesday, May 16th 2012 Session 1 & 2 Jeroen Rombouts & Egbert Gramsbergen
  • 2. Programme Session 1 (14:30 – 15:30): “meta - data - value - …” 2.Round of introduction: who-is-who and why this workshop? 3.Short intro 3TU.DC 4.Background information 5.Case: Traffic flow observations 6.Warming-up Graphs Break Session 2 (16:00 – 17:00): “producers - consumers - attitudes - …” 11.‘Discipline’ differences (researchers & repositories) 12.Dotmocracy ‘Lite’ 13.Conclusions
  • 3. 1. Who is who? • Who are you? • Why interested in this topic?
  • 4. 2. 3TU.Datacentrum = … • 3 Dutch TU’s: Delft, Eindhoven, Twente • Project 2008-2011, going concern 2012- • Data archive – 2008 - – “finished” data – preserve but do not forget usability – meta data harvestable (OAI-PMH) – crawlable (OAI-ORE linked data) – data citation information (incl. DataCite DOI’s) • Data labs – Just starting (hosting) – Unfinished data + software/scripts
  • 5. Website & Data-archive • http://datacentrum.3tu.nl • Information News, announcements Publications, links and tutorials • http://data.3tu.nl • Data sets download and ‘management’ • ‘Use’ data with Google Maps/Earth, OPeNDAP, …
  • 6. Data archiving options • ‘Simple’ sets (Do It Yourself) Standard (self)upload form and descriptive information, single file per object (can be a ‘zipped’ collection), single DOI, … E.g.: Zandvliet, H.J.W. et al. (2010): Diffusion driven concerted motion of surface atoms: Ge on Ge(001). MESA+ Institute For Nanotechnology, University of Twente. doi:10.4121/uuid:3f71549c-6097-4bb8-bc00-6db77deb161d • Special collections (Do It Together) Negotiate: deposit procedure, description (xml, picture, preview), data model, level of DOI assignment, query online, … E.g.: Otto, T., Russchenberg, H.W.J. (2010): IDRA weather radar measurements - all data. TU Delft - Delft University of Technology. doi:10.4121/uuid:5f3bcaa2-a456-4a66-a67b-1eec928cae6d
  • 7. Training & Data-labs • http://dataintelligence.3tu.nl • Reference, News & Events for training library staff. • OpenEarth, SHARE, …?
  • 9. 3. Background information • Workshop scope – Need for change ?/! – Questions (for now) • Report inputs – NSF/NSB: Definitions – RIN: Discipline/Data Differences – DANS/3TU.DC: Value/selection/DSA/…???
  • 10. Data Deluge • Data in 2015 approx. 18 million times Library of Congress (in size). • Video data in 2005 half of all digital data. • According to Eric Sieverts: At current growth rate in 2210 number of bytes equal to number of atoms on planet earth. (predicts that before that happens something will change ;-)) • CERN-LHC: 10-15PB/yr.
  • 11. Workshop scope Preconditions • Challenge: Too much data (to keep). Technology (storage capacity, cooling, energy), organizations (strategies, budgets) and people (awareness, training) can’t keep (this) up! • Upside: Not all data is valuable in the future some relevant (de)selection experience in archiving, some efficiency improvements, ‘some’ increase in storage capacity, … Questions F.Which research output to share and preserve? G.Who are the players involved? H.How to collect and preserve the research output?  Roles of University Libraries… Conclusions on differences between documents and research data?
  • 12. NSF/NSB - 1/3 • Data. For the purposes of this document, data are any and all complex data entities from observations, experiments, simulations, models, and higher order assemblies, along with the associated documentation needed to describe and interpret the data. • Metadata. Metadata are a subset of data, and are data about data. Metadata summarize data content, context, structure, interrelationships, and provenance (information on history and origins). They add relevance and purpose to data, and enable the identification of similar data in different data collections.
  • 13. NSF/NSB - 2/3 3 functional types of data collections: •Research Collections Authors are individual investigators and investigator teams. Research collections are usually maintained to serve immediate group participants only for the life of a project, and are typically subjected to limited processing or curation. Data may not conform to any data standards. •Resource Collections Resource collections are authored by a community of investigators, often within a domain of science or engineering, and are often developed with community level standards. Budgets are often intermediate in size. Lifetime is between the mid- and long-term.
  • 14. NSF/NSB - 3/3 • Reference Collections Reference collections are authored by and serve large segments of the science and engineering community and conform to robust, well- established and comprehensive standards, which often lead to a universal standard. Budgets are large and are often derived from diverse sources with a view to indefinite support. [NSF, Originally: National Science Board report on Long-Lived Digital Data Collections, …] Differences: • Community size • Collection lifetime • Level of standardization • Amount of processing • Budget size & sources • …
  • 15. RIN • Many different kinds and categories of data: – scientific experiments; – models or simulations; and – observations of specific phenomena at a specific time or location.… • Datasets are generated for different purposes and through different processes. • Data may undergo various stages of transformation. • The quality of metadata provided for research datasets is very variable. • Varying degrees of data management, efforts, resources and expertise. • There are significant variations – as well as commonalities - in researchers’ attitudes, behaviors and needs, in the available infrastructure, and in the nature and effect of policy initiatives, in different disciplines and subject areas • …
  • 16. DANS/3TU.DC Key findings •No solid definition of “research data” found •Lot of literature on selection process, but… •Not a single case of selection policy of digital data found  Apparently a lot of implicit selection going on considering the available digital research data Reasons for preserving research data: h)Obligation to enable re-use (by funder, publisher) i)Other arguments: inter or intra disciplinary value, hard to repeat, value for historic research j)Obligation for verification (by code of conduct, employer, publisher) k)Non scientific arguments (heritage, responsibilty to society)
  • 17. Docs vs. Data (Differences) • Object sizes (capacity) • Collection sizes/granularity (number or objects) • Meta data (type, standards and distinction from object) • Heterogeneity of collections (not discipline differences) – Data category (experiment, model/simulation, observation) – Data generation process (man made vs. machine made or …) – File formats • Attitudes to ‘publishing’ • Resources, expertise, efforts on data management • Selection inevitable • Value? • … • … Anything to add? (list to be expanded in workshop)
  • 19. 4. Case: Traffic flow observations • Case Researchers needed to clear the disk space and offered data which where “expensive to gather and had required quite a lot of computation to process.” Project was already finished. • Content Pictures of highway stretches shot from helicopter. Shoulder open/closed, several flights, raw/stabilized, several dates, calibration image, calibration software and settings.
  • 20. Questions for case • Which data to ingest? raw pictures, stabilized pictures, movies or … vectors and type of cars? GPS logs calibration image stabilisation software/data • Who are involved? data-producer (researcher) research funder (owner) data repository • How to preserve? gps logs: as data or meta data, all flight data or only when recording? the software (code or executable?) picture formats (tiff, png, jpeg2000, …)? granularity (per flight, per location, per recording, ...?
  • 24. Low level dataset (stabilized data)
  • 25. … •
  • 27. Docs vs. Data (Differences) • Object sizes (capacity) • Collection sizes/granularity (number of files) • Meta data (type, standards and distinction from object) • Heterogeneity of collections – Data category (experiment, model/simulation, observation) – Data generation process (man made vs. machine made or …) – File formats • Attitudes to ‘publishing’ • Resources, expertise, efforts on data management • Selection inevitable • Value? • Citation practice • … • … Anything to add?
  • 29. 5. To the graphs…
  • 30. Break
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Session 2 Session 2 (16:00 – 17:00): “producers - consumers - attitudes - …” 3.‘Discipline’ differences (researchers & repositories) 4.Dotmocracy ‘Lite’ 5.Preliminary conclusions? Back to plenary presentations
  • 36. What our accountmanagers ‘sell’… The benefits for data producers and data consumers • Increased visibility of research output. (metadata in repository networks, assigning doi’s, facilitate increases citation rate for ‘enhanced publications’, ...); • Improved quality of dataset (quality assurance for multi- user setup, checks on ingest, …); • Provide (long-term) preservation of and accessibility to, valuable research data; • Distribution of research data for reuse, including administration and usage statistics; • Provides advice on data management, rights, formats, metadata, etc.
  • 37. Value  Secure research data  Cite/Claim (DOIs)  Quality Assurance (support)  Data exchange  Data visibility  Support EU projects, Communities  Extra show window  Relation with non-academic research, society  Prepare for paradigm shift  Enable verification
  • 38. What do data producers say? 1/2 Only for long term Datasets are stored by continuous data No time! publisher Our research is once only Interesting but not for me Nobody needs my data Our datasets are Data transfer not confidential needed, every PhD does own project
  • 39. What do data producers say? 2/2 Very usefull, essential When can I store metadata often missing my datasets? Much to improve  in reuse of data Good opportunity to share datasets we bought Would like to publish data Surprising our university had no Transfer of data between faciltity for data PhD’s can be improved preservation
  • 40. Workshop with researchers Data should only become available after publication
  • 41. Workshop results • Confirmed: – Different domains have commonalities – Need for support on research data management exists • There are strong differences depending on – Research type – Data types – Individual attitudes
  • 42. ‘Conclusions’ on valuable data Which data to preserve? And why? • Data of ‘enhanced publications’ (underlying data and visualisations linked to publications). Increase publication value (stronger basis, more citations, …); • Data generated by ‘hard to repeat’ processes. E.g. high cost, (environmental) observations, complex or continuous experiments, …; • Data collected with public funding. Conditions by funding organisations or publishers like Nature Publishing Group, NWO, governmental organisations, universities, …; • Preferably open access data with potential for reuse (verification, new research, …). Increase visibility, efficiency and quality of research efforts. • … Anything to add?
  • 43. Docs vs. Data (Differences) • Object sizes (capacity) • Collection sizes/granularity (number of files) • Meta data (type, standards and distinction from object) • Heterogeneity of collections – Data category (experiment, model/simulation, observation) – Data generation process (man made vs. machine made or …) – File formats • Attitudes to ‘publishing’ • Resources, expertise, efforts on data management • Selection inevitable (due to size) • Value of research data higher • Readability of research data is lower (zero without metadata • Citation practice • … • … Anything to add?
  • 44. The End In one line: “Challenge is to find the ready, able and willing (researchers)”
  • 45. To Dotmocracy… • 15 min. to select or define new propositions (approx. 3) and write them on a sheet. • 15 min. to ‘vote’on every sheet. • 15 min. for plenary discussion on opposing opinions.
  • 46. Responsibility Propositions 1/4 • All research data should be stored in disciplinary archives. • Research institutes must register data produced by their researchers. • Libraries are the best departments at universities to take on research data archiving.
  • 47. Obligation Propositions 2/4 • Data-producers should be obliged to publish their (anonymous) research data as open data. • High cost research facilities should be obliged to share (and preserve) their data. • Users should login to download data • Data-repositories should never accept data in proprietary file formats
  • 48. Value Propositions 3/4 • Only datasets which are linked to publications need to be preserved for the long term. • Not simulation results but algorithms and boundary conditions should be stored. • Each dataset should also include the data in its rawest form.
  • 49. Misc. Propositions 4/4 • University libraries have a harder job to attract datasets from exact sciences than from humanities. • Researchers are sloppy (they regard documentation as irrelevant and annoying). • Session #4 should be on the beach with lots of beer.
  • 50. Docs vs. Data (Differences) • Object sizes (capacity) • Collection sizes/granularity (number of files) • Meta data (type, standards and distinction from object) • Heterogeneity of collections – Data category (experiment, model/simulation, observation) – Data generation process (man made vs. machine made or …) – File formats • Attitudes to ‘publishing’ • Resources, expertise, efforts on data management • Selection inevitable (due to size) • Value of research data higher • Readability of research data is lower (zero without metadata • Citation practice • (A document is data) • Boundaries of data (sets) are less clear than for documents • Assigned responsibilities and tasks • Legal status • …
  • 51. All Propositions 1/1 • All research data should be stored in disciplinary archives. • Research institutes must register data produced by their researchers. • Libraries are the best departments at universities to take on research data archiving. • Data-producers should be obliged to publish their (anonymous) research data as open data. • High cost research facilities should be obliged to share (and preserve) their data. • Users should login to download data • Data-repositories should never accept data in proprietary file formats • Only datasets which are linked to publications need to be preserved for the long term. • Not simulation results but algorithms and boundary conditions should be stored. • Each dataset should also include the data in its rawest form. • University libraries have a harder job to attract datasets from exact sciences than from humanities. • Researchers are sloppy (they regard documentation as irrelevant and annoying). • Session #4 should be on the beach with lots of beer.
  • 52. Dotmocracy results 1/3 “Users should login to download data” Str. Agree Agree Neutral Disagree Str. Disagree xx xx x + Should be for some data types (sensitive) + It helps to get an idea of usage + Anonymity(?) on the net is a ‘2000’ thought anyway + Accept license + Trace of use for data-producers - Raise threshold for re-use
  • 53. Dotmocracy results 2/3 “Data repositories should never accept files in proprietary formats” Str. Agree Agree Neutral Disagree Str. Disagree xxxxxx xxxxxx xxxxxx xx + Easy to reuse data in open formats - Better to have proprietary data than none at all - May prelude data if insist on open format - Can be migrated to open formats (sometimes)
  • 54. Dotmocracy results 3/3 “Libraries are the best departments at universities to take on research data archiving” Str. Agree Agree Neutral Disagree Str. Disagree xx xxxxxxxxxxxx xx xxxxxx + Co-operation already with researchers + Librarians have good meta data skills o The library’s vendor should deliver the service(?) + Full control and close to researcher(?) - Challenge to big: long term sustainability + Builds on metadata knowledge of libraries - Must have IT in co-operation - Archiving skills
  • 55. Responsibility • All research data should be / is best stored in disciplinary archives. – Bigger bodies of (mono)disciplinary data for consumers – Discipline specific meta data, guidelines and support – Sustainability of data-archive organisations – Research data ownership at research institutes • Research institutes should register data – … • Libraries are the best departments at universities to take on research data archiving. – Accessibility – Archiving – IT knowledge – Infrastructure
  • 56. Obligations • Data producers / High cost facilities should be obliged to publish their (anonymised) research data. – Risk: “Garbage in …” – Funding consequences WOULD make a difference • Login/registration of data-consumers – Accept license – User statistics for archive funding – Trace of use for data-producers – Raise threshold for re-use • Data-repositories should refuse proprietary formats – …
  • 57. Value • Only data linked to publications – Data can be measured faster than it can be analyzed – Accepted article proof of value AND documentation – Possible future value without present publication? – … • Not simulation results but algorithms – Software more difficult to authentically reproduce – Data calculation can be very time/resource consuming – Simulation datasets can be very large – Ability to calculate higher resolutions, faster is increasing – … • At least data in its rawest form – Interpretation (processing) might be done wrong – Interpretation (processing) only for super-experts and generally accepted – Raw data can be very large (PIV, IDRA, …) – …

Hinweis der Redaktion

  1. Meta data - Not ‘just’ bibliographic: very domain specific, distinction more fuzzy
  2. Meta data - Not ‘just’ bibliographic: very domain specific, distinction more fuzzy
  3. Group interaction - 3 groups Draw 3 graphs (multiple lines per graph are allowed but explain what they are) in next … min. Select 3 more ore define your own graphs Present & discuss most interesting one from every group - …
  4. Explain form 3 groups .. Min. to select or define new propositions, at least 3 (not from same proposition group) .. Min. to walk around and ‘vote’ on every proposition, please write comments and ‘sign-off’ .. Min. To discuss the opposing opinions
  5. Explain form 3 groups .. Min. to select or define new propositions, at least 3 (not from same proposition group) .. Min. to walk around and ‘vote’ on every proposition, please write comments and ‘sign-off’ .. Min. To discuss the opposing opinions
  6. Meta data - Not ‘just’ bibliographic: very domain specific, distinction more fuzzy