SlideShare ist ein Scribd-Unternehmen logo
1 von 27
a centre of expertise in data curation and preservation




Curating data for integrated
          science
          Chris Rusbridge
   NERC Data Management Workshop
           February 2009
                                                                                               Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License. To view a copy of this license, visit http://creativecommons
.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard
Street, 5th Floor, San Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation




             Contents
•   Curation
•   Integrated science
•   Poetry & Philosophy of D H Rumsfeld
•   Designated Community & Knowledge Base
•   Curation and integration
•   Data and Texts




           NERC Data Management Workshop
a centre of expertise in data curation and preservation




                   Curation
• Wikipedia
   • Curator: a content specialist responsible for an institution's
     collections and, together with a publications specialist, their
     associated collections catalogs.
   • Digital Curation: the curation, preservation, maintenance,
     collection and archiving of digital assets
   • Sheer curation: an approach to digital curation where
     curation activities are quietly integrated into the normal work
     flow of those creating and managing data and other digital
     assets.
• DCC: Digital curation is maintaining and adding value
  to a trusted body of digital information for current and
  future use.


              NERC Data Management Workshop
a centre of expertise in data curation and preservation




       Integrated Science?
•   Mostly educational: easy-to-swallow science
•   Some strange things
•   One nice essay
•   Lots of environmental science




            NERC Data Management Workshop
a centre of expertise in data curation and preservation




NERC Data Management Workshop
a centre of expertise in data curation and preservation




University of Integrated Science,
            California
  • Degree Programs:
    •   Vertical reality
    •   Tachyon Holistic Wellness
    •   Tantra (including Sexual Alchemy for Singles 101)
    •   Vegan and Live Food Nutrition Masters Program

    • …and that’s it!




               NERC Data Management Workshop
a centre of expertise in data curation and preservation



      Edward O Wilson (1998)
• “Science: organized systematic enterprise that gathers
  knowledge about the world and condenses the knowledge
  into testable laws and principles. Defining traits are
   • 1st, confirmation of discoveries & support of hypotheses through repetition by
     independent investigators, preferably with different tests & analyses;
   • 2nd, mensuration, the quantitative description of the phenomena on
     universally accepted scales;
   • 3rd, economy, by which the largest amount of information is abstracted into a
     simple and precise form, which can be unpacked to re-create detail;
   • 4th, heuristics, the opening of avenues to new discovery and interpretation.
   • And 5th, and finally, is consilience, the interlocking of causal explanations
     across disciplines.”

   • Consilience: “the concurrence of multiple inductions drawn from different
     data sets”
                                            •Wilson, E. O. (1998, 27 March 1998). Integrated Science and The
                                            •Coming Century of The Environment. Science Magazine, 279, 2048-2049.

                    NERC Data Management Workshop
a centre of expertise in data curation and preservation




       Wilson concluding
• “Arguably the foremost of global problems
  grounded in the idiosyncrasies of human
  nature is overpopulation and the destruction
  of the environment. The crisis is not long-term
  but here and now; it is upon us. Like it or not,
  we are entering the century of the
  environment, when science and polities will
  give the highest priority to settling humanity
  down before we wreck the planet.”


           NERC Data Management Workshop
a centre of expertise in data curation and preservation




      NCAR: January 2009
• The Integrated Science Program will promote scientific
  frontiers that are dependent on an integrated approach,
  across NCAR laboratories and across disciplines. ISP will
  focus on thematic areas where the mission and
  expertise at NCAR, and in the university atmospheric
  and related sciences community, can be advanced by
  contributions from the social and environmental
  sciences beyond those that typically occur within single
  programs or departments. These areas include, but are
  not limited to, Earth system-society interactions,
  building societal resilience to weather and climate
  hazards, hydrologic sciences, and biogeochemistry.



             NERC Data Management Workshop
a centre of expertise in data curation and preservation




Fisheries & Oceans Canada
• Integrated Science Data Management
  (ISDM) Providing Access to Ocean Data
  • “ISDM's mandate is to manage and archive
    ocean data collected by DFO, or acquired
    through national and international
    programmes conducted in ocean areas
    adjacent to Canada, and to disseminate
    data, data products, and services to the
    marine community in accordance with the
    policies of the Department.”


          NERC Data Management Workshop
a centre of expertise in data curation and preservation




       Integrated Science
• We need a definition that works better;
  something like:

“The application of multiple scientific disciplines
  to one or more core scientific challenges”

• Examples of integrated sciences?
   • Archaeology
   • Environmental sciences


           NERC Data Management Workshop
a centre of expertise in data curation and preservation




Integrated Science implications
 • Scientists will be using unfamiliar data,
   therefore
 • Data curators and managers must make their
   data available for unfamiliar users!



   • And now for something unfamiliar?




            NERC Data Management Workshop
a centre of expertise in data curation and preservation




Poetry & Philosophy of D H
        Rumsfeld
Hart Seely, April 2, 2003,
SLATE http://www.slate.com/id/2081042/




                   NERC Data Management Workshop
a centre of expertise in data curation and preservation




           A Confession
‘Once in a while,
I'm standing here, doing something.
And I think,
"What in the world am I doing here?"
It's a big surprise.’
—May 16, 2001, interview with the New York Times




           NERC Data Management Workshop
a centre of expertise in data curation and preservation




                    Clarity
‘I think what you'll find,
I think what you'll find is,
Whatever it is we do substantively,
There will be near-perfect clarity
As to what it is.

‘And it will be known,
And it will be known to the Congress,
And it will be known to you,
Probably before we decide it,
But it will be known.’
—Feb. 28, 2003, Department of Defense briefing


              NERC Data Management Workshop
a centre of expertise in data curation and preservation




             The Unknown
‘As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.’
—Feb. 12, 2002, Department of Defense news briefing


              NERC Data Management Workshop
a centre of expertise in data curation and preservation




      The 4th Rumsfeld?
• 3 epistemological classes (???)
  • Known knowns
  • Known unknowns
  • Unknown unknowns
• 4th class?
  • Uknown knowns?
  • Critical issue for integrated sciences




           NERC Data Management Workshop
a centre of expertise in data curation and preservation




   Some OAIS Concepts?
• Knowledge Base: allows a consumer to understand
  something
• Designated Community: the set of consumers for
  whom the archive curates something
• Representation Information: helps you interpret a
  data object yielding an information object
   • The amount and nature of RepInfo required is dependent on
     the Knowledge Base of the Designated Community
   • If you curate for project colleagues in the short term, little if
     any RepInfo required
   • If you curate for those unfamiliar with the data, more RepInfo
     is needed
   • (All broadly interpreted!)
                                •CCSDS (2002). Reference Model for an Open Archival Information System (OAIS).
                                •Retrieved. from http://public.ccsds.org/publications/archive/650x0b1.pdf.

               NERC Data Management Workshop
a centre of expertise in data curation and preservation




                     Time
• KB is f1(DC, t)
• DC is f2(t)
• RepInfo needed is f3(f1(DC, t), f2(t))
   • (but none of these concepts can be precisely defined!)

• If DC is small and t is short (months to year or so),
  then both may be ignored, and RepInfo be assumed
  part of the KB
• If DC is extensive (eg cross-discipline) and t is long (5
  years to 25 plus), then RepInfo must be articulated
• If t is very long, most bets are off (post-hoc
  reconstruction likely to be needed)

              NERC Data Management Workshop
a centre of expertise in data curation and preservation




What might RepInfo include
• Structure information: file format definitions, etc
• Semantic information: data dictionaries, code books etc
• Robust methods (working code?)
• Not to mention many kinds of metadata, provenance,
  documentation of hidden assumptions, etc
• Cross-domain schemas one approach to articulating
  RepInfo?
    • (Never perfect, of course)




               NERC Data Management Workshop
a centre of expertise in data curation and preservation




  What about Rumsfeld 4?
• Biggest concern with unfamiliar user is
  clashing concepts, eg different baselines,
  units, geographies, granularity
  • Especially where terms are ambiguous or
    differently interpreted
  • The KBs of two DCs conflict, potentially silently
  • Happens all the time, of course
• The unspoken: tacit knowledge, unknown
  knowns!


           NERC Data Management Workshop
a centre of expertise in data curation and preservation




                Timing
• Curation starts before creation
  • Before project proposal!
• Data acquisition should not happen at the end
  • Continuous acquisition much better?
• Enforcement… or credit for data?




           NERC Data Management Workshop
a centre of expertise in data curation and preservation




Other curation issues of concern
  •   Sustainability (work on your survival)
  •   Succession (what happens to your data if you don’t)
  •   Data audit (know what you’ve got)
  •   Data risk assessment (assess your chances of loss)
  •   Repository external audit???
  •   Provenance & computational lineage
  •   Archiving database changes
  •   Community proxy roles: help your communities
      develop data standards & data practices

  • DCC has tools & support for some of these…

                NERC Data Management Workshop
a centre of expertise in data curation and preservation




… and what is the role of

        RDF?

      NERC Data Management Workshop
a centre of expertise in data curation and preservation




                 RDF
• Anchors data to (well?) defined ontology or
  schema
  • Reduces 4th Rumsfeld risk?
• Allows processing by increasing class of tools
• More suited to comparatively isolated “facts”
  or claims than substantial data arrays?




           NERC Data Management Workshop
a centre of expertise in data curation and preservation




 … and Research Outputs?
• Need more semantically aware texts to
  support cross-community understanding
• Coded up (cf microformats, RDFa)
  •   People
  •   Citations & references
  •   Science features (eg chemicals, reactions)
  •   Graphs, spectra, tables linking to
  •   Supplementary data
• PDF is pretty bad at this


             NERC Data Management Workshop
a centre of expertise in data curation and preservation




Thanks… and now for the experts!




       NERC Data Management Workshop

Weitere ähnliche Inhalte

Andere mochten auch

The Soil Security Programme Fellows Introduction
The Soil Security Programme Fellows IntroductionThe Soil Security Programme Fellows Introduction
The Soil Security Programme Fellows IntroductionJeremy LeLean
 
JISC Digital Library initiatives
JISC Digital Library initiativesJISC Digital Library initiatives
JISC Digital Library initiativesChris Rusbridge
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringChris Rusbridge
 
The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...Chris Rusbridge
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsChris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Chris Rusbridge
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenChris Rusbridge
 
Human and natural sciences for ToK
Human and natural sciences for ToKHuman and natural sciences for ToK
Human and natural sciences for ToKplangdale
 

Andere mochten auch (10)

The Soil Security Programme Fellows Introduction
The Soil Security Programme Fellows IntroductionThe Soil Security Programme Fellows Introduction
The Soil Security Programme Fellows Introduction
 
JISC Digital Library initiatives
JISC Digital Library initiativesJISC Digital Library initiatives
JISC Digital Library initiatives
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
Dcc endeavour-2006
Dcc endeavour-2006Dcc endeavour-2006
Dcc endeavour-2006
 
The Licence Trap
The Licence TrapThe Licence Trap
The Licence Trap
 
The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
 
Human and natural sciences for ToK
Human and natural sciences for ToKHuman and natural sciences for ToK
Human and natural sciences for ToK
 

Ähnlich wie Curating data for integrated science

Curation of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesCuration of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesChris Rusbridge
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1bensparrowau
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemASIS&T
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationMichael Day
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...TERN Australia
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Chris Rusbridge
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyICZN
 
SCAR Data Management and Policy
SCAR Data Management and PolicySCAR Data Management and Policy
SCAR Data Management and PolicyAnton Van de Putte
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsJohn Kunze
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
 
MUSM5321 Museology - information management
MUSM5321 Museology -  information managementMUSM5321 Museology -  information management
MUSM5321 Museology - information managementNick Crofts
 
Xiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationXiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationxiaobinshen
 
Metadata for Repository Administrators 2010
Metadata for Repository Administrators 2010Metadata for Repository Administrators 2010
Metadata for Repository Administrators 2010Stephanie Taylor
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...John Scally
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research DataMichael Day
 

Ähnlich wie Curating data for integrated science (20)

Curation of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositoriesCuration of scientifica data: Challenges for repositories
Curation of scientifica data: Challenges for repositories
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
Stuart Phinn and Andy Lowe_TERN's national ecosystem data infrastructure is d...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
 
SCAR Data Management and Policy
SCAR Data Management and PolicySCAR Data Management and Policy
SCAR Data Management and Policy
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
 
MUSM5321 Museology - information management
MUSM5321 Museology -  information managementMUSM5321 Museology -  information management
MUSM5321 Museology - information management
 
Xiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentationXiaobin Shen eScience2013 presentation
Xiaobin Shen eScience2013 presentation
 
Metadata for Repository Administrators 2010
Metadata for Repository Administrators 2010Metadata for Repository Administrators 2010
Metadata for Repository Administrators 2010
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 

Mehr von Chris Rusbridge

"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stageChris Rusbridge
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceChris Rusbridge
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstreamChris Rusbridge
 
Curating data for integrated science
Curating data for integrated scienceCurating data for integrated science
Curating data for integrated scienceChris Rusbridge
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Chris Rusbridge
 
Disciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisDisciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisChris Rusbridge
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationChris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Chris Rusbridge
 
Blue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationBlue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationChris Rusbridge
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessChris Rusbridge
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 

Mehr von Chris Rusbridge (12)

"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experience
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstream
 
Curating data for integrated science
Curating data for integrated scienceCurating data for integrated science
Curating data for integrated science
 
Dcc jsr phase 3
Dcc jsr phase 3Dcc jsr phase 3
Dcc jsr phase 3
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?
 
Disciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisDisciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesis
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital Curation
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Blue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationBlue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital Preservation
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and Access
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 

Kürzlich hochgeladen

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Kürzlich hochgeladen (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Curating data for integrated science

  • 1. a centre of expertise in data curation and preservation Curating data for integrated science Chris Rusbridge NERC Data Management Workshop February 2009 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons .org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
  • 2. a centre of expertise in data curation and preservation Contents • Curation • Integrated science • Poetry & Philosophy of D H Rumsfeld • Designated Community & Knowledge Base • Curation and integration • Data and Texts NERC Data Management Workshop
  • 3. a centre of expertise in data curation and preservation Curation • Wikipedia • Curator: a content specialist responsible for an institution's collections and, together with a publications specialist, their associated collections catalogs. • Digital Curation: the curation, preservation, maintenance, collection and archiving of digital assets • Sheer curation: an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets. • DCC: Digital curation is maintaining and adding value to a trusted body of digital information for current and future use. NERC Data Management Workshop
  • 4. a centre of expertise in data curation and preservation Integrated Science? • Mostly educational: easy-to-swallow science • Some strange things • One nice essay • Lots of environmental science NERC Data Management Workshop
  • 5. a centre of expertise in data curation and preservation NERC Data Management Workshop
  • 6. a centre of expertise in data curation and preservation University of Integrated Science, California • Degree Programs: • Vertical reality • Tachyon Holistic Wellness • Tantra (including Sexual Alchemy for Singles 101) • Vegan and Live Food Nutrition Masters Program • …and that’s it! NERC Data Management Workshop
  • 7. a centre of expertise in data curation and preservation Edward O Wilson (1998) • “Science: organized systematic enterprise that gathers knowledge about the world and condenses the knowledge into testable laws and principles. Defining traits are • 1st, confirmation of discoveries & support of hypotheses through repetition by independent investigators, preferably with different tests & analyses; • 2nd, mensuration, the quantitative description of the phenomena on universally accepted scales; • 3rd, economy, by which the largest amount of information is abstracted into a simple and precise form, which can be unpacked to re-create detail; • 4th, heuristics, the opening of avenues to new discovery and interpretation. • And 5th, and finally, is consilience, the interlocking of causal explanations across disciplines.” • Consilience: “the concurrence of multiple inductions drawn from different data sets” •Wilson, E. O. (1998, 27 March 1998). Integrated Science and The •Coming Century of The Environment. Science Magazine, 279, 2048-2049. NERC Data Management Workshop
  • 8. a centre of expertise in data curation and preservation Wilson concluding • “Arguably the foremost of global problems grounded in the idiosyncrasies of human nature is overpopulation and the destruction of the environment. The crisis is not long-term but here and now; it is upon us. Like it or not, we are entering the century of the environment, when science and polities will give the highest priority to settling humanity down before we wreck the planet.” NERC Data Management Workshop
  • 9. a centre of expertise in data curation and preservation NCAR: January 2009 • The Integrated Science Program will promote scientific frontiers that are dependent on an integrated approach, across NCAR laboratories and across disciplines. ISP will focus on thematic areas where the mission and expertise at NCAR, and in the university atmospheric and related sciences community, can be advanced by contributions from the social and environmental sciences beyond those that typically occur within single programs or departments. These areas include, but are not limited to, Earth system-society interactions, building societal resilience to weather and climate hazards, hydrologic sciences, and biogeochemistry. NERC Data Management Workshop
  • 10. a centre of expertise in data curation and preservation Fisheries & Oceans Canada • Integrated Science Data Management (ISDM) Providing Access to Ocean Data • “ISDM's mandate is to manage and archive ocean data collected by DFO, or acquired through national and international programmes conducted in ocean areas adjacent to Canada, and to disseminate data, data products, and services to the marine community in accordance with the policies of the Department.” NERC Data Management Workshop
  • 11. a centre of expertise in data curation and preservation Integrated Science • We need a definition that works better; something like: “The application of multiple scientific disciplines to one or more core scientific challenges” • Examples of integrated sciences? • Archaeology • Environmental sciences NERC Data Management Workshop
  • 12. a centre of expertise in data curation and preservation Integrated Science implications • Scientists will be using unfamiliar data, therefore • Data curators and managers must make their data available for unfamiliar users! • And now for something unfamiliar? NERC Data Management Workshop
  • 13. a centre of expertise in data curation and preservation Poetry & Philosophy of D H Rumsfeld Hart Seely, April 2, 2003, SLATE http://www.slate.com/id/2081042/ NERC Data Management Workshop
  • 14. a centre of expertise in data curation and preservation A Confession ‘Once in a while, I'm standing here, doing something. And I think, "What in the world am I doing here?" It's a big surprise.’ —May 16, 2001, interview with the New York Times NERC Data Management Workshop
  • 15. a centre of expertise in data curation and preservation Clarity ‘I think what you'll find, I think what you'll find is, Whatever it is we do substantively, There will be near-perfect clarity As to what it is. ‘And it will be known, And it will be known to the Congress, And it will be known to you, Probably before we decide it, But it will be known.’ —Feb. 28, 2003, Department of Defense briefing NERC Data Management Workshop
  • 16. a centre of expertise in data curation and preservation The Unknown ‘As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don't know.’ —Feb. 12, 2002, Department of Defense news briefing NERC Data Management Workshop
  • 17. a centre of expertise in data curation and preservation The 4th Rumsfeld? • 3 epistemological classes (???) • Known knowns • Known unknowns • Unknown unknowns • 4th class? • Uknown knowns? • Critical issue for integrated sciences NERC Data Management Workshop
  • 18. a centre of expertise in data curation and preservation Some OAIS Concepts? • Knowledge Base: allows a consumer to understand something • Designated Community: the set of consumers for whom the archive curates something • Representation Information: helps you interpret a data object yielding an information object • The amount and nature of RepInfo required is dependent on the Knowledge Base of the Designated Community • If you curate for project colleagues in the short term, little if any RepInfo required • If you curate for those unfamiliar with the data, more RepInfo is needed • (All broadly interpreted!) •CCSDS (2002). Reference Model for an Open Archival Information System (OAIS). •Retrieved. from http://public.ccsds.org/publications/archive/650x0b1.pdf. NERC Data Management Workshop
  • 19. a centre of expertise in data curation and preservation Time • KB is f1(DC, t) • DC is f2(t) • RepInfo needed is f3(f1(DC, t), f2(t)) • (but none of these concepts can be precisely defined!) • If DC is small and t is short (months to year or so), then both may be ignored, and RepInfo be assumed part of the KB • If DC is extensive (eg cross-discipline) and t is long (5 years to 25 plus), then RepInfo must be articulated • If t is very long, most bets are off (post-hoc reconstruction likely to be needed) NERC Data Management Workshop
  • 20. a centre of expertise in data curation and preservation What might RepInfo include • Structure information: file format definitions, etc • Semantic information: data dictionaries, code books etc • Robust methods (working code?) • Not to mention many kinds of metadata, provenance, documentation of hidden assumptions, etc • Cross-domain schemas one approach to articulating RepInfo? • (Never perfect, of course) NERC Data Management Workshop
  • 21. a centre of expertise in data curation and preservation What about Rumsfeld 4? • Biggest concern with unfamiliar user is clashing concepts, eg different baselines, units, geographies, granularity • Especially where terms are ambiguous or differently interpreted • The KBs of two DCs conflict, potentially silently • Happens all the time, of course • The unspoken: tacit knowledge, unknown knowns! NERC Data Management Workshop
  • 22. a centre of expertise in data curation and preservation Timing • Curation starts before creation • Before project proposal! • Data acquisition should not happen at the end • Continuous acquisition much better? • Enforcement… or credit for data? NERC Data Management Workshop
  • 23. a centre of expertise in data curation and preservation Other curation issues of concern • Sustainability (work on your survival) • Succession (what happens to your data if you don’t) • Data audit (know what you’ve got) • Data risk assessment (assess your chances of loss) • Repository external audit??? • Provenance & computational lineage • Archiving database changes • Community proxy roles: help your communities develop data standards & data practices • DCC has tools & support for some of these… NERC Data Management Workshop
  • 24. a centre of expertise in data curation and preservation … and what is the role of RDF? NERC Data Management Workshop
  • 25. a centre of expertise in data curation and preservation RDF • Anchors data to (well?) defined ontology or schema • Reduces 4th Rumsfeld risk? • Allows processing by increasing class of tools • More suited to comparatively isolated “facts” or claims than substantial data arrays? NERC Data Management Workshop
  • 26. a centre of expertise in data curation and preservation … and Research Outputs? • Need more semantically aware texts to support cross-community understanding • Coded up (cf microformats, RDFa) • People • Citations & references • Science features (eg chemicals, reactions) • Graphs, spectra, tables linking to • Supplementary data • PDF is pretty bad at this NERC Data Management Workshop
  • 27. a centre of expertise in data curation and preservation Thanks… and now for the experts! NERC Data Management Workshop