SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Metadata for Managing
             Scientific Research Data
                      NISO/DCMI Webinar:
                              August 22, 2012




Jane Greenberg, Professor and Director of
the SILS Metadata Research Center
janeg@email.unc.edu
Overview
▪   Why should we care?
▪   What is data?
▪   What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪   Challenges, opportunities, and jumping in
▪   Concluding comments
▪   Q&A
Why should we care?
BIG stuff
▪ Digital data deluge (Hey & Trefethen, 2003)
▪ Big data (New York Times)
                                                2008
▪ The fourth paradigm (Jim Gray, 2007)

Just as important
▪ The long tail (Heidorn, 2008)
▪ CODATA/Data-at-Risk Task Group
▪ Scholarly communications, data citation

      Technological affordances for improving and
      advancing science
Cultural shift toward data sharing
▪ National and international policies
  – US NSF and NIH [1, 2]
  – OECD (Organisation for Economic Co-operation and
    Development) [3]
  – INSPIRE Infrastructure for Spatial Information in the European
    Community EU Commission [4]
  – UK Medical Research Council [5]

             Dryad ―enables scientists to validate
             published findings, explore new analysis
             methodologies, repurpose data for research
             questions unanticipated by the original
             authors, and perform synthetic studies.‖
             (http://datadryad.org/)
Overview
▪ Why should we care?

▪ What is data?
▪   What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪   Challenges, opportunities, and jumping in
▪   Concluding comments
▪   Q&A
Data
▪ No single agreed upon definition
▪ One person‘s data is another person‘s
  information
▪ Data often implies the ―raw‖ stuff lacking
  context
   – Scholarly context, written assessment
▪ ―Essence of science‖ (Greenberg, et al, 2009)
▪ What is science?
   – The Archaeology Data Service (ADS)
     archaeologydataservice.ac.uk
Data                               quantity   type             The Dryad
                                                                Repository
                                    3162       Plain Text
I know it when I see it             476        Microsoft Excel
                                    308        Adobe Portable Document
                                               Format
By example: Traditional             302        Comma-separated values
observations, numbers, and          252        Nexus
measures stored in spreadsheets     153        Microsoft Excel OpenXML
and databases, fossils,             108        Microsoft Word
phylogenetic trees, and herbarium   80         Zip file
samples (White, 2008)               62         JPEG image
                                    45         Microsoft Word OpenXML
Other disciplines                   40         Extensible Markup Language
▪ Bioinformatics: Gene              35         Hypertext Markup Language
  expressions, DNA transcription    21         Rich Text Format
  to RNA translation                16         FASTA sequence file
                                    15         Tag Image File Format
▪ Geology, agriculture,
                                    14         Postscript Files
  surveillance, and historical
                                    2          Video Quicktime
  manuscript research:
                                    2          Mathematica Notebook
  Hyperspectral remote sensing
                                    1          Microsoft Powerpoint
                                    (email w/R. Scherle, July 2012)
Overview
▪ Why should we care?
▪ What is data?

▪ What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪   Challenges, opportunities, and jumping in
▪   Concluding comments
▪   Q&A
Metadata defined
……data about data
…….information about data

▪―Metadata or ‗data about data‘ describes the
content, quality, condition, and other
characteristics of data.‖ (FGDC Metadata WG,
1998)

▪Structured information about an object (data)
that facilitates functions associated with the
object. (Greenberg, 2002, 2003, 2009)
Typical functions

                             Control
 Discover     Manage
                              rights

  Identify     Certify       Indicate
 versions    authenticity     status

Mark conent   Situate        Describe
 strucure   geospatially    processes
Overview
▪ Why should we care?
▪ What is data?
▪ What is metadata‘s role w.r.t data?

▪ Selected metadata standards
▪ Challenges, opportunities, and jumping in
▪ Concluding comments
▪ Q&A
It gets messy really quickly
Metadata for Scientific Research Data


     Descriptive
       – General to granular
   ▪Value (addressing a topic, ―aboutness‖)
       – Topical (ontologies, subject heading lists/thesauri,
         taxonomies)
   ▪Named entities
       – Name authority files (people, organizations,
         geographical jurisdictions, structures, and events)
   ▪Geo-spatial (coordinates)
   ▪Temporal data (ISO 8601/ W3CDTF, or …)
Given the messiness…

―I cannot tell you exactly what metadata
standards, vocabularies, etc. to use…‖
Examining metadata schemes
 Objectives and    Domains               Architectural layout
 principles

 • Objectives • Discipline               • Structural design
                   • Genre               • Extent
 • Principles
                   • Format              • Granularity

Metadata Objectives and principles, Domain, and
Architectural Layout (MODAL) framework

(Greenberg, 2005; Willis, et al, JASIST 2012)
Objectives and    Domains           Architectural
Simple          principles                          layout
schemes
[6]             • Interoperability • Multi-         • Primarily flat
                • Easy to            disciplinary   • Minimal with
                  generate,        • Any genre or     means to
                  lower barrier      format           extend
                  to produce                        • General (not
                                                      granular)
Dublin Core
Metadata
Element Set
(DCMES)
ver.1.1
US MARC         • Need training                     • Primarily flat
bibliographic                                       • Extensible
format
DataCite                                            • Primarily flat
Dublin Core
    Application
    Profile-
    Dryad [7]





DataCite example, ver.2.2 [8]
National Institute for
Environmental Studies and
Center for Climate System
Research Japan
US MARC bibliographic
format: World Ocean
Circulation Experiment global
data (Moss Landing Marine
Labs and the Monterey Bay
Aquarium Research Institute
Library) [9]
Objectives and         Domains              Architectural
Simple/            principles                                  layout
moderate              Interoperability      Greater domain      Primarily flat
                       balanced               focus               Extensibility—
schemes                w/specific            Genera               via connecting
                       needs                  diversity within    Slightly more
                      Generation             a domain             granular
                       requires more
                       expertise
Darwin Core

Access to                                                      •   Not as flat
Biological
Collections Data
(ABCD)
Ecological
Metadata
Language
DCMI Terms                                                     • Graph approach
Wieczorek, et al. (2012). Darwin Core: An Evolving Community-
Developed Biodiversity Data Standard.
PLoS One. 2012; 7(1): e29715: doi: 10.1371/journal.pone.0029715.
Access to Biological Collections Data (ABCD) (A minimum record)

<?xml version='1.0' encoding='UTF-8'?> <DataSets
xmlns='http://www.tdwg.org/schemas/abcd/2.06'>
<DataSet>
<TechnicalContacts> <TechnicalContact> <Name>Gerd
MÃŒller</Name> <Email>gerd@dfb.de</Email>
</TechnicalContact> </TechnicalContacts>
<ContentContacts> <ContentContact> <Name>A
Another</Name> <Email>a.another@fake.org</Email>
</ContentContact> </ContentContacts> <Metadata>
<Description> <Representation language='en'>
<Title>PonTaurus collection</Title> </Representation>
</Description> <RevisionData> <DateModified>2001-03-
01T00:00:00</DateModified> </RevisionData> </Metadata>
<Units> <Unit>
<SourceInstitutionID>BGBM</SourceInstitutionID>
<SourceID>PonTaurus</SourceID> <UnitID>1136</UnitID>
</Unit> </Units> </DataSet> </DataSets>
abstract                educationLevel      modified
accessRights            extent              provenance
accrualMethod           format              publisher
accrualPeriodicity      hasFormat           references
accrualPolicy           hasPart             relation
alternative             hasVersion          replaces
audience                identifier          requires
available               instructionalMethod rights
bibliographicCitation   isFormatOf          rightsHolder
conformsTo              isPartOf            source
contributor             isReferencedBy      spatial
coverage                isReplacedBy        subject
created                 isRequiredBy        tableOfContents
creator                 issued              temporal
date                    isVersionOf         title
dateAccepted            language            type
dateCopyrighted         license             valid
dateSubmitted           mediator        Properties in the /terms/
description             medium                 namespace
Objectives and           Domains               Architectural
Complex           principles                                     layout
schemes
                     Interoperability     •    Genre focus         Hierarchical
                      level                •    Format              Extensive
                     Generation                variation           Granular
                      requires greater
                      expertise
FGDC
DDI

Content Standard for Digital                    Data Document Initiative (DDI)
Geospatial Metadata
(CSDGM)/FGDC
1. Identification Information (M)          1.   Concept
2. Data Quality Information                2.   Collecting
3. Spatial Data Organization Information   3.   Processing  Archiving
4. Spatial Reference Information           4.   Distribution  Archiving
5. Entity and Attribute Information        5.   Discovery
6. Distribution Information                6.   Analysis
7. Metadata Reference Information (M)      7.   Repurposing
Summary for descriptive schemes
▪ Simple: Interoperable, Easy to generate/low barrier,
  generally multidisciplinary, genera/format agnostics,
  primarily flat, general (not granular), 15-25 properties

▪ Simple/moderate: Interoperability balanced
  w/specific needs, generation requires more expertise,
  greater domain focus, extensible--via connecting to
  other schemes, more granular, more properties

▪ Complex: Interoperable level, generation requires
  expertise, genera focus/format variation, hierarchical,
  granular, and extensive (100+ properties)
Overview
▪   Why should we care?
▪   What is data?
▪   What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪ Challenges, opportunities, and jumping in
▪ Concluding comments
▪ Q&A
Challenges and opportunities
Challenges            Opportunities

Workflow/When to  Educate scientists early (Qin, 2009)
   ▪ Stop
generate the here Integrate into social setting w/Center for
metadata?         Embedded Networked Sensing
                  (CENS) (Borgman, Mayernik, etc., 2009-current;
                  Mayernik‘s dissertation, 2011)
Methods for generating Use automatic techniques as much as possible,
metadata (labor        leverage human expertise (Dryad, DataOne Excel
intensive)             project)

Too many standards    Don‘t panic, join communities, look for
Which one do I use?   examples. (If you can‘t find them?)
Do I need to          No. Explore and develop a best practice.
implement my          Pursue a 2 pronged approach (Greenberg, et al,
metadata as linked    2009)
data.
Jumping in…
1. DCMI/NISO Seminars !!
2. DCMI Science and Metadata Community
  (http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata)

3. Digital Curation Center (DCC)
  (http://www.dcc.ac.uk/)

4. The Research Data Management
   Training, or MANTRA project
  (http://datalib.edina.ac.uk/mantra/)

5. DataONE workshops and tutorials
  (www.dataone.org/)
Overview
▪   Why should we care?
▪   What is data?
▪   What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪   Challenges, opportunities, and jumping in
▪ Concluding comments
▪ Q&A
Concluding comments
▪ Standards are guidelines; no police
  – Aim for reasonable quality

▪ KISS: Keep it simple stupid
  – What’s vital; what will aid reuse?
▪ Help to move the practice forward
  – Share what you learn

▪ Nothing new/it‘s all new
  –   Data documentation since ancient times
  –   SILOS; let‘s break them down (Willis, et al, 2012)
  –   Greater connectivity than ever
  –   Cross-disciplinary approaches for problem solving
Overview
▪   Why should we care?
▪   What is data?
▪   What is metadata‘s role w.r.t data?
▪   Selected metadata standards
▪   Challenges, opportunities, and jumping in
▪   Concluding comments

▪ Q&A
Footnotes
[1] NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp.
[2] NIH Data Sharing Policy: http://grants.nih.gov/grants/policy/data_sharing/.
[3] ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT/Data and
Metadata Reporting and Presentation Handbook: http://www.oecd.org/std/37671574.pdf.
[4] The INSPIRE Infrastructure for Spatial Information in the European Community):
http://inspire.ec.europa.eu/index.cfm/pageid/48. directive released 15 May 2007 and will be
implemented in various stages, with full implementation required by 2019, and aims to create a
European Union (EU) spatial data infrastructure.
[5] UK medical research council:
http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/datasharing/index.html.
[6] The DCMI Glossary (scroll down for ―schema‖ entry):
http://dublincore.org/documents/usageguide/glossary.shtml#schema.
[7] Dublin Core Example: Data from: Divergence time estimation using fossils as terminal taxa
and the origins of Lissamphibia (Dryad repository):
http://datadryad.org/resource/doi:10.5061/dryad.8120?show=full.
[8] National Institute for Environmental Studies and Center for Climate System Research
Japan—animation data (DataCite): http://schema.datacite.org/meta/kernel-
2.2/example/datacite-metadata-sample-v2.2.xml.
[9] US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss
Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library):
http://mlml.kohalibrary.com/cgi-bin/koha/opac-detail.pl?biblionumber=9282.

Weitere ähnliche Inhalte

Was ist angesagt?

Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
EUCLID project
 
Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012
Figoblog
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round table
Figoblog
 

Was ist angesagt? (20)

Data Management Planning at the DCC
Data Management Planning at the DCCData Management Planning at the DCC
Data Management Planning at the DCC
 
The Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked DataThe Dublin Core 1:1 Principle in the Age of Linked Data
The Dublin Core 1:1 Principle in the Age of Linked Data
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
Better Search With Structured Knowledge
Better Search With Structured KnowledgeBetter Search With Structured Knowledge
Better Search With Structured Knowledge
 
Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012Semantic Web special interest group meeting - IFLA WLIC 2012
Semantic Web special interest group meeting - IFLA WLIC 2012
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
The Mysteries of Metadata
The Mysteries of MetadataThe Mysteries of Metadata
The Mysteries of Metadata
 
IFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round tableIFLA 2012 - OCLC Linked Data round table
IFLA 2012 - OCLC Linked Data round table
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 

Ähnlich wie NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
Sherry Lake
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
Dr.-Ing. Thomas Hartmann
 
Introduction to Metadata Standards
Introduction to Metadata StandardsIntroduction to Metadata Standards
Introduction to Metadata Standards
David Massart
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
lyarmey
 

Ähnlich wie NISO/DCMI Webinar: Metadata for Managing Scientific Research Data (20)

Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
LAC Group - Metadata for mere mortals (Choosing standards)
LAC Group - Metadata for mere mortals (Choosing standards)LAC Group - Metadata for mere mortals (Choosing standards)
LAC Group - Metadata for mere mortals (Choosing standards)
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
2012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 12012.10 - Workshop on Semantic Statistics - 1
2012.10 - Workshop on Semantic Statistics - 1
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Data science.pptx
Data science.pptxData science.pptx
Data science.pptx
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADT
 
Introduction to Metadata Standards
Introduction to Metadata StandardsIntroduction to Metadata Standards
Introduction to Metadata Standards
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 

Mehr von National Information Standards Organization (NISO)

Mehr von National Information Standards Organization (NISO) (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Kürzlich hochgeladen (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 

NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

  • 1. Metadata for Managing Scientific Research Data NISO/DCMI Webinar: August 22, 2012 Jane Greenberg, Professor and Director of the SILS Metadata Research Center janeg@email.unc.edu
  • 2. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 3. Why should we care? BIG stuff ▪ Digital data deluge (Hey & Trefethen, 2003) ▪ Big data (New York Times) 2008 ▪ The fourth paradigm (Jim Gray, 2007) Just as important ▪ The long tail (Heidorn, 2008) ▪ CODATA/Data-at-Risk Task Group ▪ Scholarly communications, data citation Technological affordances for improving and advancing science
  • 4. Cultural shift toward data sharing ▪ National and international policies – US NSF and NIH [1, 2] – OECD (Organisation for Economic Co-operation and Development) [3] – INSPIRE Infrastructure for Spatial Information in the European Community EU Commission [4] – UK Medical Research Council [5] Dryad ―enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.‖ (http://datadryad.org/)
  • 5. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 6. Data ▪ No single agreed upon definition ▪ One person‘s data is another person‘s information ▪ Data often implies the ―raw‖ stuff lacking context – Scholarly context, written assessment ▪ ―Essence of science‖ (Greenberg, et al, 2009) ▪ What is science? – The Archaeology Data Service (ADS) archaeologydataservice.ac.uk
  • 7. Data quantity type The Dryad Repository 3162 Plain Text I know it when I see it 476 Microsoft Excel 308 Adobe Portable Document Format By example: Traditional 302 Comma-separated values observations, numbers, and 252 Nexus measures stored in spreadsheets 153 Microsoft Excel OpenXML and databases, fossils, 108 Microsoft Word phylogenetic trees, and herbarium 80 Zip file samples (White, 2008) 62 JPEG image 45 Microsoft Word OpenXML Other disciplines 40 Extensible Markup Language ▪ Bioinformatics: Gene 35 Hypertext Markup Language expressions, DNA transcription 21 Rich Text Format to RNA translation 16 FASTA sequence file 15 Tag Image File Format ▪ Geology, agriculture, 14 Postscript Files surveillance, and historical 2 Video Quicktime manuscript research: 2 Mathematica Notebook Hyperspectral remote sensing 1 Microsoft Powerpoint (email w/R. Scherle, July 2012)
  • 8. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 9. Metadata defined ……data about data …….information about data ▪―Metadata or ‗data about data‘ describes the content, quality, condition, and other characteristics of data.‖ (FGDC Metadata WG, 1998) ▪Structured information about an object (data) that facilitates functions associated with the object. (Greenberg, 2002, 2003, 2009)
  • 10. Typical functions Control Discover Manage rights Identify Certify Indicate versions authenticity status Mark conent Situate Describe strucure geospatially processes
  • 11. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 12. It gets messy really quickly
  • 13. Metadata for Scientific Research Data Descriptive – General to granular ▪Value (addressing a topic, ―aboutness‖) – Topical (ontologies, subject heading lists/thesauri, taxonomies) ▪Named entities – Name authority files (people, organizations, geographical jurisdictions, structures, and events) ▪Geo-spatial (coordinates) ▪Temporal data (ISO 8601/ W3CDTF, or …)
  • 14. Given the messiness… ―I cannot tell you exactly what metadata standards, vocabularies, etc. to use…‖
  • 15. Examining metadata schemes Objectives and Domains Architectural layout principles • Objectives • Discipline • Structural design • Genre • Extent • Principles • Format • Granularity Metadata Objectives and principles, Domain, and Architectural Layout (MODAL) framework (Greenberg, 2005; Willis, et al, JASIST 2012)
  • 16. Objectives and Domains Architectural Simple principles layout schemes [6] • Interoperability • Multi- • Primarily flat • Easy to disciplinary • Minimal with generate, • Any genre or means to lower barrier format extend to produce • General (not granular) Dublin Core Metadata Element Set (DCMES) ver.1.1 US MARC • Need training • Primarily flat bibliographic • Extensible format DataCite • Primarily flat
  • 17. Dublin Core Application Profile- Dryad [7] 
  • 18. DataCite example, ver.2.2 [8] National Institute for Environmental Studies and Center for Climate System Research Japan
  • 19. US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library) [9]
  • 20. Objectives and Domains Architectural Simple/ principles layout moderate  Interoperability  Greater domain  Primarily flat balanced focus  Extensibility— schemes w/specific  Genera via connecting needs diversity within  Slightly more  Generation a domain granular requires more expertise Darwin Core Access to • Not as flat Biological Collections Data (ABCD) Ecological Metadata Language DCMI Terms • Graph approach
  • 21. Wieczorek, et al. (2012). Darwin Core: An Evolving Community- Developed Biodiversity Data Standard. PLoS One. 2012; 7(1): e29715: doi: 10.1371/journal.pone.0029715.
  • 22. Access to Biological Collections Data (ABCD) (A minimum record) <?xml version='1.0' encoding='UTF-8'?> <DataSets xmlns='http://www.tdwg.org/schemas/abcd/2.06'> <DataSet> <TechnicalContacts> <TechnicalContact> <Name>Gerd MÃŒller</Name> <Email>gerd@dfb.de</Email> </TechnicalContact> </TechnicalContacts> <ContentContacts> <ContentContact> <Name>A Another</Name> <Email>a.another@fake.org</Email> </ContentContact> </ContentContacts> <Metadata> <Description> <Representation language='en'> <Title>PonTaurus collection</Title> </Representation> </Description> <RevisionData> <DateModified>2001-03- 01T00:00:00</DateModified> </RevisionData> </Metadata> <Units> <Unit> <SourceInstitutionID>BGBM</SourceInstitutionID> <SourceID>PonTaurus</SourceID> <UnitID>1136</UnitID> </Unit> </Units> </DataSet> </DataSets>
  • 23. abstract educationLevel modified accessRights extent provenance accrualMethod format publisher accrualPeriodicity hasFormat references accrualPolicy hasPart relation alternative hasVersion replaces audience identifier requires available instructionalMethod rights bibliographicCitation isFormatOf rightsHolder conformsTo isPartOf source contributor isReferencedBy spatial coverage isReplacedBy subject created isRequiredBy tableOfContents creator issued temporal date isVersionOf title dateAccepted language type dateCopyrighted license valid dateSubmitted mediator Properties in the /terms/ description medium namespace
  • 24. Objectives and Domains Architectural Complex principles layout schemes  Interoperability • Genre focus  Hierarchical level • Format  Extensive  Generation variation  Granular requires greater expertise FGDC DDI Content Standard for Digital Data Document Initiative (DDI) Geospatial Metadata (CSDGM)/FGDC 1. Identification Information (M) 1. Concept 2. Data Quality Information 2. Collecting 3. Spatial Data Organization Information 3. Processing  Archiving 4. Spatial Reference Information 4. Distribution  Archiving 5. Entity and Attribute Information 5. Discovery 6. Distribution Information 6. Analysis 7. Metadata Reference Information (M) 7. Repurposing
  • 25. Summary for descriptive schemes ▪ Simple: Interoperable, Easy to generate/low barrier, generally multidisciplinary, genera/format agnostics, primarily flat, general (not granular), 15-25 properties ▪ Simple/moderate: Interoperability balanced w/specific needs, generation requires more expertise, greater domain focus, extensible--via connecting to other schemes, more granular, more properties ▪ Complex: Interoperable level, generation requires expertise, genera focus/format variation, hierarchical, granular, and extensive (100+ properties)
  • 26.
  • 27. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 28. Challenges and opportunities Challenges Opportunities Workflow/When to Educate scientists early (Qin, 2009) ▪ Stop generate the here Integrate into social setting w/Center for metadata? Embedded Networked Sensing (CENS) (Borgman, Mayernik, etc., 2009-current; Mayernik‘s dissertation, 2011) Methods for generating Use automatic techniques as much as possible, metadata (labor leverage human expertise (Dryad, DataOne Excel intensive) project) Too many standards Don‘t panic, join communities, look for Which one do I use? examples. (If you can‘t find them?) Do I need to No. Explore and develop a best practice. implement my Pursue a 2 pronged approach (Greenberg, et al, metadata as linked 2009) data.
  • 29. Jumping in… 1. DCMI/NISO Seminars !! 2. DCMI Science and Metadata Community (http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata) 3. Digital Curation Center (DCC) (http://www.dcc.ac.uk/) 4. The Research Data Management Training, or MANTRA project (http://datalib.edina.ac.uk/mantra/) 5. DataONE workshops and tutorials (www.dataone.org/)
  • 30. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 31. Concluding comments ▪ Standards are guidelines; no police – Aim for reasonable quality ▪ KISS: Keep it simple stupid – What’s vital; what will aid reuse? ▪ Help to move the practice forward – Share what you learn ▪ Nothing new/it‘s all new – Data documentation since ancient times – SILOS; let‘s break them down (Willis, et al, 2012) – Greater connectivity than ever – Cross-disciplinary approaches for problem solving
  • 32. Overview ▪ Why should we care? ▪ What is data? ▪ What is metadata‘s role w.r.t data? ▪ Selected metadata standards ▪ Challenges, opportunities, and jumping in ▪ Concluding comments ▪ Q&A
  • 33. Footnotes [1] NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp. [2] NIH Data Sharing Policy: http://grants.nih.gov/grants/policy/data_sharing/. [3] ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT/Data and Metadata Reporting and Presentation Handbook: http://www.oecd.org/std/37671574.pdf. [4] The INSPIRE Infrastructure for Spatial Information in the European Community): http://inspire.ec.europa.eu/index.cfm/pageid/48. directive released 15 May 2007 and will be implemented in various stages, with full implementation required by 2019, and aims to create a European Union (EU) spatial data infrastructure. [5] UK medical research council: http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/datasharing/index.html. [6] The DCMI Glossary (scroll down for ―schema‖ entry): http://dublincore.org/documents/usageguide/glossary.shtml#schema. [7] Dublin Core Example: Data from: Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia (Dryad repository): http://datadryad.org/resource/doi:10.5061/dryad.8120?show=full. [8] National Institute for Environmental Studies and Center for Climate System Research Japan—animation data (DataCite): http://schema.datacite.org/meta/kernel- 2.2/example/datacite-metadata-sample-v2.2.xml. [9] US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library): http://mlml.kohalibrary.com/cgi-bin/koha/opac-detail.pl?biblionumber=9282.