SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Setting up a data repository,
what does it entail?
Experience sharing with the
Ethiopian Nutrition Information Platform for nutrition
Indira Yerramareddy and Nilam Prasai
International Food Policy Research Institute (IFPRI)
Webinar, April 8, 2019
Data Repository: The What
 IT infrastructure (cloud based/online) set up to manage, share,
access, maintain, and archive datasets
 An application database specialized in storing metadata of data
files/datasets/databases
 Differs from publication repository mainly in its ability
 to store metadata at different level/hierarchy
 to store and ingest data files in various formats for long-term
preservation
Data Repository: The Why
 Easy information discovery
 Easy and efficient access
 More exposure and amplify impact
 Persistent access (through persistent URL)
 Long-term storage and preservation
Additionally,
 Enables unprecedented use, analysis, and discovery through
interoperability and interlinking with other repositories
 Enhances Open science scholarship
Data Repository: The Software
 Data specific – Dataverse, Comprehensive Knowledge Archive
Network (CKAN), HUBzero
 Expanded from text-based institutional repository – DSpace,
Fedora, Hydra
 Open source – Dataverse, HUBzero, Dspace, Fedora, CKAN
 Proprietary/commercial – Figshare, Interuniversity Consortium for
Political and Social Resource [ICSPR], Digital Commons
Data Repository Software: Limitations
Source: Amorim R.C., Castro J.A., da Silva J.R., Ribeiro C. (2015) A Comparative Study of Platforms for Research Data
Management: Interoperability, Metadata Capabilities and Integration Potential. https://doi.org/10.1007/978-3-319-16486-1_10
Data Repository: Hosting??
Data Repository: Hosting or Partnering
 Cost
oFree ??
oDirect and ongoing cost of purchasing and keeping sever or
cloud space
oOngoing cost for maintenance
 Inhouse technology infrastructure and expertise
 Number of datasets
 Size/volume of data
 Sustainability
Data Repository: Selecting partners
 Reputation
o Is the repository service provider reputable
o Does the repository demonstrate acceptance and usage in the field of
its scholarship
 Visibility
o How will other discover datasets
o Is the repository indexed by google and other databases
 Sustainability
o Does the repository have long-term data management plan
o Does the repository have contingency plans
 Cost
o Free
o Direct cost for depositing datasets
o Indirect cost for maintaining and preserving datasets
 Persistence identifier
o Does the repository assigns persistence identifier (PID) for
datasets/datafiles
Data Repository: Selecting partners cont…
 Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH)
o Is the repository OAI-PMH compliant
 Policies and licenses
o Does the repository allow data use agreements and licensing to state
explicitly what uses the data owners will allow
o Does the repository allow to set open, closed or restricted access
o Does the repository allow creative commons licensing
 Scholarly impact
o Does the repository track data citation and impact
o How does the repository track downloads and usage
o Does the repository allow integration with third party impact tracking
software (Altmetrics, Plum etc.)
 Support services
o Is the support service of the repository well-established and quick
Preferred,
 Recognized by the FAIRsharing.org
Data Curation: A Crucial Function for
Successful Repository
 What is data curation
o active and ongoing management of data through lifecycle of interest
and usefulness
o processes and activities related to organizing, describing, cleaning,
annotating, enhancing and preserving data for public use.
o centered around maintaining and managing the metadata rather than
the data itself [documenting]
o directly linked with preservation and maintenance, interoperability, and
reuse
 Begins with the research lifecycle
 Requires change in culture/practice
Data Curation Mountain: Understanding the
concept
 What is data curation
o active and ongoing management of data through lifecycle of interest
and usefulness
o processes and activities related to organizing, describing, cleaning,
annotating, enhancing and preserving data for public use.
o centered around maintaining and managing the metadata rather than
the data itself [documenting]
o directly linked with preservation and maintenance, interoperability, and
reuse
 Begins with the research lifecycle
 Requires change in culture/practice
Dataverse: The What
 Harvard’s open source research data repository software
Dataverse: The What
 40 installations around the
world
 Harvard Dataverse, Odum
Dataverse, Abacus, INRA,
Bostwana Harvard Data
 CGIAR Centers with local
installation: CIFOR, CIMMYT,
ICRISAT
Dataverse: Pros and Cons
Pros Cons
Open source software Metadata uniformity across
domains (e.g. Social sciences,
geology, computer sciences,
etc.)
Evolving and addresses users
concern
Not effective in handling
sensitive data
DOI minting; Handles Can’t delete datasets; only
deaccessioned them
OAI-PMH compliant Custom version control
API calls
Harvard Dataverse
One installation of Dataverse software hosted and
managed by Harvard University
Harvard
Dataverse
3166
Dataverses
IFPRI
Dataverse
10 collections
(sub-Dataverse)
Study
Metadata Data Files
Study
Metadata Data Files
Study
Metadata Data Files
Data citation and Metadata Example from IFPRI Dataverse
Data Sharing: IFPRI’s History
CD-ROMs/IFPRI.Org
o Began in 1999
o Shared through IFPRI.org as a zipped file
o Data files in proprietary format
o No guarantee for long term preservation
o No unique identifier for dataset/datafile
o Cart system check out imposed barrier to access
o No licensing mechanism for dataset/datafiles
Harvard Dataverse
o Started migrating datasets to Dataverse in 2008
o All datasets are in Dataverse since 2013
Harvard Dataverse Benefits: IFPRI’s Perspectives
 Apparently free of cost than staff time for curation
 No stress of maintaining server and backups
 Always evolving and open to customer’s feedback
 Quick customer support service
 More sustainable for IFPRI in resource constrained environment
 Complies with the standards of data sharing and FAIR principles
 Easy access and data discovery
 Recognized by FAIRsharing.org, open data registry, and many journal
publishers
 Indexed by google datasets
Harvard Dataverse Hiccups: IFPRI’s Perspectives
 Difficult to fit-in IFPRI specific feature requests unless it serves the wider
community
 Slow rendering because of growing data volume
 Not designed for handling sensitive data
 Batch modification a challenge if you don’t have programming/API
expertise in-house
 Limited options for customizing interface
 Free service – can’t enforce to execute the new features quickly
Data Services at IFPRI: What and How
 Data policy awareness
o IFPRI Data Policy
o CGIAR Open Access and Open Data Policy
o Donor Policies (USA, BMGF, EU)
 Data management plan
o Assist in developing data management plan
o Review data management plan
 Data publishing
o Review datasets (direct/indirect personally identifiable information)
o Create metadata records (using controlled vocabulary and data
publishing standards)
o Create codebooks
o Develop guidelines for researchers (preparing data for publishing, data
anonymization etc)
o Registering datasets in donor specific library (USAID data
development library
Questions and Comments ??

Weitere ähnliche Inhalte

Was ist angesagt?

Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...Christopher Bradley
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data ManagementBhavendra Chavan
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehousesDhani Ahmad
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDenodo
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingDATAVERSITY
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEdureka!
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceRoland Bullivant
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDMMarieke Guy
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hPrecisely
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesPaul Van Siclen
 
Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides SlideTeam
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesSusanMRob
 

Was ist angesagt? (20)

Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
DMBOK 2.0 and other frameworks including TOGAF & COBIT - keynote from DAMA Au...
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data Organizations
 
Conceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data ModelingConceptual vs. Logical vs. Physical Data Modeling
Conceptual vs. Logical vs. Physical Data Modeling
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-h
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides Data Management PowerPoint Presentation Slides
Data Management PowerPoint Presentation Slides
 
Data visualization-tools
Data visualization-toolsData visualization-tools
Data visualization-tools
 
A Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support ServicesA Data Curation Framework: Data Curation and Research Support Services
A Data Curation Framework: Data Curation and Research Support Services
 

Ähnlich wie Setting up a data repository, what does it entail?

Guiding users through data deposit
Guiding users through data depositGuiding users through data deposit
Guiding users through data depositRobin Rice
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...Pedro Príncipe
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional RepositoriesRobin Rice
 
Providing support and services for researchers in good data governance
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governanceRobin Rice
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Practicing Open Science
Practicing Open SciencePracticing Open Science
Practicing Open Sciencedri_ireland
 
eROSA Stakeholder WS1: Thoughts on FAIRifying data
eROSA Stakeholder WS1: Thoughts on FAIRifying dataeROSA Stakeholder WS1: Thoughts on FAIRifying data
eROSA Stakeholder WS1: Thoughts on FAIRifying datae-ROSA
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | EUDAT
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data pagetTERN Australia
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 

Ähnlich wie Setting up a data repository, what does it entail? (20)

Guiding users through data deposit
Guiding users through data depositGuiding users through data deposit
Guiding users through data deposit
 
University of Minho Data Repository - features to publish & share data and w...
University of Minho Data Repository - features to publish & share data and  w...University of Minho Data Repository - features to publish & share data and  w...
University of Minho Data Repository - features to publish & share data and w...
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
CARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practiceCARARE: Can I use this data? FAIR into practice
CARARE: Can I use this data? FAIR into practice
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 
Providing support and services for researchers in good data governance
Providing support and services for researchers in good data governanceProviding support and services for researchers in good data governance
Providing support and services for researchers in good data governance
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Practicing Open Science
Practicing Open SciencePracticing Open Science
Practicing Open Science
 
eROSA Stakeholder WS1: Thoughts on FAIRifying data
eROSA Stakeholder WS1: Thoughts on FAIRifying dataeROSA Stakeholder WS1: Thoughts on FAIRifying data
eROSA Stakeholder WS1: Thoughts on FAIRifying data
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu | Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
Research Data Management Introduction: EUDAT/Open AIRE Webinar| www.eudat.eu |
 
Resources for Research Data Managers - 2014-05-28 - University of Oxford
Resources for Research Data Managers - 2014-05-28 - University of OxfordResources for Research Data Managers - 2014-05-28 - University of Oxford
Resources for Research Data Managers - 2014-05-28 - University of Oxford
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 

Mehr von International Food Policy Research Institute (IFPRI)

Mehr von International Food Policy Research Institute (IFPRI) (20)

Targeting in Development Projects: Approaches, challenges, and lessons learned
Targeting in Development Projects: Approaches, challenges, and lessons learnedTargeting in Development Projects: Approaches, challenges, and lessons learned
Targeting in Development Projects: Approaches, challenges, and lessons learned
 
Prevalence and Impact of Landmines on Ukrainian Agricultural Production
Prevalence and Impact of Landmines on Ukrainian Agricultural ProductionPrevalence and Impact of Landmines on Ukrainian Agricultural Production
Prevalence and Impact of Landmines on Ukrainian Agricultural Production
 
Global Markets and the War in Ukraine
Global Markets and  the War in UkraineGlobal Markets and  the War in Ukraine
Global Markets and the War in Ukraine
 
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade
Impact of the Russian Military Invasion on Ukraine’s Agriculture and Trade
 
Mapping cropland extent over a complex landscape: An assessment of the best a...
Mapping cropland extent over a complex landscape: An assessment of the best a...Mapping cropland extent over a complex landscape: An assessment of the best a...
Mapping cropland extent over a complex landscape: An assessment of the best a...
 
Examples of remote sensing application in agriculture monitoring
Examples of remote sensing application in agriculture monitoringExamples of remote sensing application in agriculture monitoring
Examples of remote sensing application in agriculture monitoring
 
Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...
 
Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...
 
Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...
 
Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...
 
Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...Statistics from Space: Next-Generation Agricultural Production Information fo...
Statistics from Space: Next-Generation Agricultural Production Information fo...
 
Current ENSO and IOD Conditions, Forecasts, and the Potential Impacts
Current ENSO and IOD Conditions, Forecasts, and the Potential ImpactsCurrent ENSO and IOD Conditions, Forecasts, and the Potential Impacts
Current ENSO and IOD Conditions, Forecasts, and the Potential Impacts
 
The importance of Rice in Senegal
The importance of Rice in SenegalThe importance of Rice in Senegal
The importance of Rice in Senegal
 
Global Rice Market and Export Restriction
Global Rice Market and Export RestrictionGlobal Rice Market and Export Restriction
Global Rice Market and Export Restriction
 
Global Rice Market Situation and Outlook
Global Rice Market Situation and Outlook Global Rice Market Situation and Outlook
Global Rice Market Situation and Outlook
 
Rice prices at highest (nominal) level in 15 years
Rice prices at highest (nominal) level in 15 yearsRice prices at highest (nominal) level in 15 years
Rice prices at highest (nominal) level in 15 years
 
Book Launch: Political Economy and Policy Analysis (PEPA) Sourcebook
Book Launch:  Political Economy and Policy Analysis (PEPA) SourcebookBook Launch:  Political Economy and Policy Analysis (PEPA) Sourcebook
Book Launch: Political Economy and Policy Analysis (PEPA) Sourcebook
 
Shocks, Production, Exports and Market Prices: An Analysis of the Rice Sector...
Shocks, Production, Exports and Market Prices: An Analysis of the Rice Sector...Shocks, Production, Exports and Market Prices: An Analysis of the Rice Sector...
Shocks, Production, Exports and Market Prices: An Analysis of the Rice Sector...
 
Anticipatory cash for climate resilience
Anticipatory cash for climate resilienceAnticipatory cash for climate resilience
Anticipatory cash for climate resilience
 
2023 Global Report on Food Crises: Joint Analysis for Better Decisions
2023 Global Report on Food Crises: Joint Analysis for Better Decisions 2023 Global Report on Food Crises: Joint Analysis for Better Decisions
2023 Global Report on Food Crises: Joint Analysis for Better Decisions
 

Kürzlich hochgeladen

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Setting up a data repository, what does it entail?

  • 1. Setting up a data repository, what does it entail? Experience sharing with the Ethiopian Nutrition Information Platform for nutrition Indira Yerramareddy and Nilam Prasai International Food Policy Research Institute (IFPRI) Webinar, April 8, 2019
  • 2. Data Repository: The What  IT infrastructure (cloud based/online) set up to manage, share, access, maintain, and archive datasets  An application database specialized in storing metadata of data files/datasets/databases  Differs from publication repository mainly in its ability  to store metadata at different level/hierarchy  to store and ingest data files in various formats for long-term preservation
  • 3. Data Repository: The Why  Easy information discovery  Easy and efficient access  More exposure and amplify impact  Persistent access (through persistent URL)  Long-term storage and preservation Additionally,  Enables unprecedented use, analysis, and discovery through interoperability and interlinking with other repositories  Enhances Open science scholarship
  • 4. Data Repository: The Software  Data specific – Dataverse, Comprehensive Knowledge Archive Network (CKAN), HUBzero  Expanded from text-based institutional repository – DSpace, Fedora, Hydra  Open source – Dataverse, HUBzero, Dspace, Fedora, CKAN  Proprietary/commercial – Figshare, Interuniversity Consortium for Political and Social Resource [ICSPR], Digital Commons
  • 5. Data Repository Software: Limitations Source: Amorim R.C., Castro J.A., da Silva J.R., Ribeiro C. (2015) A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential. https://doi.org/10.1007/978-3-319-16486-1_10
  • 7. Data Repository: Hosting or Partnering  Cost oFree ?? oDirect and ongoing cost of purchasing and keeping sever or cloud space oOngoing cost for maintenance  Inhouse technology infrastructure and expertise  Number of datasets  Size/volume of data  Sustainability
  • 8. Data Repository: Selecting partners  Reputation o Is the repository service provider reputable o Does the repository demonstrate acceptance and usage in the field of its scholarship  Visibility o How will other discover datasets o Is the repository indexed by google and other databases  Sustainability o Does the repository have long-term data management plan o Does the repository have contingency plans  Cost o Free o Direct cost for depositing datasets o Indirect cost for maintaining and preserving datasets  Persistence identifier o Does the repository assigns persistence identifier (PID) for datasets/datafiles
  • 9. Data Repository: Selecting partners cont…  Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) o Is the repository OAI-PMH compliant  Policies and licenses o Does the repository allow data use agreements and licensing to state explicitly what uses the data owners will allow o Does the repository allow to set open, closed or restricted access o Does the repository allow creative commons licensing  Scholarly impact o Does the repository track data citation and impact o How does the repository track downloads and usage o Does the repository allow integration with third party impact tracking software (Altmetrics, Plum etc.)  Support services o Is the support service of the repository well-established and quick Preferred,  Recognized by the FAIRsharing.org
  • 10. Data Curation: A Crucial Function for Successful Repository  What is data curation o active and ongoing management of data through lifecycle of interest and usefulness o processes and activities related to organizing, describing, cleaning, annotating, enhancing and preserving data for public use. o centered around maintaining and managing the metadata rather than the data itself [documenting] o directly linked with preservation and maintenance, interoperability, and reuse  Begins with the research lifecycle  Requires change in culture/practice
  • 11. Data Curation Mountain: Understanding the concept  What is data curation o active and ongoing management of data through lifecycle of interest and usefulness o processes and activities related to organizing, describing, cleaning, annotating, enhancing and preserving data for public use. o centered around maintaining and managing the metadata rather than the data itself [documenting] o directly linked with preservation and maintenance, interoperability, and reuse  Begins with the research lifecycle  Requires change in culture/practice
  • 12. Dataverse: The What  Harvard’s open source research data repository software
  • 13. Dataverse: The What  40 installations around the world  Harvard Dataverse, Odum Dataverse, Abacus, INRA, Bostwana Harvard Data  CGIAR Centers with local installation: CIFOR, CIMMYT, ICRISAT
  • 14. Dataverse: Pros and Cons Pros Cons Open source software Metadata uniformity across domains (e.g. Social sciences, geology, computer sciences, etc.) Evolving and addresses users concern Not effective in handling sensitive data DOI minting; Handles Can’t delete datasets; only deaccessioned them OAI-PMH compliant Custom version control API calls
  • 15. Harvard Dataverse One installation of Dataverse software hosted and managed by Harvard University Harvard Dataverse 3166 Dataverses IFPRI Dataverse 10 collections (sub-Dataverse) Study Metadata Data Files Study Metadata Data Files Study Metadata Data Files
  • 16.
  • 17. Data citation and Metadata Example from IFPRI Dataverse
  • 18. Data Sharing: IFPRI’s History CD-ROMs/IFPRI.Org o Began in 1999 o Shared through IFPRI.org as a zipped file o Data files in proprietary format o No guarantee for long term preservation o No unique identifier for dataset/datafile o Cart system check out imposed barrier to access o No licensing mechanism for dataset/datafiles Harvard Dataverse o Started migrating datasets to Dataverse in 2008 o All datasets are in Dataverse since 2013
  • 19. Harvard Dataverse Benefits: IFPRI’s Perspectives  Apparently free of cost than staff time for curation  No stress of maintaining server and backups  Always evolving and open to customer’s feedback  Quick customer support service  More sustainable for IFPRI in resource constrained environment  Complies with the standards of data sharing and FAIR principles  Easy access and data discovery  Recognized by FAIRsharing.org, open data registry, and many journal publishers  Indexed by google datasets
  • 20. Harvard Dataverse Hiccups: IFPRI’s Perspectives  Difficult to fit-in IFPRI specific feature requests unless it serves the wider community  Slow rendering because of growing data volume  Not designed for handling sensitive data  Batch modification a challenge if you don’t have programming/API expertise in-house  Limited options for customizing interface  Free service – can’t enforce to execute the new features quickly
  • 21. Data Services at IFPRI: What and How  Data policy awareness o IFPRI Data Policy o CGIAR Open Access and Open Data Policy o Donor Policies (USA, BMGF, EU)  Data management plan o Assist in developing data management plan o Review data management plan  Data publishing o Review datasets (direct/indirect personally identifiable information) o Create metadata records (using controlled vocabulary and data publishing standards) o Create codebooks o Develop guidelines for researchers (preparing data for publishing, data anonymization etc) o Registering datasets in donor specific library (USAID data development library

Hinweis der Redaktion

  1. Each software contains different ranges of management and collaborative options; ingestion of various data types
  2. Software framework that enables institutions to host research data repositories Allows data sharing, control, persistent data citation, data publishing and versioning management Rich data support: Tabular Data: STATA, SPSS, CSV, Tab delimited Social network data: graph ML Other data or complimentary files : all formats are accepted; only tabular files have full data support
  3. Harvard Dataverse- A centralized software installation and data repository Dataverse- Individual distributed data archive with its own branding