Weitere ähnliche Inhalte Ähnlich wie DIRISA for Open Data and Open Science/Anwar Vahed (20) Mehr von African Open Science Platform (20) Kürzlich hochgeladen (20) DIRISA for Open Data and Open Science/Anwar Vahed1. DIRISA for Open Data and
Open Science
SA-EU Open Science Dialogue Project
15 May 2018
2. National Integrated Cyberinfrastructure System
National Integrated
Cyberinfrastructure System
(NICIS)
• Advanced Integrated cyber
platform offering services for
• HPC
• Data
• Networking
• Priority science and
education
• Overarching coordination
implemented by CSIR
© CSIR, 2018 2
Core
services
Networked
resources
Skills&expertise
Computing
Services
(CHPC +)
Networking
Services
(SANReN)
Data
Services
(DIRISA)
e-Research environments (Cloud)
Materials & Manuf.
Energy
Earth & Environment
Phy Sci & Eng.
Humans & Society
Health, Bio & Food
3. DIRISA Objectives
3
Build national data infrastructure
• Build and maintain Tier 1 nodes and services
• Start Tier 2 domain nodes
Develop human capital and skills
• e-Science postgraduate programs
• Conferences and training workshops
Research data
management
• DMP tool
• PID service
• User policies
and practices
Advocate and
coordinate
• R&D initiatives
• Stakeholder
workshops
© CSIR, 2018
4. National Data Infrastructure
© CSIR, 2018 4
• “I just want to store/preserve my data (reliably)”
• “I just want to share my data (in a controlled way)”
• “I just want to process my data”
• NICIS-DIRISA role:
• Link into Tier 0
• Build and maintain Tier 1
• Support starting up Tier 2
• Link into Tier 3
One-Stop-Shop:
Federated access to
research data
5. Underpinning Open Data & Open Science
© CSIR, 2018 5
1. National infrastructure and services for Open data
• DIRISA Tier 1 (8PB) store & Research Data Management services
• Regional Tier 2 Node
2. Human capital development
• National e-Science Masters
• Data Science training
3. Data management
• PID Allocation: Handle and DOI registries
• SA_DMP: SA Data Management Planning tool
• Policies across data life cycle
4. Outreach and coordination
• Conferences and workshops: SA Data Conference 19-21 June
• Africa Open Science Platform
• Big Data strategy
6. © CSIR, 2018 6
South African National Data Infrastructure (SANDI)
DSubscribe
• Subscribe
as DIRISA
user
DataDrop
• Deposit
and store
data
reliably
FindGet
• Discover,
download
data sets
SafeShare
• Safely
share data
with users
DataStage
• Prepare
data for
HPC
User documentation Help & support Core services (DMP, PID)
Phase1: Research Data Management
• My data
management plans
• My workflows
• My data sets and
outputs
• My communities
Phase2:
Collaborative
Research
Environments
(References: EUDAT, ANDS, JISC, Data.gov)
7. Data Access Spectrum: Open by default
7
Small – Medium – Big data
Personal – Business – Government
Closed Shared Open
Internal access
• Private
• Confidential
• Sensitive
• Surveillance data
Named access
• Assigned by
contract
• Regulation
authorised
• Drivers licences
Group based
access
• Project assigned
• Selected
membership
• Genomic data
Public access
• Licence that
limits use
• Terms and
conditions
• Geospatial data
Anyone
• Open to public
• No limits on use
• Weather data
(ODI)
8. Actions
Data custodians/stewards: individuals; institutions; groups/consortia;
government; business
• Advocate and promote: Increase visibility and benefits of open data
• Clarify Open data and related concepts: Governance, Stewardship,
Custodianship, IP, Copyright,…
• Change accreditation model: data citation recognition, altmetrics, etc
• Develop policies: institutional strategies, standards, protocols, principles
and recommended practices
• Change training: Include Open data concepts in (data) science curricula
• Funding model: incentives/requirements for Open data principles
• Harmonise privacy and openness regulation
© CSIR, 2018 8
9. Beyond FAIR
• FAIR: Findable, Accessible, Interoperable, Reusable
• FAIReR: FAIR + Reproducible
• FAIReST: FAIR + Stewardship + Transparency (truth
and trust)
[Liz Lyon, University of Pittsburgh]
Open Science => New roles…
© CSIR, 2018 9
14. Architecture
© CSIR, 2018 14
• Open (FAIR) Data &
Open Science
• Federated locally and
globally (“One-stop-
shop” catalogue)
• Certified as Trusted
Repository
• Linked to funder systems
• Suite of services for RDM
and data intensive
analytics
40 PB
2 PB
Archival data & staging:
VM access
8 PB
Active data: near real time
interactive access
0.5
PB
Services & staging
between DIRISA and CHPC
storage systems
Storage
Virtualisation
ServerCHPC Lustre or
Posix storage
systems
CHPC
compute
systems
* PB
Software defined storage hierarchySmall, fast Big, slow
iRODS
DIRISA cloud portal