1. The document describes how to use EOSC-hub services like Onedata, DataHub, Notebooks, Check-in, B2FIND, and B2HANDLE to enable open science workflows including data analysis, management, discovery, and publishing of reusable results.
2. A demonstration is proposed showing how a user can authenticate to DataHub and Notebooks using Check-in, access and analyze data on Onedata, publish notebooks and data with PIDs, and have metadata harvested and content discovered through B2FIND.
3. Integrating these services allows users to perform data analysis on large volumes of data through a server-side parallel approach while producing reusable results following FAIR guidelines.
Formation of low mass protostars and their circumstellar disks
Open Data analysis with EOSC-hub services
1. EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536.
eosc-hub.eu
@EOSC_eu
Baptiste Grenier / Enol Fernández
EGI Foundation
Open Data analysis with EOSC-hub services
Dissemination level: Public
2. 2
Thanks to the EOSC-hub distributed team!
Onedata and DataHub: Lukasz Dutka,
Lukasz Opiola, Bartosz Kryza, Michal
Orzechowski
EGI FedCloud provider: Boris Parak,
Miroslav Ruda, Zdenek Sustr
EGI Check-in: Nicolas Liampotis
B2HANDLE: Kyriakos Ginis
B2FIND: Tobias Weigel, Claudia Martens
3. 3
• Several of the use cases in EOSC-hub will enable scientific end-users to
perform data analysis experiments on large volumes of data, by exploiting
a PID-enabled, server-side, and parallel approach.
• Users expect easy to use interfaces like Jupyter Notebooks for interacting
with the system.
• Producing reusable results following FAIR guidelines
- Findability, Accessibility, Interoperability, and Reusability.
What do we want to do?
5. 5
● Integrating multiple services from the EOSC-hub catalogue to build a new
solution is worth the effort
○ Self-service APIs allow you to get nice combination of services without
overhead, still some steps cannot be automated
○ Support channels with providers are life savers while prototyping
● Need to validate the setup for production with a real research community
● Aim at a completely integrated solution that people can reuse
○ Provide python modules for easy interaction with services
○ Expand the EGI Notebooks service
○ Ensure that all required operations can be done using API calls
Lessons Learned
6. 6
Enabling reproducibility with Notebooks
GitHub
Your
repository
EGI Notebooks
services
Zenodo
Your
laptop
Create repository
Upload ipynb file
Add requirements.txt
Execute
Data repository
MyBinder.org
Re-execute
Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
7. 7
An Open Science story we aim for…
GitHub
Your
repository
EGI Notebooks
and Binder service
Zenodo
Your
laptop
Create repository
Upload ipynb file
Add requirements.txt
Execute
Data repository Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
Distributed
big data
DataHub
B2DROP
Etc.
9. eosc-hub.eu @EOSC_eu
Thank you for your
attention!
Questions?
Contact
This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License.
Enol Fernandez - enol.fernandez@egi.eu
Baptiste Grenier - baptiste.grenier@egi.eu
10. 10
1. Authenticating to DataHub using Check-in: https://datahub.egi.eu
a. Showing content of space
2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu
a. Showing content of mounted space
b. Running Wind cast analysis notebook
c. Running PID registration notebook to share and publish notebooks directory
3. B2FIND cataloguing (data collected on a regular basis): http://eudat7-
ingest.dkrz.de/dataset?groups=egidatahub
4. OAI-PMH metadata in DataHub:
5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc
6. PID in Handle.net registry: http://hdl.handle.net/
7. PID pointing to shared data publicly accessible in Onedata
Demonstration flow