The document discusses the development of the SeaDataCloud Virtual Research Environment (VRE) which aims to provide researchers seamless access to all the services needed for their work and collaboration through a web-based workspace. This includes finding, accessing, processing, visualizing data and sharing results. The VRE will integrate various EUDAT services like JupyterHub, B2DROP and B2ACCESS to allow researchers to perform resource-intensive processing and analysis of large marine datasets in the cloud without needing to download and upload data. The project is still in its early stages with the first applications focusing on a use case involving quality control and interpolation of ocean temperature and salinity products.
2. sdn-userdesk@seadatanet.org – www.seadatanet.org
Marine research now…
• Data is getting bigger
• Processing is getting more resource-intensive
• Cumbersome: Download – Process – Upload
• (Potential) use of outdated data & software
• LBNT: Research is getting more cooperative
3. sdn-userdesk@seadatanet.org – www.seadatanet.org
Virtual Research Environment
… a web-based workspace providing seamless access to
all services a researcher needs to do her work and
collaborate with her community.
• Finding data
• (Centralized) access to data
• Processing of the data
• Visualisation of data/results
• Sharing of results with colleagues and/or with a
wider public
8. sdn-userdesk@seadatanet.org – www.seadatanet.org
Use cases
• SeaDataNet, T/S quality control and optimal interpolation
(climatologies), biology statistical control
• EMODNET-Chemistry, same for bio-geo-chemistry
• EMODNET-Bathymetry, DTM processing
• EMODNET-Ingestion, converting files
• Marinet2, Marine Renewable Energies prototype test
analysis
• And much more…
9. sdn-userdesk@seadatanet.org – www.seadatanet.org
Requirements (excerpt)
Processing:
• Functionality that is required to cover the use cases
• Access to JupyterHub incl. data and services
• Visualisation services for results
Data:
• Upload custom input data
• Access to CDI data + reference datasets
• View, store, download, re-use results
• Sharing of outcomes to public/custom/private
• Respect privacy and different data policies! (e.g. restricted data)
Other:
• Different Virtual Labs (data pools, tool compositions, custom entry pages…)
• Single Sign On with Marine-ID
• User permission management
• Communication to facilitate collaborative research (forum, tools, …)
• Performance + capacity (etc.)
12. sdn-userdesk@seadatanet.org – www.seadatanet.org
Processing layer
Workflows
Data layer
GUI
CDI data
Custom data Bob
Reference
datasets
API
Jupyter
Note-
book
Data access
Frontend Frontend
VL1 VL2
Custom data Alice
API
GUI
Bla? Bla!
(+Single Sign On)
Logs, Acc
Bing!
Bing!
14. sdn-userdesk@seadatanet.org – www.seadatanet.org
Quite a challenge!
• Complex requirements
• High ambition
• Team members/developers from very varied
backgrounds
• Very diverse experiences and skills to build on
• Gaining more and more insights into each other‘s
domains and concerns
• Use of existing EUDAT services
+ plus customization and addition of features
15. sdn-userdesk@seadatanet.org – www.seadatanet.org
Processing layer
Data layer
GUI
CDI data
Custom data Bob
Reference
datasets
API
Jupyter
Note-
book
Data access
Frontend Frontend
VL1 VL2
Custom data Alice
API
GUI
(+Single Sign On)
B2DROP
B2DROP
B2ACCESS
B2SAFE
B2HOST
17. sdn-userdesk@seadatanet.org – www.seadatanet.org
Development Status & Outlook
Very early state!
• First start with few applications (first use case)
• Integrate more and more
• Later: Add features (such as communication means,
version update notifications, complex workflows, …)
Technical Kick-Off next week in Delft!
19. sdn-userdesk@seadatanet.org – www.seadatanet.org
Requirements (excerpt)
Processing:
• Functionality that is required to cover the use cases
• Access to JupyterHub incl. data and services
• Visualisation services for results
Data:
• Upload custom input data
• Access to CDI data + reference datasets
• View, store, download, re-use results
• Sharing of outcomes to public/custom/private
• Respect privacy and different data policies! (e.g. restricted data)
Other:
• Different Virtual Labs (data pools, tool compositions, custom entry pages…)
• Single Sign On with Marine-ID
• User permission management
• Communication to facilitate collaborative research (forum, tools, …)
• Performance + capacity (etc.)
Hinweis der Redaktion
Q: What leads to the need of developing a VRE? Motivation? Why / what for?
REMEMBER:
IN research
AMOUNT of data
ÜBERLEITUNG:
As we saw, data moving to the cloud
Processing also moving to the cloud
Would be cool to connect them
- Not just: Data to cloud, processing to cloud, voila. More integrated, more seamless, more „aus einem Guss“
Provide researchers with means to do their processing online
Q: What is a VRE?
Stichworte:
The solution to everything! :-D
Many definitions. Rubber word. Buzzword.
Really…
Provide researchers with means to do their processing online
Various applications run in the cloud
Basically:
Accessing data in the cloud
Allows sharing
Allows chaining of these processes
From standalone apps from one portal, to highly integrated chainable micro-services…
Data and processing are centralized not outdated
Not on local machine
Integrated
From Peter (other VREs):
Mostly the same expectations with respect to community building, data sharing, processing and analysis tools
Authorisation/Authentication layer both in portal layer as well as on top of service layer
API’s for each (processing) service
Communication and (meta)data standards are key to success
Front end applications are various: From self created workflows, to VRE virtual labs, to dedicated user interfaces. But all run on same set of services and data.
We need to distinguish well between typical VRE modules (communication, GUI, etc) and the fundamental architecture
Q: Which use cases should be covered?
Q: What’s the expectations for such a VRE? (On top of the desktop stuff)
Q: What’s the benefits of moving to the cloud?
From Peter:
facilitate collaborative and individual research: Using, handling, analysing and processing ocean and marine data into value-added data products which can be integrated, visualised and published using high level visualisation services.
combine data with subsets from other data resources, such as the ingested collections
Have a high capacity and performance for big data processing and state-of-the-art web visualisation service
Respect privacy of users and differences in data policies. Differentiated users, different access to data and data products.
Be possible to configure virtual work spaces for individuals or groups to work on specific projects, including setting up of dedicated pools of data.
Allow producers to decide whether their outcomes will be shared in the public domain or stay private
Be based and hosted on EUDAT’s infrastructure based on it B2-… service platforms
Example use case
ADD:
Processing Individual Auth
Data Important to have auth/user permissions
These are the services (some GUI, some m2m API/ scripting API
accessible from Jupyter)
Note that we need Auth Auth everywhere, every service is individual.
(3) These are the datasets
(4) Services need uniform data access (formats, accessing, …)
(5) They need to store results someway, and upload own inputs
(6) Bob too
(7) Shared
Finally
(8) They need to be integrated by a common frontend, which handles AAI.
(9) Services still have Auth+Auth, but now Single Sign On.
(10) Virtual Labs
Add to this some communication features…
And some workflow features…
And some logging…
Happy user, happy developer, happy Dick Schaap! ;)
Clear interfaces
Allows to add more services easily!
Q: Which technologies are used?
ADD:
Logs by ELK stack
(details still under discussion/testing)
Q: How do we work? (Arbeitsweise)
Progressive, iterative
Mixed teams SDN, EUDAT, …
Q: How is the development going on? Where are we at? Status? Ausblick
Q: What’s the expectations for such a VRE? (On top of the desktop stuff)
Q: What’s the benefits of moving to the cloud?
From Peter:
facilitate collaborative and individual research: Using, handling, analysing and processing ocean and marine data into value-added data products which can be integrated, visualised and published using high level visualisation services.
combine data with subsets from other data resources, such as the ingested collections
Have a high capacity and performance for big data processing and state-of-the-art web visualisation service
Respect privacy of users and differences in data policies. Differentiated users, different access to data and data products.
Be possible to configure virtual work spaces for individuals or groups to work on specific projects, including setting up of dedicated pools of data.
Allow producers to decide whether their outcomes will be shared in the public domain or stay private
Be based and hosted on EUDAT’s infrastructure based on it B2-… service platforms