SlideShare ist ein Scribd-Unternehmen logo
1 von 19
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu
Data Processing and
Analysis
EUDAT WP5 Service Building
Tom Kirkham
STFC
DATA PROCESSING AND ANALYSIS
- GEF
- Big Data Tools
- B2NOTE
- Data Distribution
Transfer large data collections from EUDAT
storage facilities to external HPC facilities for
processing
In conjunction with B2SAFE, replicate
community data sets, ingesting them onto
EUDAT storage resources for long-term
preservation
Ingest computation results into the EUDAT
infrastructure
B2STAGE provides API services to manage data
transfers between:
B2SAFE , B2HANDLE and B2ACCESS
The service allows users to: eudat.eu/b2stage
3EUDAT 6M EC Review, 28th October 2015, Brussels
RVIEW
• Access layer to the B2SAFE & B2FIND
services, to allow users to store, preserve and
find data
• Enables upload and Download Data transfers of
data objects to create collections
EUDAT 6M EC Review, 28th October 2015, Brussels
KGROUND
RS
FTP or
HTTP-API
FTP or
HTTP-API
GRESS
Achievements
- Integration between B2Handle, B2Access and
B2Safe
- Enablement of data movement into CDI
- HTTP API as a method for common access
●- Developed and released
Integration with Data Discovery Service and
standards support such as PID
●- Integration from community repositories with
B2SAFE via the HTTP API, the work done by
Charles University
●- Proof-of-concept of the HTTP API on plain
filesystems, for workspaces.
Future Status
- Development continues
- Application into specific tools and filestores
THE GENERIC EXECUTION
FRAMEWORK
(GEF)
Goal: Enable execution of containerised
software within CDI
Thus reducing data transfer and increasing
customisation for user communities.
Technology objectives
- Utilise EUDAT services B2Share, B2Drop
such as B2Safe (planned)
- Support a GEF rules engine (i.e. Drools)
- Integrate services into CDI from user
communities
GEF services/Docker containers
GEF services are Docker images that are specifically
annotated in order to allow handling by the GEF.
GEF service instances are Docker containers that are
spun up for execution close to the data.
User communities are solely responsible for the contents
of their images. During the pilot phase, communities will
receive support for creating their own images. But in the
long run, scientists will have to become proficient at it.
The GEF relies on so-called GEF services that are
customized by the user to perform the required tasks:
A GEF INSTANCE
The container/GEF service invocations on the hosts are
controlled by a Docker Machine integrated with a GEF
instance.
THE GENERIC EXECUTION
FRAMEWORK
(GEF)
Achievements
- Generic Execution Service (GEF) first
release in September.
- Integrated services from Earth Science Grid
- Federation (ESGF) and European Grid
Infrastructure (EGI) e-infrastructures
Future Work
- Integration into other communities such as
IS-ENES Climate4Impact platform
• Creation RDF triples
• Harvests information from ontology repositories
• Supports semi-automatic annotation using text mining
• Supports manual data annotation
• Easy to use user interface
• Write data on the triple store
• Integrates with the different EUDAT B2 services
11EUDAT 6M EC Review, 28th October 2015, Brussels
FEATURES
Achievements
B2Note module create to support creation of
annotations
Standards based and integrated with B2Share
B2Access integration enables users federated
access to resources
Software released in January and over 100
active users
Future Work
Integration into communities such as OpenAire
Future development in EOSC project
Easy integration into community services and
within OpenAIRE and EOSC-hub services
BIG DATA ANALYSIS
Goal: To open up data deposited in EUDAT CDI to
‘Big Data’ processing
Objectives:
Integrate ‘Big Data’ stack into CDI
To handle data from EUDAT components
Enable ‘Big Data analysis in user communities
BIG DATA ANALYSIS
BIG DATA ANALYSIS
Achievements
Apache Spark and Hadoop enabled in EUDAT
Data subscription service created to link analysis
results with user communities
Integrated within EUROARGO use case
Future Work
Further development and integration of data
subscription service into other projects such as
EOSC
DATA DISTRIBUTION SERVICE
Data Distribution in terms of discovery, transfer and
integration has been a core focus in this cluster
Federated integration of data
Data annotation layer aiding discovery
Integration with services via common API
Event based subscription of data
Beyond EUDAT this technology is reaching out into other
projects
Raising the possibility of a wider view on Data
Distribution as a Service.
SOME INITIAL THOUGHTS …
SUMMARY
Software released:
B2STAGE HTTP API
B2NOTE
Generic Execution Framework
Data Subscription Service
Community use to go beyond project
Projects actively working on software beyond
project i.e. EOSC-hub, SeaDataCloud etc
Questions
EUDAT Final Review, 21st May 2015, Brussels

Weitere ähnliche Inhalte

Was ist angesagt?

EGI-EUDAT interoperability| www.eudat.eu |
EGI-EUDAT interoperability| www.eudat.eu | EGI-EUDAT interoperability| www.eudat.eu |
EGI-EUDAT interoperability| www.eudat.eu |
EUDAT
 
Inspire4 communities, communities4inspire final
Inspire4 communities, communities4inspire finalInspire4 communities, communities4inspire final
Inspire4 communities, communities4inspire final
Karel Charvat
 

Was ist angesagt? (20)

D3.4.1 Data fusion tools
D3.4.1 Data fusion toolsD3.4.1 Data fusion tools
D3.4.1 Data fusion tools
 
Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...
 
EGI-EUDAT interoperability| www.eudat.eu |
EGI-EUDAT interoperability| www.eudat.eu | EGI-EUDAT interoperability| www.eudat.eu |
EGI-EUDAT interoperability| www.eudat.eu |
 
Sensors - The Sparkplug in the Engine of the Internet of Things
Sensors - The Sparkplug in the Engine of the Internet of ThingsSensors - The Sparkplug in the Engine of the Internet of Things
Sensors - The Sparkplug in the Engine of the Internet of Things
 
The XDC project
The XDC projectThe XDC project
The XDC project
 
Helix Nebula - The Science Cloud - Lessons learned
Helix Nebula - The Science Cloud - Lessons learned Helix Nebula - The Science Cloud - Lessons learned
Helix Nebula - The Science Cloud - Lessons learned
 
Inspire4 communities, communities4inspire final
Inspire4 communities, communities4inspire finalInspire4 communities, communities4inspire final
Inspire4 communities, communities4inspire final
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
Open Source Grid Middleware Packages
Open Source Grid Middleware  PackagesOpen Source Grid Middleware  Packages
Open Source Grid Middleware Packages
 
Configuring and Visualizing The Data Resources in a Cloud-based Data Collect...
Configuring and Visualizing The Data Resources  in a Cloud-based Data Collect...Configuring and Visualizing The Data Resources  in a Cloud-based Data Collect...
Configuring and Visualizing The Data Resources in a Cloud-based Data Collect...
 
Development of a Mobile Application for the C2NET Supply Chain Cloud–based P...
Development of a Mobile Application for the  C2NET Supply Chain Cloud–based P...Development of a Mobile Application for the  C2NET Supply Chain Cloud–based P...
Development of a Mobile Application for the C2NET Supply Chain Cloud–based P...
 
2nd ARCADIA project newsletter
2nd ARCADIA project newsletter2nd ARCADIA project newsletter
2nd ARCADIA project newsletter
 
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
 
Free and Open Source Software for Regional Spatial Data Infrastructures
Free and Open Source Software for Regional Spatial Data InfrastructuresFree and Open Source Software for Regional Spatial Data Infrastructures
Free and Open Source Software for Regional Spatial Data Infrastructures
 
The RECAP Project: Large Scale Simulation Framework
The RECAP Project: Large Scale Simulation FrameworkThe RECAP Project: Large Scale Simulation Framework
The RECAP Project: Large Scale Simulation Framework
 
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meetingHNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
 
RECAP at the YERUN Launch Event
RECAP at the YERUN Launch EventRECAP at the YERUN Launch Event
RECAP at the YERUN Launch Event
 
Towards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoTTowards a Resource Slice Interoperability Hub for IoT
Towards a Resource Slice Interoperability Hub for IoT
 
SFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portalSFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portal
 
The Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and NeedsThe Science Cloud Users: Challenges and Needs
The Science Cloud Users: Challenges and Needs
 

Ähnlich wie Data Processing and Analysis

EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT
 
User Documentation and Training Material
User Documentation and Training MaterialUser Documentation and Training Material
User Documentation and Training Material
EUDAT
 

Ähnlich wie Data Processing and Analysis (20)

Data Preservation Service Area
Data Preservation Service AreaData Preservation Service Area
Data Preservation Service Area
 
EUDAT Generic Execution Framework
EUDAT Generic Execution FrameworkEUDAT Generic Execution Framework
EUDAT Generic Execution Framework
 
EUDAT CDI Architecture
EUDAT CDI ArchitectureEUDAT CDI Architecture
EUDAT CDI Architecture
 
EUDAT B2SAFE & EOSC-hub
EUDAT B2SAFE & EOSC-hubEUDAT B2SAFE & EOSC-hub
EUDAT B2SAFE & EOSC-hub
 
The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project The EOSC Compute Platform with the EGI-ACE project
The EOSC Compute Platform with the EGI-ACE project
 
NextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIsNextGEOSS Webinar - Cloud APIs
NextGEOSS Webinar - Cloud APIs
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
EUDAT Services Update
EUDAT Services UpdateEUDAT Services Update
EUDAT Services Update
 
CPaaS.io - FIWARE-based Toolbox
CPaaS.io - FIWARE-based ToolboxCPaaS.io - FIWARE-based Toolbox
CPaaS.io - FIWARE-based Toolbox
 
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service AreaEUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
EUDAT Collaborative Data Infrastructure: Data Access and Re-use Service Area
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu | B2STAGE- how to shift large amounts of data| www.eudat.eu |
B2STAGE- how to shift large amounts of data| www.eudat.eu |
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
EUDAT
EUDATEUDAT
EUDAT
 
Access Control in ESDIN: Shibboleth
Access Control in ESDIN: ShibbolethAccess Control in ESDIN: Shibboleth
Access Control in ESDIN: Shibboleth
 
User Documentation and Training Material
User Documentation and Training MaterialUser Documentation and Training Material
User Documentation and Training Material
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Coupling HPC and Data Resources and services together - EUDAT Workshop at exd...
Coupling HPC and Data Resources and services together - EUDAT Workshop at exd...Coupling HPC and Data Resources and services together - EUDAT Workshop at exd...
Coupling HPC and Data Resources and services together - EUDAT Workshop at exd...
 
Cross e-Infrastructure collaborations
Cross e-Infrastructure collaborationsCross e-Infrastructure collaborations
Cross e-Infrastructure collaborations
 
Persistent Identifiers in EUDAT services| www.eudat.eu |
Persistent Identifiers in EUDAT services| www.eudat.eu | Persistent Identifiers in EUDAT services| www.eudat.eu |
Persistent Identifiers in EUDAT services| www.eudat.eu |
 

Mehr von EUDAT

Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
EUDAT
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
EUDAT
 

Mehr von EUDAT (20)

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdf
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdf
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdf
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdf
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdf
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdf
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdf
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT services
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentation
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto Pilot
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last week
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshop
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materials
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSC
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for Researchers
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Data Processing and Analysis

  • 1. EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu Data Processing and Analysis EUDAT WP5 Service Building Tom Kirkham STFC
  • 2. DATA PROCESSING AND ANALYSIS - GEF - Big Data Tools - B2NOTE - Data Distribution
  • 3. Transfer large data collections from EUDAT storage facilities to external HPC facilities for processing In conjunction with B2SAFE, replicate community data sets, ingesting them onto EUDAT storage resources for long-term preservation Ingest computation results into the EUDAT infrastructure B2STAGE provides API services to manage data transfers between: B2SAFE , B2HANDLE and B2ACCESS The service allows users to: eudat.eu/b2stage 3EUDAT 6M EC Review, 28th October 2015, Brussels RVIEW
  • 4. • Access layer to the B2SAFE & B2FIND services, to allow users to store, preserve and find data • Enables upload and Download Data transfers of data objects to create collections EUDAT 6M EC Review, 28th October 2015, Brussels KGROUND
  • 6. GRESS Achievements - Integration between B2Handle, B2Access and B2Safe - Enablement of data movement into CDI - HTTP API as a method for common access ●- Developed and released Integration with Data Discovery Service and standards support such as PID ●- Integration from community repositories with B2SAFE via the HTTP API, the work done by Charles University ●- Proof-of-concept of the HTTP API on plain filesystems, for workspaces. Future Status - Development continues - Application into specific tools and filestores
  • 7. THE GENERIC EXECUTION FRAMEWORK (GEF) Goal: Enable execution of containerised software within CDI Thus reducing data transfer and increasing customisation for user communities. Technology objectives - Utilise EUDAT services B2Share, B2Drop such as B2Safe (planned) - Support a GEF rules engine (i.e. Drools) - Integrate services into CDI from user communities
  • 8. GEF services/Docker containers GEF services are Docker images that are specifically annotated in order to allow handling by the GEF. GEF service instances are Docker containers that are spun up for execution close to the data. User communities are solely responsible for the contents of their images. During the pilot phase, communities will receive support for creating their own images. But in the long run, scientists will have to become proficient at it. The GEF relies on so-called GEF services that are customized by the user to perform the required tasks:
  • 9. A GEF INSTANCE The container/GEF service invocations on the hosts are controlled by a Docker Machine integrated with a GEF instance.
  • 10. THE GENERIC EXECUTION FRAMEWORK (GEF) Achievements - Generic Execution Service (GEF) first release in September. - Integrated services from Earth Science Grid - Federation (ESGF) and European Grid Infrastructure (EGI) e-infrastructures Future Work - Integration into other communities such as IS-ENES Climate4Impact platform
  • 11. • Creation RDF triples • Harvests information from ontology repositories • Supports semi-automatic annotation using text mining • Supports manual data annotation • Easy to use user interface • Write data on the triple store • Integrates with the different EUDAT B2 services 11EUDAT 6M EC Review, 28th October 2015, Brussels FEATURES
  • 12. Achievements B2Note module create to support creation of annotations Standards based and integrated with B2Share B2Access integration enables users federated access to resources Software released in January and over 100 active users Future Work Integration into communities such as OpenAire Future development in EOSC project Easy integration into community services and within OpenAIRE and EOSC-hub services
  • 13. BIG DATA ANALYSIS Goal: To open up data deposited in EUDAT CDI to ‘Big Data’ processing Objectives: Integrate ‘Big Data’ stack into CDI To handle data from EUDAT components Enable ‘Big Data analysis in user communities
  • 15. BIG DATA ANALYSIS Achievements Apache Spark and Hadoop enabled in EUDAT Data subscription service created to link analysis results with user communities Integrated within EUROARGO use case Future Work Further development and integration of data subscription service into other projects such as EOSC
  • 16. DATA DISTRIBUTION SERVICE Data Distribution in terms of discovery, transfer and integration has been a core focus in this cluster Federated integration of data Data annotation layer aiding discovery Integration with services via common API Event based subscription of data Beyond EUDAT this technology is reaching out into other projects Raising the possibility of a wider view on Data Distribution as a Service.
  • 18. SUMMARY Software released: B2STAGE HTTP API B2NOTE Generic Execution Framework Data Subscription Service Community use to go beyond project Projects actively working on software beyond project i.e. EOSC-hub, SeaDataCloud etc
  • 19. Questions EUDAT Final Review, 21st May 2015, Brussels