SlideShare ist ein Scribd-Unternehmen logo
1 von 17
1
e-Infrastructural needs to support
informatics*
David Wallom
What is e-Infrastructure?
The integration of digitally-based technology, resources, facilities, and
services combined with people and organizational structures needed to
support modern, collaborative research (and teaching).
1.Data and Storage
2.Software (and Algorithms)
3.Hardware (Compute)
4.Networks
5.Security and authentication
6.People (Collaboration, Skills, Capacity)
7.The Digital Library
Bioinformatics software challenges
• This brings a onslaught of new
challenges for bioinformatics:
– projects that used to require teams
of 500 are now accessible to small
teams
– but biology curricula (i.e.
biologists) still lack computational
skills.
– thus biologists are overwhelmed
by large amounts of data
– furthermore data types are young -
so software is young, thus
• software may be badly built (by
biologists with no formal software dev
training/xp).
• software needs to be frequently
updated (bugfixes, algorithmic
improvements (sensitivity/specificity),
new data type support).
changes everything for biology
ARCHER
• UK National Supercomputing
Service
• Replacement for HECToR
• LINPACK = 1.359 Pflop/s
• EPSRC is the managing partner
on behalf of RCUK
• NERC are the other partner
research council
• Cray XC30 Hardware
• Nodes based on 2× Intel Ivy Bridge
12-core processors
• 64GB (or 128GB) memory per
node
• 3008 nodes in total (72162 cores)
• Linked by Cray Aries interconnect
(dragonfly topology)
External Network inside JASMIN
Unmanaged Cloud – IaaS, PaaS, SaaS
JASMIN Internal Network
Panasas
storage
Lotus Batch
Compute
JASMIN Cloud Architecture
Standard Remote Access Protocols –
ftp, http, …
Managed Cloud - PaaS, SaaS
JASMIN
Analysis
Platform
VM
Project1-org
Science
Analysis
VM 0
Science
Analysis
VM 0
Science
Analysis
VM
JASMIN Cloud Management Interfaces
Direct File
System
Access
Direct access to
batch processing
cluster
Appliance
Catalogue
Firewall + NAT
Firewall
optirad-org
Science
Analysis
VM 0
Science
Analysis
VM 0
IPython
Slave VM
File Server
VM
IPython
JupyterHub VM
eos-cloud-org
Science
Analysis
VM 0
Science
Analysis VM
0
EOSCloud VM
File Server
VM
EOSCloud
Fat Node
IPython Notebook VM with access
cluster through IPython.parallel EOSCoud Desktop as a Service
with dynamic RAM boost
Appliance
Catalogue
Appliance
Catalogue
Firewall + NAT Firewall + NAT
Firewall
Thanks to Phil Kershaw
OTHER PUBLIC CLOUD PROVIDERS ARE AVAILABLE
Bio-Linux: A scalable solution
• Comprehensive, free bioinformatics workstation based on Ubuntu
Linux and Debian Med
• 11 years & 8 major releases
• Around 8000 users from 1600 locations
• 200+ bioinf packages including big integrative tools :- QIIME, Galaxy
Server, PredictProtein, EMBOSS, ...Incorporates all software
Dual BootLinux Live Local Servers Cloud
Docker, simplifying the portability
of applications and services
EOS Cloud
• A tenancy in the JASMIN Unmanaged Cloud (& QMUL RCC)
• Reusing JASMIN web interfaces and user management to
provide custom IaaS software platform
• Each receives two VMs
– Bio-Linux
– Ubuntu Docker hosting environment
• Users have total responsibility for instantiated system
• Accessible though standard remote desktop tools
• Scalability limited by support available
Why Cloud?
• Data sets can be too big or restricted to easily move
– move the compute to the data
– Researcher work patterns are maintained
• Tools such as Bio-Linux/Docker etc are community enablers
• More efficient use of shared resources
• Central maintenance of infrastructure
• Central Management of data sharing agreements possible
• Lower barrier to entry (Compared to traditional HPC and
Grid)
• What type of cloud?
• What role for traditional HPC?
TRAINING IS KEY TO MAKING INFORMED CHOICES
EOS Cloud next?
• Expand currently available resource beyond
current limitations?
• Create deployable machine image for other
cloud marketplaces
• EOS/institutional badging to give users
confidence in quality
Pilot Users
• CEH Bioinformaticians using the EOS Cloud to
study patterns in microbial
biodiversity
• Genomic and transcriptomic data from fish
toxicogenomics studies at Exeter
Pilot Users
• Creating compute pipelines and containers for each OSD in silico
analysis
– HPC, Cloud (IaaS & PaaS)
– Portable
• Run same analysis on different laptops/grids/clouds
– Repeatable/Reproducible
• Same input gives same output given that reference databases did not change
– Preservation
• All analysis tools and dependencies are in one image
• Images are simple tar.gz
• Preserving Docker and base images is preserving all analysis
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability
Institute
A national facility for cultivating better, more
sustainable, research software to enable world-
class research
• Software reaches boundaries in its
development cycle that prevent
improvement, growth and adoption
• Providing the expertise and services
needed to negotiate to the next stage
• Developing the policy and tools to
support the community developing and
using research software
Supported by RCUK
Communication
Website & blog
Campaigns
Advice
Guides
Courses
Workshops
Fellowship
Research
Software
Policy
Training
Community
Consultancy
41 projects
92 evaluations
4 surgeries
33 UK SWC
workshops
1000+ learners
50,000 readers
41 domain
ambassadors
20+ workshops organised
740 researchers
50,000 grants
analysed
150+ contributed articles
19,000 unique visitors per month
272 RSEs engaged 1700 signatures 13 issues highlighted
The end of the beginning, not the beginning of the end!
• A holistic approach is required with all parts of e-infrastructure supported from the Hard to
Soft to Wet!
• Good start up investments need continuity to ensure impact
– Certain tools are foundations upon which large swathes of community depend
– Putting tools next to immovable data ensures value!
• Integrating with larger activities ensure benefits of scaling
– you can’t steer something you’re not involved with…
• Abstract underpinning e-infrastructure services from the users, as they’re not interested!
– Run something on one resource should be able to be moved to others throug hthe use of
standards etc!
– I have ignored the institutional resources…
****WARNING****
Institute for Environmental Analytics Summer School on e-infrastructure for the environment
19th – 22nd Sept ’16, Oxford.

Weitere ähnliche Inhalte

Ähnlich wie e-infrastructural needs to support informatics

Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDavid Wallom
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Ola Spjuth
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDavid Wallom
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Cloud and Desktop aaS for Teaching
Cloud and Desktop aaS for TeachingCloud and Desktop aaS for Teaching
Cloud and Desktop aaS for TeachingDavid Wallom
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobDavid Wallom
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Daniel S. Katz
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceOla Spjuth
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub project
 
Software Ecosystems = Big Data
Software Ecosystems = Big DataSoftware Ecosystems = Big Data
Software Ecosystems = Big DataTom Mens
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudCitrix
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformYandex
 

Ähnlich wie e-infrastructural needs to support informatics (20)

Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'Omics
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Cloud and Desktop aaS for Teaching
Cloud and Desktop aaS for TeachingCloud and Desktop aaS for Teaching
Cloud and Desktop aaS for Teaching
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right job
 
Cyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life ScienceCyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life Science
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolio
 
EGI Services
EGI Services EGI Services
EGI Services
 
Software Ecosystems = Big Data
Software Ecosystems = Big DataSoftware Ecosystems = Big Data
Software Ecosystems = Big Data
 
Climb bath
Climb bathClimb bath
Climb bath
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
 
An Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack PlatformAn Introduction to Red Hat Enterprise Linux OpenStack Platform
An Introduction to Red Hat Enterprise Linux OpenStack Platform
 

Mehr von David Wallom

Quantifying the impact of green leasing on energy use in a retail portfolio: ...
Quantifying the impact of green leasing on energy use in a retail portfolio: ...Quantifying the impact of green leasing on energy use in a retail portfolio: ...
Quantifying the impact of green leasing on energy use in a retail portfolio: ...David Wallom
 
Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...David Wallom
 
Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...David Wallom
 
The University of Oxford e-Research Centre
The University of Oxford e-Research CentreThe University of Oxford e-Research Centre
The University of Oxford e-Research CentreDavid Wallom
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingDavid Wallom
 
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyond
Benefits of big data analytics in Smart Metering,  ADEPT, WICKED and beyondBenefits of big data analytics in Smart Metering,  ADEPT, WICKED and beyond
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyondDavid Wallom
 
Smarter Energy, Infrastruture service, consumtion analytics and applications
Smarter Energy, Infrastruture service, consumtion analytics and applicationsSmarter Energy, Infrastruture service, consumtion analytics and applications
Smarter Energy, Infrastruture service, consumtion analytics and applicationsDavid Wallom
 
The Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingThe Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingDavid Wallom
 
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...David Wallom
 
e-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestatione-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to DeforestationDavid Wallom
 
Privacy and Security policies in the cloud
Privacy and Security policies in the cloudPrivacy and Security policies in the cloud
Privacy and Security policies in the cloudDavid Wallom
 
Working with Earth Observation Data, INFORM and the IEA
Working with Earth Observation Data, INFORM and the IEAWorking with Earth Observation Data, INFORM and the IEA
Working with Earth Observation Data, INFORM and the IEADavid Wallom
 
WICKED - Working with the data rich
WICKED - Working with the data richWICKED - Working with the data rich
WICKED - Working with the data richDavid Wallom
 
Mapping Priorities and Future Collaborations for you Projects
Mapping Priorities and Future Collaborations for you ProjectsMapping Priorities and Future Collaborations for you Projects
Mapping Priorities and Future Collaborations for you ProjectsDavid Wallom
 
CloudWatch: Mapping priorities and future collaboration for your project
CloudWatch: Mapping priorities and future collaboration for your projectCloudWatch: Mapping priorities and future collaboration for your project
CloudWatch: Mapping priorities and future collaboration for your projectDavid Wallom
 
Trust and Cloud Computing, removing the need to trust your cloud provider
Trust and Cloud Computing, removing the need to trust your cloud providerTrust and Cloud Computing, removing the need to trust your cloud provider
Trust and Cloud Computing, removing the need to trust your cloud providerDavid Wallom
 
Generating Insight from Big Data
Generating Insight from Big DataGenerating Insight from Big Data
Generating Insight from Big DataDavid Wallom
 
International Forest Risk Model
International Forest Risk ModelInternational Forest Risk Model
International Forest Risk ModelDavid Wallom
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentDavid Wallom
 
Smart Grid, Smart Metering and Cybersecurity
Smart Grid, Smart Metering and CybersecuritySmart Grid, Smart Metering and Cybersecurity
Smart Grid, Smart Metering and CybersecurityDavid Wallom
 

Mehr von David Wallom (20)

Quantifying the impact of green leasing on energy use in a retail portfolio: ...
Quantifying the impact of green leasing on energy use in a retail portfolio: ...Quantifying the impact of green leasing on energy use in a retail portfolio: ...
Quantifying the impact of green leasing on energy use in a retail portfolio: ...
 
Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...
 
Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...Trust and Cloud computing, removing the need for the consumer to trust their ...
Trust and Cloud computing, removing the need for the consumer to trust their ...
 
The University of Oxford e-Research Centre
The University of Oxford e-Research CentreThe University of Oxford e-Research Centre
The University of Oxford e-Research Centre
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyond
Benefits of big data analytics in Smart Metering,  ADEPT, WICKED and beyondBenefits of big data analytics in Smart Metering,  ADEPT, WICKED and beyond
Benefits of big data analytics in Smart Metering, ADEPT, WICKED and beyond
 
Smarter Energy, Infrastruture service, consumtion analytics and applications
Smarter Energy, Infrastruture service, consumtion analytics and applicationsSmarter Energy, Infrastruture service, consumtion analytics and applications
Smarter Energy, Infrastruture service, consumtion analytics and applications
 
The Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modellingThe Climateprediction.net programme, big data climate modelling
The Climateprediction.net programme, big data climate modelling
 
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
1990-2050 sulphur dioxide emissions data from ECLIPSE v5a for use in Met Offi...
 
e-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestatione-Research & the art of linking Astrophysics to Deforestation
e-Research & the art of linking Astrophysics to Deforestation
 
Privacy and Security policies in the cloud
Privacy and Security policies in the cloudPrivacy and Security policies in the cloud
Privacy and Security policies in the cloud
 
Working with Earth Observation Data, INFORM and the IEA
Working with Earth Observation Data, INFORM and the IEAWorking with Earth Observation Data, INFORM and the IEA
Working with Earth Observation Data, INFORM and the IEA
 
WICKED - Working with the data rich
WICKED - Working with the data richWICKED - Working with the data rich
WICKED - Working with the data rich
 
Mapping Priorities and Future Collaborations for you Projects
Mapping Priorities and Future Collaborations for you ProjectsMapping Priorities and Future Collaborations for you Projects
Mapping Priorities and Future Collaborations for you Projects
 
CloudWatch: Mapping priorities and future collaboration for your project
CloudWatch: Mapping priorities and future collaboration for your projectCloudWatch: Mapping priorities and future collaboration for your project
CloudWatch: Mapping priorities and future collaboration for your project
 
Trust and Cloud Computing, removing the need to trust your cloud provider
Trust and Cloud Computing, removing the need to trust your cloud providerTrust and Cloud Computing, removing the need to trust your cloud provider
Trust and Cloud Computing, removing the need to trust your cloud provider
 
Generating Insight from Big Data
Generating Insight from Big DataGenerating Insight from Big Data
Generating Insight from Big Data
 
International Forest Risk Model
International Forest Risk ModelInternational Forest Risk Model
International Forest Risk Model
 
Generating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the EnvironmentGenerating Insight from Big Data in Energy and the Environment
Generating Insight from Big Data in Energy and the Environment
 
Smart Grid, Smart Metering and Cybersecurity
Smart Grid, Smart Metering and CybersecuritySmart Grid, Smart Metering and Cybersecurity
Smart Grid, Smart Metering and Cybersecurity
 

Kürzlich hochgeladen

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Kürzlich hochgeladen (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

e-infrastructural needs to support informatics

  • 1. 1 e-Infrastructural needs to support informatics* David Wallom
  • 2. What is e-Infrastructure? The integration of digitally-based technology, resources, facilities, and services combined with people and organizational structures needed to support modern, collaborative research (and teaching). 1.Data and Storage 2.Software (and Algorithms) 3.Hardware (Compute) 4.Networks 5.Security and authentication 6.People (Collaboration, Skills, Capacity) 7.The Digital Library
  • 3. Bioinformatics software challenges • This brings a onslaught of new challenges for bioinformatics: – projects that used to require teams of 500 are now accessible to small teams – but biology curricula (i.e. biologists) still lack computational skills. – thus biologists are overwhelmed by large amounts of data – furthermore data types are young - so software is young, thus • software may be badly built (by biologists with no formal software dev training/xp). • software needs to be frequently updated (bugfixes, algorithmic improvements (sensitivity/specificity), new data type support). changes everything for biology
  • 4. ARCHER • UK National Supercomputing Service • Replacement for HECToR • LINPACK = 1.359 Pflop/s • EPSRC is the managing partner on behalf of RCUK • NERC are the other partner research council • Cray XC30 Hardware • Nodes based on 2× Intel Ivy Bridge 12-core processors • 64GB (or 128GB) memory per node • 3008 nodes in total (72162 cores) • Linked by Cray Aries interconnect (dragonfly topology)
  • 5. External Network inside JASMIN Unmanaged Cloud – IaaS, PaaS, SaaS JASMIN Internal Network Panasas storage Lotus Batch Compute JASMIN Cloud Architecture Standard Remote Access Protocols – ftp, http, … Managed Cloud - PaaS, SaaS JASMIN Analysis Platform VM Project1-org Science Analysis VM 0 Science Analysis VM 0 Science Analysis VM JASMIN Cloud Management Interfaces Direct File System Access Direct access to batch processing cluster Appliance Catalogue Firewall + NAT Firewall optirad-org Science Analysis VM 0 Science Analysis VM 0 IPython Slave VM File Server VM IPython JupyterHub VM eos-cloud-org Science Analysis VM 0 Science Analysis VM 0 EOSCloud VM File Server VM EOSCloud Fat Node IPython Notebook VM with access cluster through IPython.parallel EOSCoud Desktop as a Service with dynamic RAM boost Appliance Catalogue Appliance Catalogue Firewall + NAT Firewall + NAT Firewall Thanks to Phil Kershaw
  • 6.
  • 7. OTHER PUBLIC CLOUD PROVIDERS ARE AVAILABLE
  • 8. Bio-Linux: A scalable solution • Comprehensive, free bioinformatics workstation based on Ubuntu Linux and Debian Med • 11 years & 8 major releases • Around 8000 users from 1600 locations • 200+ bioinf packages including big integrative tools :- QIIME, Galaxy Server, PredictProtein, EMBOSS, ...Incorporates all software Dual BootLinux Live Local Servers Cloud
  • 9. Docker, simplifying the portability of applications and services
  • 10. EOS Cloud • A tenancy in the JASMIN Unmanaged Cloud (& QMUL RCC) • Reusing JASMIN web interfaces and user management to provide custom IaaS software platform • Each receives two VMs – Bio-Linux – Ubuntu Docker hosting environment • Users have total responsibility for instantiated system • Accessible though standard remote desktop tools • Scalability limited by support available
  • 11. Why Cloud? • Data sets can be too big or restricted to easily move – move the compute to the data – Researcher work patterns are maintained • Tools such as Bio-Linux/Docker etc are community enablers • More efficient use of shared resources • Central maintenance of infrastructure • Central Management of data sharing agreements possible • Lower barrier to entry (Compared to traditional HPC and Grid) • What type of cloud? • What role for traditional HPC? TRAINING IS KEY TO MAKING INFORMED CHOICES
  • 12. EOS Cloud next? • Expand currently available resource beyond current limitations? • Create deployable machine image for other cloud marketplaces • EOS/institutional badging to give users confidence in quality
  • 13. Pilot Users • CEH Bioinformaticians using the EOS Cloud to study patterns in microbial biodiversity • Genomic and transcriptomic data from fish toxicogenomics studies at Exeter
  • 14. Pilot Users • Creating compute pipelines and containers for each OSD in silico analysis – HPC, Cloud (IaaS & PaaS) – Portable • Run same analysis on different laptops/grids/clouds – Repeatable/Reproducible • Same input gives same output given that reference databases did not change – Preservation • All analysis tools and dependencies are in one image • Images are simple tar.gz • Preserving Docker and base images is preserving all analysis
  • 15. Software Sustainability Institute www.software.ac.uk The Software Sustainability Institute A national facility for cultivating better, more sustainable, research software to enable world- class research • Software reaches boundaries in its development cycle that prevent improvement, growth and adoption • Providing the expertise and services needed to negotiate to the next stage • Developing the policy and tools to support the community developing and using research software Supported by RCUK
  • 16. Communication Website & blog Campaigns Advice Guides Courses Workshops Fellowship Research Software Policy Training Community Consultancy 41 projects 92 evaluations 4 surgeries 33 UK SWC workshops 1000+ learners 50,000 readers 41 domain ambassadors 20+ workshops organised 740 researchers 50,000 grants analysed 150+ contributed articles 19,000 unique visitors per month 272 RSEs engaged 1700 signatures 13 issues highlighted
  • 17. The end of the beginning, not the beginning of the end! • A holistic approach is required with all parts of e-infrastructure supported from the Hard to Soft to Wet! • Good start up investments need continuity to ensure impact – Certain tools are foundations upon which large swathes of community depend – Putting tools next to immovable data ensures value! • Integrating with larger activities ensure benefits of scaling – you can’t steer something you’re not involved with… • Abstract underpinning e-infrastructure services from the users, as they’re not interested! – Run something on one resource should be able to be moved to others throug hthe use of standards etc! – I have ignored the institutional resources… ****WARNING**** Institute for Environmental Analytics Summer School on e-infrastructure for the environment 19th – 22nd Sept ’16, Oxford.

Hinweis der Redaktion

  1. Hosted through the Centre for Environmental Data Analysis(CEDA) at STFC the JASMIN service[13] is intended to be a multi user and multi use-case facility that is able to support a wide range of user communities. The system is built in a modular fashion with a hardware layer that is deliberately designed to support multiple different distributed computing paradigms, from HPC type workloads to cloud hosting. Utilising commercial off-the-shelf private cloud software, which supports API level access, the cloud service operated in two modes: firstly a highly controlled Platform as a Service system in the managed cloud, and secondly a more flexible community or user controlled Infrastructure as a Service system, the unmanaged cloud. Users may opt to run in the managed mode, where VMs are standardised to JASMIN approved templates and configured by Puppet[14] scripts that apply centralised access controls, or in the unmanaged mode which we use for the EOS Cloud. Here each user can have full root access rights on their VM once it is deployed and handed over to them, and we as clients of JASMIN can make use of the full vCloud Director API[15] to assign resources to VMs according to our own policy. A further advantage of working on JASMIN is that, as a community cloud, the service already hosts, and synchronises data from relevant providers. These include large reference datasets which our target user community finds useful. Users can rely on this resource and only need to concern themselves with the transfer of their own data into and out of the cloud.
  2. Bio-Linux[1] is a major success with widespread adoption across multiple communities. It provides a method by which personal bioinformatics workstations can be set up and maintained by novice users, removing the need for individuals to compile and install packages themselves. The project has an active user and developer community and engages with other open-source community efforts like Debian Med[2], Galaxy[3], Open Bioinformatics Foundation[4], etc. Workstations running Bio-Linux connect to a package repository where hundreds of tools have been built into Debian (.deb) packages and are tested and continually updated by the community of maintainers. Installation or removal of even complex tools is trivial for the end user, and updates are automatic. Bio-Linux has traditionally been installed onto bare-metal, possibly on a dual-boot workstation, or run directly from a USB stick for training and demo purposes. In recent times, use on local Virtual Machines(VM) and Cloud systems has increased, with a community CloudBioLinux[5] effort building and releasing machine images on the Amazon EC2[6] public cloud. Also, partly through this current project, support for local VM installations (on VirtualBox[7], Parallels[8], VMWare[9]) has been made a core part of each Bio-Linux release. Users can obtain an Open Virtualization Archive (OVA)[10] file which loads directly into these products to instantly provide a functional desktop system. It is this same pre-packaged virtualised workstation that users now access on the EOS Cloud.
  3. Docker[11] allows a uniform platform-independent application to be built and encapsulated in a format know as a container. As the research community, and people in general, make use of a large number of different operating systems the complexity of installing applications on new systems and can be high. Sometimes there may be incompatibilities between the libraries and helper tools needed by an application and those on the target platform. The Docker approach aims to get round this by providing single application hosting platform irrespective of hosting system/operating system etc. We have recently seen within the bioinformatics community an increase in sharing of applications and tools in this manner. One problem faced when using the Docker hosting environment is that the applications that are installed and running within it look and feel like remotely hosted applications, they do not behave as if they have been installed on the users machine. Therefore a tool that bridges this usability gap would greatly increase the acceptability of tools and services that are installed and distributed using Docker.
  4. Hyun Soon Gweon and Katja Lehmann, two bioinformaticians as CEH, are familiar with using Bio-Linux on the departmental server and personal VirtualBox VMs. They are trialling the system as part of their regular work and to provide a shared computing environment with project collaborators in microbial community profiling. Ranny van Aerle at University of Exeter is unable to run a Bio-Linux workstation at his home institute due to local IT policies and budget constraints. He needs a basic workstation plus occasionally some extra computing power. While he is not at present being charged for use of the pilot system, he represents exactly the type of user for whom a paid subscription to the service would make economic sense. Since being given access to a Bio-Linux instance on EOS Cloud in October 2014 he has used it to assemble bacterial genomes, annotate large metagenomics datasets using Blast and Diamond, visualise Blast/Diamond results in MEGAN and generally to try out new software and scripts.
  5. The Software Sustainability Institute can help with: software reviews and refactoring, collaborations to develop your project, guidance and best practice on software development, project management, community building, publicity and more… Drawing on pool of specialists to drive the continued improvement and impact of research software developed by and for researchers Providing services for research software users and developers Developing research community interactions and capacity Promoting research software best practice and capability
  6. KEY POINT: Get across what the Institute does This is a conceptual breakdown to make the Institute’s activities easier to manage. In reality the Institute staff work across themes, contributing to many activities, as software sustainability cannot be addressed just through individual activities working alone. Specific activities: Software Carpentry courses teach researchers key computational skills Helping to spin up ELIXIR-UK training activities DIRAC Drivers License to get best use out of HPC resources Advising Wellcome Trust on software policy Guides on wide range of subjects We understand the linkage between all of these requirements, issues and activities because we’ve investigated the research communities reliance on software in depth.