The case for cloud computing in Life Sciences

The case for cloud computing in the life
sciences
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences
and Science for Life Laboratory
Uppsala University

About me
• Ola Spjuth, Docent
• Associate Professor at Uppsala University
– Data-intensive and translational bioinformatics (http://pharmb.io)
• Head of Bioinformatics Compute and Storage facility at SciLifeLab
– Responsible for managing resources
– Strategic e-infra planning and procurement for SciLifeLab
• Deputy Director at SNIC-UPPMAX HPC center
• Guest Researcher at Karolinska Institutet
– e-Science for Cancer Prevention and Control (eCPC), flagship
project at SeRC
2

From conventional microscopes…
..to digital video-microscopes
and image analysis
Molecular biology is a field in transition…

From manual operations…
…to automated robotized laboratories

Today: We have access to high-throughput
technologies to study biological phenomena

Science for Life Laboratory
An internationally leading center
that develops and applies
large-scale technologies for
molecular biosciences with a
focus on health and
environment.
Became a national platform in 2013
Stockholm node
Uppsala node

2017: Human whole genome sequenced
in 3 days for ~$1100
…requires supercomputers
for analysis and storage
Massively parallel sequencing….
2017: Illumina HiSeq X systems. 15K whole human
genomes per year
2016: NGI data velocity 950 Mbp/hour = 16 Mbp/s

Analysis
Scientists
Sample
transfer
Mode of operation
Platforms
Pre-processing (NGI)
Research (SNIC)
Data
delivery

Software +
reference data
Support
Education
Compute resources
Storage resources
Efficiency +
automation
UPPMAX: A national e-infrastructure

Some statistics Storage usage
Projects at SNIC-UPPMAX
Data-intensive bioinformatics
Other disciplines
Support tickets

New challenges: Data management and
analysis
• Storage
• Analysis methods, pipelines
• Scaling
• Automation
• Data integration, security
• Predictions
• …

Why cloud in the life sciences?
• Access to resources
– Flexible configurations
– On-demand
– Cost-efficient?
• Collaborate on international level
– Publish/federate data
– E.g. Large sequencing initiatives, “move compute to the
data”
• New types of analysis environments
– Hadoop/Spark/Flink etc.
– Microservices, Docker, Kubernetes, Mesos
12

Challenges with cloud
• Tradition: Strong HPC tradition in academia
– Existing resources funded by Research Council and
personnel at 6 centra in Sweden (SNIC)
• Economy: Cost model is new
– Difficult to assess the costs
• Legal: Working with sensitive data
• Educational: New technology for many
13

Needs in bioinformatics
• Primarily resources with a lot of RAM and storage (high I/O)
• Preferably transparent system, users don’t want to deal with e-
infrastructure at all
• How to work with storage (tiered?)
14

My research focus
e-infrastructure development
Automation, Big Data
e-Science methods development
Prediction models,
machine learning
Applied e-Science research
Drug discovery and
individualized diagnostics

Selected research in my group
Privacy
preservation
Workflows
Big Data
frameworks
Data management and
predictive modeling
Data
federation
Compute
federation

Reactive/continuous modeling
Data sources
Coordinate
Integrate
Version
Monitor
Publish
models
Archive
models
User
Bioclipse
Train and
assess model

Tools
Tools
Data
Data
VREs aim to
bridge this gap!
Researcher Other
researchers
Virtual Research Environments

Researcher
Tools
Data
Compute
and
storage
resources
Virtual Research Environment!
Other
researchers
Virtual Research Environments

Cloudflare
kubeadm Terraform
kubectl
Packer
• Enable users to deploy their own virtual
infrastructure on an IaaS provider
• Containerize tools, orchestrate with workflow
systems on top of Kubernetes
PhenoMeNal approach and
stack
KubeNow

Hierarchical Analysis of Temporal and
Spatial Image Data
21
Carolina Wählby
PI, PhD, Professor in Quantitative Microscopy
Andreas Hellander
Co-PI, Associate Professor
Ola Spjuth
Co-PI, Associate Professor
www.cb.uu.se/~carolina/HATSID.html

Presenting at Spark Summit 2017:
“EasyMapReduce: Leverage the power of Spark And
Docker To scale scientific tools in MapReduce
fashion“
22https://spark-summit.org/east-2017/events/easymapreduce-leverage-the-
power-of-spark-and-docker-to-scale-scientific-tools-in-mapreduce-fashion/

Our most recent scientific publication
23
http://jcheminf.springeropen.com/articles/10.1186/s13321-017-0204-4

European Open Science Cloud (EOSC)
• The vast majority of all data in the world (in fact up to 90%) has been
generated in the last two years.
• Scientific data is in direct need of openness, better handling, careful
management, machine actionability and sheer re-use.
• European Open Science Cloud: A vision of a future infrastructure to
support Open Research Data and Open Science in Europe
– It should enable trusted access to services, systems and the re-use
of shared scientific data across disciplinary, social and geographical
borders
– research data should be findable, accessible, interoperable and re-
usable (FAIR)
– provide the means to analyze datasets of huge sizes
24http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud

The case for cloud computing in Life Sciences

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie The case for cloud computing in Life Sciences

Ähnlich wie The case for cloud computing in Life Sciences (20)

Mehr von Ola Spjuth

Mehr von Ola Spjuth (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The case for cloud computing in Life Sciences

Hinweis der Redaktion