4 C’s for Using Cloud to Support Scientific Research

Jeff Tabor
Sr. Director of Product Management & Marketing
Avere Systems
Jeff Tabor has worked at Avere Systems for six
years, leading product definition and marketing
efforts. Prior to Avere, Jeff managed clustered
NAS solutions at NetApp and Spinnaker
Networks. Jeff holds Bachelor’s and Master’s
degrees in Electrical Engineering and Computer
Science from the Massachusetts Institute of
Technology.
2

Scott Jeschonek
Director of Product Management
Avere Systems
Scott has more than twenty years of synthesizing
his enterprise, telecommunications, and vendor
experience to provide a unique perspective to the
implications of the cloud phenomenon. After
working with several technology companies, Scott
joined Avere in early 2014 where he is responsible
for the software roadmap.
3

Agenda
• Cloud overview
– Opportunities & challenges
• Cloud benefits for scientific research – “The 4 C’s”
– Compute scaling
– Capacity scaling
– Collaboration across global enterprise
– Cost savings
• Cloud demo
– Running scientific apps
– Storing scientific data
4

Poll Question #1
Which one of the 4 “Cs” do you think would
be the best use for cloud in your organization?
A. Compute
B. Capacity
C. Collaboration
D. Cost

Storage CloudCompute Cloud
On-Prem Storage
NAS Object
On-Prem Compute
Hybrid Cloud
6
Bucket 2
Bucket n
Bucket 1App servers

On-Prem Storage
NAS Object
On-Prem Compute
• App servers
• Compute
farm
• Desktops
Hybrid Cloud
7
Bucket 2
Bucket n
Bucket 1App servers

On-Prem Storage
NAS Object
On-Prem Compute
• NAS and
Object
• Multiple tiers of
storage
• App servers
• Compute
farm
• Desktops
Hybrid Cloud
8
Bucket 2
Bucket n
Bucket 1App servers

On-Prem Storage
NAS Object
On-Prem Compute
• NAS and
Object
storage
• App servers
• Compute
farm
• Desktops
• Near infinite
compute
• Cloud bursting
• Permanent
infrastructure
Hybrid Cloud
9
Bucket 2
Bucket n
Bucket 1App servers

On-Prem Storage
NAS Object
On-Prem Compute
• NAS and
Object
storage
• App servers
• Compute
farm
• Desktops
• Near infinite
compute
• Cloud bursting
• Permanent
infrastructure
• Near infinite
capacity
• Mostly backup
and archive
today
Hybrid Cloud
10
Bucket 2
Bucket n
Bucket 1App servers

Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
Cloud is Attractive BUT has Challenges
11
On-Prem Storage
NAS Object
On-Prem Compute

Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
12
On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
1. Unfamiliar object-
based interface
S3
S3
S3
S3

Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
13
On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
based interface
2. High latency to
remote storage
S3
S3
S3
S3
Latency of
10-100ms or more

Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
14
On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
based interface
2. High latency to
remote storage
3. No easy on-ramp to
cloud storage
S3
S3
S3
S3
Latency of
10-100ms or more

Storage Cloud
Bucket 2
Bucket n
Bucket 1
Compute Cloud
App servers
15
On-Prem Storage
NAS Object
On-Prem Compute
Cloud challenges
based interface
2. High latency to
remote storage
3. No easy on-ramp to
cloud storage
4. Cloud Gateways do
NOT scale
S3
S3
S3
S3
Latency of
10-100ms or more
Single-node
Gateway
Single-node
Gateway

Compute
• Cloud benefits
– Unlimited compute capacity
• Time to market
– Cost effective
• Easy to turn on and turn off
• Zero footprint
• Cloud challenges
– High latency to data
• Performance impact
• Compute cloud NOT intended for storage
– Familiar file system interface
• No change to apps

Cloud Computing with Avere
17
On-Prem Storage
NAS Object
On-Prem Compute
Customer Need
• Performance for tier-1
applications
• Avoid latency to storage
• Don’t replicate all data
to compute cloud
Bucket 2
Bucket n
Bucket 1App servers

18
On-Prem Storage
NAS Object
On-Prem Compute
Customer Need
• Performance for tier-1
applications
• Avoid latency to storage
• Don’t replicate all data
to compute cloud
Recommended Solution
• Automatic caching of
active data only
• Handle read, writes,
and metadata ops
• Hide latency (50:1 or
more offload)
Physical
Edge Filer
Virtual
Edge Filer
Cache data near compute
Hide latency to storage
Bucket 2
Bucket n
Bucket 1App servers

19
Customer Need
• Keep pace with growing
demand
• Avoid disruptive
upgrades
• Start small, grow huge
at “click of button”
On-Prem Storage
NAS Object
On-Prem Compute
Physical
Edge Filer
Virtual
Edge Filer
Bucket 2
Bucket n
Bucket 1App servers

20
Customer Need
• Keep pace with growing
demand
• Avoid disruptive
upgrades
• Start small, grow huge
at “click of button”
On-Prem Storage
NAS Object
On-Prem Compute
Physical
Edge Filer
Virtual
Edge Filer
Bucket 2
Bucket n
Bucket 1App servers
Recommended Solution
• Scale-out NAS provided
via clustering
• Performance scaling
(more CPUs, DRAM)
• Capacity scaling (more
SSD for better hit rate)
Cluster from
3 to 50 nodes

Capacity
• Cloud benefits
– Unlimited storage capacity
– Simple to manage (no maintenance, easy scaling)
– Pay only for what you use
– Protect investment in on-prem storage
– Unfamiliar access protocol (e.g. S3 API)
– Missing features
• Security
• Compression
• Snapshots

Data Source: http://genome.gov/sequencingcosts
Slide Source: Chris Dagdigian, BioTeam, as presented in BioIT World webinar, April 2014

Use Case – Inova Health System
Enabling Personalized Medicine with a Hybrid Cloud
Avere + AWS Benefits
– 6,300 whole genomes, 1.3PB, and 7M files in AWS
– Genomic analysis results in hours not days
– Saved more than $10 million
– Avere encryption & AWS HIPAA compliance
Customer Challenges
– Scale to petabytes of capacity
– Scale performance for genomic analysis
– Contain cost (opex, not capex)
– Security & compliance

GNS
AWS S3AWS EC2
Use Case – Large Pharmaceutical Company
Public Cloud Services
• New apps, built for cloud
• FlashMove to public cloud
- Lowest cost
- Simplest management
• Centralized resources
- Collaboration
- Scalable, most efficient
On-Prem Resources
• Existing apps, no changes
• Data that cannot move to
public cloud (security)
• FlashMove to object storage
- Lower cost
- Better scaling
• Geo-dispersal, no replication
- Simpler to manage
- More efficient
Bucket 2
Bucket n
Bucket 1
Physical FXT
On-Prem Storage
NAS Object
On-Prem Compute
Virtual FXT
Virtual Compute Farm
FlashMove
FlashMove
New Apps
Existing
Apps

Use Case – Next Gen Sequencing at CDC
Avere Benefits
– Scalable performance for SMB and NFS
– Reduced Isilon spend by 33%
– GNS provides central mount point for all storage
– Hide network latency to remote labs
– Add public or private cloud in future
Customer Challenges
– Isilon performance at scale
– Isilon cost
– Legacy Isilon
– Distributed users and storage
– Network saturation and latency
Datacenter B (legacy)
EMC Isilon
(old)
FXT Cluster
(3 nodes)EMC Isilon
(510PB)
FXT Cluster
(11 nodes)
Generate
(SMB)
Process
(NFS)
Interpret
(SMB)
Datacenter A (centralized, scalable storage)
Remote Labs
Use central storage
Simplify mgmt
Cache at edge
GNS

Collaboration
• Cloud benefits
– Centralized data
– Accessible from many geographic regions
– Hide latency to centralized data
– Read and write access coordination
• One writer, many readers
• Many writers, minimal sharing
• Many writers to shared files

Primary Datacenter
Hybrid Cloud NAS for Global Enterprises
Remote Office

Primary Datacenter
Remote Office
Secondary/Partner/Colo Datacenter

Primary Datacenter
Cloud Computing
(Public/Private)
Remote Office
Virtual or
Physical

Cloud Storage
(Public/Private)
Primary Datacenter
Cloud Computing
(Public/Private)
Remote Office
Virtual or
Physical

Cost
• Cloud benefits
– Economy of scale
– Zero footprint, power, and cooling
– Pay only for what you use (capacity & compute)
– Simplified management
• Cloud challenge/change
– Opex, not capex

Avere + Amazon TCO Example
TCO Savings = $2,701,000
Old Way (NetApp or EMC) New Way (Avere + Amazon)
T1
Customer Reqs
• 200k IOPS
• 200TB primary
• 1PB Amazon S3
T2
200k IOPS + 200TB Primary
NetApp 6000+SAS
or
EMC Isilon S200
1PB Archive
NetApp 3000+SATA
or
EMC Isilon NL400
T2 AWS
200TB Primary
Private object or
legacy NAS
1PB Storage
AWS S3: 1,000TB
Avere 200k IOPS
Avere 3800 (4x)manual
copy
FlashMove
TCO Saving with Avere + Amazon
Cost NetApp or Isilon Avere + Amazon S3
Storage Acquisition $2,067,000 $298,000
Service $930,000 $134,000
Amazon S3 (capacity) $0 $1,032,000
Amazon S3 (data out) $0 $298,000
Storage Admin $1,080,000 $196,000
Facilities & Power $264,000 $43,000
Data Migration $360,000 $0
3-year TCO $4,701,000 $2,000,000
Avere + AWS TCO savings ($) $2,701,000
Avere + AWS TCO savings (%) 57%

Comparing 1,000,000 IOPS Solutions*
EMC Isilon
$10.7 / IOPS
NetApp
$5.1 / IOPS
150ms
Avere
$2.3 / IOPS
Throughput
(IOPS)
Latency/ORT
(ms)
List Price $/IOPS Disk
Quantity
Rack
Units
Cabinets Product Config
Avere FXT 3800 1,592,334 1.24 $3,637,500 $2.3 549 76 1.8
32-node cluster,
cloud storage config
NetApp FAS 6240 1,512,784 1.53 $7,666,000 $5.1 1728 436 12 24-node cluster
EMC Isilon S200 1,112,705 2.54 $11,903,540 $10.7 3360 288 7 140-node cluster
*Comparing top SPEC SFS results for a single NFS file system/namespace. See www.spec.org/sfs2008 for more information.

EMC Isilon
$10.7 / IOPS
NetApp
$5.1 / IOPS
150ms
Avere
$2.3 / IOPS
Throughput
(IOPS)
Latency/ORT
(ms)
Quantity
Rack
Units
Avere FXT 3800 1,592,334 1.24 $3,637,500 $2.3 549 76 1.8
32-node cluster,
Avere 32-node
FXT cluster

EMC Isilon
$10.7 / IOPS
NetApp
$5.1 / IOPS
150ms
Avere
$2.3 / IOPS
Throughput
(IOPS)
Latency/ORT
(ms)
Quantity
Rack
Units
Avere FXT 3800 1,592,334 1.24 $3,637,500 $2.3 549 76 1.8
32-node cluster,
Avere 32-node
FXT cluster
Core Filer
-NAS
-Public object
-Private object

Poll Question #2
What applications are you looking to deploy in
the cloud?
A. Business apps (e.g. email, DB, knowledge
base)
B. File services (e.g. file serving, home dirs,
document management, patient
management)
C. High-performance apps (e.g. genomic
sequencing, imaging, bioinformatics )
D. Backup and archival only
E. Other (please specify in chat)

Real-world Demo: Life Sciences
in Google Cloud

Cloud Bursting Use Case – Life Sciences
39
Proprietary & Confidential
Cloud StorageCloud Compute Customer Challenges
•Add compute resources at
peak times
•Need for 3-6 months, no long-
term commitment
•Do NOT want to rewrite
applications
•May move data to the cloud
for capacity scaling (later)
Avere Benefits
•Virtual FXT provides scalable
NAS in Cloud Compute
•Hide latency to on-prem NAS
and object storage
•Easy setup, easy teardown
•Pay only for what is used
•Future: move data to the cloud
for better economics
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT

Cloud Bursting + Archive Use Case –
Life Sciences
40
•Add compute resources at
peak times
•Need for 3-6 months, no long-
term commitment
•Do NOT want to rewrite
applications
•Wishes to post results for
general access in lower-cost
Cloud Compute
Avere Benefits
•Provide File System access to
virtually unlimited cloud storage
•Provide Global NameSpace
(GNS) directory structure
between on-prem and cloud for
application continuity
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT
Bucket 2
Bucket n
Bucket 1

Hybrid Galaxy-in-Cloud – Life Sciences
41
•Never enough compute nodes
on-prem
•Linear OPEX costs for new
on-prem, seeking to reduce
•Requires file-system and
performance for continuity
•Storage flexibility must remain
for compliance, etc
Avere Benefits
•Empower application
operation completely in Cloud
and eliminate on-prem
compute, no matter the size of
organization
Physical FXT
On-Prem Storage
NAS
On-Prem Compute
Virtual FXT
Bucket 2
Bucket n
Bucket 1
Sequencer

Galaxy-in-Cloud – Life Sciences
42
•Company or Project too small
for on-prem costs
•But need file system with
sufficient performance
characteristics
•Duration of project may be
short
Avere Benefits
•No on-prem requirement
•Run only when you need the
nodes helps reduce overall
costs
•The ultimate flexibility
Virtual FXT
Bucket 2
Bucket n
Bucket 1
Small Company, Research Lab,
etc

Galaxy in Cloud
Compute Instance (8CPUs, 8GB
Memory, 500GB Disk or less)
Debian Linux
Galaxy Server installed with all tools
3-Node Avere vFXT cluster
BucketNFS Mounts to the Avere
Cluster
Map to Cloud Bucket
S3 Protocol
Cloud Storage, Single Bucket
/
/gcs
File System
Maintained By
Avere for
Object Storage
/galaxy-data

Galaxy in Cloud
/etc/fstab
Maps NFS to the vFXT
environment as /mnt/galaxy-data
Galaxy Server is locally installed, though it could also be located in the Cloud Storage and mounted via the vFXT

Avere vFXT maps to Cloud Storage

Galaxy Client is mounted on the
FXT cluster

Reference Data located in Bucket
Reference Genome data stored in a directory on the Cloud
Bucket

The Power of Caching
3-Node Avere vFXT cluster
BucketNFS Mounts to the Avere
Cluster
Map to Cloud Bucket
S3 Protocol
Cloud Storage, Single Bucket
/
/gcs
File System
Maintained By
Avere for
Object Storage
/galaxy-data
Multiple Galaxy Clients calling:
Reference Genomes
Same output files
Read / Read-ahead caching
Ensures faster response and
minimizes Cloud Bucket “hits”

Thank you!
Questions?
For more information, visit:
averesystems.com
Contact Us at:
888.88.AVERE
askavere@averesystems.com

4 C’s for Using Cloud to Support Scientific Research

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie 4 C’s for Using Cloud to Support Scientific Research

Ähnlich wie 4 C’s for Using Cloud to Support Scientific Research (20)

Mehr von Avere Systems

Mehr von Avere Systems (18)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

4 C’s for Using Cloud to Support Scientific Research

Hinweis der Redaktion