In this presentation from the DDN User Meeting at SC13, Tim Cutts from The Sanger Insitute describes how the company wrangles genomics data with DDN storage.
Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/
8. Typical
data
flow
Raw data from
sequencer
Stage data to Lustre
Staging storage
Lustre
QC and alignment
Research analysis
iRODS
Archival
storage
Website
8
10. Staging
storage
Simple
scale-‐out
architecture
– Server
with
~50TB
direct
a;ached
block
storage
– One
per
sequencer
– Running
SAMBA
for
upload
from
sequencer
Maximum
data
from
all
sequencers
is
currently
1.7
TB/day
1000
core
cluster
reads
data
from
staging
servers
over
NFS
– Quality
checks
– Alignment
to
reference
genome
– Store
aligned
BAM
and/or
CRAM
files
in
iRODS
Next Gen
Sequencer
Sequence
data over CIFS
Production
sequencing cluster
QC and alignment
(1000 cores)
CIFS/NFS
staging server
NFS
50TB
One of these for each of
One of 27 sequencers of
these for each
One of 27 sequencers of
these for each
27 sequencers
Aligned BAM files
iRODS
(4PB)
10
11. iRODS
Object
store
with
arbitrary
metadata
Rules
to
automate
mirroring,
and
other
tasks
as
required
Vendor-‐agnos'c
Mostly
DDN
SFA
10K
Some
other
vendors’
storage
also
Oracle
RAC
cluster
holds
metadata
Two
ac've-‐ac've
iRES
resource
servers
in
different
rooms
8Gb
FC
to
storage
10Gb
IP
Series
of
43
TB
LVM
volumes
from
2x
SFA
10K
in
each
room
iCAT
(Oracle RAC)
iRODS Server
Other vendors
Other vendors
SFA10K
SFA10K
43TB
43TB
43TB
43TB
iRES server
43TB
43TB
iRES server
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
SFA10K
SFA10K
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
43TB
11
12. Downstream
analysis
iRODS
(4PB)
Analysis clusters
(~14000 cores)
Aligned sequences
Lustre scratch space
(13 filesystems)
Research
analysis
NFS storage for
completed work
12
13. Lustre
setup
11
filesystems
500TB
/1PB
each
Large
projects
have
their
own
Exascaler
hardware
…
but
our
own
Lustre
install
Aim
to
deliver
5MB/sec
per
core
of
compute
IB
connected
OSS-‐OST
10G
ethernet
to
clients
EF3015
MGS
MDS
Clients
MDT
MDT
1/2U servers
IB
SFA10K/12K
OSS
OSS
OST
OSS
10G/40G
Network
OST
OST
OSS
OST
OSS
OST
OSS
OST
OSS
OST
OSS
OST
13
14. Future
challenges
and
direc'ons
iRODS
• Object
storage
instead
of
filesystems
(WOS?)
• File
systems
take
a
long
'me
to
fsck
• integra'on
with
WOS
Clinical
use
and
personalised
medicine
• Security
implica'ons
• How
can
we
do
this
in
a
small
laboratory
in
Africa
with
terrible
power
and
minimal
IT
skills?
Lustre
• Upgrade
to
2.5
(HSM
features)
• Exascaler
needs
to
be
more
current
Sequencing
technology
• Nanopore
sequencing
• Use
outside
the
datacentre
Vendor
support
• Integrated
support
plaoorms
for
produc'on
systems
14
15. Thank
you
The
team
– Phil
Butcher,
IT
Director
– Tim
Cu;s,
Ac'ng
Head
of
Scien'fic
Compu'ng
– Guy
Coates,
Informa'cs
Systems
Group
Team
Leader
– Peter
Clapham
– James
Beal
– Helen
Brimmer
– Jon
Nicholson,
Network
Team
Leader
– Shanthi
Sivadasan,
DBA
Team
Leader
– Numerous
bioinforma'cians
15