1. IBM Systems and Technology Education
Case Study
Swiss National
Supercomputing Center
Gains low-latency, high-bandwidth storage with
IBM General Parallel File System
Founded in 1991, CSCS, the Swiss National Supercomputing Center,
Overview develops and promotes technical and scientiļ¬c services for the Swiss
research community in the ļ¬eld of high-performance computing
The need
(HPC). CSCS enables world-class scientiļ¬c research by pioneering,
With data volumes doubling each
operating and supporting leading-edge supercomputing technologies.
year, researchers at the Swiss National
Supercomputing Center (CSCS) needed Located near Lugano, in the south of Switzerland, CSCS is an
a centralized storage solution offering autonomous unit of the Swiss Federal Institute of Technology in
low latency, high bandwidth and Zurich (ETH Zurich).
extreme scalability.
The solution CSCS serves dozens of different research institutions, supporting a
CSCS engaged IBM to build a broad range of computational projects across theoretical chemistry,
centralized storage solution based material sciences, biological sciences and climate science. Simulations
on IBMĀ® General Parallel File
System (GPFS), IBM System xĀ® and and other computational projects running on the organizationās com-
IBM System StorageĀ® hardware, and pute clusters process many terabytes of data and generate large sets of
Inļ¬niBand networking technology. intermediate results ready for further computation. During the actual
The beneļ¬t simulation run time, all of this data resides on the āscratchā storage sys-
The solution supports massively parallel tems that are directly attached to each cluster.
read/write operations and provides a
single namespace for all systems. It In the past, research teams would store intermediate simulation results
offers extremely high availability and
on tape - but the limited bandwidth of the tape library made this
nondisruptive online scaling of the ļ¬le
system. impractical as data volumes grew rapidly. To analyze their results fully,
users would have needed to transfer the data back to their own institu-
tions, which was not feasible because of the low transfer speeds.
āEven over high-speed leased lines, copying data back to a university
network could take weeks, so we wanted to give our users the possibil-
ity of storing their data locally at CSCS for the duration of their proj-
ects,ā comments Dominik Ulmer, CSCS general manager. āEqually,
the typical HPC workļ¬ow has become more sophisticated: Instead of
simply running a simulation on an input data set, we now often run
2. IBM Systems and Technology Education
Case Study
multiple simulations in series, using the output data from one as the
input data for the next. This tendency to reuse data was another reason
āWe selected IBM GPFS for creating a permanent, centralized data storage solution at CSCS.ā
as it offered the best
combination of high Choosing the best solution
The amount of data handled in the HPC environment at CSCS
scalability, compatibility roughly doubles each year, making it imperative to select a highly scal-
with our distributed able architecture for the proposed centralized storage solution. It was
also critical to choose a ļ¬le system that could be mounted on multiple
operating systems, par- different HPC systems simultaneously, and that would offer both per-
allelism of access, and formance and reliability.
failover between nodes.ā
āWe tested a number of ļ¬le systems and narrowed our choice down
to Oracle Lustre and IBM General Parallel File System [GPFS],ā
āHussein Harake, HPC systems engineer,
CSCS says Hussein Harake, CSCS HPC systems engineer. āWe selected
IBM GPFS, as it offered the best combination of high scalability, com-
patibility with our distributed operating systems, parallelism of access
and failover between nodes. Our data can be very long-lived, so it was
also important to choose a solution that would offer longevityāboth in
terms of the reliability of long-term data storage and in terms of the
vendor support and roadmap. Selecting GPFS from IBM enabled us to
meet these requirements.ā
Managing rapid growth
GPFS supports single cluster ļ¬le systems of multiple petabytes and
runs at I/O rates of more than 100 gigabytes per second. Individual
clusters may be cross-connected to provide parallel access to data even
across large geographic distances. At CSCS, GPFS offers both low
latency (needed for high-speed access to small ļ¬les) and high band-
width (vital for delivering very large ļ¬les to compute clusters).
āOur GPFS-based central ļ¬le store is becoming a really important
resource for us,ā says Harake. āUsers really appreciate the option to
store their data locally rather than needing to copy it back to their own
institution. They are requesting more capacity than we originally antic-
ipated, so the environment is growing faster than expected.ā
Ulmer adds, āThe rapid rate of growth in data volumes is partly a con-
sequence of researchers being able to run more complex simulations on
the newer HPC clusters. So, to an extent, they are catching up on
2
3. IBM Systems and Technology Education
Case Study
projects that couldnāt be done before. GPFS gave us an infrastructure
Solution components: that would grow with user demand but in a way that was predictable in
budgetary terms.ā
Hardware
ā IBMĀ® System xĀ® 3650 M2 A key decision factor for GPFS was its support for nondisruptive
ā IBM System StorageĀ® DS5100
ā IBM System Storage DS5300 migrations and upgrades. Since ļ¬rst implementing the IBM ļ¬le
ā IBM System Storage EXP5000 system, CSCS has upgraded through three phases of different storage
ā IBM System Storage EXP5060 arrays and network switches, all without loss of data or service inter-
Software ruption. Today, the centralized storage solution is based around three
ā IBM General Parallel File System IBM System Storage DS5100 controllers with eight IBM System
Storage EXP5060 Storage Expansion Enclosures (containing high-
Services capacity SATA disks) and four IBM System Storage EXP5000 Storage
ā IBM Global Technology Services
Expansion Enclosures (containing high-performance Fibre Channel
disks). IBM System x 3650 M2 servers running GPFS act as the ļ¬le
servers; Mellanox gateways and switches provide high-speed Inļ¬niBand
networking.
Says Harake, āWe have upgraded the ļ¬le system several times, changed
the disk controllers and even changed the disks themselves, all without
taking the solution down. We will soon upgrade the controllers from
DS5100 to DS5300 and add four more expansion enclosures, which
will expand our total capacity to 2 PB without any interruption to
service.ā
Holistic approach
IBM is responsible for supplying and supporting every element in the
centralized storage environment, from the disks up to the network
infrastructure. āWe wanted IBM to take ownership of the core net-
work, so that we have a single point of support for the whole environ-
ment,ā says Ulmer. āThis holistic approach helps us minimize risk and
delays in support.ā
He adds, āWe consider HPC technology know-how to be our core
competence, and we want to ļ¬nd external partners that are willing to
tackle the really cutting-edge stuff and learn alongside us. Our rela-
tionship with IBM is very good, and we see a lot of value in our shared
workshops. With the GPFS-based centralized storage solution, we
feel that we have the ideal building-block for the coming years. The
IBM solution will enable us to expand our capacity enormously with-
out disruption and without loss of performance.ā
3