SlideShare ist ein Scribd-Unternehmen logo
1 von 12
0
©2011 Quest Software, Inc. All rights reserved.
Case Study:
Accomplishing redundancy
on Lustre based PFS
with DRBD
1
©2011 Quest Software, Inc. All rights reserved.
Customer Name:
India’s premier Scientific
Research Institute
Industry:
Education and Research
Corporate Office:
Bangalore, India .
2
©2011 Quest Software, Inc. All rights reserved.
Sought a solution that could make them future-
ready and provide flexibility in terms of technology
options
The PFS being on Lustre file system was a
challenge to build as Lustre didn’t
supported redundancy
3
©2011 Quest Software, Inc. All rights reserved.
High Performance Computing setup installed.
Object Storage Servers used for setting up Parallel file
System.
4
©2011 Quest Software, Inc. All rights reserved.
By incorporating DRBD and High Availability, the PFS based
on a Lustre file system delivered by Netweb Technologies
provides full redundancy.
The 48 node cluster can simultaneously access the storage
(PFS) with 1.5 GB/s throughput
5
©2011 Quest Software, Inc. All rights reserved.
Tyrone storage array.
Six object storage servers of 12 TB each and each having two volumes of
6TB configured on RAID 5.
Infiniband switch between Meta data servers and object storage servers.
Being one of India’s premier research institutes it is well-known across the
world. The research institute provides a vibrant academic ambience hosting
more than 200 Researchers. The research institute is funded by the
Department of Science and Technology, Government of India and is a deemed
university.
6
©2011 Quest Software, Inc. All rights reserved.
Being one of the renowned research institutes in India; the institute wanted to deploy a
High Performance Computing (HPC) for their research and training purposes. Netweb
Technologies was deputed to make this deployment of HPC at the research institute.
The research institute had put forth a requirement of 48 node HPC that would have a
theoretical peak performance of 4.5 Tera flops and should have a Parallel File System
(PFS) of 30 TB storage which would have 100% redundancy and a throughput of 1.5 GB/s.
The setting up of the HPC 48 node cluster was straight forward for Netweb Technologies,
but to provide the corresponding PFS storage posed a challenge.
Lustre by default was the choice of file system for the PFS deployment at the research
institute. The only hindrance was that Lustre doesn’t have any native support for
redundancy.
The only hindrance was that Lustre doesn’t have any native support for redundancy.
And since, redundancy was the primary requirement laid for PFS by the research
institute, it posed the biggest challenge for Netweb Technologies to provide the solution
with redundancy.
7
©2011 Quest Software, Inc. All rights reserved.
A Parallel File System mainly constitutes of Meta Data servers and the
connected storage nodes called as Object Storage servers. Meta Data servers
store the location of the data which is actually residing on the physical storage
nodes i.e. Object Storage servers.
If suppose, a disk fails in any of the Object storage server, then not only the
portion of data in that disk is lost, but also across all the nodes the complete data
file becomes corrupted.
The setup planned by Netweb Technologies was to have their “twin servers” to
act as Meta data servers.
And likewise for Object Storage servers, Netweb Technologies planned to have
each node having two volumes; whereby one volume will be active and other in
passive state.
8
©2011 Quest Software, Inc. All rights reserved.
The passive state volume would be the mirrored image of active volume of the
corresponding node.
Thus, in a round-robin fashion each active volume in one node of the PFS will
have a passive mirrored volume residing in another node of the PFS.
But it was easier to plan than to accomplish, as Lustre didn’t had any support
for redundancy. To accomplish redundancy through RAID within a
server/machine is easy, but across two different machines it is not possible with
RAID.
9
©2011 Quest Software, Inc. All rights reserved.
To achieve redundancy, the technicians of Netweb Technologies used
Distributed Replicated Block Devices (DRBD) which is normally used in High
Availability HA) clusters.
DRBD ensures that the writes happening on block device on primary node are
propagated simultaneously to the block device on secondary node.
Netweb Technologies was building 6 node PFS solution for the research
institute, with each node of 12 TB, where each node had two volumes of 6 TB
configured on RAID 5; where one volume was active while other was to be the
mirrored image of another volume in the cluster node.
By using DRBD, they were able to ascertain that, the active volume of first node
is getting mirrored in the passive volume on second node and so on in a round
robin fashion across all the 6 cluster nodes.
Likewise, redundancy was provided on Meta Data servers, so that if one fails
the other Meta data server which had the mirrored image could take over the
functioning of the PFS solution.
10
©2011 Quest Software, Inc. All rights reserved.
The non-traditional approach to accomplish redundancy on PFS solution has helped
the research institute to have a High Performance Computing cluster environment
where they can achieve 4.5 Tera flops of performance and can access the storage
system simultaneously from 48 nodes at 1.5 GB/s throughput.
100% Redundancy: By using DRBD along with High Availability concepts,
the PFS solution is designed to provide full redundancy which could occur
either from disk failure or from a complete node failure. Even if a cluster node
fails, its backup is in another node which gets promoted to the active state,
without hindering any performance of the PFS and maintaining the data
consistency. Even the hardware is designed to provide redundancy feature.
High Performance: With Infiniband as the backbone, high throughputs
can be achieved. So that even if all 48 nodes of the HPC access the
storage system the performance doesn’t get disrupted and a sustained
1.5 GB/s throughput is maintained.
11
©2011 Quest Software, Inc. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
bergwolf
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
GlusterFS
 

Was ist angesagt? (20)

Disperse xlator ramon_datalab
Disperse xlator ramon_datalabDisperse xlator ramon_datalab
Disperse xlator ramon_datalab
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQL
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Gluster 3.3 deep dive
Gluster 3.3 deep diveGluster 3.3 deep dive
Gluster 3.3 deep dive
 
librados
libradoslibrados
librados
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
On demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brandOn demand file-caching_-_gustavo_brand
On demand file-caching_-_gustavo_brand
 
Overview of some popular distributed databases
Overview of some popular distributed databasesOverview of some popular distributed databases
Overview of some popular distributed databases
 
DHT2 - O Brother, Where Art Thou with Shyam Ranganathan
DHT2 - O Brother, Where Art Thou with 	Shyam RanganathanDHT2 - O Brother, Where Art Thou with 	Shyam Ranganathan
DHT2 - O Brother, Where Art Thou with Shyam Ranganathan
 
Database backed coherence cache
Database backed coherence cacheDatabase backed coherence cache
Database backed coherence cache
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
YDAL Barcelona
YDAL BarcelonaYDAL Barcelona
YDAL Barcelona
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 

Andere mochten auch

أساسيات لغة Php بالعربي
أساسيات لغة Php بالعربيأساسيات لغة Php بالعربي
أساسيات لغة Php بالعربي
tahsal99
 
drbd9_and_drbdmanage_may_2015
drbd9_and_drbdmanage_may_2015drbd9_and_drbdmanage_may_2015
drbd9_and_drbdmanage_may_2015
Alexandre Huynh
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
buildacloud
 

Andere mochten auch (15)

brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
 
What is new in Leap42.2 and SLE12SP2
What is new in Leap42.2 and SLE12SP2What is new in Leap42.2 and SLE12SP2
What is new in Leap42.2 and SLE12SP2
 
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
BlackStor - World's fastest & most reliable Cloud Native Software Defined Sto...
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & Heartbeat
 
أساسيات لغة Php بالعربي
أساسيات لغة Php بالعربيأساسيات لغة Php بالعربي
أساسيات لغة Php بالعربي
 
Drbd
DrbdDrbd
Drbd
 
MySQL with DRBD/Pacemaker/Corosync on Linux
 MySQL with DRBD/Pacemaker/Corosync on Linux MySQL with DRBD/Pacemaker/Corosync on Linux
MySQL with DRBD/Pacemaker/Corosync on Linux
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016
 
drbd9_and_drbdmanage_may_2015
drbd9_and_drbdmanage_may_2015drbd9_and_drbdmanage_may_2015
drbd9_and_drbdmanage_may_2015
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 

Ähnlich wie Accomplishing redundancy on Lustre based PFS with DRBD

EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
David Walker
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
Perforce
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
sprdd
 

Ähnlich wie Accomplishing redundancy on Lustre based PFS with DRBD (20)

White Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset MetadataWhite Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
PowerAlluxio
PowerAlluxioPowerAlluxio
PowerAlluxio
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use CaseOracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
 
Oracle exalytics deployment for high availability
Oracle exalytics deployment for high availabilityOracle exalytics deployment for high availability
Oracle exalytics deployment for high availability
 
2015 open storage workshop ceph software defined storage
2015 open storage workshop   ceph software defined storage2015 open storage workshop   ceph software defined storage
2015 open storage workshop ceph software defined storage
 
4 026
4 0264 026
4 026
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
os
osos
os
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
LogicalDOC Clustering
LogicalDOC ClusteringLogicalDOC Clustering
LogicalDOC Clustering
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 

Mehr von Tyrone Systems

Mehr von Tyrone Systems (20)

Kubernetes in The Enterprise
Kubernetes in The EnterpriseKubernetes in The Enterprise
Kubernetes in The Enterprise
 
Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?Why minio wins the hybrid cloud?
Why minio wins the hybrid cloud?
 
why min io wins the hybrid cloud
why min io wins the hybrid cloudwhy min io wins the hybrid cloud
why min io wins the hybrid cloud
 
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
5 ways hci (hyper-converged infrastructure) powering today’s modern learning ...
 
5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.5 current and near-future use cases of ai in broadcast and media.
5 current and near-future use cases of ai in broadcast and media.
 
How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...How hci is driving digital transformation in the insurance firms to enable pr...
How hci is driving digital transformation in the insurance firms to enable pr...
 
How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...How blockchain is revolutionising healthcare industry’s challenges of genomic...
How blockchain is revolutionising healthcare industry’s challenges of genomic...
 
5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...5 ways hpc can provides cost savings and flexibility to meet the technology i...
5 ways hpc can provides cost savings and flexibility to meet the technology i...
 
How Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking IndustryHow Emerging Technologies are Enabling The Banking Industry
How Emerging Technologies are Enabling The Banking Industry
 
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...Five Exciting Ways HCI can accelerates digital transformation for Media and E...
Five Exciting Ways HCI can accelerates digital transformation for Media and E...
 
Design and Optimize your code for high-performance with Intel® Advisor and I...
Design and Optimize your code for high-performance with Intel®  Advisor and I...Design and Optimize your code for high-performance with Intel®  Advisor and I...
Design and Optimize your code for high-performance with Intel® Advisor and I...
 
Fast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent AutomationFast-Track Your Digital Transformation with Intelligent Automation
Fast-Track Your Digital Transformation with Intelligent Automation
 
Top Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged InfrastructureTop Five benefits of Hyper-Converged Infrastructure
Top Five benefits of Hyper-Converged Infrastructure
 
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
An Effective Approach to Cloud Migration for Small and Medium Enterprises (SMEs)
 
How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?How can Artificial Intelligence improve software development process?
How can Artificial Intelligence improve software development process?
 
3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection3 Ways Machine Learning Facilitates Fraud Detection
3 Ways Machine Learning Facilitates Fraud Detection
 
Four ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloudFour ways to digitally transform with HPC in the cloud
Four ways to digitally transform with HPC in the cloud
 
How to Secure Containerized Environments?
How to Secure Containerized Environments?How to Secure Containerized Environments?
How to Secure Containerized Environments?
 
OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20OneAPI Series 2 Webinar - 9th, Dec-20
OneAPI Series 2 Webinar - 9th, Dec-20
 
OneAPI dpc++ Virtual Workshop 9th Dec-20
OneAPI dpc++ Virtual Workshop 9th Dec-20OneAPI dpc++ Virtual Workshop 9th Dec-20
OneAPI dpc++ Virtual Workshop 9th Dec-20
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Accomplishing redundancy on Lustre based PFS with DRBD

  • 1. 0 ©2011 Quest Software, Inc. All rights reserved. Case Study: Accomplishing redundancy on Lustre based PFS with DRBD
  • 2. 1 ©2011 Quest Software, Inc. All rights reserved. Customer Name: India’s premier Scientific Research Institute Industry: Education and Research Corporate Office: Bangalore, India .
  • 3. 2 ©2011 Quest Software, Inc. All rights reserved. Sought a solution that could make them future- ready and provide flexibility in terms of technology options The PFS being on Lustre file system was a challenge to build as Lustre didn’t supported redundancy
  • 4. 3 ©2011 Quest Software, Inc. All rights reserved. High Performance Computing setup installed. Object Storage Servers used for setting up Parallel file System.
  • 5. 4 ©2011 Quest Software, Inc. All rights reserved. By incorporating DRBD and High Availability, the PFS based on a Lustre file system delivered by Netweb Technologies provides full redundancy. The 48 node cluster can simultaneously access the storage (PFS) with 1.5 GB/s throughput
  • 6. 5 ©2011 Quest Software, Inc. All rights reserved. Tyrone storage array. Six object storage servers of 12 TB each and each having two volumes of 6TB configured on RAID 5. Infiniband switch between Meta data servers and object storage servers. Being one of India’s premier research institutes it is well-known across the world. The research institute provides a vibrant academic ambience hosting more than 200 Researchers. The research institute is funded by the Department of Science and Technology, Government of India and is a deemed university.
  • 7. 6 ©2011 Quest Software, Inc. All rights reserved. Being one of the renowned research institutes in India; the institute wanted to deploy a High Performance Computing (HPC) for their research and training purposes. Netweb Technologies was deputed to make this deployment of HPC at the research institute. The research institute had put forth a requirement of 48 node HPC that would have a theoretical peak performance of 4.5 Tera flops and should have a Parallel File System (PFS) of 30 TB storage which would have 100% redundancy and a throughput of 1.5 GB/s. The setting up of the HPC 48 node cluster was straight forward for Netweb Technologies, but to provide the corresponding PFS storage posed a challenge. Lustre by default was the choice of file system for the PFS deployment at the research institute. The only hindrance was that Lustre doesn’t have any native support for redundancy. The only hindrance was that Lustre doesn’t have any native support for redundancy. And since, redundancy was the primary requirement laid for PFS by the research institute, it posed the biggest challenge for Netweb Technologies to provide the solution with redundancy.
  • 8. 7 ©2011 Quest Software, Inc. All rights reserved. A Parallel File System mainly constitutes of Meta Data servers and the connected storage nodes called as Object Storage servers. Meta Data servers store the location of the data which is actually residing on the physical storage nodes i.e. Object Storage servers. If suppose, a disk fails in any of the Object storage server, then not only the portion of data in that disk is lost, but also across all the nodes the complete data file becomes corrupted. The setup planned by Netweb Technologies was to have their “twin servers” to act as Meta data servers. And likewise for Object Storage servers, Netweb Technologies planned to have each node having two volumes; whereby one volume will be active and other in passive state.
  • 9. 8 ©2011 Quest Software, Inc. All rights reserved. The passive state volume would be the mirrored image of active volume of the corresponding node. Thus, in a round-robin fashion each active volume in one node of the PFS will have a passive mirrored volume residing in another node of the PFS. But it was easier to plan than to accomplish, as Lustre didn’t had any support for redundancy. To accomplish redundancy through RAID within a server/machine is easy, but across two different machines it is not possible with RAID.
  • 10. 9 ©2011 Quest Software, Inc. All rights reserved. To achieve redundancy, the technicians of Netweb Technologies used Distributed Replicated Block Devices (DRBD) which is normally used in High Availability HA) clusters. DRBD ensures that the writes happening on block device on primary node are propagated simultaneously to the block device on secondary node. Netweb Technologies was building 6 node PFS solution for the research institute, with each node of 12 TB, where each node had two volumes of 6 TB configured on RAID 5; where one volume was active while other was to be the mirrored image of another volume in the cluster node. By using DRBD, they were able to ascertain that, the active volume of first node is getting mirrored in the passive volume on second node and so on in a round robin fashion across all the 6 cluster nodes. Likewise, redundancy was provided on Meta Data servers, so that if one fails the other Meta data server which had the mirrored image could take over the functioning of the PFS solution.
  • 11. 10 ©2011 Quest Software, Inc. All rights reserved. The non-traditional approach to accomplish redundancy on PFS solution has helped the research institute to have a High Performance Computing cluster environment where they can achieve 4.5 Tera flops of performance and can access the storage system simultaneously from 48 nodes at 1.5 GB/s throughput. 100% Redundancy: By using DRBD along with High Availability concepts, the PFS solution is designed to provide full redundancy which could occur either from disk failure or from a complete node failure. Even if a cluster node fails, its backup is in another node which gets promoted to the active state, without hindering any performance of the PFS and maintaining the data consistency. Even the hardware is designed to provide redundancy feature. High Performance: With Infiniband as the backbone, high throughputs can be achieved. So that even if all 48 nodes of the HPC access the storage system the performance doesn’t get disrupted and a sustained 1.5 GB/s throughput is maintained.
  • 12. 11 ©2011 Quest Software, Inc. All rights reserved.