SlideShare a Scribd company logo
1 of 4
Download to read offline
Hadoop Workflow Accelerator
•	 Enables Hadoop environments to
scale compute and data storage
independently
•	 Supports centralized data
repositories up to 100s of PB storage
capacity
•	 Increases flexibility to dynamically
adjust analysis resources
Accelerate Time to Insight
Organizations are struggling to manage the tremendous volume of data they
are collecting from a wide variety of sources. While, big data helps derive new
insights that enable actionable intelligence and improve operational efficiency,
organizations are also searching for solutions that will improve their ability to
manage and rapidly process the explosion in data and accelerate time to insight.
Seagate’s award-winning ClusterStor scale-out High Performance Computing
(HPC) solutions enable organizations to optimize big data workflows and centralize
data storage for High Performance Data Analytics solutions.
Superior Hadoop Performance
Seagate’s ClusterStor scale-out HPC solutions are optimized to take advantage
of HPC environments including high speed Ethernet or InfiniBand networks
to efficiently deliver data to either HPC or Hadoop clusters. For Hadoop, this
allows organizations to optimize analytics workflows by enabling direct access to
centralized data repositories eliminating the need to bulk copy large amounts of
data into HDFS based direct attach storage before beginning Hadoop workflows.
The Hadoop Workflow Accelerator provides seamless compatibility with Hadoop
environments, enables streamlined Hadoop workflow efficiency and superior
Hadoop performance. The Hadoop Workflow Accelerator consists of Seagate’s
Hadoop on Lustre® Connector, professional services, as well as an array of
ClusterStor performance optimization best practices, system tuning methods,
installation and configuration management tools.
The Hadoop on Lustre Connector by-passes the HDFS software layer and directs
I/O to ClusterStor’s internal Lustre parallel file system; therefore, the Hadoop
Workflow Accelerator and HDFS related workflow operate side by side. This
provides investment protection and compatibility across your entire HPC and
Hadoop environment. Users may seamlessly migrate Hadoop workflows to
ClusterStor using the Hadoop Workflow Accelerator. Further, users may continue
to use their Hadoop cluster’s direct attached storage to hold legacy data and
intermediate results while gaining all the benefits of the Hadoop Workflow
Accelerator.
Data Sheet
Hadoop Workflow Accelerator
ClusterStor’s innovative scale-out HPC architecture enables a centrally managed big
data repository that accelerates Hadoop applications, so users can execute parallel
Hadoop jobs on any selected set of available data. The Hadoop Workflow Accelerator
significantly reduces time to results by enabling immediate data processing from the
start of each Hadoop job, and eliminates the unnecessary and time consuming step of
copying large amounts of data from a separate data repository.
All this is due to ClusterStor’s industry leading architecture, proven in some of
the highest performance and highest capacity supercomputing sites in the world.
ClusterStor supports low-latency, POSIX compliant, massively parallel and concurrent
data access along with support for intensive Hadoop data analytics processing
workloads. ClusterStor scales to support from hundreds to tens-of-thousands of
client compute nodes, from hundreds to tens-of-thousands of Terabytes data, and
from several GB/sec to over 1TB/sec throughput performance, all with a single globally
coherent name space.
Overall, the Hadoop Workflow Accelerator streamlines and accelerates Hadoop
application efficiency and results, as well as significantly enhances flexibility and
productivity of analytics workload processing and big data centralized repository
management.
Improved Total Cost of Ownership
Standard Hadoop implementations using HDFS depend on triple replication of data to
ensure data availability. HDFS forces 66% of your storage capacity to remain idle as
2nd or 3rd level copies. This represents an enormous waste of storage resources such
as: rack space, data center floor space, power and cooling.
In contrast, the Hadoop Workflow Accelerator using RAID, significantly lowers storage
overhead for data protection, and enables disaggregation of compute from storage
allowing for independent scaling and optimization. ClusterStor scales file system
performance and capacity linearly, the same way a Hadoop compute cluster scales
processing power.
By using the Hadoop Workflow Accelerator to join Hadoop compute nodes with
Seagate’s ClusterStor storage systems, administrators gain the ability to separately
manage allocation of Hadoop compute and storage resources, to more efficiently
satisfy dynamic and continually expanding data analytics workload requirements.
Hadoop Workflow Accelerator
Hadoop Workflow Accelerator
The Hadoop Workflow Accelerator when used in conjuction with ClusterStor reduces
core operating costs that often represent the largest share of Total Cost of Ownership
(TCO), including reducing rack space, data center floor space, weight, power, cooling
and administrative costs. Most importantly the Hadoop Workflow Accelerator saves
you time, your most precious resource.
Key solution attributes that generate TCO savings include:
•	 Enabling immediate data processing at the start of each Hadoop job and eliminating
bulk copying of large amounts of data
•	 Accelerating Hadoop applications by leveraging the Lustre parallel file system and
high performance networks
•	 Reducing data protection overhead using RAID and advanced parity declustered
RAID capabilities called GridRAID
•	 Increasing flexibility to scale and optimize compute and storage resources
ClusterStor Manager
ClusterStor Manager consolidates management of the storage infrastructure, RAID data
protection layer, scale-out operating system and the Lustre file system into a single,
easy-to-use administrator graphical user interface (GUI).
Scalable Storage
ClusterStor’s architecture removes complexity and is naturally more efficient by design.
The ClusterStor Scalable Storage Unit integrates operating system, data protection, the
Lustre® file system and management into a single high availability building block that
consolidates storage, network and server processing. The result is an easy to deploy,
easy to use and easy to manage solution. There is no need to guess at how to scale;
each Scalable Storage Unit is a balanced performance building block delivering a
predictable level of performance and storage capacity.
Overall system performance is directly proportional to the number of Scalable Storage
Units due to ClusterStor’s efficient internal optimization that yields industry leading
linear performance scalability. End users simply add Scalable Storage Units and/or
Expansion Storage Units to satisfy their performance and/or data capacity needs.
Hadoop Workflow Accelerator
	 AMERICAS	 Seagate Technology LLC 10200 South De Anza Boulevard, Cupertino, California 95014, United States, 408-658-1000
	 ASIA/PACIFIC	 Seagate Singapore International Headquarters Pte. Ltd. 7000 Ang Mo Kio Avenue 5, Singapore 569877, 65-6485-3888
	 EUROPE, MIDDLE EAST AND AFRICA	 Seagate Technology SAS 16–18, rue du Dôme, 92100 Boulogne-Billancourt, France, 33 1-4186 10 00
© 2014 Seagate Technology LLC. All rights reserved. Printed in USA. Seagate, Seagate Technology and the Wave logo are registered trademarks of Seagate Technology LLC in the United States and/or other countries.
Seagate ClusterStor is a logo either a trademarks or registered trademarks of Seagate Technology LLC or one of its affiliated companies in the United States and/or other countries. All other trademarks or registered
trademarks are the property of their respective owners. When referring to drive capacity, one gigabyte, or GB, equals one billion bytes. Your computer’s operating system may use a different standard of measurement and
report a lower capacity. In addition, some of the listed capacity is used for formatting and other functions, and thus will not be available for data storage. Actual data rates may vary depending on operating environment and
other factors. The export or re-export of hardware or software containing encryption may be regulated by the U.S. Department of Commerce, Bureau of Industry and Security (for more information, visit www.bis.doc.gov),
and controlled for import and use outside of the U.S. Seagate reserves the right to change, without notice, product offerings or specifications. DS Hadoop with ClusterStor , November 2014.
ClusterStor Hadoop Workflow Accelerator
Supported Hadoop Distributions1
Apache Hadoop
Hadoop 1.x & 2.x
Supported ClusterStor Solutions ClusterStor™
1500
ClusterStor™
6000
ClusterStor™
9000
ClusterStor™
Secure Data Appliance
ClusterStor™
Manager
Client Access InfiniBand QDR or FDR, or Ethernet 10GE or 40GE
File System Performance2
Up to 63 GB/s bandwidth performance per 42RU height rack
Usable File System Capacity2
Up to 3,444 TB per rack using 6TB SAS HDDs
File System Lustre 2.1, 2.52
+ Seagate supported enhancements
ClusterStor GridRAID2
ClusterStor GridRAID delivers up to 400% faster rebuild time to repair
Mandatory to effectively implement high capacity drives and solutions
Consolidates 4 to1 reduction in Object Storage Targets
1
Specific ecosystem distributions will be supported by request
2
ClusterStor 9000
Efficiently Manage Very Large Data Sets
When using Seagate’s Hadoop Workflow Accelerator in conjunction with the
ClusterStor family of storage solutions, you gain an effective tool for efficiently
managing very large data sets.
The Hadoop Workflow Accelerator allows you to scale storage and compute separately
while leveraging your current high performance infrastructure investment, including
continued use of your Hadoop cluster’s direct attached storage to hold legacy data
and the intermediate results of MapReduce jobs.
Professional Services
Our highly trained ClusterStor Professional Services (CPS) team ensures efficient,
skillful and knowledgeable delivery on each phase of your adoption and use of the
Hadoop Workflow Accelerator and ClusterStor scale-out storage solutions. CPS
enables rapid capture, execution and tailoring of advanced system parameters
optimally matched with your unique system environment and application
characteristics. CPS offers :
•	 Best practice service methodologies and a consistent implementation framework
•	 Rapid deployment of the highest-performance, best-in-class scale-out data storage
solutions & seamless integration into your application environment
•	 Deep and pragmatic expertise in the Lustre® open source parallel file system
•	 Highly trained architects, service consultants and field engineers with experience to
execute quality controlled and personalized solutions
www.seagate.com

More Related Content

What's hot

NetApp’s Open Solution for Hadoop
NetApp’s Open Solution for HadoopNetApp’s Open Solution for Hadoop
NetApp’s Open Solution for HadoopNetApp
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...DataWorks Summit
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDataWorks Summit
 

What's hot (20)

Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
 
NetApp’s Open Solution for Hadoop
NetApp’s Open Solution for HadoopNetApp’s Open Solution for Hadoop
NetApp’s Open Solution for Hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
 
Keep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its BestKeep your Hadoop Cluster at its Best
Keep your Hadoop Cluster at its Best
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
Hadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the ExpertsHadoop in the Cloud – The What, Why and How from the Experts
Hadoop in the Cloud – The What, Why and How from the Experts
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Hadoop
HadoopHadoop
Hadoop
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 

Viewers also liked

Ntino Cloud BioLinux Barcelona Spain 2012
Ntino Cloud BioLinux Barcelona Spain 2012Ntino Cloud BioLinux Barcelona Spain 2012
Ntino Cloud BioLinux Barcelona Spain 2012Ntino Krampis
 
Shubhanken Presentation
Shubhanken PresentationShubhanken Presentation
Shubhanken Presentationguest7300c4
 
Wm Credentials
Wm CredentialsWm Credentials
Wm Credentialspikwik
 
"mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende""mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende"Denis Ferraretti
 
Reputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalReputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalDamjana Kocjanc
 
Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...Michel Dumontier
 
20090511 Manchester Biochemistry
20090511 Manchester Biochemistry20090511 Manchester Biochemistry
20090511 Manchester BiochemistryMichel Dumontier
 
Geekmeet Iasi Intro
Geekmeet Iasi IntroGeekmeet Iasi Intro
Geekmeet Iasi IntroGeekMeet
 
The Global Competition For Capital
The Global Competition For CapitalThe Global Competition For Capital
The Global Competition For CapitalThomas Nastas
 
Tactical Formalization of Linked Open Data (Ontology Summit 2014)
Tactical Formalization of Linked Open Data (Ontology Summit 2014)Tactical Formalization of Linked Open Data (Ontology Summit 2014)
Tactical Formalization of Linked Open Data (Ontology Summit 2014)Michel Dumontier
 
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...Sascha Funk
 
Web Configurator
Web ConfiguratorWeb Configurator
Web Configuratormikuzz
 

Viewers also liked (20)

Wwf Intro
Wwf IntroWwf Intro
Wwf Intro
 
IPCC2010-1
IPCC2010-1IPCC2010-1
IPCC2010-1
 
Free Software
Free SoftwareFree Software
Free Software
 
Ntino Cloud BioLinux Barcelona Spain 2012
Ntino Cloud BioLinux Barcelona Spain 2012Ntino Cloud BioLinux Barcelona Spain 2012
Ntino Cloud BioLinux Barcelona Spain 2012
 
Shubhanken Presentation
Shubhanken PresentationShubhanken Presentation
Shubhanken Presentation
 
Riskopecredience 10 05 Final_Version
Riskopecredience 10 05 Final_VersionRiskopecredience 10 05 Final_Version
Riskopecredience 10 05 Final_Version
 
Wm Credentials
Wm CredentialsWm Credentials
Wm Credentials
 
"mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende""mettiamoci sempre dove si prende"
"mettiamoci sempre dove si prende"
 
Reputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, finalReputation snapshot for the banking industry, 2012, final
Reputation snapshot for the banking industry, 2012, final
 
Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...Accurate biochemical knowledge starting with precise structure-based criteria...
Accurate biochemical knowledge starting with precise structure-based criteria...
 
20090511 Manchester Biochemistry
20090511 Manchester Biochemistry20090511 Manchester Biochemistry
20090511 Manchester Biochemistry
 
Tamk Conference Finished 2008
Tamk Conference Finished 2008Tamk Conference Finished 2008
Tamk Conference Finished 2008
 
Geekmeet Iasi Intro
Geekmeet Iasi IntroGeekmeet Iasi Intro
Geekmeet Iasi Intro
 
The Global Competition For Capital
The Global Competition For CapitalThe Global Competition For Capital
The Global Competition For Capital
 
Zas
ZasZas
Zas
 
Tactical Formalization of Linked Open Data (Ontology Summit 2014)
Tactical Formalization of Linked Open Data (Ontology Summit 2014)Tactical Formalization of Linked Open Data (Ontology Summit 2014)
Tactical Formalization of Linked Open Data (Ontology Summit 2014)
 
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...
University Students Fighting World Hunger - Slacktivism vs. Activism. Does a ...
 
Burlata
BurlataBurlata
Burlata
 
Yoshida thesis
Yoshida thesisYoshida thesis
Yoshida thesis
 
Web Configurator
Web ConfiguratorWeb Configurator
Web Configurator
 

Similar to clusterstor-hadoop-data-sheet

Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
 
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi Vantara
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsAlluxio, Inc.
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...DataStax
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Ory Chhean
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi Vantara
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Andrew Underwood
 

Similar to clusterstor-hadoop-data-sheet (20)

Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
Hitachi solution-profile-advanced-project-version-management-in-schlumberger-...
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
BlueData DataSheet
BlueData DataSheetBlueData DataSheet
BlueData DataSheet
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
 
Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358Bigdataappliance datasheet-1883358
Bigdataappliance datasheet-1883358
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-family
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 

More from Andrei Khurshudov

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Andrei Khurshudov
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Andrei Khurshudov
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterAndrei Khurshudov
 
Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Andrei Khurshudov
 
Reliability Of Solid State Drives 2008
Reliability Of Solid State Drives 2008Reliability Of Solid State Drives 2008
Reliability Of Solid State Drives 2008Andrei Khurshudov
 

More from Andrei Khurshudov (9)

Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
 
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...Short introduction to Big Data Analytics, the Internet of Things, and their s...
Short introduction to Big Data Analytics, the Internet of Things, and their s...
 
Seagate_1
Seagate_1Seagate_1
Seagate_1
 
Health monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenterHealth monitoring & predictive analytics to lower the TCO in a datacenter
Health monitoring & predictive analytics to lower the TCO in a datacenter
 
Using Big Data Analytics
Using Big Data AnalyticsUsing Big Data Analytics
Using Big Data Analytics
 
Presentation_Final
Presentation_FinalPresentation_Final
Presentation_Final
 
Long Term Data Storage 2007
Long Term Data Storage 2007Long Term Data Storage 2007
Long Term Data Storage 2007
 
Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007Future Information Growth And Storage Device Reliability 2007
Future Information Growth And Storage Device Reliability 2007
 
Reliability Of Solid State Drives 2008
Reliability Of Solid State Drives 2008Reliability Of Solid State Drives 2008
Reliability Of Solid State Drives 2008
 

clusterstor-hadoop-data-sheet

  • 1. Hadoop Workflow Accelerator • Enables Hadoop environments to scale compute and data storage independently • Supports centralized data repositories up to 100s of PB storage capacity • Increases flexibility to dynamically adjust analysis resources Accelerate Time to Insight Organizations are struggling to manage the tremendous volume of data they are collecting from a wide variety of sources. While, big data helps derive new insights that enable actionable intelligence and improve operational efficiency, organizations are also searching for solutions that will improve their ability to manage and rapidly process the explosion in data and accelerate time to insight. Seagate’s award-winning ClusterStor scale-out High Performance Computing (HPC) solutions enable organizations to optimize big data workflows and centralize data storage for High Performance Data Analytics solutions. Superior Hadoop Performance Seagate’s ClusterStor scale-out HPC solutions are optimized to take advantage of HPC environments including high speed Ethernet or InfiniBand networks to efficiently deliver data to either HPC or Hadoop clusters. For Hadoop, this allows organizations to optimize analytics workflows by enabling direct access to centralized data repositories eliminating the need to bulk copy large amounts of data into HDFS based direct attach storage before beginning Hadoop workflows. The Hadoop Workflow Accelerator provides seamless compatibility with Hadoop environments, enables streamlined Hadoop workflow efficiency and superior Hadoop performance. The Hadoop Workflow Accelerator consists of Seagate’s Hadoop on Lustre® Connector, professional services, as well as an array of ClusterStor performance optimization best practices, system tuning methods, installation and configuration management tools. The Hadoop on Lustre Connector by-passes the HDFS software layer and directs I/O to ClusterStor’s internal Lustre parallel file system; therefore, the Hadoop Workflow Accelerator and HDFS related workflow operate side by side. This provides investment protection and compatibility across your entire HPC and Hadoop environment. Users may seamlessly migrate Hadoop workflows to ClusterStor using the Hadoop Workflow Accelerator. Further, users may continue to use their Hadoop cluster’s direct attached storage to hold legacy data and intermediate results while gaining all the benefits of the Hadoop Workflow Accelerator. Data Sheet Hadoop Workflow Accelerator
  • 2. ClusterStor’s innovative scale-out HPC architecture enables a centrally managed big data repository that accelerates Hadoop applications, so users can execute parallel Hadoop jobs on any selected set of available data. The Hadoop Workflow Accelerator significantly reduces time to results by enabling immediate data processing from the start of each Hadoop job, and eliminates the unnecessary and time consuming step of copying large amounts of data from a separate data repository. All this is due to ClusterStor’s industry leading architecture, proven in some of the highest performance and highest capacity supercomputing sites in the world. ClusterStor supports low-latency, POSIX compliant, massively parallel and concurrent data access along with support for intensive Hadoop data analytics processing workloads. ClusterStor scales to support from hundreds to tens-of-thousands of client compute nodes, from hundreds to tens-of-thousands of Terabytes data, and from several GB/sec to over 1TB/sec throughput performance, all with a single globally coherent name space. Overall, the Hadoop Workflow Accelerator streamlines and accelerates Hadoop application efficiency and results, as well as significantly enhances flexibility and productivity of analytics workload processing and big data centralized repository management. Improved Total Cost of Ownership Standard Hadoop implementations using HDFS depend on triple replication of data to ensure data availability. HDFS forces 66% of your storage capacity to remain idle as 2nd or 3rd level copies. This represents an enormous waste of storage resources such as: rack space, data center floor space, power and cooling. In contrast, the Hadoop Workflow Accelerator using RAID, significantly lowers storage overhead for data protection, and enables disaggregation of compute from storage allowing for independent scaling and optimization. ClusterStor scales file system performance and capacity linearly, the same way a Hadoop compute cluster scales processing power. By using the Hadoop Workflow Accelerator to join Hadoop compute nodes with Seagate’s ClusterStor storage systems, administrators gain the ability to separately manage allocation of Hadoop compute and storage resources, to more efficiently satisfy dynamic and continually expanding data analytics workload requirements. Hadoop Workflow Accelerator
  • 3. Hadoop Workflow Accelerator The Hadoop Workflow Accelerator when used in conjuction with ClusterStor reduces core operating costs that often represent the largest share of Total Cost of Ownership (TCO), including reducing rack space, data center floor space, weight, power, cooling and administrative costs. Most importantly the Hadoop Workflow Accelerator saves you time, your most precious resource. Key solution attributes that generate TCO savings include: • Enabling immediate data processing at the start of each Hadoop job and eliminating bulk copying of large amounts of data • Accelerating Hadoop applications by leveraging the Lustre parallel file system and high performance networks • Reducing data protection overhead using RAID and advanced parity declustered RAID capabilities called GridRAID • Increasing flexibility to scale and optimize compute and storage resources ClusterStor Manager ClusterStor Manager consolidates management of the storage infrastructure, RAID data protection layer, scale-out operating system and the Lustre file system into a single, easy-to-use administrator graphical user interface (GUI). Scalable Storage ClusterStor’s architecture removes complexity and is naturally more efficient by design. The ClusterStor Scalable Storage Unit integrates operating system, data protection, the Lustre® file system and management into a single high availability building block that consolidates storage, network and server processing. The result is an easy to deploy, easy to use and easy to manage solution. There is no need to guess at how to scale; each Scalable Storage Unit is a balanced performance building block delivering a predictable level of performance and storage capacity. Overall system performance is directly proportional to the number of Scalable Storage Units due to ClusterStor’s efficient internal optimization that yields industry leading linear performance scalability. End users simply add Scalable Storage Units and/or Expansion Storage Units to satisfy their performance and/or data capacity needs.
  • 4. Hadoop Workflow Accelerator AMERICAS Seagate Technology LLC 10200 South De Anza Boulevard, Cupertino, California 95014, United States, 408-658-1000 ASIA/PACIFIC Seagate Singapore International Headquarters Pte. Ltd. 7000 Ang Mo Kio Avenue 5, Singapore 569877, 65-6485-3888 EUROPE, MIDDLE EAST AND AFRICA Seagate Technology SAS 16–18, rue du Dôme, 92100 Boulogne-Billancourt, France, 33 1-4186 10 00 © 2014 Seagate Technology LLC. All rights reserved. Printed in USA. Seagate, Seagate Technology and the Wave logo are registered trademarks of Seagate Technology LLC in the United States and/or other countries. Seagate ClusterStor is a logo either a trademarks or registered trademarks of Seagate Technology LLC or one of its affiliated companies in the United States and/or other countries. All other trademarks or registered trademarks are the property of their respective owners. When referring to drive capacity, one gigabyte, or GB, equals one billion bytes. Your computer’s operating system may use a different standard of measurement and report a lower capacity. In addition, some of the listed capacity is used for formatting and other functions, and thus will not be available for data storage. Actual data rates may vary depending on operating environment and other factors. The export or re-export of hardware or software containing encryption may be regulated by the U.S. Department of Commerce, Bureau of Industry and Security (for more information, visit www.bis.doc.gov), and controlled for import and use outside of the U.S. Seagate reserves the right to change, without notice, product offerings or specifications. DS Hadoop with ClusterStor , November 2014. ClusterStor Hadoop Workflow Accelerator Supported Hadoop Distributions1 Apache Hadoop Hadoop 1.x & 2.x Supported ClusterStor Solutions ClusterStor™ 1500 ClusterStor™ 6000 ClusterStor™ 9000 ClusterStor™ Secure Data Appliance ClusterStor™ Manager Client Access InfiniBand QDR or FDR, or Ethernet 10GE or 40GE File System Performance2 Up to 63 GB/s bandwidth performance per 42RU height rack Usable File System Capacity2 Up to 3,444 TB per rack using 6TB SAS HDDs File System Lustre 2.1, 2.52 + Seagate supported enhancements ClusterStor GridRAID2 ClusterStor GridRAID delivers up to 400% faster rebuild time to repair Mandatory to effectively implement high capacity drives and solutions Consolidates 4 to1 reduction in Object Storage Targets 1 Specific ecosystem distributions will be supported by request 2 ClusterStor 9000 Efficiently Manage Very Large Data Sets When using Seagate’s Hadoop Workflow Accelerator in conjunction with the ClusterStor family of storage solutions, you gain an effective tool for efficiently managing very large data sets. The Hadoop Workflow Accelerator allows you to scale storage and compute separately while leveraging your current high performance infrastructure investment, including continued use of your Hadoop cluster’s direct attached storage to hold legacy data and the intermediate results of MapReduce jobs. Professional Services Our highly trained ClusterStor Professional Services (CPS) team ensures efficient, skillful and knowledgeable delivery on each phase of your adoption and use of the Hadoop Workflow Accelerator and ClusterStor scale-out storage solutions. CPS enables rapid capture, execution and tailoring of advanced system parameters optimally matched with your unique system environment and application characteristics. CPS offers : • Best practice service methodologies and a consistent implementation framework • Rapid deployment of the highest-performance, best-in-class scale-out data storage solutions & seamless integration into your application environment • Deep and pragmatic expertise in the Lustre® open source parallel file system • Highly trained architects, service consultants and field engineers with experience to execute quality controlled and personalized solutions www.seagate.com