SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
White Paper




Cisco and Greenplum
Partner to Deliver
High-Performance
Hadoop Reference
Configurations




February 2012
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012



                                                    Contents
                                                    Next-Generation Hadoop Solution....................................................................	3

                                                    Greenplum MR: Hadoop Reengineered.................................................................	3

                                                    Cisco UCS: The Exclusive Platform........................................................................	6

                                                    Reference Configuration...................................................................................	7

                                                    Excellence from Cisco and Greenplum.............................................................	10

                                                    Complete Big Data Analysis Solution................................................................	10

                                                    Designed for High Availability and Reliability.....................................................	10

                                                    High Performance and Exceptional Scalability..................................................	11

                                                    Simplified Management....................................................................................	11

                                                    Coexistence with Enterprise Applications.........................................................	11

                                                    Rapid Deployment............................................................................................	11

                                                    Enterprise Service and Support ...........................................................................	11

                                                    For More Information........................................................................................	12

                                                    Acknowledgments............................................................................................	12




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                       Page 2
Cisco and Greenplum Partner to                                                                          White Paper
                                                                                                        February 2012
Deliver High-Performance
Hadoop Reference Configurations




     Highlights
                                                    Greenplum MR, together with the Cisco Unified
                                                    Computing System provides companies with an integrated
     Optimized for Performance                      Hadoop solution that delivers advanced performance, full
   • Greenplum and Cisco deliver
     an integrated Hadoop solution
                                                    data protection, no single point of failure, and improved
     specifically engineered to handle the          data access features that can expedite the implementation
     most demanding Hadoop workloads.
                                                    of big data analytics environments.
     Cisco UCS Creates a Flexible
     Appliance Platform
   • The Cisco Unified Computing                    Next-Generation Hadoop Solution
     System™ (Cisco UCS™) provides a
     flexible, high-performance platform            The worldwide leader in data center networking, and now a leading competitor in
     that can be optimized and easily
     scaled for any sized Hadoop cluster.           the server market, Cisco is partnering with Greenplum to provide a best-in-class
                                                    big data solution. The Cisco® Greenplum MR Reference Configuration delivers
     Ease of Management
                                                    an integrated, end-to-end software and hardware infrastructure that accelerates
   • Cisco UCS Manager provides unified,
     embedded management of all
                                                    big data initiatives. The combination of world-leading Cisco Unified Computing
     computing, networking, and storage             System™ (Cisco UCS™) and Greenplum MR enables companies to significantly
     access resources.                              reduce time-to-value and the operating expenses associated with Apache Hadoop
     Enterprise-Class Support and                   implementations.
     Services
   • The Cisco® Greenplum MR solution               Greenplum MR: Hadoop Reengineered
     combines the support and services of           Greenplum MR, based on the MapR M5 Distribution, is an implementation of the
     two of the world’s largest technology          Apache Hadoop stack that enables near real-time collection and organization of
     companies.
                                                    high volumes of structured, semi-structured, and unstructured data distributed
                                                    across a cluster of servers. With 100 percent interface compatibility with Apache
                                                    Hadoop, Greenplum MR provides transparent application portability while delivering
                                                    performance that is at least twice as fast as standard Apache Hadoop.

                                                    Greenplum MR provides direct data input and output to the cluster with MapR Direct
                                                    Access Network File System (NFS), real-time analytics, and is the first distribution
                                                    to provide true high availability at all levels. Greenplum MR introduces the concept
                                                    of logical volumes to Hadoop: a means of grouping data and applying policy
                                                    consistently across an entire data set. Greenplum MR provides hardware status and




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                              Page 3
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                     control with the MapR Control System, a comprehensive user interface including a
                                                     heatmap that displays the health of the entire cluster at a glance.

                                                     Cisco UCS and Greenplum MR can help businesses manage many different big data
                                                     scenarios. Table 1 provides examples of how the Cisco Greenplum MR Reference
                                                     Configurations can accelerate big data initiatives.



Table 1. Sample Use Cases for Cisco UCS and Greenplum MR

  Scenario                                  Cisco Greenplum MR Reference Configuration Capabilities


  Content management                        Collect and store unstructured and semi-structured data in a fault-resilient, scalable data store that can be
                                            organized and sorted for indexing and analysis.


  Batch processing unstructured data        Batch-process large quantities of unstructured and semi-structured data: for example, data warehousing extract,
                                            transform, and load (ETL) processing.


  Medium-term data archive                  Medium-term (12 to 36 months) archival of data from an enterprise data warehouse (EDW) database management
                                            system (DBMS) to increase the length of time that data is retained or to meet data retention and compliance
                                            policies.


  Integration with data warehouse           Transfer data stored in Hadoop to and from a separate DBMS for advanced analytics.


  Customer risk analysis                    Build a comprehensive data picture of customer-side risk, based on activity and behavior across products and
                                            accounts.


  Personalization and asset management      Create and model investor strategy and goals based on market data, individual asset characteristics, and reports
                                            fed into an online recommendation system.


  Trade analytics                           Analyze historical volume and trading data for individual stock symbols, variable cost of trades, and allocation of
                                            expenses.


  Credit scoring                            Update credit-scoring models using cross-functional transaction data and recent outcomes, to respond to changes
                                            such as the collapse of bubble markets. Sweep recent credit history to build transactional and temporal models.


  Retailer compromise                       Prevent or catch frauds resulting from a breach of retailer cards or accounts by monitoring, modeling, and analyzing
                                            high volumes of transaction data and extracting features and patterns.


  Miscategorized credit card fraud          Reduce false positives and prevent miscategorization of legitimate transactions as fraud, using high volumes of data
                                            to build good models.


  Next-generation credit card fraud         Daily cross-sectional analysis of portfolio using transaction similarities to find accounts that are being cultivated for
                                            eventual fraud, using common application elements, temporal patterns, vendors, and transaction amounts to detect
                                            similar accounts before the fraud is perpetrated.


  Customer retention                        Combine transactional customer contact information and social network data to perform attrition modeling to learn
                                            social and transaction markers for attrition and retention.


  Sentiment analysis (opinion mining) and   Find better indicators to predict bankruptcy among existing customers using sentiment analysis from social
  bankruptcy                                networking, responding quickly before the warning horizon.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                           Page 4
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                    As shown in Figure 1, the major components of Greenplum MR include:
   Greenplum MR                                     •	 Lockless storage services: A replacement for the Hadoop Distributed File
   •	 Complete enterprise-class solution
                                                       System (HDFS), the MapR lockless storage services provide multidimensional
                                                       scalability and accelerate MapReduce performance. The services allow random
   •	 100 percent compatibility with
      Apache Hadoop                                    read and write operations while automatically compressing data in real time.
   •	 Elimination of the common                     •	 MapReduce: Part of the Apache Hadoop framework, MapReduce simplifies
      problems experienced with HDFS                   the creation of applications that process large amounts of unstructured and
   •	 Direct access through NFS                        structured data in parallel. Underlying hardware failures are handled transparently
   •	 High availability through JobTracker             for user applications, providing a reliable and fault-tolerant capability.
      enhancements and a no-                        •	 Hive: Hive is a data warehouse system for Hadoop that facilitates data
      NameNode architecture
                                                       summarization, impromptu queries, and analysis of large data sets. This SQL-like
   •	 From two to five times faster                    interface increases the compression of stored data for improved storage resource
      performance than other Hadoop
      distributions
                                                       utilization without affecting access speed.
   •	 Advanced management for clusters              •	 Pig: Pig is a high-level procedural language for processing data sets in parallel
      of all sizes                                     using the Hadoop MapReduce platform. Its intuitive syntax simplifies the
   •	 Robust data protection features,                 development of MapReduce jobs, providing an alternative programming language
      including snapshots and intercluster             to Java.
      mirroring
                                                    •	 HBase: HBase is the distributed, versioned, column-oriented database that
   •	 A comprehensive network of
                                                       delivers random, real-time, read-write access to big data.
      enterprise business intelligence
      tools                                         •	 ZooKeeper: ZooKeeper is a highly available system for coordinating distributed
                                                       processes. Applications use ZooKeeper to store and mediate updates to
                                                       important configuration information.

                                                                                 Greenplum MR for Apache Hadoop

                                                                                       MapR Control System

                                                                                   LDAP and NIS        Quotas, Alerts,              CLI and
                                                          MapR Heatmap
                                                                                    Integration         and Alarms                  REST API




                                                           Hive            Pig           Oozie          Sqoop             HBase          Whirr



                                                                                         Nagios         Ganglia
                                                          Mahout       Cascading                                          Flume       Zookeeper
                                                                                       Integration    Integration



                                                                   Easy                     Dependable                            Fast

                                                                                       Distributed NameNode HA           MapR’s High Performance
                                                            Direct Access NFS
                                                                                              JobTracker HA                    MapReduce
                                                            Realtime Dataflows
                                                                                           Direct Access NFS                  Direct Shuffle




                                                                                               Mirroring                  Data Placement Control
                                                              MapR Volumes
                                                                                             and Snapshots                    Local Mirroring




                                                                                     MapR Lockless Storage Services




                                                    Figure 1. Greenplum MR Architecture and Components




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                               Page 5
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                                                                                                                                                                                                                •	 MapR Heatmap: MapR Heatmap provides visibility, access, and tools that give
   Cisco UCS                                                                                                                                                                                                                                       insight into the state of the cluster. Graphical and programmatic interfaces are
                                                                                                                                                                                                                                                   designed to scale with the largest clusters.
   The Cisco Unified Computing System
   delivers a radical simplification of                                                                                                                                                                                                         •	 MapR Control System: MapR Control System provides real-time monitoring of
   traditional architecture with the                                                                                                                                                                                                               the cluster health, including alarms to notify you of conditions that need to be
   first self-aware, self-integrating,                                                                                                                                                                                                             corrected. Alarms can also be configured to trigger email notifications.
   converged system that automates
   system configuration in a
   reproducible, scalable manner
                                                                                                                                                                                                                                                Cisco UCS: The Exclusive Platform
                                                                                                                                                                                                                                                Cisco UCS is the exclusive hardware platform for Greenplum MR. It is the outcome
   •	 More than 50 world records on
      critical benchmarks
                                                                                                                                                                                                                                                of a thorough testing and development process between Greenplum and Cisco.
                                                                                                                                                                                                                                                Cisco UCS innovations combine industry-standard, x86-architecture servers with
   •	 The benefits of centralized
      computing, through a single point of                                                                                                                                                                                                      networking and storage access into a single converged system (Figure 2). The
      management, delivered to massive                                                                                                                                                                                                          system is entirely programmable using unified, model- based management to
      scale-out applications                                                                                                                                                                                                                    simplify and accelerate the deployment of enterprise-class applications and services
   •	 Self-aware and self-integrating                                                                                                                                                                                                           running in bare-metal, virtualized, and cloud-computing environments.
      system
   •	 Automatic server provisioning                                                                                                                                                                                                             Cisco UCS helps organizations gain more than efficiency: it helps them become
      through association of models with                                                                                                                                                                                                        more effective through technologies that foster simplicity rather than complexity.
      system resources                                                                                                                                                                                                                          The result is a flexible, agile, high-performance, platform that reduces operating
   •	 Standards-based, high-bandwidth,                                                                                                                                                                                                          costs with increased uptime through automation and enables more rapid return on
      low-latency, lossless Ethernet                                                                                                                                                                                                            investment (ROI).
      network




                                           Cisco Unified Computing System Components
                                                                                                                                                                                                                                   Cisco UCS Fabric Interconnects                                             Cisco UCS Fabric Interconnects
                                                                                                                                                                                                                                                Cisco UCS 6100 Series                                         Integrates all components into a single
                                                                                                                                                                                                                                                                                                                  management domain
                              Cisco UCS 6248UP                                                                                                                                                                                                                                                                Up to 2 matching interconnects per system
                                                                                                                                                                                                                                                                                                 Cisco UCS
                                                                                                                                                                                                                                                                                                 Manager      Low-latency, lossless, 10 GE and FCoE connectivity
                                                                                                                                                                                                                                                                                                 (Embedded)   Uplinks to data center networks: 10 GE, native
                                                                                             16   17   18 19   20 21   22 23   24 25   26 27   28 29   30 31   32
                                   1   2   3   4   5   6   7   8 9   10 11   12 13   14 15




                                                                                                                                                                                                                                                                                                                  Fibre Channel, or FCoE
                                                                                                                                                                        2   3   4   5   6   7   8 9   10 11   12 13   14 15   16
                                                                                                                                                                    1




                                                                                                                                                                                                                                                                                                              Embedded unified, model-based management



                                      Data and                                                                                                                                                                                                                           Data and                             Cisco Fabric Extenders
                                  Management Planes                                                                                                                                                                                                                      Management Planes                    Scales data and management planes without complexity
                                                                                                                                                                                                                                                                                                              Up to 2 matched models per system
                                                                                                                                                                                                                                                                                         Cisco Nexus
                                                                                                                                                                                                                                                                                                              Cisco UCS 2208XP: Up to 160 Gbps per blade chassis
                                                                                                                                                                                                                                                                                         2232PP 10GE
                                                                                                                                                                                                                                                                                         Fabric Extender         (with Cisco UCS 6200 Series Fabric Interconnects)
                   Cisco UCS                                                                                                                                                                                          Cisco UCS                                                                               Cisco UCS 2104XP: Up to 80 Gbps per blade chassis
                2208XP Fabric                                                                                                                                                                                         2104XP Fabric                                                                           Cisco Nexus 2232PP: Integrates rack-mount servers
                     Extender                                                                                                                                                                                         Extender                                                                                   into the system



                                                                                                                                                                                                                                                                 Data          Management                     Cisco UCS 5108 Blade Chassis
                                                                                                                                                                                                                                                                 Plane         Plane                          Accommodates up to 8 half-width blade servers or
                                                                                                                                                                                                                                    Cisco UCS 5108                                                               4 full-width blade servers
                                                                                                                                                                                                                                    Blade Server Chassis                                                      Accommodates up to two fabric extenders for
                                                                                                                                                                                                                                                                                                                 connectivity and management
                                                                                                                                                                                                                                                                                                              Straight-through airflow, 92% efficient power supplies,
                                                                                                                                                                                                                                                                                                                 N+1, and grid redundant




                       Cisco UCS B-Series Blade Servers                                                                                                                                                                                                     Cisco UCS C-Series Rack-Mount Servers

                                                                                                                                                                                                                                                                                                              Cisco UCS Servers
                                                                                                                                                                                                                                                                                                              Powered exclusively by Intel Xeon processors
                   1
                          2
                                                                                                                                                                                                                                                                                                              World-record-setting performance
                                                                                                                                                                                                                                                                                                              Comprehensive product line for ease of matching
                                                                                                                                                                                                                                                                                                                 servers to workloads
                                                                                                                                                                                                                 •                                                                                            Every aspect of identity, personality, and connectivity
                                                                                                                                                                                                                                                                                                                 configured by Cisco UCS Manager




          Figure 2. Cisco UCS Is a Single Unified System




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                                                                                                                                                                                                                                    Page 6
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012



                                                    Reference Configuration
                                                    The Cisco Greenplum MR Reference Configuration is built using the following
                                                    components:

                                                    •	 Cisco UCS 6200 Series Fabric Interconnects: The Cisco UCS 6200 Series
                                                       Fabric Interconnects are a core part of Cisco UCS, providing both network
                                                       connectivity and management capabilities across Cisco UCS 5100 Series Blade
                                                       Server Chassis as well as Cisco UCS C-Series Rack-Mount Servers. Typically
                                                       deployed in redundant pairs, the fabric Interconnects offer line-rate, low-latency,
                                                       lossless 10 Gigabit Ethernet connectivity and unified management with Cisco
                                                       UCS Manager in a highly available management domain.
                                                    •	 Cisco UCS 2200 Series Fabric Extenders: Cisco UCS 2200 Series Fabric
                                                       Extenders behave as remote line cards for a parent switch and provide a highly
                                                       scalable and extremely cost-effective unified server-access platform.
                                                    •	 Cisco UCS C210 M2 Rack-Mount Servers: Cisco UCS C210 M2 Rack-Mount
                                                       Servers are general-purpose 2-socket platforms based on Intel® Xeon® 5600
                                                       series processors. These servers support up to 192 GB of main memory and 16
                                                       internal front-accessible, hot-swappable, Small Form Factor (SFF) disk drives,
                                                       with a choice of one or two RAID controllers, to provide data performance and
                                                       protection.
                                                    •	 Cisco UCS P81E Virtual Interface Card (VIC): Unique to Cisco UCS is a dual-
                                                       port PCI Express (PCIe) 2.0 x8 10-Gbps adapter designed for use with Cisco
                                                       UCS C-Series Rack-Mount Servers.
                                                    •	 Cisco UCS Manager: Cisco UCS Manager resides within the Cisco UCS 6200
                                                       Series Fabric Interconnects. It makes the system self-aware and self-integrating,
                                                       managing all of the system components as a single logical entity. Cisco UCS
                                                       Manager can be accessed through an intuitive GUI, a command-line interface
                                                       (CLI), or an XML API. Cisco UCS Manager uses service profiles to define the
                                                       personality, configuration, and connectivity of all resources within Cisco UCS,
                                                       radically simplifying provisioning of resources so that the process takes minutes
                                                       instead of days. This simplification allows IT departments to shift their focus
                                                       from constant maintenance to strategic business initiatives. It also provides the
                                                       most streamlined, simplified approach commercially available today to firmware
                                                       updating of all server components.

                                                    The Cisco Greenplum MR Reference Configurations come in single- and multi-
                                                    rack form factors. The single-rack configuration has 192 processor cores and a
                                                    typical user data capacity of 320 terabytes (TB). The multi-rack configuration adds
                                                    192 processor cores and 320 TB of capacity for every additional rack, as shown in
                                                    Table 2.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                Page 7
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                    Table 2. Medium-Sized Enterprise and Large Enterprise Reference Configurations

                                                                                 Component                              Single Rack           Multi-Rack


                                                       Network                   Fabric interconnects                   2                     2 per cluster
                                                       Fabric
                                                                                 Fabric extenders                       2                     2 per rack


                                                       Computing                 Servers                                16                    16 per rack


                                                                                 Computer processor cores               192                   216 per rack


                                                                                 Memory                                 Up to 1536 GB         Up to 1536 GB
                                                                                                                                              per rack


                                                                                 Unformatted storage capacity           256 TB                256 TB per rack


                                                                                 Typical user capacity                  350 TB                350 TB per rack
                                                                                 (compressed, with 3-way replication)


                                                                                 I/O bandwidth                          20 GBps               20 GBps per
                                                                                                                                              rack




                                                    The single-rack configuration consists of two fully redundant Cisco UCS 6248UP
                                                    48-Port Fabric Interconnects and two Cisco Nexus® 2232PP 10GE Fabric
                                                    Extenders, as depicted in Figure 3. Each node in the configuration connects to the
                                                    unified fabric through two active-active 10-Gbps links using a Cisco UCS P81E
                                                    VIC (data traffic) and Cisco Integrated Management Controller (IMC; management
                                                    traffic). Multi-rack configurations include components from a single rack and two
                                                    Cisco Nexus 2232PP fabric extenders for every additional rack.


                                                       Cisco UCS 6248UP
                                                       Fabric Interconnects



                                                           Cisco Nexus
                                                           2232PP 10GE
                                                           Fabric Extenders




                                                                              Cisco UCS C210 M2
                                                                              General-Purpose                                         Combined Data
                                                                              Rack-Mount Server                                       and Management Traffic
                                                                                                                                      Management Traffic
                                                                                                                                      Data Traffic




                                                       Figure 3. Cisco UCS Fabric Architecture




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                       Page 8
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                    The base cluster node is a Cisco UCS C210 M2 Rack-Mount Server with two Intel
                                                    Xeon X5670 processors (2.93 GHz), 96 GB of industry-standard double-data-rate
                                                    (DDR3) memory, a Cisco UCS P81E VIC, an LSI 6G MegaRAID 9261-8i card, and
                                                    sixteen 1-TB SATA SFF internal disk drives, for a total of 16 TB of storage (Figure 4).




                                                      •	 2 Intel Xeon X5670 (2.93 GHz) 95W processors	      • 1 Cisco UCS P81E VIC (2x 10 Gbps)
                                                      •	 96 GB Memory 				                                  • Embedded Cisco IMC (2x 1 Gbps)
                                                      •	 1 LSI 6G MegaRAID 9261-8i Card		                   • 2 Integrated Gigabit Ethernet Ports
                                                      •	 16 1-TB 6-Gb SATA 7200 RPM SFF Disk Drives	        • Red Hat Enterprise Linux Server Standard
                                                      •	 Redundant Hot-Swappable Power Supplies and Fans




                                                       Figure 4. Cluster Node



                                                    The components of the single-rack and multi-rack configurations are depicted in
                                                    Figure 5. Table 3 provides general guidelines for the number of instances of each
                                                    service to run in a solution.



                                                     2x Cisco UCS 6248UP
                                                     Fabric Interconnects
                                                                                                                                        2x Cisco Nexus 2232PP
                                                     2x Cisco Nexus 2232PP                                                              10 GE Fabric Extenders
                                                     10 GE Fabric Extenders                                                             for Each Additional Rack




                                                                                                                                        16x Cisco UCS C210 M2
                                                     16x Cisco UCS C210 M2                                                              Compute Nodes
                                                     Compute Nodes                                                                      for Each Additional Rack



                                                                              Single-Rack Configuration   Multiple-Rack Configuration



                                                       Figure 5. Cisco Single- and Multiple-Rack Configurations




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                                    Page 9
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                    Table 3. Node Recommendations for Greenplum MR Services

                                                       Service                                   Number of Nodes (per Rack)


                                                       Container location database               1 to 3


                                                       FileServer                                Most or all nodes


                                                       HBase Master                              1 to 3


                                                       HBase RegionServer                        Varies


                                                       JobTracker                                1 to 3


                                                       NFS                                       Varies


                                                       TaskTracker                               Most or all nodes


                                                       WebServer                                 1 or more


                                                       Zookeeper                                 1, 3, 5, or a higher odd number




                                                    Excellence from Cisco and Greenplum
                                                    Hadoop implementations can present a number of challenges to enterprise
                                                    environments. Many of these challenges arise from the dichotomy between the
                                                    introduction of innovative new technology and the enterprise-class performance,
                                                    reliability, and support demanded by mission-critical systems. The collaboration
                                                    between Cisco and Greenplum is specifically designed to provide a solution to these
                                                    challenges. The joint Cisco and Greenplum solution delivers all the characteristics
                                                    expected of a fully integrated solution, including radically simplified deployment and
                                                    management, high availability, excellent performance, exceptional scalability, and
                                                    enterprise-class service and support.

                                                    Complete Big Data Analysis Solution
                                                    The comprehensive solution from Cisco and Greenplum helps organizations
                                                    deploy big data solutions quickly, with validated configurations that scale easily
                                                    and predictably, as demand dictates. The Cisco Greenplum MR Reference
                                                    Configurations provide an end-to-end solution that has been tested and validated
                                                    and that enables enterprise customers to accelerate big data initiatives.

                                                    Designed for High Availability and Reliability
                                                    Every component in the Cisco and Greenplum MR Hadoop solution is fully
                                                    redundant. The Greenplum MR architecture provides JobTracker High Availability
                                                    (HA) and Distributed NameNode HA to prevent lost jobs from causing time-
                                                    consuming restarts or failover incidents. Combining the core networking capabilities
                                                    of Cisco, this solution can be extended to include remote mirroring to help ensure
                                                    data reliability by synchronizing a copy of the cluster’s data at a remote site so that
                                                    data analysis can continue in the event of a disaster. Locally, snapshots protect data



© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                Page 10
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012


                                                    from application errors or accidental deletion. Snapshots also enable easy data
                                                    recovery to a specific point in time by simply copying a file or directory from the
                                                    snapshot location to the desired destination directory.

                                                    High Performance and Exceptional Scalability
                                                    Cisco UCS unified fabric architecture provides fully redundant, highly scalable
                                                    lossless 10-Gbps unified fabric connectivity for big data traffic. Powered by the
                                                    latest Intel Xeon processor, the Cisco Greenplum MR solution delivers best-
                                                    in-class performance and internal storage capacity to gain at least two times
                                                    faster performance than Apache Hadoop. The Cisco Greenplum MR Reference
                                                    Configurations can easily scale to support a large number of nodes when required
                                                    by business demands. The advanced management capabilities of Cisco UCS
                                                    radically simplify this process with a single point of management that spans all
                                                    nodes in the cluster.

                                                    Simplified Management
                                                    Hadoop implementations tend to involve very large numbers of servers. In traditional
                                                    environments, it can be challenging to manage these large numbers of servers
                                                    effectively. Cisco UCS Manager delivers unified, model-based management
                                                    that applies personality and configures server, network, and storage connectivity
                                                    resources, making it as easy to deploy hundreds of servers as it is to deploy a single
                                                    server. Additionally, Cisco UCS Manager can perform system maintenance activates
                                                    such as firmware updates across the entire cluster as a single operation. To ease
                                                    the monitoring of these large clusters, the Greenplum MR control system raises
                                                    alarms and sends notifications to alert IT personnel about cluster health and the
                                                    status of services.

                                                    Coexistence with Enterprise Applications
                                                    Beyond introducing a Hadoop deployment, organizations need ways to transfer data
                                                    transparently between their enterprise applications and Hadoop. This solution can
                                                    connect, across the same management plane, to other Cisco UCS deployments
                                                    running enterprise applications, thereby radically simplifying data center
                                                    management and connectivity.

                                                    Rapid Deployment
                                                    Deployment of large numbers of servers can take time. Systems need to be racked,
                                                    networked, configured, and provisioned before they can be put into use. Cisco UCS
                                                    Manager uses a model-based approach to provision servers by applying a desired
                                                    configuration to physical infrastructure quickly, accurately, and automatically. The
                                                    ability to create consistent configurations improves business agility and eliminates a
                                                    major source of errors that can cause downtime.

                                                    Enterprise Service and Support
                                                    Enterprises want know that the vendors providing a solution have the expertise to
                                                    help them quickly proceed through the design, deployment, and testing of strategic
                                                    big data initiatives. Businesses also need to have confidence that if a critical system
                                                    fails, they will be able to get timely and competent support. The Cisco Greenplum
                                                    MR Reference Configurations bring together world-class service and support from
                                                    long-time collaborators Cisco and Greenplum.




© 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information.                                Page 11
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations
February 2012



                                                             For More Information
                                                             For complete details about Cisco UCS, visit http://www.cisco.com/go/ucs.


                                                             Acknowledgments
                                                             This document was authored by Raghunath Nambiar and William Davis with
                                                             contributions from Robert Demartino, Theresa Chen, Ashok Rajagopalan and Girish
                                                             Kulkarni.




Americas Headquarters                                        Asia Pacific Headquarters                                    Europe Headquarters
Cisco Systems, Inc.                                          Cisco Systems (USA) Pte. Ltd.                                Cisco Systems International BV Amsterdam,
San Jose, CA                                                 Singapore                                                    The Netherlands
Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices.

Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this
URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (1110R)                                                                                                       LE-35101-02 02/12

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopJoão Gabriel Lima
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data recordsDavid Walker
 
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse EMC
 
Managed Data Services
Managed Data ServicesManaged Data Services
Managed Data ServicesSimon Dale
 
Managed Data Services
Managed Data ServicesManaged Data Services
Managed Data ServicesMark Halpin
 
HCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCL Technologies
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
Juniper: Data Center Evolution
Juniper: Data Center EvolutionJuniper: Data Center Evolution
Juniper: Data Center EvolutionTechnologyBIZ
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
Cloud batch a batch job queuing system on clouds with hadoop and h-base
Cloud batch  a batch job queuing system on clouds with hadoop and h-baseCloud batch  a batch job queuing system on clouds with hadoop and h-base
Cloud batch a batch job queuing system on clouds with hadoop and h-baseJoão Gabriel Lima
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetAndrei Khurshudov
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS EMC
 

Was ist angesagt? (16)

Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoop
 
Building a data warehouse of call data records
Building a data warehouse of call data recordsBuilding a data warehouse of call data records
Building a data warehouse of call data records
 
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
 
Managed Data Services
Managed Data ServicesManaged Data Services
Managed Data Services
 
Managed Data Services
Managed Data ServicesManaged Data Services
Managed Data Services
 
335 340
335 340335 340
335 340
 
HCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of ServersHCLT Whitepaper: Thermal Design and Management of Servers
HCLT Whitepaper: Thermal Design and Management of Servers
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
Juniper: Data Center Evolution
Juniper: Data Center EvolutionJuniper: Data Center Evolution
Juniper: Data Center Evolution
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Cloud batch a batch job queuing system on clouds with hadoop and h-base
Cloud batch  a batch job queuing system on clouds with hadoop and h-baseCloud batch  a batch job queuing system on clouds with hadoop and h-base
Cloud batch a batch job queuing system on clouds with hadoop and h-base
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 
White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS   White Paper: Hadoop on EMC Isilon Scale-out NAS
White Paper: Hadoop on EMC Isilon Scale-out NAS
 

Andere mochten auch

Importancia de los valores para una convivencia social
Importancia de los valores para una convivencia socialImportancia de los valores para una convivencia social
Importancia de los valores para una convivencia socialjuangtdimas
 
Normalizacion boyce codd_4_fn
Normalizacion boyce codd_4_fnNormalizacion boyce codd_4_fn
Normalizacion boyce codd_4_fnLuis Jherry
 
The Digital Universe in 2020 - India
The Digital Universe in 2020 - IndiaThe Digital Universe in 2020 - India
The Digital Universe in 2020 - IndiaEMC
 
Это Алексеевская детская школа искуств.
Это Алексеевская детская школа искуств.Это Алексеевская детская школа искуств.
Это Алексеевская детская школа искуств.lexa0784
 
How Does Mobile Compare?
How Does Mobile Compare?How Does Mobile Compare?
How Does Mobile Compare?Research Now
 
Pivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprisePivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterpriseEMC
 
Formulario agenda telefonica
Formulario agenda telefonicaFormulario agenda telefonica
Formulario agenda telefonicaNathalia Sanchez
 
Germansk mytologi og_verdensanskuelse_nor
Germansk mytologi og_verdensanskuelse_norGermansk mytologi og_verdensanskuelse_nor
Germansk mytologi og_verdensanskuelse_norSebastian Hübner
 
BPC: Do you have the right design?
BPC: Do you have the right design?BPC: Do you have the right design?
BPC: Do you have the right design?Brian Tyson
 
Can Cheeseburgers and Fruit Smoothies Co-Exist?
Can Cheeseburgers and Fruit Smoothies Co-Exist?Can Cheeseburgers and Fruit Smoothies Co-Exist?
Can Cheeseburgers and Fruit Smoothies Co-Exist?Research Now
 
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...EMC
 
UP THERE, EVERYWHERE Partners
UP THERE, EVERYWHERE PartnersUP THERE, EVERYWHERE Partners
UP THERE, EVERYWHERE PartnersShari Monnes
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
The Evolution of IP Storage and Its Impact on the Network
The Evolution of IP Storage and Its Impact on the NetworkThe Evolution of IP Storage and Its Impact on the Network
The Evolution of IP Storage and Its Impact on the NetworkEMC
 
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments   Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments EMC
 

Andere mochten auch (20)

Sesion clinica mi
Sesion clinica miSesion clinica mi
Sesion clinica mi
 
Importancia de los valores para una convivencia social
Importancia de los valores para una convivencia socialImportancia de los valores para una convivencia social
Importancia de los valores para una convivencia social
 
Normalizacion boyce codd_4_fn
Normalizacion boyce codd_4_fnNormalizacion boyce codd_4_fn
Normalizacion boyce codd_4_fn
 
The Digital Universe in 2020 - India
The Digital Universe in 2020 - IndiaThe Digital Universe in 2020 - India
The Digital Universe in 2020 - India
 
Fri maya art
Fri maya artFri maya art
Fri maya art
 
Это Алексеевская детская школа искуств.
Это Алексеевская детская школа искуств.Это Алексеевская детская школа искуств.
Это Алексеевская детская школа искуств.
 
Wed thurs reform
Wed thurs reformWed thurs reform
Wed thurs reform
 
Beetle 20 operating_manual_english
Beetle 20 operating_manual_englishBeetle 20 operating_manual_english
Beetle 20 operating_manual_english
 
How Does Mobile Compare?
How Does Mobile Compare?How Does Mobile Compare?
How Does Mobile Compare?
 
Pivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprisePivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprise
 
Formulario agenda telefonica
Formulario agenda telefonicaFormulario agenda telefonica
Formulario agenda telefonica
 
Germansk mytologi og_verdensanskuelse_nor
Germansk mytologi og_verdensanskuelse_norGermansk mytologi og_verdensanskuelse_nor
Germansk mytologi og_verdensanskuelse_nor
 
BPC: Do you have the right design?
BPC: Do you have the right design?BPC: Do you have the right design?
BPC: Do you have the right design?
 
Can Cheeseburgers and Fruit Smoothies Co-Exist?
Can Cheeseburgers and Fruit Smoothies Co-Exist?Can Cheeseburgers and Fruit Smoothies Co-Exist?
Can Cheeseburgers and Fruit Smoothies Co-Exist?
 
Confer
ConferConfer
Confer
 
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...
White Paper: Sizing EMC VNX Series for VDI Workload — An Architectural Guidel...
 
UP THERE, EVERYWHERE Partners
UP THERE, EVERYWHERE PartnersUP THERE, EVERYWHERE Partners
UP THERE, EVERYWHERE Partners
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
The Evolution of IP Storage and Its Impact on the Network
The Evolution of IP Storage and Its Impact on the NetworkThe Evolution of IP Storage and Its Impact on the Network
The Evolution of IP Storage and Its Impact on the Network
 
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments   Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
 

Ähnlich wie Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosAccenture Italia
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum  White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum EMC
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...IRJET Journal
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureEmiliano Fusaglia
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Swiss Data Forum Swiss Data Forum
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 
Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Gregory Pence
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsWhitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsHappiest Minds Technologies
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkGraisy Biswal
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Informatica push down optimization implementation
Informatica push down optimization implementationInformatica push down optimization implementation
Informatica push down optimization implementationdivjeev
 
Dell Scalable Server Platforms
Dell Scalable Server PlatformsDell Scalable Server Platforms
Dell Scalable Server PlatformsLiamJohnson30
 

Ähnlich wie Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations (20)

ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery ScenariosACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
ACIC Rome & Veritas: High-Availability and Disaster Recovery Scenarios
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data AnalyticsEMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum  White Paper: MoreVRP for EMC Greenplum
White Paper: MoreVRP for EMC Greenplum
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
An Energy Efficient Data Transmission and Aggregation of WSN using Data Proce...
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 
Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890Www.Sas.Com Resources Whitepaper Wp 33890
Www.Sas.Com Resources Whitepaper Wp 33890
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsWhitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Informatica push down optimization implementation
Informatica push down optimization implementationInformatica push down optimization implementation
Informatica push down optimization implementation
 
Dell Scalable Server Platforms
Dell Scalable Server PlatformsDell Scalable Server Platforms
Dell Scalable Server Platforms
 

Mehr von EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

  • 1. White Paper Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012
  • 2. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 Contents Next-Generation Hadoop Solution.................................................................... 3 Greenplum MR: Hadoop Reengineered................................................................. 3 Cisco UCS: The Exclusive Platform........................................................................ 6 Reference Configuration................................................................................... 7 Excellence from Cisco and Greenplum............................................................. 10 Complete Big Data Analysis Solution................................................................ 10 Designed for High Availability and Reliability..................................................... 10 High Performance and Exceptional Scalability.................................................. 11 Simplified Management.................................................................................... 11 Coexistence with Enterprise Applications......................................................... 11 Rapid Deployment............................................................................................ 11 Enterprise Service and Support ........................................................................... 11 For More Information........................................................................................ 12 Acknowledgments............................................................................................ 12 © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 2
  • 3. Cisco and Greenplum Partner to White Paper February 2012 Deliver High-Performance Hadoop Reference Configurations Highlights Greenplum MR, together with the Cisco Unified Computing System provides companies with an integrated Optimized for Performance Hadoop solution that delivers advanced performance, full • Greenplum and Cisco deliver an integrated Hadoop solution data protection, no single point of failure, and improved specifically engineered to handle the data access features that can expedite the implementation most demanding Hadoop workloads. of big data analytics environments. Cisco UCS Creates a Flexible Appliance Platform • The Cisco Unified Computing Next-Generation Hadoop Solution System™ (Cisco UCS™) provides a flexible, high-performance platform The worldwide leader in data center networking, and now a leading competitor in that can be optimized and easily scaled for any sized Hadoop cluster. the server market, Cisco is partnering with Greenplum to provide a best-in-class big data solution. The Cisco® Greenplum MR Reference Configuration delivers Ease of Management an integrated, end-to-end software and hardware infrastructure that accelerates • Cisco UCS Manager provides unified, embedded management of all big data initiatives. The combination of world-leading Cisco Unified Computing computing, networking, and storage System™ (Cisco UCS™) and Greenplum MR enables companies to significantly access resources. reduce time-to-value and the operating expenses associated with Apache Hadoop Enterprise-Class Support and implementations. Services • The Cisco® Greenplum MR solution Greenplum MR: Hadoop Reengineered combines the support and services of Greenplum MR, based on the MapR M5 Distribution, is an implementation of the two of the world’s largest technology Apache Hadoop stack that enables near real-time collection and organization of companies. high volumes of structured, semi-structured, and unstructured data distributed across a cluster of servers. With 100 percent interface compatibility with Apache Hadoop, Greenplum MR provides transparent application portability while delivering performance that is at least twice as fast as standard Apache Hadoop. Greenplum MR provides direct data input and output to the cluster with MapR Direct Access Network File System (NFS), real-time analytics, and is the first distribution to provide true high availability at all levels. Greenplum MR introduces the concept of logical volumes to Hadoop: a means of grouping data and applying policy consistently across an entire data set. Greenplum MR provides hardware status and © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 3
  • 4. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 control with the MapR Control System, a comprehensive user interface including a heatmap that displays the health of the entire cluster at a glance. Cisco UCS and Greenplum MR can help businesses manage many different big data scenarios. Table 1 provides examples of how the Cisco Greenplum MR Reference Configurations can accelerate big data initiatives. Table 1. Sample Use Cases for Cisco UCS and Greenplum MR Scenario Cisco Greenplum MR Reference Configuration Capabilities Content management Collect and store unstructured and semi-structured data in a fault-resilient, scalable data store that can be organized and sorted for indexing and analysis. Batch processing unstructured data Batch-process large quantities of unstructured and semi-structured data: for example, data warehousing extract, transform, and load (ETL) processing. Medium-term data archive Medium-term (12 to 36 months) archival of data from an enterprise data warehouse (EDW) database management system (DBMS) to increase the length of time that data is retained or to meet data retention and compliance policies. Integration with data warehouse Transfer data stored in Hadoop to and from a separate DBMS for advanced analytics. Customer risk analysis Build a comprehensive data picture of customer-side risk, based on activity and behavior across products and accounts. Personalization and asset management Create and model investor strategy and goals based on market data, individual asset characteristics, and reports fed into an online recommendation system. Trade analytics Analyze historical volume and trading data for individual stock symbols, variable cost of trades, and allocation of expenses. Credit scoring Update credit-scoring models using cross-functional transaction data and recent outcomes, to respond to changes such as the collapse of bubble markets. Sweep recent credit history to build transactional and temporal models. Retailer compromise Prevent or catch frauds resulting from a breach of retailer cards or accounts by monitoring, modeling, and analyzing high volumes of transaction data and extracting features and patterns. Miscategorized credit card fraud Reduce false positives and prevent miscategorization of legitimate transactions as fraud, using high volumes of data to build good models. Next-generation credit card fraud Daily cross-sectional analysis of portfolio using transaction similarities to find accounts that are being cultivated for eventual fraud, using common application elements, temporal patterns, vendors, and transaction amounts to detect similar accounts before the fraud is perpetrated. Customer retention Combine transactional customer contact information and social network data to perform attrition modeling to learn social and transaction markers for attrition and retention. Sentiment analysis (opinion mining) and Find better indicators to predict bankruptcy among existing customers using sentiment analysis from social bankruptcy networking, responding quickly before the warning horizon. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 4
  • 5. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 As shown in Figure 1, the major components of Greenplum MR include: Greenplum MR • Lockless storage services: A replacement for the Hadoop Distributed File • Complete enterprise-class solution System (HDFS), the MapR lockless storage services provide multidimensional scalability and accelerate MapReduce performance. The services allow random • 100 percent compatibility with Apache Hadoop read and write operations while automatically compressing data in real time. • Elimination of the common • MapReduce: Part of the Apache Hadoop framework, MapReduce simplifies problems experienced with HDFS the creation of applications that process large amounts of unstructured and • Direct access through NFS structured data in parallel. Underlying hardware failures are handled transparently • High availability through JobTracker for user applications, providing a reliable and fault-tolerant capability. enhancements and a no- • Hive: Hive is a data warehouse system for Hadoop that facilitates data NameNode architecture summarization, impromptu queries, and analysis of large data sets. This SQL-like • From two to five times faster interface increases the compression of stored data for improved storage resource performance than other Hadoop distributions utilization without affecting access speed. • Advanced management for clusters • Pig: Pig is a high-level procedural language for processing data sets in parallel of all sizes using the Hadoop MapReduce platform. Its intuitive syntax simplifies the • Robust data protection features, development of MapReduce jobs, providing an alternative programming language including snapshots and intercluster to Java. mirroring • HBase: HBase is the distributed, versioned, column-oriented database that • A comprehensive network of delivers random, real-time, read-write access to big data. enterprise business intelligence tools • ZooKeeper: ZooKeeper is a highly available system for coordinating distributed processes. Applications use ZooKeeper to store and mediate updates to important configuration information. Greenplum MR for Apache Hadoop MapR Control System LDAP and NIS Quotas, Alerts, CLI and MapR Heatmap Integration and Alarms REST API Hive Pig Oozie Sqoop HBase Whirr Nagios Ganglia Mahout Cascading Flume Zookeeper Integration Integration Easy Dependable Fast Distributed NameNode HA MapR’s High Performance Direct Access NFS JobTracker HA MapReduce Realtime Dataflows Direct Access NFS Direct Shuffle Mirroring Data Placement Control MapR Volumes and Snapshots Local Mirroring MapR Lockless Storage Services Figure 1. Greenplum MR Architecture and Components © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 5
  • 6. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 • MapR Heatmap: MapR Heatmap provides visibility, access, and tools that give Cisco UCS insight into the state of the cluster. Graphical and programmatic interfaces are designed to scale with the largest clusters. The Cisco Unified Computing System delivers a radical simplification of • MapR Control System: MapR Control System provides real-time monitoring of traditional architecture with the the cluster health, including alarms to notify you of conditions that need to be first self-aware, self-integrating, corrected. Alarms can also be configured to trigger email notifications. converged system that automates system configuration in a reproducible, scalable manner Cisco UCS: The Exclusive Platform Cisco UCS is the exclusive hardware platform for Greenplum MR. It is the outcome • More than 50 world records on critical benchmarks of a thorough testing and development process between Greenplum and Cisco. Cisco UCS innovations combine industry-standard, x86-architecture servers with • The benefits of centralized computing, through a single point of networking and storage access into a single converged system (Figure 2). The management, delivered to massive system is entirely programmable using unified, model- based management to scale-out applications simplify and accelerate the deployment of enterprise-class applications and services • Self-aware and self-integrating running in bare-metal, virtualized, and cloud-computing environments. system • Automatic server provisioning Cisco UCS helps organizations gain more than efficiency: it helps them become through association of models with more effective through technologies that foster simplicity rather than complexity. system resources The result is a flexible, agile, high-performance, platform that reduces operating • Standards-based, high-bandwidth, costs with increased uptime through automation and enables more rapid return on low-latency, lossless Ethernet investment (ROI). network Cisco Unified Computing System Components Cisco UCS Fabric Interconnects Cisco UCS Fabric Interconnects Cisco UCS 6100 Series Integrates all components into a single management domain Cisco UCS 6248UP Up to 2 matching interconnects per system Cisco UCS Manager Low-latency, lossless, 10 GE and FCoE connectivity (Embedded) Uplinks to data center networks: 10 GE, native 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Fibre Channel, or FCoE 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 Embedded unified, model-based management Data and Data and Cisco Fabric Extenders Management Planes Management Planes Scales data and management planes without complexity Up to 2 matched models per system Cisco Nexus Cisco UCS 2208XP: Up to 160 Gbps per blade chassis 2232PP 10GE Fabric Extender (with Cisco UCS 6200 Series Fabric Interconnects) Cisco UCS Cisco UCS Cisco UCS 2104XP: Up to 80 Gbps per blade chassis 2208XP Fabric 2104XP Fabric Cisco Nexus 2232PP: Integrates rack-mount servers Extender Extender into the system Data Management Cisco UCS 5108 Blade Chassis Plane Plane Accommodates up to 8 half-width blade servers or Cisco UCS 5108 4 full-width blade servers Blade Server Chassis Accommodates up to two fabric extenders for connectivity and management Straight-through airflow, 92% efficient power supplies, N+1, and grid redundant Cisco UCS B-Series Blade Servers Cisco UCS C-Series Rack-Mount Servers Cisco UCS Servers Powered exclusively by Intel Xeon processors 1 2 World-record-setting performance Comprehensive product line for ease of matching servers to workloads • Every aspect of identity, personality, and connectivity configured by Cisco UCS Manager Figure 2. Cisco UCS Is a Single Unified System © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 6
  • 7. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 Reference Configuration The Cisco Greenplum MR Reference Configuration is built using the following components: • Cisco UCS 6200 Series Fabric Interconnects: The Cisco UCS 6200 Series Fabric Interconnects are a core part of Cisco UCS, providing both network connectivity and management capabilities across Cisco UCS 5100 Series Blade Server Chassis as well as Cisco UCS C-Series Rack-Mount Servers. Typically deployed in redundant pairs, the fabric Interconnects offer line-rate, low-latency, lossless 10 Gigabit Ethernet connectivity and unified management with Cisco UCS Manager in a highly available management domain. • Cisco UCS 2200 Series Fabric Extenders: Cisco UCS 2200 Series Fabric Extenders behave as remote line cards for a parent switch and provide a highly scalable and extremely cost-effective unified server-access platform. • Cisco UCS C210 M2 Rack-Mount Servers: Cisco UCS C210 M2 Rack-Mount Servers are general-purpose 2-socket platforms based on Intel® Xeon® 5600 series processors. These servers support up to 192 GB of main memory and 16 internal front-accessible, hot-swappable, Small Form Factor (SFF) disk drives, with a choice of one or two RAID controllers, to provide data performance and protection. • Cisco UCS P81E Virtual Interface Card (VIC): Unique to Cisco UCS is a dual- port PCI Express (PCIe) 2.0 x8 10-Gbps adapter designed for use with Cisco UCS C-Series Rack-Mount Servers. • Cisco UCS Manager: Cisco UCS Manager resides within the Cisco UCS 6200 Series Fabric Interconnects. It makes the system self-aware and self-integrating, managing all of the system components as a single logical entity. Cisco UCS Manager can be accessed through an intuitive GUI, a command-line interface (CLI), or an XML API. Cisco UCS Manager uses service profiles to define the personality, configuration, and connectivity of all resources within Cisco UCS, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives. It also provides the most streamlined, simplified approach commercially available today to firmware updating of all server components. The Cisco Greenplum MR Reference Configurations come in single- and multi- rack form factors. The single-rack configuration has 192 processor cores and a typical user data capacity of 320 terabytes (TB). The multi-rack configuration adds 192 processor cores and 320 TB of capacity for every additional rack, as shown in Table 2. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 7
  • 8. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 Table 2. Medium-Sized Enterprise and Large Enterprise Reference Configurations Component Single Rack Multi-Rack Network Fabric interconnects 2 2 per cluster Fabric Fabric extenders 2 2 per rack Computing Servers 16 16 per rack Computer processor cores 192 216 per rack Memory Up to 1536 GB Up to 1536 GB per rack Unformatted storage capacity 256 TB 256 TB per rack Typical user capacity 350 TB 350 TB per rack (compressed, with 3-way replication) I/O bandwidth 20 GBps 20 GBps per rack The single-rack configuration consists of two fully redundant Cisco UCS 6248UP 48-Port Fabric Interconnects and two Cisco Nexus® 2232PP 10GE Fabric Extenders, as depicted in Figure 3. Each node in the configuration connects to the unified fabric through two active-active 10-Gbps links using a Cisco UCS P81E VIC (data traffic) and Cisco Integrated Management Controller (IMC; management traffic). Multi-rack configurations include components from a single rack and two Cisco Nexus 2232PP fabric extenders for every additional rack. Cisco UCS 6248UP Fabric Interconnects Cisco Nexus 2232PP 10GE Fabric Extenders Cisco UCS C210 M2 General-Purpose Combined Data Rack-Mount Server and Management Traffic Management Traffic Data Traffic Figure 3. Cisco UCS Fabric Architecture © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 8
  • 9. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 The base cluster node is a Cisco UCS C210 M2 Rack-Mount Server with two Intel Xeon X5670 processors (2.93 GHz), 96 GB of industry-standard double-data-rate (DDR3) memory, a Cisco UCS P81E VIC, an LSI 6G MegaRAID 9261-8i card, and sixteen 1-TB SATA SFF internal disk drives, for a total of 16 TB of storage (Figure 4). • 2 Intel Xeon X5670 (2.93 GHz) 95W processors • 1 Cisco UCS P81E VIC (2x 10 Gbps) • 96 GB Memory • Embedded Cisco IMC (2x 1 Gbps) • 1 LSI 6G MegaRAID 9261-8i Card • 2 Integrated Gigabit Ethernet Ports • 16 1-TB 6-Gb SATA 7200 RPM SFF Disk Drives • Red Hat Enterprise Linux Server Standard • Redundant Hot-Swappable Power Supplies and Fans Figure 4. Cluster Node The components of the single-rack and multi-rack configurations are depicted in Figure 5. Table 3 provides general guidelines for the number of instances of each service to run in a solution. 2x Cisco UCS 6248UP Fabric Interconnects 2x Cisco Nexus 2232PP 2x Cisco Nexus 2232PP 10 GE Fabric Extenders 10 GE Fabric Extenders for Each Additional Rack 16x Cisco UCS C210 M2 16x Cisco UCS C210 M2 Compute Nodes Compute Nodes for Each Additional Rack Single-Rack Configuration Multiple-Rack Configuration Figure 5. Cisco Single- and Multiple-Rack Configurations © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 9
  • 10. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 Table 3. Node Recommendations for Greenplum MR Services Service Number of Nodes (per Rack) Container location database 1 to 3 FileServer Most or all nodes HBase Master 1 to 3 HBase RegionServer Varies JobTracker 1 to 3 NFS Varies TaskTracker Most or all nodes WebServer 1 or more Zookeeper 1, 3, 5, or a higher odd number Excellence from Cisco and Greenplum Hadoop implementations can present a number of challenges to enterprise environments. Many of these challenges arise from the dichotomy between the introduction of innovative new technology and the enterprise-class performance, reliability, and support demanded by mission-critical systems. The collaboration between Cisco and Greenplum is specifically designed to provide a solution to these challenges. The joint Cisco and Greenplum solution delivers all the characteristics expected of a fully integrated solution, including radically simplified deployment and management, high availability, excellent performance, exceptional scalability, and enterprise-class service and support. Complete Big Data Analysis Solution The comprehensive solution from Cisco and Greenplum helps organizations deploy big data solutions quickly, with validated configurations that scale easily and predictably, as demand dictates. The Cisco Greenplum MR Reference Configurations provide an end-to-end solution that has been tested and validated and that enables enterprise customers to accelerate big data initiatives. Designed for High Availability and Reliability Every component in the Cisco and Greenplum MR Hadoop solution is fully redundant. The Greenplum MR architecture provides JobTracker High Availability (HA) and Distributed NameNode HA to prevent lost jobs from causing time- consuming restarts or failover incidents. Combining the core networking capabilities of Cisco, this solution can be extended to include remote mirroring to help ensure data reliability by synchronizing a copy of the cluster’s data at a remote site so that data analysis can continue in the event of a disaster. Locally, snapshots protect data © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 10
  • 11. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 from application errors or accidental deletion. Snapshots also enable easy data recovery to a specific point in time by simply copying a file or directory from the snapshot location to the desired destination directory. High Performance and Exceptional Scalability Cisco UCS unified fabric architecture provides fully redundant, highly scalable lossless 10-Gbps unified fabric connectivity for big data traffic. Powered by the latest Intel Xeon processor, the Cisco Greenplum MR solution delivers best- in-class performance and internal storage capacity to gain at least two times faster performance than Apache Hadoop. The Cisco Greenplum MR Reference Configurations can easily scale to support a large number of nodes when required by business demands. The advanced management capabilities of Cisco UCS radically simplify this process with a single point of management that spans all nodes in the cluster. Simplified Management Hadoop implementations tend to involve very large numbers of servers. In traditional environments, it can be challenging to manage these large numbers of servers effectively. Cisco UCS Manager delivers unified, model-based management that applies personality and configures server, network, and storage connectivity resources, making it as easy to deploy hundreds of servers as it is to deploy a single server. Additionally, Cisco UCS Manager can perform system maintenance activates such as firmware updates across the entire cluster as a single operation. To ease the monitoring of these large clusters, the Greenplum MR control system raises alarms and sends notifications to alert IT personnel about cluster health and the status of services. Coexistence with Enterprise Applications Beyond introducing a Hadoop deployment, organizations need ways to transfer data transparently between their enterprise applications and Hadoop. This solution can connect, across the same management plane, to other Cisco UCS deployments running enterprise applications, thereby radically simplifying data center management and connectivity. Rapid Deployment Deployment of large numbers of servers can take time. Systems need to be racked, networked, configured, and provisioned before they can be put into use. Cisco UCS Manager uses a model-based approach to provision servers by applying a desired configuration to physical infrastructure quickly, accurately, and automatically. The ability to create consistent configurations improves business agility and eliminates a major source of errors that can cause downtime. Enterprise Service and Support Enterprises want know that the vendors providing a solution have the expertise to help them quickly proceed through the design, deployment, and testing of strategic big data initiatives. Businesses also need to have confidence that if a critical system fails, they will be able to get timely and competent support. The Cisco Greenplum MR Reference Configurations bring together world-class service and support from long-time collaborators Cisco and Greenplum. © 2012 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public information. Page 11
  • 12. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations February 2012 For More Information For complete details about Cisco UCS, visit http://www.cisco.com/go/ucs. Acknowledgments This document was authored by Raghunath Nambiar and William Davis with contributions from Robert Demartino, Theresa Chen, Ashok Rajagopalan and Girish Kulkarni. Americas Headquarters Asia Pacific Headquarters Europe Headquarters Cisco Systems, Inc. Cisco Systems (USA) Pte. Ltd. Cisco Systems International BV Amsterdam, San Jose, CA Singapore The Netherlands Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) LE-35101-02 02/12