SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
White Paper




EMC Greenplum Data Computing Appliance
Enhances EMC IT’s Global Data Warehouse
Accelerating Big Data and Analytics




                      Abstract
                      This white paper illustrates a synergistic model for
                      deploying EMC’s Greenplum Data Computing Appliance
                      (DCA) with EMC IT’s incumbent Global Data Warehouse
                      infrastructure. It illustrates the steps to migrate from EMC’s
                      IT Global Data Warehouse, and the progress Greenplum DCA
                      enabled EMC to make in addressing its business challenges
                      of growing data (“Big Data”) and accelerated analytics.

                      August 2011
Copyright © 2011 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.

The information in this publication is provided “as is.” EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.

Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.

VMware and ESX are registered trademarks or trademarks of
VMware, Inc. in the United States and/or other jurisdictions. All
other trademarks used herein are the property of their respective
owners.

Part Number H8869.1




                           EMC IT Accelerating Big Data and Analytics   2
Contents
Executive Summary ................................................................................................. 4
   Problem Statement............................................................................................................. 4
   Value Statement ................................................................................................................. 4
   Audience ............................................................................................................................ 5
EMC IT’s Big Data Approach ..................................................................................... 6
   EMC IT’s Information Reference Architecture....................................................................... 6
     EMC IT’s Global Data Warehouse Success ...................................................................... 6
     EMC IT’s Big Data and Analytics Business Drivers ........................................................... 7
   Greenplum Offerings .......................................................................................................... 8
     EMC Greenplum Database Software ............................................................................... 8
     EMC Greenplum Data Computing Appliance (DCA) Product Family .................................. 9
     Greenplum Enablers ..................................................................................................... 10
Business POC ........................................................................................................ 11
      Business Challenge ...................................................................................................... 11
      Business Success......................................................................................................... 12
EMC IT’s Greenplum POC Highlights ....................................................................... 13
   Greenplum POC Platform Migration .................................................................................. 13
     High-Level Migration POC Milestones ........................................................................... 13
     How It Was Done: A POC ............................................................................................... 13
     Greenplum DCA POC Deployment ................................................................................. 13
     EMC IT Global Data Warehouse to Greenplum POC Migration........................................ 15
     Migration Steps ............................................................................................................ 15
Business POC Benefits .......................................................................................... 25
   Query Performance ........................................................................................................... 25
   New Analytics Abilities ..................................................................................................... 28
   EMC IT Global Data Warehouse to Greenplum POC: Lessons Learned ............................... 31
Conclusion ............................................................................................................ 33
   Benefits............................................................................................................................ 33
   References ....................................................................................................................... 34




                                                                             EMC IT Accelerating Big Data and Analytics                      3
Executive Summary
This white paper illustrates a synergistic model for deploying the Greenplum Data Computing
Appliance (DCA) with EMC IT’s existing Global Data Warehouse (GDW) deployed on EMC IT’s
Business Intelligence (BI) Grid infrastructure. It outlines the migration steps of moving the GDW
to the DCA to solve business problems driven by growing data (“Big Data”), and the need for
better, accelerated, and advanced analytics that could not be addressed by the previous Global
Data Warehouse.

Problem Statement

As EMC has grown, so has our data. It is not unusual for us to analyze hundreds of millions of
rows of data to gather insight into the health of our business.

As EMC has become more complex, so has the analysis that must be executed on this Big Data.
Our business is transitioning from backward looking metrics (lagging indicators) to both
backward and forward looking indicators (leading indicators), and is using statistical modeling
to predict the impact that EMC’s current decisions will have on our future business model.

Until recently, EMC’s Global Data Warehouse had been struggling to meet the demand for
Operational BI, Big Data, and advanced analytics. This resulted in shadow initiatives and labor-
intensive processes to generate the information required by EMC’s business owners. For
example, an EMC business unit data load on the legacy infrastructure took more than 6 days
and created more than 650 batch reports that had to be reviewed by a business unit subject
matter expert to create the needed information to do an analysis. (See Business POC section for
details)

Greenplum is perfectly designed to satisfy this need for predictive analytics on Big Data. The
introduction of the Greenplum DCA will reduce the need for these shadow activities, and will
create a platform for EMC IT to develop and deploy an advanced analytics appliance that will
serve as a platform to develop Business Intelligence as a Service (BIaaS).

Value Statement
This document illustrates the specific deployment steps and resultant benefits of the
integration of the Greenplum DCA with EMC’s Global Data Warehouse, including:

The high-level deployment model of the Greenplum DCA into the existing EMC Global Data
Warehouse infrastructure. This includes:

           – The phased approach for moving EMC IT’s BI functions to the Greenplum DCA via
             the business proof of concept (POC).




                                                 EMC IT Accelerating Big Data and Analytics      4
– Improved data loading, performance, and functionality via the use of Greenplum
             DCA. Example:

                  •   Ultra-fast loading -- The legacy infrastructure data load that took the
                      business unit more than 6 days and 650 batch reports can be loaded in
                      less than 29 minutes on the Greenplum DCA.

                  •   A query running in 427 seconds on the legacy platform decreased to just
                      28.85 seconds on the Greenplum DCA for 99 million rows of data.

                  •   A process that took 45 minutes on the legacy platform can now be
                      completed in 3.46 minutes on the Greenplum DCA.

           – EMC IT increased its ability to use a tool set to create a dashboard and execute
             “on-the-fly” analytics instead of delayed batch reports.

           – The EMC / EMC IT “lessons learned” from the Greenplum DCA deployment.

           – The co-existence of both the EMC IT Global Data Warehouse (BI Grid) and the
             Greenplum DCA to address EMC’s data warehouse and BI challenges with Big
             Data and analytics to create a platform for Business Intelligence (BI) as a Service
             (BIaaS).

Audience
This white paper is intended for CIOs, data warehouse architects, Oracle architects, storage
architects, Oracle DBAs, and server and network administrators.




                                                EMC IT Accelerating Big Data and Analytics      5
EMC IT’s Big Data Approach
This section identifies the EMC Data Reference Architecture and EMC IT’s current infrastructure
of the EMC IT Global Data Warehouse deployment.

EMC IT’s Information Reference Architecture

The diagram below illustrates EMC IT’s vision of the components and process for its Information
Reference Architecture:




                 Figure 1: Information Reference Architecture model
This logical architecture shows the need to have both the current Oracle Data Warehouse and
the DCA be part of EMC IT’s Global Data Warehouse.

EMC IT’s Global Data Warehouse Success

This consolidated Oracle Global Data Warehouse deployed on EMC IT’s BI-Grid infrastructure
(Oracle RAC) has delivered the following benefits for EMC and EMC IT (please review the Global
Data Warehouse white paper in the References section for details):
   •   Consolidation of five data warehouses to one unified Global Data Warehouse
       infrastructure



                                                EMC IT Accelerating Big Data and Analytics        6
•   10 times improvement in query performance from Day 1
             •   180 percent improvement in batch job performance
             •   50 percent reduction in data mart update times
             •   Two to three times performance improvement in reporting cube build times
             •   200 percent improvement in dashboard rendering and drill-down performance

          EMC IT’s Big Data and Analytics Business Drivers

          Even with the greatly improved performance provided by the enterprise Oracle Data
          Warehouse, EMC IT was still not keeping up with:
             •   Data explosion
             •   Shadow systems sprawl
             •   Shrinking load windows
             •   Increasing complexity of the analytics required by the business
             •   Volumes of data required for predictive analytics and data mining
             •   Explosion of point solutions and shadow marts in an attempt to remediate the issues
Figure 2 is a high-level illustration of the challenges the data warehouse did not address:




                                  Figure 2: EMC’s Big Data business drivers




                                                           EMC IT Accelerating Big Data and Analytics   7
Figure 2 illustrates the need to integrate the Greenplum DCA into the Global Data Warehouse
infrastructure to address the business drivers that the Global Data Warehouse could not solve.
These business drivers caused EMC IT to look at a new platform to deal with Big Data
and analytics. EMC IT selected the Greenplum DCA for its ability to:
    •   Analyze data and information in real time
    •   Apply analytics and discern critical information
    •   Unlock the value of information from this Big Data

Greenplum Offerings

The following section gives a high-level overview of the Greenplum Database and the
Greenplum DCA, and their ability to accelerate business value for EMC IT Big Data and analytics
drivers.

EMC Greenplum Database Software
Built to support the next generation of Big Data warehousing and large-scale analytics
processing, the EMC Greenplum Database manages, stores, and analyzes terabytes to
petabytes of data. Users experience 10 to 100 times better performance over traditional RDBMS
products – a result of the DCA’s shared-nothing MPP architecture, high-performance parallel
dataflow engine, and advanced gNet software interconnect technology.

The Greenplum Database was conceived, designed, and engineered to enable customers to
take advantage of large clusters of increasingly powerful, increasingly inexpensive commodity
servers, storage, and Ethernet switches. EMC Greenplum customers can gain immediate benefit
from deploying the latest commodity hardware innovations.

Greenplum’s shared-nothing architecture is optimal for fast queries and loads because
processors are placed as close as possible to the data itself for faster operations with the
maximum degree of parallelism possible.




                                                 EMC IT Accelerating Big Data and Analytics      8
Figure 3: Greenplum DCA high-level architecture
EMC Greenplum Data Computing Appliance (DCA) Product Family
The Greenplum Data Computing Appliance family is a group of purpose-built, highly scalable,
parallel data warehousing appliances that architecturally integrate database, compute,
storage, and network into a single, easy-to-manage enterprise-class system. Designed for rapid
analysis of data volumes scaling into petabytes, the Greenplum DCA is a powerful platform for
unifying business intelligence and advanced analytics enterprise-wide. The Greenplum DCA has
been proven to provide the industry’s fastest data loading capabilities, and, in turn, to rapidly
deliver real business value to customers by reducing investments of money, time, and effort.

The Greenplum DCA product line is available in two models: the Standard DCA, which supports
up to 144TB of compressed data; and the High Capacity DCA, which supports up to 496TB of
compressed data. Regardless of your data computing requirements, there is a model in the DCA
family to meet your needs. The Greenplum DCA family is offered in configurations ranging from
a quarter-rack to multiple-rack appliances to achieve the maximum flexibility and scalability for
organizations faced with terabyte to petabyte scale data opportunities. Because these
appliances support the same state-of-the-art EMC Greenplum Database as their core, switching
between the models can be effortless when a business grows and its requirements change.

The Greenplum DCA family provides a rapidly deployable, scalable, and cost-effective
infrastructure so you can manage and analyze exploding data volumes while increasing
performance and achieving greater business agility.




                                                EMC IT Accelerating Big Data and Analytics      9
Greenplum Enablers
Greenplum is a strategic platform for EMC IT. At a high level, it provides the following benefits
for EMC IT’s vision of Big Data and analytics:
•   Ultra-fast loads and linear scalability
•   Data aggregation with common taxonomy
•   In database ad-hoc analytics
•   Predictive analysis and proactive identification




                                                  EMC IT Accelerating Big Data and Analytics        10
Business POC
          The EMC Customer Quality business unit focuses on EMC's customers, partners, and internal
          stakeholders, looking at quality in the broadest sense, including products, services,
          innovations, interactions, and processes. The following section highlights the EMC Corporate
          Quality business unit’s challenge using the current Oracle infrastructure.

         Business Challenge

          Prior to deploying the Greenplum DCA, the analytics process would begin with a 31-hour data
          load preparation process followed by a 4-to-5-day batch report run. At the conclusion of this
          process, Customer Quality would begin looking at product quality.

          Customer Quality would then spend a week or more creating various reliability analyses. They
          would use other offline toolsets and spreadsheets, as well as many hours of labor-intensive
          analysis, to generate the root cause analysis of any identified issues and to predict expected
          future product quality.

The diagram below illustrates the process used on the EMC IT Global Data Warehouse and stand-alone
reporting instance:




                           Figure 4: Previous Customer Quality process




                                                         EMC IT Accelerating Big Data and Analytics        11
Business Success
With hundreds of millions of records to sort through, obtaining the insight from the data using
the existing platform was previously a time-consuming process. Now, with the Greenplum DCA,
the Customer Quality business unit processes hundreds of millions of records in minutes. The
diagram below illustrates the process used on the Greenplum DCA:




                    Figure 5: New Customer Quality process with Greenplum
With the Greenplum DCA, Customer Quality can perform quality analysis of all the products,
versus only one product at a time in the Oracle environment.

Customer Quality can now drive business and customer value by using Greenplum’s speed.




                                                EMC IT Accelerating Big Data and Analytics    12
EMC IT’s Greenplum POC Highlights

The following section outlines the highlights of the Greenplum POC platform
migration and Greenplum POC architecture.

Greenplum POC Platform Migration
These sub-sections detail the high-level migration milestones and POC approach to
migrate to the Greenplum DCA. They highlight the discovery and phased migration of
a POC project to gain experience and knowledge on Greenplum, and describe how it
could coexist with the existing Global Data Warehouse infrastructure.

High-Level Migration POC Milestones
The high-level migration milestones were:
•   October 2010 – January 2011: Proof-of-concept (POC) Greenplum DCA equipment
    procured and configured.
•   February – March 2011: Initial load tests run on the POC environment.
•   April – May 2011: POC Greenplum DCA and initial procurement used to build a
    development (dev) and test POC environments.
Introducing the Greenplum DCA in EMC IT's Global Data Warehouse infrastructure
required planning, a knowledgeable IT staff, and business owners.

How It Was Done: A POC
EMC IT first got its knowledge and experience of deploying and migrating EMC
Customer Quality data by migrating a few legacy data marts to the Global Data
Warehouse infrastructures. The Customer Quality data was then migrated over to the
Greenplum DCA for the POC to prove that the Greenplum DCA could address the real-
world advanced analytics problem of the Customer Quality organization.

Greenplum DCA POC Deployment
The deployment of EMC IT’s Greenplum DCA for this POC included the introduction of the
following hardware into the EMC IT Global Data Warehouse infrastructure:




                                                EMC IT Accelerating Big Data and Analytics   13
Figure 6: EMC IT’s Greenplum DCA POC hardware layout
Table 1 shows the components of the separately deployed Greenplum appliances for
the POC infrastructure:
    Components             GP1000 (Full Rack) DCA       GP100 (Half Rack) DCA
    Operating system       Red Hat Enterprise Linux 5   Red Hat Enterprise Linux
                           update 5 (RHEL 5.5).         5 update 5 (RHEL 5.5).
    RDBMS Software         Greenplum 4.0.3              Greenplum 4.0.3

    Segment Servers        16                           8
    Processors (per        (2) six-core (2.93GHz)       (2) six-core (2.93GHz)
    segment server)        192 total cores              96 total cores
    Memory (per segment    48GB (768GB total)           48GB (384GB total)
    server)
    Storage (hard drives   (12) 600GB 15K SAS           (12) 600GB 15K SAS
    per segment server)
    Usable Capacity        36TB                         18TB
    (uncompressed)
    Usable Capacity        144TB                        73TB
    (compressed)
    Master Servers         2                            2
    Processors (per        (2) six-core X5680 (3.33     (2) six-core X5680 (3.33
    master server)         GHz) 12 total cores          GHz) 12 total cores




                                              EMC IT Accelerating Big Data and Analytics   14
Memory (per master      48GB                           48GB
    server)
    Storage (hard drives    (6) 600GB 10K SAS              (6) 600GB 10K SAS
    per master server)
    Scan Rate               24GB/s                         12GB/s


    Data Load Rate          10TB/Hr                        5TB/Hr

                  Table 1: Greenplum DCA POC components
EMC IT Global Data Warehouse to Greenplum POC Migration
The following section details the method EMC IT used to migrate the EMC Global Data
Warehouse to the Greenplum DCA platform.

Migration Steps
Here are the steps for migrating EMC IT’s Oracle Global Data Warehouse to the EMC IT
POC Greenplum DCA:

Step 1. Assess the current Oracle environment.
           1. Prepare a list of objects of the Oracle Global Data Warehouse (GDW)

              The total database size in Oracle is 15TB.

              Below are all application-related objects (counts) in the current Oracle
              environment.


                                                       Number of
                  Object Types                         Objects
                  TABLES                               3,634
                  TABLE PARTITIONS                     485
                  TABLE SUBPARTITIONS                  732
                  INDEXES                              4,424
                  INDEX PARTITIONS                     5,823
                  INDEX SUBPARTITIONS                  1,924
                  MATERIALIZED VIEWS                   34
                  LOB SEGMENTS                         27
                  TRIGGERS                             1,492
                  PACKAGES                             70
                  PACKAGE BODY                         55
                  PROCEDURES                           194




                                                EMC IT Accelerating Big Data and Analytics   15
SEQUENCES                          720
                  Constraints                        16,140
                  Packages                           61
                  Number of rows                     ~20 billion rows
                  Procedures                         191
                  Tables related to CQ               394
                  Table 2: Oracle Global Data Warehouse (GDW) objects
           2. Find the objects that do not need to be migrated to Greenplum:
                 a. Materialized views
                 b. Indexes – due to MPP architecture performance gains, indexes will not be
                     needed
           Test to see if it is necessary to migrate certain objects, such as
           materialized views and indexes.
           3. Prepare a list of schemas, with the size of the objects to be migrated.
           4. Prepare the list of objects to be migrated to Greenplum.

            Object type         Count         Comments
            TABLES and data     3,634         Tables and its data to be migrated
            TABLE                             Created two versions: non-partitioned and
            PARTITIONS          485           then partitioned
            TABLE
            SUBPARTITIONS       732           Tested all the options
            INDEXES             4,424         Not migrated
            INDEX
            PARTITIONS          5,823         Not migrated
            INDEX
            SUBPARTITIONS       1,924         Not migrated
            MATERIALIZED
            VIEWS               34       Not migrated
            TRIGGERS            1,492    Not part of the POC
            PACKAGES            70       Not part of the POC
            PACKAGE BODY        55       Not part of the POC
            PROCEDURES          194      Not part of the POC
                                         Required for incremental loading. Not part of
            SEQUENCES       720          this POC
                      Table 3: Oracle Migrate/No Migrate object list



Step 2. Assess data types in the Oracle database.
    Below are the data types and the number of columns for each.




                                                 EMC IT Accelerating Big Data and Analytics   16
Data types                           Counts
                  VARCHAR2                             74,296
                  NUMBER                               38,393
                  DATE                                 16,999
                  CHAR                                 845
                  FLOAT                                564
                   TIMESTAMP                           46
                  NCHAR                                1
                  NVARCHAR                             25

                         Table 4: Oracle Data Types
           1. Find the right equivalent data type available in Greenplum. Below are the
              different data types that are part of the tables to be migrated and their
              equivalent Greenplum (postgres) data types.

                  Data types        Greenplum equivalent data type
                  VARCHAR2,
                  NVARCHAR2        VARCHAR
                                   NUMERIC, NUMERIC (p,s), SMALLINT (2
                  NUMBER           bytes), INTEGER (4 bytes), BIGINT (8 bytes),
                  DATE             DATE or TIMESTAMP
                  CHAR, NCHAR      CHAR
                                   NUMERIC (p,s), DECIMAL (p,s), REAL (4
                  FLOAT            bytes), DOUBLE (8 bytes),
                  TIMESTAMP        TIMESTAMP
                        Table 5: Greenplum data type
           The following select statement assists in finding the precision and scale used for
           the NUMBER data types. It assumes statistics are up to date in Oracle:

                  select length(utl_raw.cast_to_number(HIGH_VALUE)) from all_tab_columns
                  where table_name='TABLE_NAME' and column_name='COLUMN_NAME'


Step 3. Identify and assess migration tools and their requirements.
            Migration tools considered:
                • Informatica
                • Migration scripts
                • Ora2pg freeware tool
                • Web external tables
                • Third-party utilities




                                                EMC IT Accelerating Big Data and Analytics      17
Step 4. Prepare and deploy Greenplum DCA.
           1. Site-specific requirements for DCA deployment.
                     a. IP address on network for master, standby server, and network
                         access.
                     b. Choose passwords for root and gpadmin, and password for remote
                         console access (iDRAC).
                     c. Preferred locale, time zone for DCA servers.
                     d. Preferred database character set encoding.

           2. Installing Greenplum Data Computing Appliance (DCA).
                       a. Appliance: Shipping state from Manufacturing.
                               i. Appliance comes with racked, stacked, and cabled.
                              ii. Software pre-installed, minimum on-site configuration
                                   required.
           3. Initial deployment.
                       a. Supply power to rack; inspect and validate rack hardware and
                          cabling.
                       b. Configure access to network (master and standby only).
                       c. Refresh software if needed; run validation check.
                       d. Configure locale, timezone, and NTP on all DCA servers.
                       e. Run gpinitsystem to initialize database.
                       f. Initialize standby database.
                       g. Enable Greenplum performance monitor.
                       h. Configure and connect EMC as needed.


The installation time was only 8 hours for each DCA deployed (half and full rack) for a
total of 16 hours.




                                                 EMC IT Accelerating Big Data and Analytics   18
Step 5. Prepare Greenplum Database Environment in DCA.

          1. Create a database in Greenplum DCA. Database properties will be inherited from
             template1 that was created during the DCA environment setup. Creating a
             database in Greenplum is as simple as shown in this screen shot:




                       Figure 7: Greenplum database creation
          2. Preparing schemas.

                    a. List all the schemas to be created, and create schemas in Greenplum
                       using the below syntax.




                        Figure 8: Greenplum database schemas




                                             EMC IT Accelerating Big Data and Analytics   19
Step 6. Convert DDL from Oracle to Greenplum.

             1. Converting DDL from Oracle to Greenplum involves:
                      a. Reading the table properties from the current Oracle environment.
                      b. Converting the create table syntax from Oracle to Greenplum.
                      c. Converting each data type in table definition from Oracle to
                          Greenplum.
                      d. Choosing the right distribution key.

             2. Identify the distribution keys. EMC IT chose the distribution keys based on these
                criteria:
                        a. If a table contains a primary key in Oracle, consider it a distribution
                           key in Greenplum.
                        b. If a table in Oracle contains no primary key but a unique key exists,
                           then consider a unique key with fewer columns for a distribution key
                           in Greenplum.
                        c. Choose Random distribution when the table does not have a
                           candidate column for the distribution key and the table is small.
             3. No tool is perfect when it comes to converting the DDL from Oracle to any other
                MPP database.
                        a. EMC IT leveraged a DDL conversion utility, which connects to Oracle
                           and converts table DDLs for one schema, multiple schemas, or a
                           single table.

                           The EMC IT DDL conversion utility usage is as follows:

                           Usage: generate_gp_tables.sh -s <schema names separated by
                           comma> -r <Optional:remap schema> -p <Optional:Path>

                           There are three switches in this script.
                            -s is mandatory and EMC IT needs to specify the schema name.

                           The EMC IT script can specify multiple schemas separated by comma.

                           The other two switches are optional.

                           -p switch is path for generating ddl scripts.
                            If this is not specified, ddl scripts will be generated in the current
                           directory.

                           -r is for remap schema option.

                           E.g.: #:/apps/oradump2/dump2/gpdump/scripts:oraedw1>
                           ./generate_gp_tables.sh -s extract,masterdata -r
                           extract_data,mstr_data




                                                   EMC IT Accelerating Big Data and Analytics        20
Directory created.
                           PL/SQL procedure successfully completed.
                           PL/SQL procedure successfully completed.
                           Directory dropped.

                           #:/apps/oradump2/dump2/gpdump/scripts:oraedw1> ls -ltr
                           total 496
                           -rwxr-x--- 1 oracle dba 9861 Jun 6 01:50 generate_gp_tables.sh
                           -rw-r--r-- 1 oracle dba 434555 Jun 6 02:04
                           create_gptables_EXTRACT_DATA.sql
                           -rw-r--r-- 1 oracle dba 39587 Jun 6 02:04
                           create_gptables_MSTR_DATA.sql

                    The above process illustrates converting DDLs from Oracle to Greenplum for
                    two schemas (Extract and MasterData).

  Step 7. Create table structures in Greenplum.

             1. Copy converted DDL files to the Greenplum master server script location.
             2. Execute the script in Greenplum to create the DDL of all tables in the Greenplum
                database.

                 E.g.: psql -d gpcqdb -f create_gptables_EXTRACT_DATA.sql -L
                 create_gptables_EXTRACT_DATA.log -o create_gptables_EXTRACT_DATA.out

  Step 8. Create a gold environment as a data migration source.

             1. Create a gold environment with migration scripts to perform the clone from the
                production Oracle Global Data Warehouse. This option will have no impact on
                the production database during migration of the data. The script:

                               a. Shuts down the gold Oracle database and unmounts the
                                  Oracle asm diskgroups.
                               b. Recreates the clone using EMC Timefinder technology.
                               c. Keeps the Production Oracle instance in backup mode.
                               d. Activates the created Timefinder clone.
                               e. Takes the Oracle production instance out of backup mode.
                               f. Mounts asm diskgroups on the gold environment and brings
                                  up the Oracle database on the gold environment to use as the
                                  Data Migration source.

Step 9. Leverage unloading and loading scripts from Oracle to Greenplum.

      The challenge involved in EMC IT’s environment is to be able to migrate data from more
      than 3,600 tables, with approximately 132,000 columns. Each column of data should be
      able to be identified uniquely.



                                                  EMC IT Accelerating Big Data and Analytics     21
1. Unloading involves:
      a. Identifying the schemas and tables to be unloaded.
      b. Establishing the connection to the Oracle database and ASM.
      c. Choosing the type of unload, such as direct or conventional.
      d. Choosing the column separator (E.g.: ;).
      e. Choosing the Record separator (E.g.: n).
      f. Choosing the Encloser (E.g. “) .
      g. Choosing the Escape Enclose (E.g. ).
2. Loading to Greenplum involves:
      a. A YAML file for each table. An example YAML file is below. YAML is a
         human-friendly data serialization standard for all programming
         languages. In this example, this YAML file had the information about
         the target Greenplum environment (database name, user, hostname,
         port, input source information, column identifications in the files, and
         the delimiters used).
                        VERSION: 1.0.0.1
                        DATABASE: gpcqdb
                        USER: gpadmin
                        HOST: gpserver
                        PORT: 5432
                        GPLOAD:
                         INPUT:
                         - SOURCE:
                           FILE:
                            - /tmp/gp/W_TASK_NOTE_D.dat.
                         - COLUMNS:
                            - ROW_WID: numeric
                            - W_SESSION_NUMBER: numeric
                            - TASK_WID: numeric
                            - NOTE_ID: numeric
                            - NOTE_CRTE_BY_NM: text
                            - NOTE_CRTE_BY_RSRC_ID: numeric
                            - NOTE_CRTE_DT: timestamp
                            - TASK_ID: numeric
                            - NOTE_SUBJ: text
                            - NOTE_TYPE_WID: numeric
                            - NOTE_VISABILITY_FLG_WID: numeric
                            - W_LAST_UPDATED_BY: text
                            - W_INSERT_DT: timestamp
                            - W_UPDATE_DT: timestamp
                            - W_ROW_HASH: text
                            - W_REPROCESSED_DT: timestamp
                         - FORMAT: text
                         - DELIMITER: ';'
                         - NULL_AS: ''



                                 EMC IT Accelerating Big Data and Analytics     22
- ESCAPE: ''
                                          - ERROR_LIMIT: 10000
                                          - ERROR_TABLE: GS_ERR.w_task_note_d_err
                                          OUTPUT:
                                          - TABLE: GS.w_task_note_d
                                          - MODE: INSERT
                        b. An input data source; a data file, pipe, or socket.
                        c. Gpload commands for all the tables involved to load the data from
                           the input source to the Greenplum target database.

                            Here is an example gpload command:

                                   gpload.py -f "/tmp/W_TASK_NOTE_D.ctl" -h gpserver -p 5432 -
                                   d gdcqdb -U gpadmin -l "/tmp/W_TASK_NOTE_D.log"

Step 10. Migrate data from Oracle to Greenplum.

                        1. Create sockets; unload the data using unload scripts, which unloads
                           the data directly from Oracle ASM in parallel into PIPLE/SOCKET;
                           compress the data; transfer to Greenplum servers; uncompress, and
                           load data to Greenplum using the gpload utility.

                            Below are example commands to perform the data migration from
                            Oracle to Greenplum using pipe/socket.

                            1. On Oracle server:

                                   mknod /tmp/TABLE.dat p

                            2. On Oracle server:
                                  gzip < /tmp/TABLE.dat | ssh -q gpadmin@gpmasterhost
                               "gunzip >/tmp/TABLE.dat " &

                            3. On Greenplum DCA:
                                  mknod /tmp/TABLE.dat p
                                  gpload.py -f "/tmp/TABLE.ctl" -h gpmasterhost -p 5432 -d
                               gpdb -U gpadmin -l "/tmp/TABLE.log"

                            4. On the Oracle server, start the unload process:
                                             a. Using this approach, it starts loading data into
                                                Greenplum as it unloads data from Oracle
                                                without the need for any intermediary storage.

             You can also unload the data to an ASCII text file, read the data from this text file,
             and load into the Greenplum database. This approach requires the intermediary
             storage to hold the ASCII text files.



                                                   EMC IT Accelerating Big Data and Analytics         23
After the data has been migrated, the storage space needed on Greenplum is only
50 percent of the storage space needed in Oracle because it eliminates the need for
most of the indexes and materialized views.

Here is a high-level illustration of steps 8 – 10:




         Figure 9: Oracle to Greenplum high-level migration steps




                                       EMC IT Accelerating Big Data and Analytics   24
Business POC Benefits
The following are illustrative examples of the benefits of using the Greenplum DCA.

Query Performance

Here are a few examples of queries that were executed in the Oracle environment, and the
queries and reports on the Greenplum DCA.




           Figure 10: Oracle versus Greenplum Query 1 example




                                                EMC IT Accelerating Big Data and Analytics   25
As the diagram illustrates, a query that took 7 minutes, 7 seconds to execute in the Oracle
environment now executes in 28.85 seconds (28855 ms) on more than 99 million rows.



  Oracle sample Query 2:




                                                   EMC IT Accelerating Big Data and Analytics   26
Greenplum sample
Query2:




                   Figure 11: Oracle versus Greenplum Query 2 example




                                                     EMC IT Accelerating Big Data and Analytics   27
Figure 12: Oracle versus Greenplum Query 1 and Query 2 example


The query graph illustrates that Query 1 was run in 427 seconds in Oracle and in 28 seconds on
Greenplum DCA. Query 2 on Oracle executed in 144 seconds and 13 seconds on Greenplum.



  New Analytics Abilities

  Prior to the introduction of the Greenplum DCA, all analysis was done via batch reports. In the
  POC, the performance of the Greenplum DCA gave Customer Quality the ability to use a tool set
  to create a dashboard and execute on-the-fly analytics that was not available in the Oracle
  environment.

  With the Greenplum DCA in place, Customer Quality can now process large data sets in record
  time.

  Here are examples that illustrate the ability to view Big Data in a dashboard and do analytics
  on-the-fly via dashboard tools such as SAS or other third-party BI tools.




                                                   EMC IT Accelerating Big Data and Analytics      28
Figure 13: Big Data analysis dashboard


Greenplum now enables Customer Quality to provide management with a dashboard environment
depicting many views of a product analysis, and, most importantly, to proactively address an
issue before it is seen by the customer.




                                                EMC IT Accelerating Big Data and Analytics   29
Figure 14: Dashboard view of Customer Quality’s Big Data
Customer Quality can now quickly analyze across a broad range of data to enable proactive
remediation of quality concerns and ensure continued customer satisfaction.

The time saved using Greenplum has allowed enhanced predictive analytics to be performed, which
will provide even more value internally and for EMC’s customers.




                                                     EMC IT Accelerating Big Data and Analytics   30
Figure 15: Greenplum enablers


As the diagram illustrates, Greenplum gives accelerated insight into:

                 •   Predictive models
                 •   Data mining
                 •   Dashboards
                 •   Dynamic query reports


  EMC IT Global Data Warehouse to Greenplum POC: Lessons Learned
  Here are highlights of the lessons learned about Oracle from this migration.
         Challenge                             Resolution

  Greenplum expertise               Greenplum reduces the complexity of Big Data analytics,
                                    enabling a simple retraining of the current EMC IT Oracle Data
                                    Warehouse team.
  Data type conversions             There is no perfect tool for converting the data types from
                                    Oracle to any MPP database.
                                    The majority of the numeric values in Oracle were defined as
                                    NUMBER data type without precision and scale. To better
                                    manage storage space, we experimented with a few
                                    conversion options. Our testing showed the following as the
                                    best option:

                                       1. Converted all the Oracle NUMBER columns to




                                                   EMC IT Accelerating Big Data and Analytics    31
Greenplum NUMERIC data type.
                                           a. NUMERIC data type consumes a minimum of 7
                                              bytes in Greenplum and can expand if the data
                                              is larger. This has been a very good resolution
                                              so far. We had to remediate the load failures.
                                           b. There is still a chance to save some space
                                              where there are columns that do not need 7
                                              bytes of storage space. Greenplum native INT
                                              data types are more storage-efficient and
                                              perform better. This requires expensive 100
                                              percent stats collection on Oracle or a lot of
                                              expertise on thousands of columns of space
                                              requirement.

Data distribution                The distribution policy determines how to divide the rows of a
                                 table across the Greenplum segments.
                                            1. Used hash distribution and a primary key as the
                                                distribution key where a primary key exists for a
                                                table in Oracle.
                                            2. Used hash distribution and unique index as a
                                                distribution key where a primary key does not
                                                exist and a unique index exists.
                                            3. Used random distribution for tables with no
                                                primary key and no unique index.
Staging space for                For accelerated data movement, we leveraged a mounted NAS
unload files                     device between Oracle and Greenplum.
(for 10TB database)
                                 Rather than using the compression algorithm during unload
                                 and the un-compression algorithm for load, we used PIPE to
                                 unload data with no compression and loaded to Greenplum
                                 via PIPE.
OBIEE integration with           EMC IT used the pgsqlODBC driver to successfully connect
Greenplum                        OBIEE with the Greenplum Database; it generates the SQL we
                                 wanted and expected.

                                 We also leveraged UDFs (user defined functions) in
                                 Greenplum to replicate some Oracle built-in functions.

Generating a DDL from            There is no perfect tool out there. The big challenge is
Oracle to Greenplum              generating the DDL with the right distribution policy, right
                                 distribution key, and correct data type. Working with the
                                 experts at Greenplum, we leveraged a set of methodologies
                                 and scripts to convert the DDL from Oracle to Greenplum.
    Table 6: Lessons learned in EMC IT’s migration from Global Data Warehouse to
    Greenplum




                                                EMC IT Accelerating Big Data and Analytics      32
Conclusion
This white paper illustrated a synergistic model for deploying the Greenplum DCA with
the existing EMC IT Global Data Warehouse (BI Grid infrastructure), showing the high-
level steps to migrate, business challenges the EMC’s Customer Quality business unit
experienced in the migration from the Global Data Warehouse to Greenplum, and the
improvements seen in Big Data and analytics processing with the Greenplum DCA for
EMC IT.

Benefits
           – Greenplum DCA reduces the need for labor-intensive offline analysis with
             dramatic results:

                  •   Customer Quality has seen a dramatic reduction in processing time.

                         •   Query examples that show a dramatic reduction in query
                             execution time:

                                •   Query 1 example from 427 seconds executing in the Oracle
                                    environment down to 28.85 seconds for 99 million rows
                                    with the DCA.

                                •   Query 2 example from 144 seconds executing in the Oracle
                                    environment down to 13 seconds with the DCA.

                         •   Data load example from 6 days on the legacy platform to 29
                             minutes on the Greenplum DCA.

                         •   Introduction of third-party BI tools to accelerate visualization of
                             Big Data and analytics.

           – EMC IT created a platform to develop and deploy a Big Data and analytics
             appliance, and to use as a future platform for developing Business Intelligence
             as a Service (BIaaS).




                                                 EMC IT Accelerating Big Data and Analytics        33
References
The links below will direct you to detailed information on the following subjects:

EMC IT’s Global Data Warehouse

EMC IT's Migration to the Open, Expandable Oracle BI Grid - Applied Technology

Greenplum Database and Data Computing Appliance

www.emc.com/bigdata

Greenplum DCA Data Sheet
Greenplum DCA Data Sheet

EMC Greenplum Data Computing Appliance architecture
EMC Greenplum Data Computing Appliance architecture

SAS interoperability with EMC Greenplum database
SAS interoperability with EMC Greenplum database

Acknowledgments
The author thanks EMC IT’s Global Data Warehouse, Greenplum, and the Customer
Quality teams for assistance in creating this white paper.




                                                   EMC IT Accelerating Big Data and Analytics   34

Weitere ähnliche Inhalte

Was ist angesagt?

EMC Academic Associate Exam Description - Cloud Infrastructure and Services
EMC Academic Associate Exam Description  - Cloud Infrastructure and ServicesEMC Academic Associate Exam Description  - Cloud Infrastructure and Services
EMC Academic Associate Exam Description - Cloud Infrastructure and ServicesEMC
 
How windows server 2008 r2 helps optimize it and save you money
How windows server 2008 r2 helps optimize it and save you moneyHow windows server 2008 r2 helps optimize it and save you money
How windows server 2008 r2 helps optimize it and save you moneyupendhra90
 
Emc vspex customer_presentation_private_cloud_microsoft_smb
Emc vspex customer_presentation_private_cloud_microsoft_smbEmc vspex customer_presentation_private_cloud_microsoft_smb
Emc vspex customer_presentation_private_cloud_microsoft_smbxKinAnx
 
AngliaRuskinCaseStudy-web
AngliaRuskinCaseStudy-webAngliaRuskinCaseStudy-web
AngliaRuskinCaseStudy-webTim Kitchener
 
Data Center Management: Where Brain Meet Braun
Data Center Management: Where Brain Meet BraunData Center Management: Where Brain Meet Braun
Data Center Management: Where Brain Meet BraunAFCOM
 
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...Principled Technologies
 
Defining the Value of a Modular, Scale out Storage Architecture
Defining the Value of a Modular, Scale out Storage ArchitectureDefining the Value of a Modular, Scale out Storage Architecture
Defining the Value of a Modular, Scale out Storage ArchitectureNetApp
 
The business value o f la ge scal e serve r consol i d a t i o n
The business value o f la ge  scal e serve r consol i d a t i o nThe business value o f la ge  scal e serve r consol i d a t i o n
The business value o f la ge scal e serve r consol i d a t i o nIBM India Smarter Computing
 
Data Center Optimization
Data Center OptimizationData Center Optimization
Data Center OptimizationBihag Karnani
 
Media and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac ProMedia and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac ProPrincipled Technologies
 
Informatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolInformatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolEdureka!
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems divjeev
 
Case Studies in Improving Application Performance With Solix Database Archivi...
Case Studies in Improving Application Performance With Solix Database Archivi...Case Studies in Improving Application Performance With Solix Database Archivi...
Case Studies in Improving Application Performance With Solix Database Archivi...LindaWatson19
 
Net App Scores 100% For Midrange Storage Market Solutions
Net App Scores 100% For Midrange Storage Market SolutionsNet App Scores 100% For Midrange Storage Market Solutions
Net App Scores 100% For Midrange Storage Market SolutionsMichael Hudak
 
Presidio Data Center Practice Overview
Presidio Data Center Practice OverviewPresidio Data Center Practice Overview
Presidio Data Center Practice Overviewrmoquete
 

Was ist angesagt? (18)

EMC Academic Associate Exam Description - Cloud Infrastructure and Services
EMC Academic Associate Exam Description  - Cloud Infrastructure and ServicesEMC Academic Associate Exam Description  - Cloud Infrastructure and Services
EMC Academic Associate Exam Description - Cloud Infrastructure and Services
 
How windows server 2008 r2 helps optimize it and save you money
How windows server 2008 r2 helps optimize it and save you moneyHow windows server 2008 r2 helps optimize it and save you money
How windows server 2008 r2 helps optimize it and save you money
 
Emc vspex customer_presentation_private_cloud_microsoft_smb
Emc vspex customer_presentation_private_cloud_microsoft_smbEmc vspex customer_presentation_private_cloud_microsoft_smb
Emc vspex customer_presentation_private_cloud_microsoft_smb
 
AngliaRuskinCaseStudy-web
AngliaRuskinCaseStudy-webAngliaRuskinCaseStudy-web
AngliaRuskinCaseStudy-web
 
Data Center Management: Where Brain Meet Braun
Data Center Management: Where Brain Meet BraunData Center Management: Where Brain Meet Braun
Data Center Management: Where Brain Meet Braun
 
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
Dell PowerEdge M820 blades: Balancing performance, density, and high availabi...
 
Defining the Value of a Modular, Scale out Storage Architecture
Defining the Value of a Modular, Scale out Storage ArchitectureDefining the Value of a Modular, Scale out Storage Architecture
Defining the Value of a Modular, Scale out Storage Architecture
 
The business value o f la ge scal e serve r consol i d a t i o n
The business value o f la ge  scal e serve r consol i d a t i o nThe business value o f la ge  scal e serve r consol i d a t i o n
The business value o f la ge scal e serve r consol i d a t i o n
 
Data Center Optimization
Data Center OptimizationData Center Optimization
Data Center Optimization
 
Media and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac ProMedia and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
Media and entertainment workload comparison: HP Z8 vs. Apple Mac Pro
 
Informatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolInformatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL Tool
 
Data center terminology photostory
Data center terminology photostoryData center terminology photostory
Data center terminology photostory
 
EMC DR Case Study
EMC DR Case StudyEMC DR Case Study
EMC DR Case Study
 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems
 
Case Studies in Improving Application Performance With Solix Database Archivi...
Case Studies in Improving Application Performance With Solix Database Archivi...Case Studies in Improving Application Performance With Solix Database Archivi...
Case Studies in Improving Application Performance With Solix Database Archivi...
 
Pos03154 usen
Pos03154 usenPos03154 usen
Pos03154 usen
 
Net App Scores 100% For Midrange Storage Market Solutions
Net App Scores 100% For Midrange Storage Market SolutionsNet App Scores 100% For Midrange Storage Market Solutions
Net App Scores 100% For Midrange Storage Market Solutions
 
Presidio Data Center Practice Overview
Presidio Data Center Practice OverviewPresidio Data Center Practice Overview
Presidio Data Center Practice Overview
 

Andere mochten auch

Fotonovel.la informatica ionut_roger_elyas
Fotonovel.la informatica ionut_roger_elyasFotonovel.la informatica ionut_roger_elyas
Fotonovel.la informatica ionut_roger_elyasmgonellgomez
 
Battle of the Scales: Examining Respondent Scale Usage across 10 countries
Battle of the Scales: Examining Respondent Scale Usage across 10 countriesBattle of the Scales: Examining Respondent Scale Usage across 10 countries
Battle of the Scales: Examining Respondent Scale Usage across 10 countriesResearch Now
 
IT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersIT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersEMC
 
Fed fiscal monetary policy
Fed fiscal monetary policyFed fiscal monetary policy
Fed fiscal monetary policyTravis Klein
 
Contractor to client_sample
Contractor to client_sampleContractor to client_sample
Contractor to client_sampleARSEN ROCK
 
Other elasticities
Other elasticitiesOther elasticities
Other elasticitiesTravis Klein
 
Mit2 092 f09_lec04
Mit2 092 f09_lec04Mit2 092 f09_lec04
Mit2 092 f09_lec04Rahman Hakim
 
Mit2 092 f09_lec16
Mit2 092 f09_lec16Mit2 092 f09_lec16
Mit2 092 f09_lec16Rahman Hakim
 
Catching the moving targets
Catching the moving targetsCatching the moving targets
Catching the moving targetsResearch Now
 
Gestão de stocks lingua inglesa 1
Gestão de stocks lingua inglesa 1Gestão de stocks lingua inglesa 1
Gestão de stocks lingua inglesa 1Isabel Miguel
 
EMC IT's Virtual Oracle Deployment Framework
EMC IT's Virtual Oracle Deployment FrameworkEMC IT's Virtual Oracle Deployment Framework
EMC IT's Virtual Oracle Deployment FrameworkEMC
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsEMC
 
RSA MONTHLY FRAUD REPORT - September 2014
RSA MONTHLY FRAUD REPORT - September 2014RSA MONTHLY FRAUD REPORT - September 2014
RSA MONTHLY FRAUD REPORT - September 2014EMC
 
Day 8 economic issues
Day 8 economic issuesDay 8 economic issues
Day 8 economic issuesTravis Klein
 
RSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference ArchitectureRSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference ArchitectureEMC
 
Wed demand consumer surplus
Wed demand consumer surplusWed demand consumer surplus
Wed demand consumer surplusTravis Klein
 

Andere mochten auch (20)

Fotonovel.la informatica ionut_roger_elyas
Fotonovel.la informatica ionut_roger_elyasFotonovel.la informatica ionut_roger_elyas
Fotonovel.la informatica ionut_roger_elyas
 
Battle of the Scales: Examining Respondent Scale Usage across 10 countries
Battle of the Scales: Examining Respondent Scale Usage across 10 countriesBattle of the Scales: Examining Respondent Scale Usage across 10 countries
Battle of the Scales: Examining Respondent Scale Usage across 10 countries
 
Citophobia
CitophobiaCitophobia
Citophobia
 
IT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersIT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare Providers
 
Fed fiscal monetary policy
Fed fiscal monetary policyFed fiscal monetary policy
Fed fiscal monetary policy
 
Contractor to client_sample
Contractor to client_sampleContractor to client_sample
Contractor to client_sample
 
Hyper-V Dynamic Memory in Depth
Hyper-V Dynamic Memory in Depth Hyper-V Dynamic Memory in Depth
Hyper-V Dynamic Memory in Depth
 
Other elasticities
Other elasticitiesOther elasticities
Other elasticities
 
Mit2 092 f09_lec04
Mit2 092 f09_lec04Mit2 092 f09_lec04
Mit2 092 f09_lec04
 
Mit2 092 f09_lec16
Mit2 092 f09_lec16Mit2 092 f09_lec16
Mit2 092 f09_lec16
 
Catching the moving targets
Catching the moving targetsCatching the moving targets
Catching the moving targets
 
Gestão de stocks lingua inglesa 1
Gestão de stocks lingua inglesa 1Gestão de stocks lingua inglesa 1
Gestão de stocks lingua inglesa 1
 
EMC IT's Virtual Oracle Deployment Framework
EMC IT's Virtual Oracle Deployment FrameworkEMC IT's Virtual Oracle Deployment Framework
EMC IT's Virtual Oracle Deployment Framework
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array Designs
 
Ppf productivity
Ppf productivityPpf productivity
Ppf productivity
 
RSA MONTHLY FRAUD REPORT - September 2014
RSA MONTHLY FRAUD REPORT - September 2014RSA MONTHLY FRAUD REPORT - September 2014
RSA MONTHLY FRAUD REPORT - September 2014
 
Day 8 economic issues
Day 8 economic issuesDay 8 economic issues
Day 8 economic issues
 
RSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference ArchitectureRSA-Pivotal Security Big Data Reference Architecture
RSA-Pivotal Security Big Data Reference Architecture
 
Wed demand consumer surplus
Wed demand consumer surplusWed demand consumer surplus
Wed demand consumer surplus
 
2015 day 5
2015 day 52015 day 5
2015 day 5
 

Ähnlich wie White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global Data Warehouse

White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...EMC
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC
 
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCE
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCEEMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCE
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCEEMC
 
Configuration Compliance For Storage, Network & Server
Configuration Compliance For Storage, Network & Server Configuration Compliance For Storage, Network & Server
Configuration Compliance For Storage, Network & Server EMC
 
EMC IT's Journey to Cloud : VIRTUAL DESKTOP
EMC IT's Journey to Cloud : VIRTUAL DESKTOPEMC IT's Journey to Cloud : VIRTUAL DESKTOP
EMC IT's Journey to Cloud : VIRTUAL DESKTOPEMC
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataBlueData, Inc.
 
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution EMC
 
Beakbane safeguards future with ERP - ready infrastructure upgrade.
Beakbane safeguards future with ERP - ready infrastructure upgrade.Beakbane safeguards future with ERP - ready infrastructure upgrade.
Beakbane safeguards future with ERP - ready infrastructure upgrade.Icomm Technologies
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...EMC
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...EMC
 
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION EMC
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - finalAndrew White
 
A Journey to Power Intelligent IT - Big Data Employed
A Journey to Power Intelligent IT - Big Data EmployedA Journey to Power Intelligent IT - Big Data Employed
A Journey to Power Intelligent IT - Big Data EmployedMohamed Sohail
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power CenterEdureka!
 
Andmekeskuse hüperkonvergents
Andmekeskuse hüperkonvergentsAndmekeskuse hüperkonvergents
Andmekeskuse hüperkonvergentsPrimend
 
Software Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageSoftware Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageEMC
 

Ähnlich wie White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global Data Warehouse (20)

White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
White Paper: EMC Accelerates Journey to Big Data with Business Analytics as a...
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMSEMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
EMC IT's Journey to Cloud : BUSINESS PRODUCTION BACKUP & RECOVERY SYSTEMS
 
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCE
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCEEMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCE
EMC IT's Journey to Cloud : IT-AS-A-SERVICE APPLICATIONS & CLOUD EXPERIENCE
 
Configuration Compliance For Storage, Network & Server
Configuration Compliance For Storage, Network & Server Configuration Compliance For Storage, Network & Server
Configuration Compliance For Storage, Network & Server
 
EMC IT's Journey to Cloud : VIRTUAL DESKTOP
EMC IT's Journey to Cloud : VIRTUAL DESKTOPEMC IT's Journey to Cloud : VIRTUAL DESKTOP
EMC IT's Journey to Cloud : VIRTUAL DESKTOP
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide EMC IT's Journey to the Private Cloud: A Practitioner's Guide
EMC IT's Journey to the Private Cloud: A Practitioner's Guide
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution
 
Beakbane safeguards future with ERP - ready infrastructure upgrade.
Beakbane safeguards future with ERP - ready infrastructure upgrade.Beakbane safeguards future with ERP - ready infrastructure upgrade.
Beakbane safeguards future with ERP - ready infrastructure upgrade.
 
efh-casestudy
efh-casestudyefh-casestudy
efh-casestudy
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applianc...
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
 
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION
EMC IT's Journey to Cloud : IT PRODUCTION SERVER VIRTUALIZATION
 
Brighttalk converged infrastructure and it operations management - final
Brighttalk   converged infrastructure and it operations management - finalBrighttalk   converged infrastructure and it operations management - final
Brighttalk converged infrastructure and it operations management - final
 
A Journey to Power Intelligent IT - Big Data Employed
A Journey to Power Intelligent IT - Big Data EmployedA Journey to Power Intelligent IT - Big Data Employed
A Journey to Power Intelligent IT - Big Data Employed
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power Center
 
Andmekeskuse hüperkonvergents
Andmekeskuse hüperkonvergentsAndmekeskuse hüperkonvergents
Andmekeskuse hüperkonvergents
 
Software Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageSoftware Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and Storage
 

Mehr von EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS BreachEMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 
2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach2014 Cybercrime Roundup: The Year of the POS Breach
2014 Cybercrime Roundup: The Year of the POS Breach
 

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Kürzlich hochgeladen (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global Data Warehouse

  • 1. White Paper EMC Greenplum Data Computing Appliance Enhances EMC IT’s Global Data Warehouse Accelerating Big Data and Analytics Abstract This white paper illustrates a synergistic model for deploying EMC’s Greenplum Data Computing Appliance (DCA) with EMC IT’s incumbent Global Data Warehouse infrastructure. It illustrates the steps to migrate from EMC’s IT Global Data Warehouse, and the progress Greenplum DCA enabled EMC to make in addressing its business challenges of growing data (“Big Data”) and accelerated analytics. August 2011
  • 2. Copyright © 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. VMware and ESX are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners. Part Number H8869.1 EMC IT Accelerating Big Data and Analytics 2
  • 3. Contents Executive Summary ................................................................................................. 4 Problem Statement............................................................................................................. 4 Value Statement ................................................................................................................. 4 Audience ............................................................................................................................ 5 EMC IT’s Big Data Approach ..................................................................................... 6 EMC IT’s Information Reference Architecture....................................................................... 6 EMC IT’s Global Data Warehouse Success ...................................................................... 6 EMC IT’s Big Data and Analytics Business Drivers ........................................................... 7 Greenplum Offerings .......................................................................................................... 8 EMC Greenplum Database Software ............................................................................... 8 EMC Greenplum Data Computing Appliance (DCA) Product Family .................................. 9 Greenplum Enablers ..................................................................................................... 10 Business POC ........................................................................................................ 11 Business Challenge ...................................................................................................... 11 Business Success......................................................................................................... 12 EMC IT’s Greenplum POC Highlights ....................................................................... 13 Greenplum POC Platform Migration .................................................................................. 13 High-Level Migration POC Milestones ........................................................................... 13 How It Was Done: A POC ............................................................................................... 13 Greenplum DCA POC Deployment ................................................................................. 13 EMC IT Global Data Warehouse to Greenplum POC Migration........................................ 15 Migration Steps ............................................................................................................ 15 Business POC Benefits .......................................................................................... 25 Query Performance ........................................................................................................... 25 New Analytics Abilities ..................................................................................................... 28 EMC IT Global Data Warehouse to Greenplum POC: Lessons Learned ............................... 31 Conclusion ............................................................................................................ 33 Benefits............................................................................................................................ 33 References ....................................................................................................................... 34 EMC IT Accelerating Big Data and Analytics 3
  • 4. Executive Summary This white paper illustrates a synergistic model for deploying the Greenplum Data Computing Appliance (DCA) with EMC IT’s existing Global Data Warehouse (GDW) deployed on EMC IT’s Business Intelligence (BI) Grid infrastructure. It outlines the migration steps of moving the GDW to the DCA to solve business problems driven by growing data (“Big Data”), and the need for better, accelerated, and advanced analytics that could not be addressed by the previous Global Data Warehouse. Problem Statement As EMC has grown, so has our data. It is not unusual for us to analyze hundreds of millions of rows of data to gather insight into the health of our business. As EMC has become more complex, so has the analysis that must be executed on this Big Data. Our business is transitioning from backward looking metrics (lagging indicators) to both backward and forward looking indicators (leading indicators), and is using statistical modeling to predict the impact that EMC’s current decisions will have on our future business model. Until recently, EMC’s Global Data Warehouse had been struggling to meet the demand for Operational BI, Big Data, and advanced analytics. This resulted in shadow initiatives and labor- intensive processes to generate the information required by EMC’s business owners. For example, an EMC business unit data load on the legacy infrastructure took more than 6 days and created more than 650 batch reports that had to be reviewed by a business unit subject matter expert to create the needed information to do an analysis. (See Business POC section for details) Greenplum is perfectly designed to satisfy this need for predictive analytics on Big Data. The introduction of the Greenplum DCA will reduce the need for these shadow activities, and will create a platform for EMC IT to develop and deploy an advanced analytics appliance that will serve as a platform to develop Business Intelligence as a Service (BIaaS). Value Statement This document illustrates the specific deployment steps and resultant benefits of the integration of the Greenplum DCA with EMC’s Global Data Warehouse, including: The high-level deployment model of the Greenplum DCA into the existing EMC Global Data Warehouse infrastructure. This includes: – The phased approach for moving EMC IT’s BI functions to the Greenplum DCA via the business proof of concept (POC). EMC IT Accelerating Big Data and Analytics 4
  • 5. – Improved data loading, performance, and functionality via the use of Greenplum DCA. Example: • Ultra-fast loading -- The legacy infrastructure data load that took the business unit more than 6 days and 650 batch reports can be loaded in less than 29 minutes on the Greenplum DCA. • A query running in 427 seconds on the legacy platform decreased to just 28.85 seconds on the Greenplum DCA for 99 million rows of data. • A process that took 45 minutes on the legacy platform can now be completed in 3.46 minutes on the Greenplum DCA. – EMC IT increased its ability to use a tool set to create a dashboard and execute “on-the-fly” analytics instead of delayed batch reports. – The EMC / EMC IT “lessons learned” from the Greenplum DCA deployment. – The co-existence of both the EMC IT Global Data Warehouse (BI Grid) and the Greenplum DCA to address EMC’s data warehouse and BI challenges with Big Data and analytics to create a platform for Business Intelligence (BI) as a Service (BIaaS). Audience This white paper is intended for CIOs, data warehouse architects, Oracle architects, storage architects, Oracle DBAs, and server and network administrators. EMC IT Accelerating Big Data and Analytics 5
  • 6. EMC IT’s Big Data Approach This section identifies the EMC Data Reference Architecture and EMC IT’s current infrastructure of the EMC IT Global Data Warehouse deployment. EMC IT’s Information Reference Architecture The diagram below illustrates EMC IT’s vision of the components and process for its Information Reference Architecture: Figure 1: Information Reference Architecture model This logical architecture shows the need to have both the current Oracle Data Warehouse and the DCA be part of EMC IT’s Global Data Warehouse. EMC IT’s Global Data Warehouse Success This consolidated Oracle Global Data Warehouse deployed on EMC IT’s BI-Grid infrastructure (Oracle RAC) has delivered the following benefits for EMC and EMC IT (please review the Global Data Warehouse white paper in the References section for details): • Consolidation of five data warehouses to one unified Global Data Warehouse infrastructure EMC IT Accelerating Big Data and Analytics 6
  • 7. 10 times improvement in query performance from Day 1 • 180 percent improvement in batch job performance • 50 percent reduction in data mart update times • Two to three times performance improvement in reporting cube build times • 200 percent improvement in dashboard rendering and drill-down performance EMC IT’s Big Data and Analytics Business Drivers Even with the greatly improved performance provided by the enterprise Oracle Data Warehouse, EMC IT was still not keeping up with: • Data explosion • Shadow systems sprawl • Shrinking load windows • Increasing complexity of the analytics required by the business • Volumes of data required for predictive analytics and data mining • Explosion of point solutions and shadow marts in an attempt to remediate the issues Figure 2 is a high-level illustration of the challenges the data warehouse did not address: Figure 2: EMC’s Big Data business drivers EMC IT Accelerating Big Data and Analytics 7
  • 8. Figure 2 illustrates the need to integrate the Greenplum DCA into the Global Data Warehouse infrastructure to address the business drivers that the Global Data Warehouse could not solve. These business drivers caused EMC IT to look at a new platform to deal with Big Data and analytics. EMC IT selected the Greenplum DCA for its ability to: • Analyze data and information in real time • Apply analytics and discern critical information • Unlock the value of information from this Big Data Greenplum Offerings The following section gives a high-level overview of the Greenplum Database and the Greenplum DCA, and their ability to accelerate business value for EMC IT Big Data and analytics drivers. EMC Greenplum Database Software Built to support the next generation of Big Data warehousing and large-scale analytics processing, the EMC Greenplum Database manages, stores, and analyzes terabytes to petabytes of data. Users experience 10 to 100 times better performance over traditional RDBMS products – a result of the DCA’s shared-nothing MPP architecture, high-performance parallel dataflow engine, and advanced gNet software interconnect technology. The Greenplum Database was conceived, designed, and engineered to enable customers to take advantage of large clusters of increasingly powerful, increasingly inexpensive commodity servers, storage, and Ethernet switches. EMC Greenplum customers can gain immediate benefit from deploying the latest commodity hardware innovations. Greenplum’s shared-nothing architecture is optimal for fast queries and loads because processors are placed as close as possible to the data itself for faster operations with the maximum degree of parallelism possible. EMC IT Accelerating Big Data and Analytics 8
  • 9. Figure 3: Greenplum DCA high-level architecture EMC Greenplum Data Computing Appliance (DCA) Product Family The Greenplum Data Computing Appliance family is a group of purpose-built, highly scalable, parallel data warehousing appliances that architecturally integrate database, compute, storage, and network into a single, easy-to-manage enterprise-class system. Designed for rapid analysis of data volumes scaling into petabytes, the Greenplum DCA is a powerful platform for unifying business intelligence and advanced analytics enterprise-wide. The Greenplum DCA has been proven to provide the industry’s fastest data loading capabilities, and, in turn, to rapidly deliver real business value to customers by reducing investments of money, time, and effort. The Greenplum DCA product line is available in two models: the Standard DCA, which supports up to 144TB of compressed data; and the High Capacity DCA, which supports up to 496TB of compressed data. Regardless of your data computing requirements, there is a model in the DCA family to meet your needs. The Greenplum DCA family is offered in configurations ranging from a quarter-rack to multiple-rack appliances to achieve the maximum flexibility and scalability for organizations faced with terabyte to petabyte scale data opportunities. Because these appliances support the same state-of-the-art EMC Greenplum Database as their core, switching between the models can be effortless when a business grows and its requirements change. The Greenplum DCA family provides a rapidly deployable, scalable, and cost-effective infrastructure so you can manage and analyze exploding data volumes while increasing performance and achieving greater business agility. EMC IT Accelerating Big Data and Analytics 9
  • 10. Greenplum Enablers Greenplum is a strategic platform for EMC IT. At a high level, it provides the following benefits for EMC IT’s vision of Big Data and analytics: • Ultra-fast loads and linear scalability • Data aggregation with common taxonomy • In database ad-hoc analytics • Predictive analysis and proactive identification EMC IT Accelerating Big Data and Analytics 10
  • 11. Business POC The EMC Customer Quality business unit focuses on EMC's customers, partners, and internal stakeholders, looking at quality in the broadest sense, including products, services, innovations, interactions, and processes. The following section highlights the EMC Corporate Quality business unit’s challenge using the current Oracle infrastructure. Business Challenge Prior to deploying the Greenplum DCA, the analytics process would begin with a 31-hour data load preparation process followed by a 4-to-5-day batch report run. At the conclusion of this process, Customer Quality would begin looking at product quality. Customer Quality would then spend a week or more creating various reliability analyses. They would use other offline toolsets and spreadsheets, as well as many hours of labor-intensive analysis, to generate the root cause analysis of any identified issues and to predict expected future product quality. The diagram below illustrates the process used on the EMC IT Global Data Warehouse and stand-alone reporting instance: Figure 4: Previous Customer Quality process EMC IT Accelerating Big Data and Analytics 11
  • 12. Business Success With hundreds of millions of records to sort through, obtaining the insight from the data using the existing platform was previously a time-consuming process. Now, with the Greenplum DCA, the Customer Quality business unit processes hundreds of millions of records in minutes. The diagram below illustrates the process used on the Greenplum DCA: Figure 5: New Customer Quality process with Greenplum With the Greenplum DCA, Customer Quality can perform quality analysis of all the products, versus only one product at a time in the Oracle environment. Customer Quality can now drive business and customer value by using Greenplum’s speed. EMC IT Accelerating Big Data and Analytics 12
  • 13. EMC IT’s Greenplum POC Highlights The following section outlines the highlights of the Greenplum POC platform migration and Greenplum POC architecture. Greenplum POC Platform Migration These sub-sections detail the high-level migration milestones and POC approach to migrate to the Greenplum DCA. They highlight the discovery and phased migration of a POC project to gain experience and knowledge on Greenplum, and describe how it could coexist with the existing Global Data Warehouse infrastructure. High-Level Migration POC Milestones The high-level migration milestones were: • October 2010 – January 2011: Proof-of-concept (POC) Greenplum DCA equipment procured and configured. • February – March 2011: Initial load tests run on the POC environment. • April – May 2011: POC Greenplum DCA and initial procurement used to build a development (dev) and test POC environments. Introducing the Greenplum DCA in EMC IT's Global Data Warehouse infrastructure required planning, a knowledgeable IT staff, and business owners. How It Was Done: A POC EMC IT first got its knowledge and experience of deploying and migrating EMC Customer Quality data by migrating a few legacy data marts to the Global Data Warehouse infrastructures. The Customer Quality data was then migrated over to the Greenplum DCA for the POC to prove that the Greenplum DCA could address the real- world advanced analytics problem of the Customer Quality organization. Greenplum DCA POC Deployment The deployment of EMC IT’s Greenplum DCA for this POC included the introduction of the following hardware into the EMC IT Global Data Warehouse infrastructure: EMC IT Accelerating Big Data and Analytics 13
  • 14. Figure 6: EMC IT’s Greenplum DCA POC hardware layout Table 1 shows the components of the separately deployed Greenplum appliances for the POC infrastructure: Components GP1000 (Full Rack) DCA GP100 (Half Rack) DCA Operating system Red Hat Enterprise Linux 5 Red Hat Enterprise Linux update 5 (RHEL 5.5). 5 update 5 (RHEL 5.5). RDBMS Software Greenplum 4.0.3 Greenplum 4.0.3 Segment Servers 16 8 Processors (per (2) six-core (2.93GHz) (2) six-core (2.93GHz) segment server) 192 total cores 96 total cores Memory (per segment 48GB (768GB total) 48GB (384GB total) server) Storage (hard drives (12) 600GB 15K SAS (12) 600GB 15K SAS per segment server) Usable Capacity 36TB 18TB (uncompressed) Usable Capacity 144TB 73TB (compressed) Master Servers 2 2 Processors (per (2) six-core X5680 (3.33 (2) six-core X5680 (3.33 master server) GHz) 12 total cores GHz) 12 total cores EMC IT Accelerating Big Data and Analytics 14
  • 15. Memory (per master 48GB 48GB server) Storage (hard drives (6) 600GB 10K SAS (6) 600GB 10K SAS per master server) Scan Rate 24GB/s 12GB/s Data Load Rate 10TB/Hr 5TB/Hr Table 1: Greenplum DCA POC components EMC IT Global Data Warehouse to Greenplum POC Migration The following section details the method EMC IT used to migrate the EMC Global Data Warehouse to the Greenplum DCA platform. Migration Steps Here are the steps for migrating EMC IT’s Oracle Global Data Warehouse to the EMC IT POC Greenplum DCA: Step 1. Assess the current Oracle environment. 1. Prepare a list of objects of the Oracle Global Data Warehouse (GDW) The total database size in Oracle is 15TB. Below are all application-related objects (counts) in the current Oracle environment. Number of Object Types Objects TABLES 3,634 TABLE PARTITIONS 485 TABLE SUBPARTITIONS 732 INDEXES 4,424 INDEX PARTITIONS 5,823 INDEX SUBPARTITIONS 1,924 MATERIALIZED VIEWS 34 LOB SEGMENTS 27 TRIGGERS 1,492 PACKAGES 70 PACKAGE BODY 55 PROCEDURES 194 EMC IT Accelerating Big Data and Analytics 15
  • 16. SEQUENCES 720 Constraints 16,140 Packages 61 Number of rows ~20 billion rows Procedures 191 Tables related to CQ 394 Table 2: Oracle Global Data Warehouse (GDW) objects 2. Find the objects that do not need to be migrated to Greenplum: a. Materialized views b. Indexes – due to MPP architecture performance gains, indexes will not be needed Test to see if it is necessary to migrate certain objects, such as materialized views and indexes. 3. Prepare a list of schemas, with the size of the objects to be migrated. 4. Prepare the list of objects to be migrated to Greenplum. Object type Count Comments TABLES and data 3,634 Tables and its data to be migrated TABLE Created two versions: non-partitioned and PARTITIONS 485 then partitioned TABLE SUBPARTITIONS 732 Tested all the options INDEXES 4,424 Not migrated INDEX PARTITIONS 5,823 Not migrated INDEX SUBPARTITIONS 1,924 Not migrated MATERIALIZED VIEWS 34 Not migrated TRIGGERS 1,492 Not part of the POC PACKAGES 70 Not part of the POC PACKAGE BODY 55 Not part of the POC PROCEDURES 194 Not part of the POC Required for incremental loading. Not part of SEQUENCES 720 this POC Table 3: Oracle Migrate/No Migrate object list Step 2. Assess data types in the Oracle database. Below are the data types and the number of columns for each. EMC IT Accelerating Big Data and Analytics 16
  • 17. Data types Counts VARCHAR2 74,296 NUMBER 38,393 DATE 16,999 CHAR 845 FLOAT 564 TIMESTAMP 46 NCHAR 1 NVARCHAR 25 Table 4: Oracle Data Types 1. Find the right equivalent data type available in Greenplum. Below are the different data types that are part of the tables to be migrated and their equivalent Greenplum (postgres) data types. Data types Greenplum equivalent data type VARCHAR2, NVARCHAR2 VARCHAR NUMERIC, NUMERIC (p,s), SMALLINT (2 NUMBER bytes), INTEGER (4 bytes), BIGINT (8 bytes), DATE DATE or TIMESTAMP CHAR, NCHAR CHAR NUMERIC (p,s), DECIMAL (p,s), REAL (4 FLOAT bytes), DOUBLE (8 bytes), TIMESTAMP TIMESTAMP Table 5: Greenplum data type The following select statement assists in finding the precision and scale used for the NUMBER data types. It assumes statistics are up to date in Oracle: select length(utl_raw.cast_to_number(HIGH_VALUE)) from all_tab_columns where table_name='TABLE_NAME' and column_name='COLUMN_NAME' Step 3. Identify and assess migration tools and their requirements. Migration tools considered: • Informatica • Migration scripts • Ora2pg freeware tool • Web external tables • Third-party utilities EMC IT Accelerating Big Data and Analytics 17
  • 18. Step 4. Prepare and deploy Greenplum DCA. 1. Site-specific requirements for DCA deployment. a. IP address on network for master, standby server, and network access. b. Choose passwords for root and gpadmin, and password for remote console access (iDRAC). c. Preferred locale, time zone for DCA servers. d. Preferred database character set encoding. 2. Installing Greenplum Data Computing Appliance (DCA). a. Appliance: Shipping state from Manufacturing. i. Appliance comes with racked, stacked, and cabled. ii. Software pre-installed, minimum on-site configuration required. 3. Initial deployment. a. Supply power to rack; inspect and validate rack hardware and cabling. b. Configure access to network (master and standby only). c. Refresh software if needed; run validation check. d. Configure locale, timezone, and NTP on all DCA servers. e. Run gpinitsystem to initialize database. f. Initialize standby database. g. Enable Greenplum performance monitor. h. Configure and connect EMC as needed. The installation time was only 8 hours for each DCA deployed (half and full rack) for a total of 16 hours. EMC IT Accelerating Big Data and Analytics 18
  • 19. Step 5. Prepare Greenplum Database Environment in DCA. 1. Create a database in Greenplum DCA. Database properties will be inherited from template1 that was created during the DCA environment setup. Creating a database in Greenplum is as simple as shown in this screen shot: Figure 7: Greenplum database creation 2. Preparing schemas. a. List all the schemas to be created, and create schemas in Greenplum using the below syntax. Figure 8: Greenplum database schemas EMC IT Accelerating Big Data and Analytics 19
  • 20. Step 6. Convert DDL from Oracle to Greenplum. 1. Converting DDL from Oracle to Greenplum involves: a. Reading the table properties from the current Oracle environment. b. Converting the create table syntax from Oracle to Greenplum. c. Converting each data type in table definition from Oracle to Greenplum. d. Choosing the right distribution key. 2. Identify the distribution keys. EMC IT chose the distribution keys based on these criteria: a. If a table contains a primary key in Oracle, consider it a distribution key in Greenplum. b. If a table in Oracle contains no primary key but a unique key exists, then consider a unique key with fewer columns for a distribution key in Greenplum. c. Choose Random distribution when the table does not have a candidate column for the distribution key and the table is small. 3. No tool is perfect when it comes to converting the DDL from Oracle to any other MPP database. a. EMC IT leveraged a DDL conversion utility, which connects to Oracle and converts table DDLs for one schema, multiple schemas, or a single table. The EMC IT DDL conversion utility usage is as follows: Usage: generate_gp_tables.sh -s <schema names separated by comma> -r <Optional:remap schema> -p <Optional:Path> There are three switches in this script. -s is mandatory and EMC IT needs to specify the schema name. The EMC IT script can specify multiple schemas separated by comma. The other two switches are optional. -p switch is path for generating ddl scripts. If this is not specified, ddl scripts will be generated in the current directory. -r is for remap schema option. E.g.: #:/apps/oradump2/dump2/gpdump/scripts:oraedw1> ./generate_gp_tables.sh -s extract,masterdata -r extract_data,mstr_data EMC IT Accelerating Big Data and Analytics 20
  • 21. Directory created. PL/SQL procedure successfully completed. PL/SQL procedure successfully completed. Directory dropped. #:/apps/oradump2/dump2/gpdump/scripts:oraedw1> ls -ltr total 496 -rwxr-x--- 1 oracle dba 9861 Jun 6 01:50 generate_gp_tables.sh -rw-r--r-- 1 oracle dba 434555 Jun 6 02:04 create_gptables_EXTRACT_DATA.sql -rw-r--r-- 1 oracle dba 39587 Jun 6 02:04 create_gptables_MSTR_DATA.sql The above process illustrates converting DDLs from Oracle to Greenplum for two schemas (Extract and MasterData). Step 7. Create table structures in Greenplum. 1. Copy converted DDL files to the Greenplum master server script location. 2. Execute the script in Greenplum to create the DDL of all tables in the Greenplum database. E.g.: psql -d gpcqdb -f create_gptables_EXTRACT_DATA.sql -L create_gptables_EXTRACT_DATA.log -o create_gptables_EXTRACT_DATA.out Step 8. Create a gold environment as a data migration source. 1. Create a gold environment with migration scripts to perform the clone from the production Oracle Global Data Warehouse. This option will have no impact on the production database during migration of the data. The script: a. Shuts down the gold Oracle database and unmounts the Oracle asm diskgroups. b. Recreates the clone using EMC Timefinder technology. c. Keeps the Production Oracle instance in backup mode. d. Activates the created Timefinder clone. e. Takes the Oracle production instance out of backup mode. f. Mounts asm diskgroups on the gold environment and brings up the Oracle database on the gold environment to use as the Data Migration source. Step 9. Leverage unloading and loading scripts from Oracle to Greenplum. The challenge involved in EMC IT’s environment is to be able to migrate data from more than 3,600 tables, with approximately 132,000 columns. Each column of data should be able to be identified uniquely. EMC IT Accelerating Big Data and Analytics 21
  • 22. 1. Unloading involves: a. Identifying the schemas and tables to be unloaded. b. Establishing the connection to the Oracle database and ASM. c. Choosing the type of unload, such as direct or conventional. d. Choosing the column separator (E.g.: ;). e. Choosing the Record separator (E.g.: n). f. Choosing the Encloser (E.g. “) . g. Choosing the Escape Enclose (E.g. ). 2. Loading to Greenplum involves: a. A YAML file for each table. An example YAML file is below. YAML is a human-friendly data serialization standard for all programming languages. In this example, this YAML file had the information about the target Greenplum environment (database name, user, hostname, port, input source information, column identifications in the files, and the delimiters used). VERSION: 1.0.0.1 DATABASE: gpcqdb USER: gpadmin HOST: gpserver PORT: 5432 GPLOAD: INPUT: - SOURCE: FILE: - /tmp/gp/W_TASK_NOTE_D.dat. - COLUMNS: - ROW_WID: numeric - W_SESSION_NUMBER: numeric - TASK_WID: numeric - NOTE_ID: numeric - NOTE_CRTE_BY_NM: text - NOTE_CRTE_BY_RSRC_ID: numeric - NOTE_CRTE_DT: timestamp - TASK_ID: numeric - NOTE_SUBJ: text - NOTE_TYPE_WID: numeric - NOTE_VISABILITY_FLG_WID: numeric - W_LAST_UPDATED_BY: text - W_INSERT_DT: timestamp - W_UPDATE_DT: timestamp - W_ROW_HASH: text - W_REPROCESSED_DT: timestamp - FORMAT: text - DELIMITER: ';' - NULL_AS: '' EMC IT Accelerating Big Data and Analytics 22
  • 23. - ESCAPE: '' - ERROR_LIMIT: 10000 - ERROR_TABLE: GS_ERR.w_task_note_d_err OUTPUT: - TABLE: GS.w_task_note_d - MODE: INSERT b. An input data source; a data file, pipe, or socket. c. Gpload commands for all the tables involved to load the data from the input source to the Greenplum target database. Here is an example gpload command: gpload.py -f "/tmp/W_TASK_NOTE_D.ctl" -h gpserver -p 5432 - d gdcqdb -U gpadmin -l "/tmp/W_TASK_NOTE_D.log" Step 10. Migrate data from Oracle to Greenplum. 1. Create sockets; unload the data using unload scripts, which unloads the data directly from Oracle ASM in parallel into PIPLE/SOCKET; compress the data; transfer to Greenplum servers; uncompress, and load data to Greenplum using the gpload utility. Below are example commands to perform the data migration from Oracle to Greenplum using pipe/socket. 1. On Oracle server: mknod /tmp/TABLE.dat p 2. On Oracle server: gzip < /tmp/TABLE.dat | ssh -q gpadmin@gpmasterhost "gunzip >/tmp/TABLE.dat " & 3. On Greenplum DCA: mknod /tmp/TABLE.dat p gpload.py -f "/tmp/TABLE.ctl" -h gpmasterhost -p 5432 -d gpdb -U gpadmin -l "/tmp/TABLE.log" 4. On the Oracle server, start the unload process: a. Using this approach, it starts loading data into Greenplum as it unloads data from Oracle without the need for any intermediary storage. You can also unload the data to an ASCII text file, read the data from this text file, and load into the Greenplum database. This approach requires the intermediary storage to hold the ASCII text files. EMC IT Accelerating Big Data and Analytics 23
  • 24. After the data has been migrated, the storage space needed on Greenplum is only 50 percent of the storage space needed in Oracle because it eliminates the need for most of the indexes and materialized views. Here is a high-level illustration of steps 8 – 10: Figure 9: Oracle to Greenplum high-level migration steps EMC IT Accelerating Big Data and Analytics 24
  • 25. Business POC Benefits The following are illustrative examples of the benefits of using the Greenplum DCA. Query Performance Here are a few examples of queries that were executed in the Oracle environment, and the queries and reports on the Greenplum DCA. Figure 10: Oracle versus Greenplum Query 1 example EMC IT Accelerating Big Data and Analytics 25
  • 26. As the diagram illustrates, a query that took 7 minutes, 7 seconds to execute in the Oracle environment now executes in 28.85 seconds (28855 ms) on more than 99 million rows. Oracle sample Query 2: EMC IT Accelerating Big Data and Analytics 26
  • 27. Greenplum sample Query2: Figure 11: Oracle versus Greenplum Query 2 example EMC IT Accelerating Big Data and Analytics 27
  • 28. Figure 12: Oracle versus Greenplum Query 1 and Query 2 example The query graph illustrates that Query 1 was run in 427 seconds in Oracle and in 28 seconds on Greenplum DCA. Query 2 on Oracle executed in 144 seconds and 13 seconds on Greenplum. New Analytics Abilities Prior to the introduction of the Greenplum DCA, all analysis was done via batch reports. In the POC, the performance of the Greenplum DCA gave Customer Quality the ability to use a tool set to create a dashboard and execute on-the-fly analytics that was not available in the Oracle environment. With the Greenplum DCA in place, Customer Quality can now process large data sets in record time. Here are examples that illustrate the ability to view Big Data in a dashboard and do analytics on-the-fly via dashboard tools such as SAS or other third-party BI tools. EMC IT Accelerating Big Data and Analytics 28
  • 29. Figure 13: Big Data analysis dashboard Greenplum now enables Customer Quality to provide management with a dashboard environment depicting many views of a product analysis, and, most importantly, to proactively address an issue before it is seen by the customer. EMC IT Accelerating Big Data and Analytics 29
  • 30. Figure 14: Dashboard view of Customer Quality’s Big Data Customer Quality can now quickly analyze across a broad range of data to enable proactive remediation of quality concerns and ensure continued customer satisfaction. The time saved using Greenplum has allowed enhanced predictive analytics to be performed, which will provide even more value internally and for EMC’s customers. EMC IT Accelerating Big Data and Analytics 30
  • 31. Figure 15: Greenplum enablers As the diagram illustrates, Greenplum gives accelerated insight into: • Predictive models • Data mining • Dashboards • Dynamic query reports EMC IT Global Data Warehouse to Greenplum POC: Lessons Learned Here are highlights of the lessons learned about Oracle from this migration. Challenge Resolution Greenplum expertise Greenplum reduces the complexity of Big Data analytics, enabling a simple retraining of the current EMC IT Oracle Data Warehouse team. Data type conversions There is no perfect tool for converting the data types from Oracle to any MPP database. The majority of the numeric values in Oracle were defined as NUMBER data type without precision and scale. To better manage storage space, we experimented with a few conversion options. Our testing showed the following as the best option: 1. Converted all the Oracle NUMBER columns to EMC IT Accelerating Big Data and Analytics 31
  • 32. Greenplum NUMERIC data type. a. NUMERIC data type consumes a minimum of 7 bytes in Greenplum and can expand if the data is larger. This has been a very good resolution so far. We had to remediate the load failures. b. There is still a chance to save some space where there are columns that do not need 7 bytes of storage space. Greenplum native INT data types are more storage-efficient and perform better. This requires expensive 100 percent stats collection on Oracle or a lot of expertise on thousands of columns of space requirement. Data distribution The distribution policy determines how to divide the rows of a table across the Greenplum segments. 1. Used hash distribution and a primary key as the distribution key where a primary key exists for a table in Oracle. 2. Used hash distribution and unique index as a distribution key where a primary key does not exist and a unique index exists. 3. Used random distribution for tables with no primary key and no unique index. Staging space for For accelerated data movement, we leveraged a mounted NAS unload files device between Oracle and Greenplum. (for 10TB database) Rather than using the compression algorithm during unload and the un-compression algorithm for load, we used PIPE to unload data with no compression and loaded to Greenplum via PIPE. OBIEE integration with EMC IT used the pgsqlODBC driver to successfully connect Greenplum OBIEE with the Greenplum Database; it generates the SQL we wanted and expected. We also leveraged UDFs (user defined functions) in Greenplum to replicate some Oracle built-in functions. Generating a DDL from There is no perfect tool out there. The big challenge is Oracle to Greenplum generating the DDL with the right distribution policy, right distribution key, and correct data type. Working with the experts at Greenplum, we leveraged a set of methodologies and scripts to convert the DDL from Oracle to Greenplum. Table 6: Lessons learned in EMC IT’s migration from Global Data Warehouse to Greenplum EMC IT Accelerating Big Data and Analytics 32
  • 33. Conclusion This white paper illustrated a synergistic model for deploying the Greenplum DCA with the existing EMC IT Global Data Warehouse (BI Grid infrastructure), showing the high- level steps to migrate, business challenges the EMC’s Customer Quality business unit experienced in the migration from the Global Data Warehouse to Greenplum, and the improvements seen in Big Data and analytics processing with the Greenplum DCA for EMC IT. Benefits – Greenplum DCA reduces the need for labor-intensive offline analysis with dramatic results: • Customer Quality has seen a dramatic reduction in processing time. • Query examples that show a dramatic reduction in query execution time: • Query 1 example from 427 seconds executing in the Oracle environment down to 28.85 seconds for 99 million rows with the DCA. • Query 2 example from 144 seconds executing in the Oracle environment down to 13 seconds with the DCA. • Data load example from 6 days on the legacy platform to 29 minutes on the Greenplum DCA. • Introduction of third-party BI tools to accelerate visualization of Big Data and analytics. – EMC IT created a platform to develop and deploy a Big Data and analytics appliance, and to use as a future platform for developing Business Intelligence as a Service (BIaaS). EMC IT Accelerating Big Data and Analytics 33
  • 34. References The links below will direct you to detailed information on the following subjects: EMC IT’s Global Data Warehouse EMC IT's Migration to the Open, Expandable Oracle BI Grid - Applied Technology Greenplum Database and Data Computing Appliance www.emc.com/bigdata Greenplum DCA Data Sheet Greenplum DCA Data Sheet EMC Greenplum Data Computing Appliance architecture EMC Greenplum Data Computing Appliance architecture SAS interoperability with EMC Greenplum database SAS interoperability with EMC Greenplum database Acknowledgments The author thanks EMC IT’s Global Data Warehouse, Greenplum, and the Customer Quality teams for assistance in creating this white paper. EMC IT Accelerating Big Data and Analytics 34