SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
© Copyright 2012 EMC Corporation. All rights reserved.   1
整合分析結構與非結構
                                                         性資料暨應用案例
                                                         Greenplum
                                                         Enable Big Data Analytics


                                                         邱垂吉 Jimmy Chiu

                                                         技術顧問/EMC Greenplum Taiwan




© Copyright 2012 EMC Corporation. All rights reserved.                               2
Volume, Variety, Velocity, Value +
 Complexity
New insights on                   Contextual and
customers, products,                                      Velocity                   Volume                    location-aware
and operations                                                                                                 delivery to any
                                                                       Big Data                                device



                                        Variety                                            Complexity




                Documents          Transactional                     Smart Grid   Images      Audio     Text      Video
                                   Data
      • Volume: data volumes approaching multiple petabytes
      • Velocity: data being generated and ingested for analysis in real-time
      • Variety: tabular, documents, e-mail, metering, network, video, image,
        audio
      • Complexity: different standards, domain rules, and storage formats per
        data type
                                                                                                                    Gartner March 2011



 © Copyright 2010 EMC Corporation. All rights reserved.                                                                                  3
Sample Big Data Scenarios
             LOAN PROCESSING                             AUTO INSURANCE     SMART GRID ANALYTICS
                     IN BANKING                          IN P&C INSURANCE     IN UTILITIES/ENERGY




                                                                            REAL-TIME STATISTICAL
PROACTIVE EMERGENCY RESPONSE                             VIDEO ANALYTICS
                  IN HEALTHCARE                              IN RETAIL
                                                                             PROCESS CONTROL
                                                                               IN MANUFACTURING




© Copyright 2010 EMC Corporation. All rights reserved.                                              4
Big Data Analytics For Competitive
Advantage            Suppliers                                                         Suppliers

                                                            Who are my
                                                           most valuable
                  Manufacturing                             customers?               Manufacturing
                                                                                       Inventory
                     Inventory

                 Physical Assets                                                    Physical Assets

                                                         What are my most             Distribution
                                                           important                   Services
                    Distribution
                                                            products?                  Personal
                                                                                       Marketing
                     Services

                      Mass                                                            Additional
                     Marketing                                                         Profits
                                                         What are my most
                                                           successful
                                                           campaigns?
                    Customers                                                         Customers


          Today’s Business Model                                            Big Data Analytics Business Model



© Copyright 2010 EMC Corporation. All rights reserved.                                                          5
Big Data meets Fast Data

                                                         Social and Personal – Every
                                                         Minutes:
                                                         •Google gets more than 2 million search
                                                         queries
                                                         •About 47,000 people download an App
                                                         •Some 100,000 tweets hit Twitter
                                                         •Almost 300,000 people log on to
                                                         Facebook
    Business and Transactional:
    •CERN (European Organization for Nuclear
    Research) generates 40TB/sec of scientific
    data
    •Wal-Mart – 1 million transactions per hour
    •World’s top systems currently trade at
    faster than 50 microseconds
    •New York Stock Exchange generates 1TB of
    new trading data daily



© Copyright 2010 EMC Corporation. All rights reserved.                                             6
Working together, they enable entirely
 New Business Models
                                                          Big Data allows you to find
                                                          opportunities you didn’t know
                                                          you had.
                                                          Fast Data allows you to respond
                                                          to opportunities before they are
                                                          gone.

                                                         In the Financial Services
                                                         Industry, large quantities of
                                                         historical data need to be
                                                         processed against a growing number
                                                         of fast-moving data feeds.

                                                         Batch processing is no longer a
                                                         suitable solution!



© Copyright 2010 EMC Corporation. All rights reserved.                                        7
Effective Customer Segmentation is all
 about blending Structured and
 Unstructured Data




       – Transaction data (structured data) tells you what the customer
         did.
       – Unstructured data can tell you why they did it, why some others
         did not, what else they need or want, and what problems they may
         have.

© Copyright 2010 EMC Corporation. All rights reserved.                      8
Big Data Architecture                                 Solving Big Data challenge
                                                          involves more than just
   Requirements                                          managing volumes of data.

                                                                         ― Gartner

  • Multiple data types: structured, semi-structured,
    unstructured
  • Integrated data stores: real-time, traditional,
    data warehouse
  • Modern development tools: Java, lightweight
    messages, mobile-enabled
  • Cloud-enabled: elastic scale, self-healing



          Beware point solutions – integration is critical!

© Copyright 2010 EMC Corporation. All rights reserved.                                9
Greenplum Overview




© Copyright 2010 EMC Corporation. All rights reserved.   10
Greenplum Product Line




© Copyright 2010 EMC Corporation. All rights reserved.   11
Architecture of Greenplum
Flexible framework for processing large datasets

Process large datasets with support for                        SQL
both SQL and MapReduce                                      MapReduce


                                                         Master    Master
Master servers optimize queries
for the most efficient query execution


Interconnect for continuous
pipelining of data processing


Segment servers process queries
close to the data in parallel



MPP Scatter/Gather streaming for
fast loading of data




© Copyright 2010 EMC Corporation. All rights reserved.                      12
Greenplum MPP Share-Nothing Arch.


                                                                           MPP
      Share                                         Share Disk                          Share nothing
    everything                                              eg:                                 eg:
         eg:                                             Oracle RAC                       Greenplum
      Unix server
                                            Intranet
                                                                                              Master
                                                                                   Intranet

          DB                         DB             DB          DB    DB
                                                                            DB           DB       DB     DB
                                          SAN/FC



        Disk                                          SAN
                                                                            Disk         Disk     Disk   Disk
                                                    Share disk




© Copyright 2010 EMC Corporation. All rights reserved.                                                          13
Benefits of the Greenplum Database
   Architecture
  • Simplicity
           –    Parallelism is automatic – no manual partitioning required
           –    No complex tuning required – just load and query
           –    HA
           –    Best of breed x86 and Ethernet networking technologies

  • Scalability
           – Linear scalability
           – Each node adds storage, query performance, loading performance

  • Flexibility
           –    Fully parallelism for SQL92, SQL99, SQL2003 OLAP, MapReduce
           –    Any schema (star, snowflake, 3NF, hybrid, etc)
           –    Rich extensibility and language support (Perl, Python, R, C, etc)
           –    Structure, semi-structure and unstructure




© Copyright 2010 EMC Corporation. All rights reserved.                              14
Greenplum and Hadoop

                                                         Analytics
                                                          Semi-Structured
                  Structured                               Machine Data
                                                                                 UnStructured
                  ERP/CRM                                      Logs              Images/Sound




                    Ad-hoc Analysis                                         batch reporting on static data
                    Dynamic Data


© Copyright 2010 EMC Corporation. All rights reserved.                                                       15
Big Data Analytics
The Power of Data Co-Processing
                                                                                                          Greenplum Chorus
                                                                                                   Analytic Productivity & Tool Integration
                            End-to-end Platform Management & Control




                                                                                                          Data Access And Query
      Greenplum Commander




                                                                                             SQL, MapReduce, SAS, MADLib, Mahout, R, and others

                                                                                    SQL Engine                                                MapReduce Engine
                                                                                                                    parallel                  For Unstructured Data
                                                                                    For Structured Data
                                                                                                                 data exchange                •Enterprise ready Apache
                                                                                    • In-database Advanced
                                                                                      Analytics                                               Hadoop
                                                                                    • Extreme performance on                                  •Faster, more dependable, and
                                                                                      commodity hardware            parallel                  easier to use
                                                                                                                 data exchange


                                                                               Greenplum Database                                      Greenplum Hadoop


                                                                          Network


                                                                       Parallel Loading Of
                                                                         All Data Types




© Copyright 2010 EMC Corporation. All rights reserved.                                                                                                                        16
Greenplum Hadoop

• Greenplum HD
     – Enterprise-ready Apache Hadoop
     – Proven at Scale in 1,000 node Analytics
       Workbench
     – Single product with 2 storage options (Isilon &
       HDFS)


• Enterprise Edition becomes
  Greenplum MR:
     – Advanced features
     – 100% API compatible
     – Software-only product



 © Copyright 2010 EMC Corporation. All rights reserved.   17
AWB Update

Analytics Workbench Operational!
•1025 nodes operational
•1011 nodes with GPHD installed
•8 total projects have been on boarded from university
collaboration to partner technology evaluation

Proposals accepted by customer engagement team –
info@analyticsworkbench.com
•Engagement team will learn project objectives
•JEDI council approves/disproves project based on technical
feasibility and alignment with company goals
•Projects informed of decisions and timelines
Cluster access via - http://portal.analyticsworkbench.com/



 © Copyright 2010 EMC Corporation. All rights reserved.       18
Apache Hadoop Pain Points
                                                         • Poor Job and Application Monitoring
           Monitoring                                      Solution
                                                         • Non-existent Performance Monitoring


      Operability                                        • Complex System Configuration and
                                                           Manageability
          and                                            • No Data Format Interoperability &
     Manageability                                         Storage Abstractions

                                                         • Poor Dimensional Lookup Performance
         Performance                                     • Very poor Random Access and Serving
                                                           Performance



© Copyright 2010 EMC Corporation. All rights reserved.                                           19
Greenplum MR:
Enterprise Edition Stack


           100%
           APACHE




                                                                                                          Enhanced Monitoring
         INTERFACE




                                                                                           Hive
                                                                                   Pig




                                                                                                  HBase
                                                     Zookeeper




                                                                 MapReduce Framework (MapRed)


                                                                      Distributed File System




© Copyright 2010 EMC Corporation. All rights reserved.                                                                          20
Greenplum MR: Enterprise Edition
Enterprise-Ready Hadoop Platform for Unstructured Data



                                                 • 2 – 5x Faster than Apache
               Faster                              Hadoop

                                                 • High Availability
           Reliable                              • Mirroring

         Easier to                               • NFS mountable
            Use                                  • Graphical System Management




© Copyright 2010 EMC Corporation. All rights reserved.                           21
Greenplum MR
 Simple Management

• Health
  Monitoring
• Cluster
  Administratio
  n
• Application
  Provisioning




© Copyright 2010 EMC Corporation. All rights reserved.   22
Rack Level Monitoring




© Copyright 2010 EMC Corporation. All rights reserved.   23
Greenplum MR Delivers True Return on
Investment
                                                         •     NFS direct access to simply load and access
                                                               data directly in a Hadoop cluster
                                                         •     Enables standard tools and utilities to work
                                                               directly on data contained in Hadoop
                                                         •     Heatmap user interface provides full cluster
                                                               visibility and control.


                                                             • Eliminates all single points of failure
                                                             • High Availability for Job Tracker , NameNode &
                                                               NFS
                                                             • Snapshots allow point-in-time data protection
                                                               and recovery.
                                                             • Mirroring for business continuity includes wide
                                                               area replication support.

                                                              • Speeds jobs by 2X – 5X
                                                              • Provides faster performance with ½ the
                                                                hardware
                                                              • Substantial capital and operating expense
                                                                savings

© Copyright 2010 EMC Corporation. All rights reserved.                                                           24
EMC Greenplum

   Fastest data loading                                                                         Advanced analytics



                 DATA IN                                 IN-DATABASE ANALYTICS                     DECISIONS OUT
Scatter/Gather Streaming                                 Optimized for fast query execution   Unified data access for greater
technology for the world’s                               and linear scalability               insight and value from data
fastest data loading                                     •Move processing closer to data      •Enable parallel analysis
•Eliminate data load                                     •Shared-nothing, massively           across the enterprise
bottlenecks                                              parallel processing (MPP)            •Open platform with broad
•Clean and integrate new data                            scale-out architecture               language support
•Several loading options,                                •Computing is automatically          •Certified enterprise
ranging from bulk load                                   optimized and distributed            connectivity and integration
updates to micro-batching for                            across resources                     with most business
near real-time processing                                • Provides the best concurrent       intelligence; extract,
                                                           multi-workload performance         transform, and load (ETL);
                                                                                              and management products


© Copyright 2010 EMC Corporation. All rights reserved.                                                                          25
EMC Big Data Analytics Reference
Architecture
    Data Sources                                         Hadoop                                                                                      Alerts




                                                                                                                               Statistics
                                                                         Reduce
   Documents




                                                                                                          Genetic Algorithms
                                                               Map-




                                                                  Map-
                                        Ecosystem*                                        HDFS
                                                              Reduce                                                                                             Dashboards
      Mobile

                                                     Key Values Documents Other NoSql
     Machine                                                                                                                                         Reports




                                                                                                                               Data Mining
                                Data
                               Quality                          NoSQL Stores
   Multimedia                                        parallel
                                                  data exchange                                                                                                 Spreadsheets
                                                                         SQL Stores
   Web/Social




                                                                                                          OLAP
                                                                                                BU 1




                                                                                                                               Operations Research
                                                                                  Data Marts
    LOB data
                               MDM                                                                                                                    Mobile
                                                          Enterprise
                                                            Data                                BU 2
        ERP                                               Warehouse




                                                                                                          Neural Nets
                                                                                                BU 3

                                ETL                                                                                                                       Data Visualization
        CRM
                                                          Federated
                                                                                               BI as a
                                                            Data
                                                                                               Service
        POS                                               Warehouse



        Data                                                  Data Stores and                                    Data                                      Presentation &
                              Integration
       Input                                                      Access                                       Analysis                                       Delivery

        Structured                        Traditional data                             Traditional data                                              Big data analytics
        data sources                      Integration                                  warehousing                                                   ramifications

*Hadoop Ecosystem includes: Hive, Pig, Mahout, HBase, ZooKeeper, Oozie, Sqoop, Avro

© Copyright 2010 EMC Corporation. All rights reserved.                                                                                                                         26
Architecture for Business Value
                                                           Business Value


                                                         Chorus for Collaboration                Analytics
         Analytics
   Self-develop app                                                                          Self-develop app

                        Java API             Analytics tools         Analytics tools
                                                                                                  JDBC
                                               (Mahout)              (SAS, R, MADlib and more)
                                                                                                  ODBC
                       Hbase
                                                         .csv                                     SAS & MADlib
                                                         .txt                        GPDB         - In GPDB
                                                                                                  - In Memory
                       MapRFS
                       (GPMR)                                                             ETL
                                  MapRFS: C++; MR: C++
                                   x
                             Load Performance: 2~5X                                  DB’s
                         Files    High Availability
                                  Stable




© Copyright 2010 EMC Corporation. All rights reserved.                                                           27
Big Data And EMC

                                                                4   New Analytic Applications




                                                 Data Science   3
                                                                2   Unified Analytics Platform




                      Petabyte Scale Data Storage               1
© Copyright 2010 EMC Corporation. All rights reserved.                                           29
SAS / Greenplum Product Overview
                                              SAS High Performance Computing

        SAS Access for                                          SAS In-Database                      SAS In-Memory
         Integration                                               Processing                          Analytics

Provides integration capability to                         Requires SAS Enterprise Miner in   New functionality from SAS that
a number of databases                                      order to be of value               requires dedicated database
                                                                                              appliance




Allows for increased performance                           Will lead to significant           Very high performance for business
of Base SAS Procs                                          improvement in performance         users that can significantly
                                                                                              increase revenues or decrease
                                                                                              costs as a result of improved
                                                                                              performance



Products: SAS Access for Greenpum                          Products: SAS Access for           Products: SAS Access for
                                                           Greenplum, SAS Grid Manager, SAS   Greenplum, SAS Grid Manager, SAS
                                                           Enterprise Miner, SAS Scoring      High Performance Analytics
                                                           Accelerator for Greenplum




  © Copyright 2010 EMC Corporation. All rights reserved.                                                                           30
SAS and Greenplum UAP Integrated Architecture
                                         Data                Data       Data          Bl         LOB
                                         Scientist           Engineer   Analyst       Analyst    User




                                                         SAS Business Intelligence
      DATA SCIENCE TEAM




                                      Greenplum Chorus - Analytic Productivity Layer

                                                                 SAS Analytics

                                Data Access & Query Layer (SAS ACCESS, SQL, MapReduce)

                                     Greenplum Database                           Greenplum Hadoop


                                    Private/Hybrid Cloud Infrastructure or Appliance
    Data
  Platform
   Admin
                                                         SAS Information Management


© Copyright 2010 EMC Corporation. All rights reserved.                                                  31
In A Single Unified Analytics Platform


Self-Service
Iterative, Agile
Transparent, Real-time Collaboration




Structured & Unstructured Data
Analyze Petabytes Of Current Data
Virtual, Scale Out Architecture




© Copyright 2010 EMC Corporation. All rights reserved.   32
© Copyright 2010 EMC Corporation. All rights reserved.   33

Weitere ähnliche Inhalte

Was ist angesagt?

Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analyticseaiti
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Abhishek Satyam
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10keirdo1
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad IIIT ALLAHABAD
 
INTERSPORT improves fitness and business flexibility
INTERSPORT improves  fitness and business  flexibilityINTERSPORT improves  fitness and business  flexibility
INTERSPORT improves fitness and business flexibilityIBM India Smarter Computing
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradataAsis Mohanty
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantDataWorks Summit
 
Ugif 12 2011-informix iwa
Ugif 12 2011-informix iwaUgif 12 2011-informix iwa
Ugif 12 2011-informix iwaUGIF
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616Bruno Banha
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
Do More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed TechnologiesDo More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed TechnologiesEMC Forum India
 
Transform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information InfrastructureTransform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information InfrastructureEMC Forum India
 
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy ModernizationMove to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy ModernizationDataWorks Summit
 

Was ist angesagt? (19)

Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001Ibm pure data system for analytics n3001
Ibm pure data system for analytics n3001
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Oow Ppt 1
Oow Ppt 1Oow Ppt 1
Oow Ppt 1
 
Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad Green Plum IIIT- Allahabad
Green Plum IIIT- Allahabad
 
INTERSPORT improves fitness and business flexibility
INTERSPORT improves  fitness and business  flexibilityINTERSPORT improves  fitness and business  flexibility
INTERSPORT improves fitness and business flexibility
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Hadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the ElephantHadoop in the Enterprise: Legacy Rides the Elephant
Hadoop in the Enterprise: Legacy Rides the Elephant
 
Ugif 12 2011-informix iwa
Ugif 12 2011-informix iwaUgif 12 2011-informix iwa
Ugif 12 2011-informix iwa
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616
 
Netezza pure data
Netezza pure dataNetezza pure data
Netezza pure data
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
Do More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed TechnologiesDo More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed Technologies
 
Transform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information InfrastructureTransform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information Infrastructure
 
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy ModernizationMove to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
 

Ähnlich wie Greenplum hadoop

Rob anderson
Rob andersonRob anderson
Rob andersonEduserv
 
Fujitsu keynote at Oracle OpenWorld 2012
Fujitsu keynote at Oracle OpenWorld 2012 Fujitsu keynote at Oracle OpenWorld 2012
Fujitsu keynote at Oracle OpenWorld 2012 Fujitsu Global
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalAccenture the Netherlands
 
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...MarketBridge
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiFondazione CUOA
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetTom_Webb
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetTom_Webb
 
IBM Smarter Business 2012 - PureSystems - PureData
IBM Smarter Business 2012 - PureSystems - PureDataIBM Smarter Business 2012 - PureSystems - PureData
IBM Smarter Business 2012 - PureSystems - PureDataIBM Sverige
 
Oracle India Mop Delegation Visit to Colorado 051611
Oracle India Mop Delegation Visit to Colorado 051611Oracle India Mop Delegation Visit to Colorado 051611
Oracle India Mop Delegation Visit to Colorado 051611chandyGhosh
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Information Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeInformation Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeBob Rhubart
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...BIOVIA
 
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...Career Communications Group
 

Ähnlich wie Greenplum hadoop (20)

Rob anderson
Rob andersonRob anderson
Rob anderson
 
Fujitsu keynote at Oracle OpenWorld 2012
Fujitsu keynote at Oracle OpenWorld 2012 Fujitsu keynote at Oracle OpenWorld 2012
Fujitsu keynote at Oracle OpenWorld 2012
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - Technical
 
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...
Crunching “Big Data” to Drive 2012 Revenue Growth: The 5 Myths of Sales & Mar...
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativi
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends Rpaquet
 
September 2 Technology Trends Rpaquet
September 2 Technology Trends RpaquetSeptember 2 Technology Trends Rpaquet
September 2 Technology Trends Rpaquet
 
IBM Smarter Business 2012 - PureSystems - PureData
IBM Smarter Business 2012 - PureSystems - PureDataIBM Smarter Business 2012 - PureSystems - PureData
IBM Smarter Business 2012 - PureSystems - PureData
 
Enterprise Services Solutions
Enterprise Services SolutionsEnterprise Services Solutions
Enterprise Services Solutions
 
Oracle India Mop Delegation Visit to Colorado 051611
Oracle India Mop Delegation Visit to Colorado 051611Oracle India Mop Delegation Visit to Colorado 051611
Oracle India Mop Delegation Visit to Colorado 051611
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Information Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeInformation Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise Challenge
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
(ATS4-GS03) Partner Session - Intel Balanced Cloud Solutions for the Healthca...
 
Privacy final presentaiton
Privacy final presentaitonPrivacy final presentaiton
Privacy final presentaiton
 
Privacy lecture 7 partners
Privacy lecture 7 partnersPrivacy lecture 7 partners
Privacy lecture 7 partners
 
Privacy lecture 8 resources
Privacy lecture 8 resourcesPrivacy lecture 8 resources
Privacy lecture 8 resources
 
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...
The Next Big Thing: Industry Experts Share Pioneering Technical Advancements ...
 

Mehr von Chiou-Nan Chen

Mehr von Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Greenplum hadoop

  • 1. © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. 整合分析結構與非結構 性資料暨應用案例 Greenplum Enable Big Data Analytics 邱垂吉 Jimmy Chiu 技術顧問/EMC Greenplum Taiwan © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. Volume, Variety, Velocity, Value + Complexity New insights on Contextual and customers, products, Velocity Volume location-aware and operations delivery to any Big Data device Variety Complexity Documents Transactional Smart Grid Images Audio Text Video Data • Volume: data volumes approaching multiple petabytes • Velocity: data being generated and ingested for analysis in real-time • Variety: tabular, documents, e-mail, metering, network, video, image, audio • Complexity: different standards, domain rules, and storage formats per data type Gartner March 2011 © Copyright 2010 EMC Corporation. All rights reserved. 3
  • 4. Sample Big Data Scenarios LOAN PROCESSING AUTO INSURANCE SMART GRID ANALYTICS IN BANKING IN P&C INSURANCE IN UTILITIES/ENERGY REAL-TIME STATISTICAL PROACTIVE EMERGENCY RESPONSE VIDEO ANALYTICS IN HEALTHCARE IN RETAIL PROCESS CONTROL IN MANUFACTURING © Copyright 2010 EMC Corporation. All rights reserved. 4
  • 5. Big Data Analytics For Competitive Advantage Suppliers Suppliers Who are my most valuable Manufacturing customers? Manufacturing Inventory Inventory Physical Assets Physical Assets What are my most Distribution important Services Distribution products? Personal Marketing Services Mass Additional Marketing Profits What are my most successful campaigns? Customers Customers Today’s Business Model Big Data Analytics Business Model © Copyright 2010 EMC Corporation. All rights reserved. 5
  • 6. Big Data meets Fast Data Social and Personal – Every Minutes: •Google gets more than 2 million search queries •About 47,000 people download an App •Some 100,000 tweets hit Twitter •Almost 300,000 people log on to Facebook Business and Transactional: •CERN (European Organization for Nuclear Research) generates 40TB/sec of scientific data •Wal-Mart – 1 million transactions per hour •World’s top systems currently trade at faster than 50 microseconds •New York Stock Exchange generates 1TB of new trading data daily © Copyright 2010 EMC Corporation. All rights reserved. 6
  • 7. Working together, they enable entirely New Business Models Big Data allows you to find opportunities you didn’t know you had. Fast Data allows you to respond to opportunities before they are gone. In the Financial Services Industry, large quantities of historical data need to be processed against a growing number of fast-moving data feeds. Batch processing is no longer a suitable solution! © Copyright 2010 EMC Corporation. All rights reserved. 7
  • 8. Effective Customer Segmentation is all about blending Structured and Unstructured Data – Transaction data (structured data) tells you what the customer did. – Unstructured data can tell you why they did it, why some others did not, what else they need or want, and what problems they may have. © Copyright 2010 EMC Corporation. All rights reserved. 8
  • 9. Big Data Architecture Solving Big Data challenge involves more than just Requirements managing volumes of data. ― Gartner • Multiple data types: structured, semi-structured, unstructured • Integrated data stores: real-time, traditional, data warehouse • Modern development tools: Java, lightweight messages, mobile-enabled • Cloud-enabled: elastic scale, self-healing Beware point solutions – integration is critical! © Copyright 2010 EMC Corporation. All rights reserved. 9
  • 10. Greenplum Overview © Copyright 2010 EMC Corporation. All rights reserved. 10
  • 11. Greenplum Product Line © Copyright 2010 EMC Corporation. All rights reserved. 11
  • 12. Architecture of Greenplum Flexible framework for processing large datasets Process large datasets with support for SQL both SQL and MapReduce MapReduce Master Master Master servers optimize queries for the most efficient query execution Interconnect for continuous pipelining of data processing Segment servers process queries close to the data in parallel MPP Scatter/Gather streaming for fast loading of data © Copyright 2010 EMC Corporation. All rights reserved. 12
  • 13. Greenplum MPP Share-Nothing Arch. MPP Share Share Disk Share nothing everything eg: eg: eg: Oracle RAC Greenplum Unix server Intranet Master Intranet DB DB DB DB DB DB DB DB DB SAN/FC Disk SAN Disk Disk Disk Disk Share disk © Copyright 2010 EMC Corporation. All rights reserved. 13
  • 14. Benefits of the Greenplum Database Architecture • Simplicity – Parallelism is automatic – no manual partitioning required – No complex tuning required – just load and query – HA – Best of breed x86 and Ethernet networking technologies • Scalability – Linear scalability – Each node adds storage, query performance, loading performance • Flexibility – Fully parallelism for SQL92, SQL99, SQL2003 OLAP, MapReduce – Any schema (star, snowflake, 3NF, hybrid, etc) – Rich extensibility and language support (Perl, Python, R, C, etc) – Structure, semi-structure and unstructure © Copyright 2010 EMC Corporation. All rights reserved. 14
  • 15. Greenplum and Hadoop Analytics Semi-Structured Structured Machine Data UnStructured ERP/CRM Logs Images/Sound Ad-hoc Analysis batch reporting on static data Dynamic Data © Copyright 2010 EMC Corporation. All rights reserved. 15
  • 16. Big Data Analytics The Power of Data Co-Processing Greenplum Chorus Analytic Productivity & Tool Integration End-to-end Platform Management & Control Data Access And Query Greenplum Commander SQL, MapReduce, SAS, MADLib, Mahout, R, and others SQL Engine MapReduce Engine parallel For Unstructured Data For Structured Data data exchange •Enterprise ready Apache • In-database Advanced Analytics Hadoop • Extreme performance on •Faster, more dependable, and commodity hardware parallel easier to use data exchange Greenplum Database Greenplum Hadoop Network Parallel Loading Of All Data Types © Copyright 2010 EMC Corporation. All rights reserved. 16
  • 17. Greenplum Hadoop • Greenplum HD – Enterprise-ready Apache Hadoop – Proven at Scale in 1,000 node Analytics Workbench – Single product with 2 storage options (Isilon & HDFS) • Enterprise Edition becomes Greenplum MR: – Advanced features – 100% API compatible – Software-only product © Copyright 2010 EMC Corporation. All rights reserved. 17
  • 18. AWB Update Analytics Workbench Operational! •1025 nodes operational •1011 nodes with GPHD installed •8 total projects have been on boarded from university collaboration to partner technology evaluation Proposals accepted by customer engagement team – info@analyticsworkbench.com •Engagement team will learn project objectives •JEDI council approves/disproves project based on technical feasibility and alignment with company goals •Projects informed of decisions and timelines Cluster access via - http://portal.analyticsworkbench.com/ © Copyright 2010 EMC Corporation. All rights reserved. 18
  • 19. Apache Hadoop Pain Points • Poor Job and Application Monitoring Monitoring Solution • Non-existent Performance Monitoring Operability • Complex System Configuration and Manageability and • No Data Format Interoperability & Manageability Storage Abstractions • Poor Dimensional Lookup Performance Performance • Very poor Random Access and Serving Performance © Copyright 2010 EMC Corporation. All rights reserved. 19
  • 20. Greenplum MR: Enterprise Edition Stack 100% APACHE Enhanced Monitoring INTERFACE Hive Pig HBase Zookeeper MapReduce Framework (MapRed) Distributed File System © Copyright 2010 EMC Corporation. All rights reserved. 20
  • 21. Greenplum MR: Enterprise Edition Enterprise-Ready Hadoop Platform for Unstructured Data • 2 – 5x Faster than Apache Faster Hadoop • High Availability Reliable • Mirroring Easier to • NFS mountable Use • Graphical System Management © Copyright 2010 EMC Corporation. All rights reserved. 21
  • 22. Greenplum MR Simple Management • Health Monitoring • Cluster Administratio n • Application Provisioning © Copyright 2010 EMC Corporation. All rights reserved. 22
  • 23. Rack Level Monitoring © Copyright 2010 EMC Corporation. All rights reserved. 23
  • 24. Greenplum MR Delivers True Return on Investment • NFS direct access to simply load and access data directly in a Hadoop cluster • Enables standard tools and utilities to work directly on data contained in Hadoop • Heatmap user interface provides full cluster visibility and control. • Eliminates all single points of failure • High Availability for Job Tracker , NameNode & NFS • Snapshots allow point-in-time data protection and recovery. • Mirroring for business continuity includes wide area replication support. • Speeds jobs by 2X – 5X • Provides faster performance with ½ the hardware • Substantial capital and operating expense savings © Copyright 2010 EMC Corporation. All rights reserved. 24
  • 25. EMC Greenplum Fastest data loading Advanced analytics DATA IN IN-DATABASE ANALYTICS DECISIONS OUT Scatter/Gather Streaming Optimized for fast query execution Unified data access for greater technology for the world’s and linear scalability insight and value from data fastest data loading •Move processing closer to data •Enable parallel analysis •Eliminate data load •Shared-nothing, massively across the enterprise bottlenecks parallel processing (MPP) •Open platform with broad •Clean and integrate new data scale-out architecture language support •Several loading options, •Computing is automatically •Certified enterprise ranging from bulk load optimized and distributed connectivity and integration updates to micro-batching for across resources with most business near real-time processing • Provides the best concurrent intelligence; extract, multi-workload performance transform, and load (ETL); and management products © Copyright 2010 EMC Corporation. All rights reserved. 25
  • 26. EMC Big Data Analytics Reference Architecture Data Sources Hadoop Alerts Statistics Reduce Documents Genetic Algorithms Map- Map- Ecosystem* HDFS Reduce Dashboards Mobile Key Values Documents Other NoSql Machine Reports Data Mining Data Quality NoSQL Stores Multimedia parallel data exchange Spreadsheets SQL Stores Web/Social OLAP BU 1 Operations Research Data Marts LOB data MDM Mobile Enterprise Data BU 2 ERP Warehouse Neural Nets BU 3 ETL Data Visualization CRM Federated BI as a Data Service POS Warehouse Data Data Stores and Data Presentation & Integration Input Access Analysis Delivery Structured Traditional data Traditional data Big data analytics data sources Integration warehousing ramifications *Hadoop Ecosystem includes: Hive, Pig, Mahout, HBase, ZooKeeper, Oozie, Sqoop, Avro © Copyright 2010 EMC Corporation. All rights reserved. 26
  • 27. Architecture for Business Value Business Value Chorus for Collaboration Analytics Analytics Self-develop app Self-develop app Java API Analytics tools Analytics tools JDBC (Mahout) (SAS, R, MADlib and more) ODBC Hbase .csv SAS & MADlib .txt GPDB - In GPDB - In Memory MapRFS (GPMR) ETL MapRFS: C++; MR: C++ x Load Performance: 2~5X DB’s Files High Availability Stable © Copyright 2010 EMC Corporation. All rights reserved. 27
  • 28. Big Data And EMC 4 New Analytic Applications Data Science 3 2 Unified Analytics Platform Petabyte Scale Data Storage 1 © Copyright 2010 EMC Corporation. All rights reserved. 29
  • 29. SAS / Greenplum Product Overview SAS High Performance Computing SAS Access for SAS In-Database SAS In-Memory Integration Processing Analytics Provides integration capability to Requires SAS Enterprise Miner in New functionality from SAS that a number of databases order to be of value requires dedicated database appliance Allows for increased performance Will lead to significant Very high performance for business of Base SAS Procs improvement in performance users that can significantly increase revenues or decrease costs as a result of improved performance Products: SAS Access for Greenpum Products: SAS Access for Products: SAS Access for Greenplum, SAS Grid Manager, SAS Greenplum, SAS Grid Manager, SAS Enterprise Miner, SAS Scoring High Performance Analytics Accelerator for Greenplum © Copyright 2010 EMC Corporation. All rights reserved. 30
  • 30. SAS and Greenplum UAP Integrated Architecture Data Data Data Bl LOB Scientist Engineer Analyst Analyst User SAS Business Intelligence DATA SCIENCE TEAM Greenplum Chorus - Analytic Productivity Layer SAS Analytics Data Access & Query Layer (SAS ACCESS, SQL, MapReduce) Greenplum Database Greenplum Hadoop Private/Hybrid Cloud Infrastructure or Appliance Data Platform Admin SAS Information Management © Copyright 2010 EMC Corporation. All rights reserved. 31
  • 31. In A Single Unified Analytics Platform Self-Service Iterative, Agile Transparent, Real-time Collaboration Structured & Unstructured Data Analyze Petabytes Of Current Data Virtual, Scale Out Architecture © Copyright 2010 EMC Corporation. All rights reserved. 32
  • 32. © Copyright 2010 EMC Corporation. All rights reserved. 33