SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
IMEX
                                                                                                                    RESEARCH.COM




                       NextGen
                  Infrastructure
            Are SSDs Ready for Enterprise Storage Systems
                                                                    Anil Vasudeva, President & Chief Analyst, IMEX Research

                    for Big Data
                                                                                                                  © 2007-12 IMEX Research
                                                                                                                       All Rights Reserved
                                                                                                                   Copying Prohibited
                                                                                                                 Contact IMEX for authorization




        Anil Vasudeva
        President & Chief Analyst
        imex@imexresearch.com
                                                                                                        IMEX
                                                                                                        RESEARCH.COM
        408-268-0800
© 2010-12 IMEX Research, Copying prohibited. All rights reserved.
Abstract                                                                                      IMEX
                                                                                               RESEARCH.COM



• NextGen Infrastructure for Big Data
    • This session will appeal to Business Planning, Marketing, Technology System Integrators and Data
      Center Managers seeking to understand the drivers behind the demand for and rise of Big Data.
    • Abstract
    • The internet has spawned an explosion in data growth in the form of data sets, called Big Data, that
      are so large they are difficult to store, manage and analyze using traditional RDBMS which are
      tuned for Online Transaction Processing (OLTP) only. Not only is this new data heavily
      unstructured, voluminous and streams rapidly and difficult to harness but even more importantly, the
      infrastructure cost of HW and SW required to crunch it using traditional RDBMS, to derive any
      analytics or business intelligence online (OLAP) from it, is prohibitive.
    • To capitalize on the Big Data trend, a new breed of Big Data technologies (such as Hadoop and
      others) many companies have emerged which are leveraging new parallelized processing,
      commodity hardware, open source software and tools to capture and analyze these new data sets
      and provide a price/performance that is 10 times better than existing Database/Data
      Warehousing/Business Intelligence Systems.
    • Learning Objectives
    • The presentation will illustrate the existing operational challenges businesses face today using
      RDBMS systems despite using fast access in-memory and solid state storage technologies. It
      details how IT is harnessing the emergent Big Data to manage massive amounts of data and new
      techniques such as parallelization and virtualization to solve complex problems in order to empower
      businesses with knowledgeable decision-making.
    • It lays out the rapidly evolving big data technology ecosystem - different big data technologies from
      Hadoop, Distributed File Systems, emerging NoSQL derivatives for implementation in private and
      hybrid cloud-based environments, Storage Infrastructure Requirements to Store, Access, Secure,
      Prepare for analytics and visualization of data while manipulating it rapidly to derive business
      intelligence online, to run businesses smartly.

2                                                                                                      2
Big Data in IT Industry Roadmap                                                                                                  IMEX
                                                                                                                                    RESEARCH.COM




                                                                                        Analytics – BI
           IT Industry                                             Predictive Analytics - Unstructured Data
            Roadmap                                                From Dashboards Visualization to Prediction Engines using Big Data.

                                                                      Cloudization
                                                          On-Premises > Private Clouds > Public Clouds
                                                          DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds.

                                                             Automation
                                           Automatically Maintains Application SLAs
                                           (Self-Configuration, Self-Healing©IMEX, Self-Acctg. Charges etc.)

                                          Virtualization
                        Pools Resources. Provisions, Optimizes, Monitors
                        Shuffles Resources to optimize Delivery of various Business Services

                       Integration/Consolidation
            Integrate Physical Infrast./Blades to meet CAPSIMS                               ®IMEX


            Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security

              Standardization
Standard IT Infrastructure- Volume Economics HW/Syst SW
(Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt. SW)

                                                                                                                                           3
Harnessing Big Data for Business Insights                                                        IMEX
                                                                                                  RESEARCH.COM



                       Data                     Big Data                    Business
                      Sources                Infrastructure                 Insights




  Information is at the center of   Majority of data growth is being   80% of world’s data is unstructured
  New Wave of opportunity           driven by unstructured data        driven by rise in Mobility devices,
                                    and billions of large objects      collaboration machine generated data.




                                                                                                           4
Corporate Need: Business Perf... OptimizationIMEX        RESEARCH.COM




                          Unstructured Big Data can provide Next Gen
                          Analytics to help businesses make informed,
                          better decision in:
                          ▪ Product Strategy
                          ▪ Targeting Sales
                          ▪ Just-In-Time Supply-Chain Economics
                          ▪ Business Performance Optimization
                          ▪ Predictive Analytics & Recommendations
                          ▪ Country Resources Management




                                                                5
Corporate Need: Real Time Analytics   IMEX
                                      RESEARCH.COM




                                             6
Corporate Need: Business Insights                                                               IMEX
                                                                                                 RESEARCH.COM



    Item                           Issue                        .             Solution
                Information Exploding                         A unified information/content
                Volume: Digital Content doubling every 18     storage methodology that enables
    Store       months. Velocity: >80% growth driven from     users to manage the volume, velocity and
                unstructured data. Variety: sources of data   variety of information from multiple sources
                changing

                Complexity in "managing"                      A solution portfolio of tools and
                information. - Need to classify,              services to manage all types of
   Manage       synchronize, aggregate, integrate, share,     information in a hybrid storage environment
                transform, profile, move, cleanse, protect,
                retire

                Current solutions limited to BI tools         Build/buy packaged Real-Time
   Analyze      focused on structured and lagging             Predictive Analytical Solutions for
                information                                   unstructured analytics tools


                Multiple access methods needed to             Centralized share, collaborate and
 Collaborate    meet needs of a diverse audience.             act on insights anytime, anywhere on any
                                                              device.

                Ability to understand how the                 Model Information on current
 Model/ Adapt   information impacts the business.             operations w/potential strategy
                How to transfer to action.                    impact. Leverage Tech. to adapt.

                                                                                                             7
Opportunity: Converting Big Data Deluge
into Predictive Analytics & Insights
                                                                     IMEX
                                                                     RESEARCH.COM


                     Big Data Predictive Analytics




            Personal Location Services Data Generated by   TB/Year
            Navigation Devices                                 600
            Navigation Apps on Phones                           20
            Smart Phone Opt-In Tracking                       1000
            Geo Targeted Ads                                    20
            People Locator (Emergency Calls/Search..)           10
            Location based Services (e.g.Games)                  5
            Other                                               45
            Total (Est.)                                      1700

                                                                            8
Issues with Existing RDBMS                                                                        IMEX
                                                                                                  RESEARCH.COM



                                      Key Issues with RDBMS Technologies
                                      Handling Mixed Unstructured Data
     Unstructured   Scalability       - RDBMs don’t handle non-tabular data
         Data
                                          (Notorious for doing a poor job on recursive data structure)
                                      Legacy Archaic Architecture
                                      - RDBMS don’t parallelize well to accommodate
                           Fault      commodity HW clusters
 Cost                    Tolerance
                                      Speed
             Big Data                 - Seek time of physical Storage has not kept pace with
             Analytics                network speed improvements
EU Adhoc                    Latency   Scale
Analytics
                                      - Difficult to scale-out RDBMS efficiently – Clustering
                                      beyond few servers notoriously hard
                                      Integration
                      Deep            - Data processing tasks need to combine data from non-
   Availability      Insights         related sources, over a network
                                      Volume
                                      - Data volumes have grown from 10s GB >100s TB >
                                      PBs in recent years. Existing Tabular RDBMS can’t
                                      handle such large DBs
                                                                                                         9
Issues with Existing RDBMS                             IMEX
                                                       RESEARCH.COM



Present RDBMS struggling to Store & Analyze Big Data




                                                             10
Big Data - Database Solutions   IMEX
                                RESEARCH.COM




                                      11
Big Data – The New Face of DBs                                 IMEX
                                                               RESEARCH.COM




Big Data Paradigm - The New face of DB Systems
• Adopts Schema-Free Architecture
• Can do away with Legacy Relational DB Systems
    Some data have sparse attributes, do not need relational property
• Key Oriented Queries
   Some data stored/retrieved mainly by primary key, w/o complex joins
• Trade-off of Consistency, Availability & Partition Tolerance
• Scale Out, not up, - Online Load balancing cluster growth




                                                                     12
Analytics – The Next Frontier in IT   IMEX
                                      RESEARCH.COM




                                            13
Key Innovations: HW Technologies IMEX
                                RESEARCH.COM




                                      14
Key Innovations – Solid State Storage                                                                                                      IMEX
                                                                                                                                             RESEARCH.COM



                    Storage - IOPS/GB & Price Erosion - HDD vs. SSDs
            600                                                                                                                       8

                                              HDD




                                                                                                                                              Units (Millions)
  IOPS/GB




            300


                                                    SSD


                                                                                IOPS/GB HDDs
               0                                                                                                                      0
                         2009                 2010           2011               2012               2013               2014
                      Note: 2U storage rack, • 2.5” HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, • 2.5” SSD max cap = 800GB / 36 SSDs
            Key to Database performance are random IOPS. SSDs outshine HDD in IO price/performance
                    – a major reason, besides better space and power, for their explosive growth.

                                            ©2011
                                                                                                                                                                 1515
Source: IMEX Research SSD Industry Report
Innovations – DB SW Technologies                                                                                                                               IMEX
                                                                                                                                                                               RESEARCH.COM



              Tech Innovation                1985             1990                1995                                   2000               2005               2010                2015

              OLTP Transactions                                                                                                                                             Open Source /
                                    Rows Locking        Optimizer        Parallel Query                           Clustering          XML                  Grid
              DB SW                                                                                                                                                               Hadoop

              OLAP- Analytics                                                                                     Materialized        Bit Mapped
                                    Indexing            Partitioning     Columnar                                                                          In-Memory        Query Binding
              DB SW                                                                                                      View                Index

                                                                                                                                      Multi-core/
              Hardware              32 bit              SMP              NUMA                                     64 bit                                   Flash            MPP
                                                                                                                                      Blades

                                                                                                                                                           Columnar         MPP
              Big Data                                                                                                                Multi-core
                                                                                                                                                           In-Memory        Visualization




                         OLTP Database Innovation Progress                                                           Big Data: Analytics DB Technology Impact
 100000                                                                       10000
                                                                                      100,000                                                                                               100000

     10000                                                                                                                                                                                  10000
                                                                              1000       10,000
                                                                                                                           $/GB                                                             1000

                                                                                         TPMc / Processor
                                                                                           1,000                                                                      DW Size TB
$ / TPMc




           1000                                                               100
                                                                                                                                                                                            100
                     $/TPMc
                                                                                                     100                                                                                    10
            100                                     TPMc/Processor            10
                                                                                                            10                                                                              1
             10                                                               1                                                                                                             0.1
                                                                                                             1
                                                                                                                                                                                            0.01
              1                                                               0.1
                                                                                                            .1
                                                                                                             0
                                                                                                                                                                                            0.001
            0.1                                                               0.01                          .01
                                                                                                              0                                                                             0.0001
                  1985      1990   1995      2000    2005     2010     2015                                       1985         1990    1995         2000     2005       2010       2015




                                                                                                                                                                                            16
Big Data - Key Requirements                                                                          IMEX
                                                                                                     RESEARCH.COM




                      Types of Data Organizations Analyze
                                                                                      59%
                Customer/member data                                                          68%
                                                                         44%
    Transactional data from applications                                                      68%
                                                                                               69%
                       Application Logs                            37%
                                                                                         64%
              Other Types of Event Data                    23%
                                                                        41%
     Network Monitoring/Network Traffic                          33%
                                                                 33%
              Online Retail Transactions                          34%
                                                                               51%
                        Other Log Files                     26%
                                                             28%
                      Call Data Records                         32%
                                                                          46%
                             Web Logs                     21%
                                                                  36%
  Text data from social media and online            15%
                                                                  36%
                            Search logs           11%
                                                   18%
                       Trade/quote data          15%                                 Hadoop
                                                   18%
               Intelligence/defense data       11%                                   Non Hadoop
                                                     21%
        Multimedia (audio/video/images)       9%
                                             8%
                               Weather     3%
                                           3%
                       Smartmeter data       6%
                                           3%
                  Other (please specify)    5%

                                                                                                           17
Big Data – Architectural Goals                                        IMEX
                                                                       RESEARCH.COM




 Meet Requirements of V3                         Meet Enterprise Criterion


                                  Big Data
                                  Platform




                      Analyze Data in Native Format



                                                                             18
Big Data - Market Requirements                                               IMEX
                                                                             RESEARCH.COM




 Unified system: Pre-integrated for Ease of Installation and Management
 • Platform – Large Scale Indexing Pre-integrated using Hadoop Foundation,
 • Integrated Text Analytics - Address Unstructured Data
 • Usability - User Friendly Admin Console including HDFS Explorer, Query
   Languages
 • Enterprise Class Features – Provisioning, Storage, Scheduler, Advance Security
 • Supports search-centric, document-based XML data model
     o store documents within a transactional repository.
 • Schema-Free:
     o No advance knowledge of the document structure (its "schema") needed
     o Index words and values from each of the loaded documents together with its
       document structure.
 • Standard commodity hardware leveraged



                                                                                    19
Big Data – Market Requirements                                                IMEX
                                                                               RESEARCH.COM




Architectural
• Shared-nothing clustered DB architecture
      o programmable and extensible application servers.
•   Support massive scalability to petabytes of source data
•   Support open-source XQuery- and XSLT-driven architecture
•   Simple to Deploy, Develop and Manage (UI & Restful Interface)
•   Support extreme mixed workloads - a wide variety of data types including arbitrarily
  hierarchical data structures, images, waveforms, data logs etc.
• Support thousands of geographically dispersed on--line users and programs
  executing variety of requests from ad hoc queries to strategic analysis
• Loading data before declaring or discovering its structure
• Load data in batch and streaming fashion
• Integrate data from multiple sources during load process at very high rates Spread
  I/O and data across instances
• Provide consistent performance with linear cost
• Leverage Open Source SW Lo Costs, Multiple Sources, Hadoop Foundation Tools
• Connectivity with Oracle DB, Teradata Warehouse, JDBC Connectivity,


                                                                                     20
Big Data - Market Requirements                                                      IMEX
                                                                                    RESEARCH.COM




Real Time Analytics Execution
•   Execute “streaming” analytic queries in real time on incoming load data
•   Updating data in place at full load speeds
•   Scheduling and execution of complex multi-hundred node workflows
•   Join a billion row dimension table to a trillion row fact table without pre-clustering
    the dimension table with the fact table
Performance
• Analyze data, at very high rates >GB/sec
• Predictable Sub-ms response time for highly constrained standard SQL queries
Availability
• Ability to configure without any single point of failure
• Auto-Failover Extreme High Availability
     o Automated failover and process continuation without operational interruption
         when processing nodes fail



                                                                                             21
Big Data – Product Metrics Choices IMEX       RESEARCH.COM



                               PB
              Data Set Size    TB
                               GB

                               Transaction

                               Machine
              Data Structure   Unstructured

                               Other
                               Transaction

              Access/Use       Search
    Big Data                   Analytics
       -                       Appliance

    Product Parallel           Cluster < 1K
             Processing
    Metrics                    Cluster > 1K
                               In-Memory
              Memory           Flash

                               Columnar

              DB Technique     Zero Sharing

                               No SQL
                               Text
              Data             Image
              Cataloging       Audio
              SW               Video


                                                    22
Advantage: Big Data Products                                               IMEX
                                                                           RESEARCH.COM



Characteristic   Legacy Paradigm                Big Data Paradigm
Structure        •Transactional/Corporate       •Unstructured/Derivative/Internet
Mode             •Data Collection               •Data Analysis
Focus            •Find Answers                  •Find Questions
Facility         •Reportive / What Happened?    •Analytic / Why did it Happen?
                                                Predictive / What will Happen Next?
Opportunity      •Very Small Growth             •Massive Growth
Players          •Legacy Players                •Agile Start Ups, well funded
Impact           •Analyze Existing Businesses   •Create New Businesses




                                                                                    23
Advantage: Big Data Products                                                             IMEX
                                                                                         RESEARCH.COM


Characteristic             Traditional RDBMS                     Big Data/MapReduce

Data Size                  •GB                                   •PB
Access                     •Interactive                          •Batch/Near Real-Time
Latency                    •Low                                  •High
Data Updates               •Read & Write Many Times              •Write Once Read Many Times
Schema/Structure           •Static Schema                        •Dynamic Schema
Language                   •SQL                                  •UQL/Procedural (Java,C++..)
Integrity                  •High                                 •Not 100%
Works Well for             •Process Intensive Jobs               •Data Intensive Jobs
Works Well w Data Size     •Gigabytes                            •Petabytes
Data/Processing            •Low Latency/High BW – precursor to   •Sends Code to Data, instead of
Interactions               success. Ntwk. BW can be a            Sending Data to other Nodes
                           bottleneck causing nodes to be idle   (Requiring Lower BW in Cluster)
Fault Tolerance            •Coordinating Processes with Node     •Fault Tolerant for HW/SW Failures
                           Failures – a challenge
Access                     •Interactive                          •Batch/Near Real-time
Scaling                    •Non-linear                           •Linear
Pgm-Distribution of Jobs   •Difficult                            •Simple & Effective

                                                                                                24
Big Data Ecosystem   IMEX
                     RESEARCH.COM




                           25
Big Data Stack                                                                        IMEX
                                                                                      RESEARCH.COM



Merging Hadoop innovations into Nextgen DBMS

     Data
                           Analytics Platform                          Applications
    Sources

                                                                      Advanced
                               RHive                                  Analytics

     Oracle
                                                                            DBA
                                             PerfMon, Query
      IBM
   Teradata…
                 Data       Enterprise            Plan
               Importer
                                             Hive Workflow
                                                                           ETL
 Databases                  Oracle-to-Hive                             Ad-hoc query


                 Data
     Data
    Sources
               Importer

                                                                        OLAP
                                                            Data        Server
                               Data Store                 exporter
  Streaming                     (Hadoop)                                 OLTP
     Data                                                               Server
               Collector
   Devices
                                                              Rest/
                                                              JSON
                                    Search                     API


                                                                      Real-time
                             Admin Center                              Queries

                                                                                            26
Hadoop’s Fit in Enterprise Stack   IMEX
                                   RESEARCH.COM




                                         27
Big Data - Hadoop Architecture                                                                 IMEXRESEARCH.COM




                                                    Data Flow                    SQL       Programming
                                                        (Pig)                    (Hive)     Languages
                      Coordination
 Management




                                                   Distributed Computing Framework
                                     (Zookeeper)




                                                                                           Computations
                                                                   (MapReduce)
              (HMS)




                                                    Metadata               Column-Stg        Tabular
                                                      (HCatalog)                 (HBase)     Storage

                                                   Hadoop Distributed File        System      Object
                                                                 (HDFS)                      Storage


                                                                                              28
Big Data Connectors to EDW/BI                                                                  IMEX
                                                                                                RESEARCH.COM


BI Framework - Interoperable with Enterprise Data Warehousing



      EDW                       BI APPLICATIONS
                      (Query, Analytics, Reporting, Statistics)


    Big Data                 Big Data Orchestration Framework
   (Connectors)      HBase         Avro          Flume     ZooKeeper

                      Big Data Access Framework




                                                                          Backup & Recovery
                                   Hive                   Sqoop




                                                                                                         Management
             Pig




                                                                                              Security
                                Network
     Big Data Storage Framework           Big Data Processing Framework
                (HDFS)                             (MapReduce)
                          Hypervisor/VMs
                         Operating System
                                Servers
                                                                                                                  29
Big Data Infrastructure – Map                                                IMEX
                                                                             RESEARCH.COM
Reduce
                             Data Flow                          SQL
                                 (Pig)                         (Hive)




                             MapReduce
Coordination
Management




                             Computing
               (Zookeeper)
   (HMS)




                             Framework)
                                Metadata                MetadataColumn-Stg
                              (HCatalog)CV                       (HBase)
                                                      (HCatalog)CV




                                 Hadoop
                                Distributed
                                File System
                                             (HDFS)




          Map Reduce
           A Distributed Computing Model
           Typical Pipeline:
            Input>Map>Shuffle/Sort>Reduce>Output
           Easy to Use , Developer writes few functions,
            Moves compute to Data
           Schedules work on HDFS node with data
           Scans through data, reducing seeks
           Automatic Reliability and re-execution on failure


                                                                                   30
Big Data Infrastructure – HDFS                                                                                                                           IMEX
                                                                                                                                                         RESEARCH.COM



                                                             HDFS Architecture
                                                        Actively Maintaining High Availability
                   Persistent Namespace Metadata & Journal                                         Persistent Namespace Metadata & Journal




                                                                                 Namespace
                                                                                                   NFS
                            Namespace
  Namespace




                   NFS                                                                                        Namespace
                              State                      Block                                                  State                      Block
                   NFS
                                                         Map                                   NFS                                         Map

                          Name Node                                                                        Name Node
                              Hierarchical Namespace
                    File name >> Block IDs >> Block Locations                     0. Bad/                                  3. Block
                                                                                Lost Block               1. Replicate                              Periodically Check
                            Heartbeats & Block Reports                                                                     Received
                                                                                                                                                   Block Checksums
  b1          b2          b1    b2              b1      b2       b1    b2        b1           b2            b1    b2              b1      b2        b1    b2
  b3 Data Node            b3 Data Node          b3 Data Node     b3 Data Node    b3 Data Node               b3 Data Node
                                                                                                                  2. Copy         b3 Data Node      b3 Data Node

          JBOD                 JBOD                  JBOD             JBOD                   JBOD                JBOD                  JBOD              JBOD
                                     Block ID >> Data                                                                  Block ID >> Data
                          Horizontally Scale I/O & Storage                                                  Horizontally Scale I/O & Storage


                      HDFS
                       Immutable File System – Read, Write, Sync/Flush – No random writes
                       Storage Server used for Computation – Move Computation to Data
                       Fault Tolerant & Easy Management – Built In Redundancy, Tolerates Disk & Node Failure, Auto-Managing
                             addition/removal of nodes, One operator/8K nodes
                       Not a SAN but high bandwidth network access to data via Ethernet
                       Used typically to Solve problems not feasible with traditional systems: Large Storage Capacity >100PB raw,
                            Large IO/computational BW >4K node/cluster, scale by adding commodity HW, Cost ~$1.5/GB incl. MR cluster
Hadoop Distributed File System                     IMEX
                                                    RESEARCH.COM




HDFS Architecture

                       HDFS Characteristics
                       •   Based on Google GFS (Google File
                           System)
                       •   Redundant Storage for massive
                           amounts of data
                       •   Data is distributed across all
                           nodes at load time – efficient
                           MapReduce processing
                       •   Runs on commodity hardware –
                           assumes high failure rate for
                           components
                       •   Works well with lots of large files
                       •   Built around Write once – Read
                           many times”
                       •   Large Streaming Reads – Not
                           random access
                       •   High Throughtput more important
                           than low latency



                                                           32
Hadoop Architecture - Overview        IMEX
                                      RESEARCH.COM

Hadoop Data Processing Architecture




                                            33
Key Technologies Required for Big Data                       IMEX
                                                             RESEARCH.COM




  Key Technologies Required for Big Data
  • Cloud Infrastructure
  • Virtualization
  • Networking
  • Storage
     o   In-Memory Data Base (Solid State Memory)
     o   Tiered Storage Software (Performance Enhancement)
     o   Deduplication (Cost Reduction)
     o   Data Protection (Back Up, Archive & Recovery)




                                                                   34
Cloud Infrastructure for Big Data                                                           IMEX
                                                                                            RESEARCH.COM




     Service     Cloud Services Providers            Examples
                                                     Public - BT, Telstra, T-Systems France Telecom
                 Public – Mutitenancy,OnDemand
     Providers   Private - On Premises, Enterprise
                                                     Private –
                                                     Hybrid – IBM/Cloudburst,
                 Hybrid – Interoperable P2P



                                                     Examples
      SaaS       Software-as-a-Service
                 - Servers, Network, Storage
                                                     eMail - Yahoo!,Google…
                                                     Collaboration - Facebook,Twitter …
                 - Management, Reporting             Bus.Apps - SalesForce, GoogleApps, Intuit…




                                                     Examples

      PaaS       Platform Tools & Services
                 - Deploy developed platforms
                                                     Amazon EC2
                                                     Force.com
                 ready for Application SW on         Navitaire
                 Cloud Aware Infrastructure



                                                     Examples
                 Infrastructure HW &
      IaaS
                                                     Amazon S3
                   Services                          Nirvanix
                 - Servers, Network, Storage
                 - Management, Reporting
35
Cloud Infrastructure for Big Data                                                                 IMEX
                                                                                                  RESEARCH.COM




                                   Cloud Computing
                  Private Cloud                     Hybrid             Public Cloud Service
                    Enterprise                      Cloud                   Providers

                                          SaaS Applications
     SaaS
                 App            App                  App              App          App




                                                               SLA




                                                                            SLA




                                                                                            SLA
                          SLA




                                         SLA
                                       Platform Tools & Services




                                                                                                   Management
                                           Python
                                Ruby




                                                        .Net
                                                                             …..      …..
                    EJB




                                                                     PHP
     PaaS

                                         Operating Systems

                                               Virtualization
      IaaS
                       Resources (Servers, Storage, Networks)

    Application’s SLA dictates the Resources Required to meet specific
 requirements of Availability, Performance, Cost, Security, Manageability etc.
Private Cloud Requirements for Big Data                                                                             IMEX
                                                                                                                     RESEARCH.COM



                                                  Public
Operational Flexibility




                                                  Cloud
                                                  Storage
                                        Private
                                        Cloud
                                        Storage
                                                                             Service Catalog
                                  VZ
                                                                   Choose pre-defined IT Services/user/dept.
                          App                                      Define SLA to efficiently meet services
                          Silos
                             Costs

                                                             Self-                                        Service
                                                                                   Private
                                                            Service                 Cloud                Analytics
                                        Access Resources on demand to             Storage             Monitor & Analyze Usage for
                                          speed deployment and delivery                                 Charge back
                                               Scale Resource Up/down                                 Interactively auto-tune
                                                 to optimize their usage,                              performance
                                              Release when not needed                                Availability with SLA
                                                                             Automation                 requirements
                                                                     Automated Provisioning
                                                                    - Moving Data & Processes Seamlessly
                                                                     Allocation, Self Tuning of Resources to
                                                                      meet Workload Requirements
Virtualization: Workloads Consolidation                           IMEX
                                                                  RESEARCH.COM




                                          • A single server 1.5x larger than
                                          standard 2-way server will handle
                                          consolidated load of 6 servers.
                                          • VZ manages the workloads +
                                          important apps get the compute
                                          resources they need automatically
                                          w/o operator intervention.
                                          • Physical consolidation of 15-20:1
                                          is easily possible
                                          • Reasonable goal for VZ x86
                                          servers – 40-50% utilization on
                                          large systems (>4way), rising as
                                          dual/quad core processors
                                          becomes available
                                          • Savings result in Real Estate,
                                          Power & Cooling, High Availability,
                                          Hardware, Management




  Source: Dan Olds & IMEX Research 2009
Storage Architecture - Impact from VZ                                                         IMEX
                                                                                               RESEARCH.COM




        Data Protection                             Storage Efficiency   Virtualization (VZ)
                                                                         requires Shared Storage for
           Back Up/Archive/DR                       Virtualization
                                                                         - VMotion
               RAID – 0,1,5,6,10                    Thin Provisioning    - Storage VMotion
                       Virtual Tape                 Deduplication        - HA/DRS
                                                    Auto Tiering         - Fault Tolerance
                        Replication
                                                    MAID                 Additional Capacity
                                                                         Consumed for
                                                                         - VZ snapshots,
                                                                         - VM Kernel etc.




Source: IMEX Research SSD Industry Report   ©2011




                                                                                                       39
Storage – Issues & Solutions                                               IMEX
                                                                           RESEARCH.COM




                                    Technologies Reducing Storage Costs
Storage Costs Reduction




                                Traditional
                                                  Thin Replication ~ 95%
                               Data Growth
                                                  Snapshots ~ 75%
                                              Virtual Clones ~80%
                                              Thin Provisioning ~30%
                                              DeDuplication ~ 25-95%
                                  CAPACITY
                                              Auto-Tiering 65-95%
                                  SAVINGS
                                    ~ xx %    RAID*DP ~ 40% vs. R10

                          Capacity Requirements
Storage Architecture Impacting Big Data                                                                           IMEX
                                                                                                                   RESEARCH.COM




             Data Protection                        Storage Efficiency
                Back Up/Archive/DR
                                                                                 Auto-Tiering System
                                                    Storage Virtualization
                                                                                  using Flash SSDs
                    RAID – 0,1,5,6,10               Thin Provisioning
                                                                                Data Class
                            Virtual Tape            Deduplication               (Tiers 0,1,2,3)
                                                    Auto-Tiering                Storage Media Type
                             Replication
                                                    MAID                        (Flash/Disk/Tape)
                                                                                Policy Engines
                                                                                (Workload Mgmt.)
                                                                     DRAM
                                                                                Transparent Migration
                                                                    Flash       (Data Placement)
                                                                                File Virtualization
                                                                     SSD
                                                                 Performance
                                                                     Disk       (Uninterrupted App.Opns.in Migration)

                                                                Capacity Disk


                                                                    Tape




Source: IMEX Research SSD Industry Report   ©2011




                                                                                                                         41
Data Storage: Hierarchical Usage                                                                  IMEX
                                                                                                       RESEARCH.COM




                                   I/O Access Frequency vs. Percent of Corporate Data

               95%

               75%
                                                                                 Cloud
               65%                                                FCoE/         Storage
                                                                  SAS              SATA
                                                                  Arrays
               % of I/O Accesses




                                                                                 • Back Up Data
                                      SSD                           • Tables     • Archived Data
                                      • Logs                       • Indices   • Offsite DataVault
                                      • Journals                  • Hot Data
                                      • Temp Tables
                                      • Hot Tables




                                    1% 2%                          10%          50%                         100%
                                                                                      % of Corporate Data
Source:: IMEX Research - Cloud Infrastructure Report   ©2009-12
                                                                                                               42
SSD Storage: Filling Price/Perf.Gaps IMEX                                                                    RESEARCH.COM




  Price
  $/GB                                                                          CPU
                                                                               SDRAM


                                                                                       DRAM getting
                                                                        DRAM
                                                       NOR                             Faster (to feed faster CPUs) &
                                                                                       Larger (to feed Multi-cores &
                                    NAND                                               Multi-VMs from Virtualization)
                                                                               PCIe
                                                                 SCM
                                                                               SSD

                                    HDD
                                                                                SSD segmenting into
                                                                 SATA            PCIe SSD Cache
                     Tape                                        SSD              - as backend to DRAM &
                                                                                 SATA SSD
                                        HDD becoming                               - as front end to HDD
                                        Cheaper, not faster
          Source: IMEX Research SSD Industry Report   ©2010-12

                                                                                                  Performance
                                                                                                  I/O Access Latency

                                                                                                                       43
SSD Storage - Performance & TCO                                                                                               IMEX
                                                                                                                              RESEARCH.COM




                 SAN TCO using HDD vs. Hybrid Storage                                             SAN Performance
           250
                                                                                                  Improvements using SSD
                                                                                        300                                                10
                                                                                                                      IOPS                 9
           200                                                                                   $/IOPS
                                                                                        250                        Improvement
                                                                                              Improvement                                  8
                                                                                                                      475%
                          145                                                                     800%                                     7
                                                                                        200
           150
                                    HDD-FC                                                                                                 6
 Cost $K




                                                             0




                                                                                 IOPS




                                                                                                                                                $/IOP
                                             HDD-            36                         150                                                5
           100                               SATA
                                                                                                                                           4
                           0
                                                                                        100
                                                                                                                                           3
                                              SSD            64
           50             75                                                                                                               2
                                                                                        50
                                   RackSpace                                                                                               1
                                                             28
                          14.2      Pwr/Cool                 5.2
                                                                                         0                                                 0
            0
                        HDD Only                           HDD/SSD                             FC-HDD Only             SSD/SATA-HDD
             Power & Cooling     RackSpace          SSDs     HDD SATA   HDD FC                    Performance (IOPS)       $/IOP
Source: IMEX Research SSD Industry Report   ©2011




                                                                                                                                      44
Workloads Characterization                                                                                             IMEX
                                                                                                                         RESEARCH.COM




                   1000 K
                                                                        OLTP
                                           Transaction
                                           Processing                   eCommerce
                             100 K
                                                                                          Business
        IOPS* (*Latency-1)




                                                (RAID - 1, 5, 6)           Data           Intelligence
                                                                                                                  (RAID - 0, 3)


                                                                        Warehousing
                              10K
                                                                                        OLAP


                               1K
                                                                                     Scientific Computing       HPC
                                                                                                 Imaging
                                              TP
                              100
                                                                                                       Audio             Web 2.0
                                                                         HPC
                                                                                                               Video
                               10
                                       1                        5               10                50            100               500
                                     *IOPS for a required response time ( ms)                                         MB/sec
                                     *=(#Channels*Latency-1)
Source:: IMEX Research - Cloud Infrastructure Report         ©2009-12
Workloads Characterization                                                              IMEX
                                                                                                 RESEARCH.COM




          Storage performance, management and costs
               are big issues in running Databases

           Data Warehousing Workloads are I/O intensive
                 • Predominantly read based with low hit ratios on buffer pools
                 • High concurrent sequential and random read levels
                     Sequential Reads requires high level of I/O Bandwidth (MB/sec)
                     Random Reads require high IOPS)
                 • Write rates driven by life cycle management and sort operations
           OLTP Workloads are strongly random I/O intensive
                 • Random I/O is more dominant
                     Read/write ratios of 80/20 are most common but can be 50/50
                     Can be difficult to build out test systems with sufficient I/O characteristics
           Batch Workloads are more write intensive
                 • Sequential Writes requires high level of I/O Bandwidth (MB/sec)
           Backup & Recovery times are critical for these workloads
                 • Backup operations drive high level of sequential IO
                 • Recovery operation drives high levels of random I/O

Source: IMEX Research SSD Industry Report   ©2011                                                      46
Best Practices – Storage in Big Data Apps                                      IMEX
                                                                               RESEARCH.COM


 Goals & Implementation
    • Establish Goals for SLAs (Performance/Cost/Availability), BC/DR (RPO/RTO)
      & Compliance
    • Increase Performance for DB, OLTP and OLAP Apps:
         −Random I/O > 20x , Sequential I/O Bandwidth > 5x
         −Remove Stale data from Production Resources to improve performance
     • Use Partitioning Software to Classify Data
         −By Frequency of Access (Recent Usage) and
         −Capacity (by percent of total Data) using general guidelines as:
         −Hyperactive (1%), Active (5%), Less Active (20%), Historical (74%)
 Implementation
     • Optimize Tiering by Classifying Hot & Cold Data
         −Improve Query Performance by reducing number of I/Os
         −Reduce number of Disks Needed by 25-50% using advance compression software
          achieving 2-4x compression
     • Match Data Classification vs.Tiered Devices accordingly
         −Flash, High Perf Disk, Low Cost Capacity Disk, Online Lowest Cost Archival
          Disk/Tape
     • Balance Cost vs. Performance of Flash
         −More Data in Flash > Higher Cache Hit Ratio > Improved Data Performance
     • Create and Auto-Manage Tiering (Monitoring, Migrations, Placements)
       without manual intervention
                                                                                       47
Best Practices: I/O Forensics in Storage-Tiering                                                           IMEX
                                                                                                           RESEARCH.COM




                                                                              Storage-Tiered Virtualization
                                                                              Storage-Tiering at LBA/Sub-LUN Level

                                                                       Physical Storage            Logical Volume




                                                                      SSDs                     Hot Data
                                                                     Arrays


                                                                                               Cold Data
                                                                      HDDs
                                                                     Arrays




 LBA Monitoring and Tiered Placement
 •     Every workload has unique I/O access signature
 •     Historical performance data for a LUN can
       identify performance skews & hot data regions                                    Automatic
       by LBAs                                                                          Migration
                                                                                       (Policy Based)




Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-12
                                                                                                                     48
Best Practices: Cached Storage                                                                                        IMEX
                                                                                                                      RESEARCH.COM




  App/DB                                                           Application                          Improvement over Cached
   Server                                                                                               vs.HDD only
                                                                   Oracle OLTP Benchmarks               681%
                                                                   SQL Server OLTP                      1251%
                                                                   Benchmark
                                                                   Neoload (Web Server                  533%
                                                                   Simulation
                                                                   SysBench (MySQL OLTP                 150%
            LSI MegaRAID CacheCade Pro 2.0
                                                                   Server)




                               Response Time                                               TPS
                                                         660          700                                  655
                       700
                       600                                            600
                       500                                            500
                       400                   330                                              373
                                                                      400
                       300
                                                                      300
                       200
                                                                      200
                       100           0
                         0                                            100
                                                                                   0
                                 All HDD Smart Flash    Persist          0
                                           Cache        Data on                  All HDD   Smart Flash Persist Data
                                                       Warpdrive                             Cache     on Warpdrive
                                                                                                                              49
Big Data Targets: Analytics                                               IMEX
                                                                          RESEARCH.COM


Key Industries Benefitting from Big Data Analytics

                    Global IT Spending by Industry Verticals 2010-15 $B
 CAGR % (2010-15)




                          5 Year Cum Global IT Spending 2010-15 ($B)



                                                                                50
Big Data Targets – Storage Infrastructure                                                                   IMEX
                                                                                                            RESEARCH.COM



  Value Potential of Using Big Data by Data Intensive Verticals


                    Data Intensity by Industry Vertical

             Banking & Financial Services
                   Media & Entrertainment
                      Healthcare Providers
                     Professional Services
                      Telecommunications
   Pharma, Life Sciences & Medical Pdcts
                      Retail & Wholesales
                                    Utilities
         Ind'l Elex & Electrical Equipment
           SW Publishing & Internet Srvcs
                       Consumer Products
                                  Insurance
                             Transportation
                                     Energy
                                                0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9

                                                        Installed Terabytes/$M Rev 2010

                                                                                                                  51
Big Data Targets: Storage Infrastructure                                                                                      IMEX
                                                                                                                              RESEARCH.COM



  Data Stored by Large US Enterprises
                                       Big Data Storage Potential
                                      Data Stored by Large US Enterprises
                  Big Data Storage Potential
                   Data Stored by Large US Enterprise                             Stored Data TB/Firm
                                        Stored Data by Industry
                                             (in US 2009 PB)                           (>1K Employees US)
            Discrete Manafacturing                                         14%   231

                       Government                                         12%    150

         Communications and Media                                   10%                      825

            Process manufacturing                               10%                                   1,507

                           Banking                             9%                      536

              Health Care Providers                     6%                                801
      Securities & Investment Srvcs                     6%                                   870
             Professional Services                      6%                        319
                             Retail                    5%                                697
                         Education                4%                              278
                         Insurance               4%                                                                   3,866
                     Transportation              3%                                370
                         Wholesale               3%                                                           1,931
                            Utilities           3%                                           831
              Resource Industries           2%                                                              1,792
        Consumer and Recreational
                                            2%                                                      1,312
               Services
                     Construction          1%                                                 967

                                                                                                                                    52
Big Data Targets: Savings w Open Source         IMEX
                                                RESEARCH.COM



 Legacy BI vs. Open Source Big Data Analytics




                                                      53
Rise of Big Data Adoption   IMEX
                            RESEARCH.COM




54
Key Takeaways                                                                               IMEX
                                                                                                   RESEARCH.COM




            Big Data creating paradigm shift in IT Industry
                Leverage the opportunity to optimize your computing infrastructure with Big Data
                Infrastructure after making a due diligence in selection of vendors/products, industry testing
                and interoperability.
                Apply best storage technologies listed in this presentation and elsewhere
            Optimize Big Data Analytics for Query Response Time vs. # of Users
                Improving Query Response time for a given number of users (IOPs) or Serving more users
                (IOPS) for a given query response time
            Select Automated Storage Management Software
               Data Forensics and Tiered Placement
             • Every workload has unique I/O access signature
             • Historical performance data for a LUN can identify performance skews & hot data regions by
               LBAs.Non-disruptively migrate hot data using auto-tiering Software
            Optimize Infrastructure to meet needs of Applications/SLA
             • Performance Economics/Benefits
             • Typically 4-8% of data becomes a candidate and when migrated for higher performance
               tiering can provide response time reduction of ~65% at peak loads. Many industry Verticals
               and Applications will benefit using Big Data

Source: IMEX Research SSD Industry Report   ©2011                                                          55

Weitere ähnliche Inhalte

Was ist angesagt?

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsRamakant Gawande
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular networkshubham patil
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 

Was ist angesagt? (20)

Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
Big Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning GuruBig Data Hadoop Training by Easylearning Guru
Big Data Hadoop Training by Easylearning Guru
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big data storage
Big data storageBig data storage
Big data storage
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 

Ähnlich wie NextGen Infrastructure for Big Data

IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...
IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...
IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...Anil Vasudeva
 
Solving the IO Bottleneck
Solving the IO BottleneckSolving the IO Bottleneck
Solving the IO BottleneckIMEX Research
 
Solving io bottleneck
Solving io bottleneckSolving io bottleneck
Solving io bottleneckAnil Vasudeva
 
Key to Efficient Tiered Storage Infrastructure
Key to Efficient Tiered  Storage InfrastructureKey to Efficient Tiered  Storage Infrastructure
Key to Efficient Tiered Storage InfrastructureIMEX Research
 
SSD: Ready for Enterprise and Cloud?
SSD: Ready for Enterprise and Cloud?SSD: Ready for Enterprise and Cloud?
SSD: Ready for Enterprise and Cloud?IMEX Research
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger PresentationMauricio Godoy
 
Micro Strategies Overview
Micro Strategies OverviewMicro Strategies Overview
Micro Strategies Overviewjvbrennan
 
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4Friedel Jonker
 
Storage for cloud
Storage for cloudStorage for cloud
Storage for cloudAccenture
 
Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote addressthreesixty
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
IBM Software Day 2013. Turning opportunities into outcomes
IBM Software Day 2013. Turning opportunities into outcomesIBM Software Day 2013. Turning opportunities into outcomes
IBM Software Day 2013. Turning opportunities into outcomesIBM (Middle East and Africa)
 
Move your desktop to the cloud for $1 day
Move your desktop to the cloud for $1 day Move your desktop to the cloud for $1 day
Move your desktop to the cloud for $1 day Desktone
 
Asigra Product Marketing Strategy
Asigra Product Marketing StrategyAsigra Product Marketing Strategy
Asigra Product Marketing StrategyJas Mann
 
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...IBM Danmark
 

Ähnlich wie NextGen Infrastructure for Big Data (20)

IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...
IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...
IMEX Research - Is Solid State Storage Ready for Enterprise & Cloud Computing...
 
Solving the IO Bottleneck
Solving the IO BottleneckSolving the IO Bottleneck
Solving the IO Bottleneck
 
Solving io bottleneck
Solving io bottleneckSolving io bottleneck
Solving io bottleneck
 
Key to Efficient Tiered Storage Infrastructure
Key to Efficient Tiered  Storage InfrastructureKey to Efficient Tiered  Storage Infrastructure
Key to Efficient Tiered Storage Infrastructure
 
SSD: Ready for Enterprise and Cloud?
SSD: Ready for Enterprise and Cloud?SSD: Ready for Enterprise and Cloud?
SSD: Ready for Enterprise and Cloud?
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Martin Wildberger Presentation
Martin Wildberger PresentationMartin Wildberger Presentation
Martin Wildberger Presentation
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
Micro Strategies Overview
Micro Strategies OverviewMicro Strategies Overview
Micro Strategies Overview
 
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4
Ibm big dataandanalytics_28433_archposter_wht_mar_2014_v4
 
Arc view convergence of ai and io t report
Arc view convergence of ai and io t reportArc view convergence of ai and io t report
Arc view convergence of ai and io t report
 
Storage for cloud
Storage for cloudStorage for cloud
Storage for cloud
 
Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote address
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
IBM Software Day 2013. Turning opportunities into outcomes
IBM Software Day 2013. Turning opportunities into outcomesIBM Software Day 2013. Turning opportunities into outcomes
IBM Software Day 2013. Turning opportunities into outcomes
 
Move your desktop to the cloud for $1 day
Move your desktop to the cloud for $1 day Move your desktop to the cloud for $1 day
Move your desktop to the cloud for $1 day
 
Asigra Product Marketing Strategy
Asigra Product Marketing StrategyAsigra Product Marketing Strategy
Asigra Product Marketing Strategy
 
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...
Konsolider, optimer og automatiser dit servermiljø med IBM PureApplications S...
 

Mehr von Ed Dodds

Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaUpdated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaEd Dodds
 
ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8Ed Dodds
 
Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Ed Dodds
 
Iris Ritter interconnection map
Iris Ritter interconnection mapIris Ritter interconnection map
Iris Ritter interconnection mapEd Dodds
 
Inoversity - Bob Metcalfe
Inoversity - Bob MetcalfeInoversity - Bob Metcalfe
Inoversity - Bob MetcalfeEd Dodds
 
Distributed Ledger Technology
Distributed Ledger TechnologyDistributed Ledger Technology
Distributed Ledger TechnologyEd Dodds
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondEd Dodds
 
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Ed Dodds
 
Innovation Accelerators Report
Innovation Accelerators ReportInnovation Accelerators Report
Innovation Accelerators ReportEd Dodds
 
Strategy for American Innovation
Strategy for American InnovationStrategy for American Innovation
Strategy for American InnovationEd Dodds
 
Collaboration with NSFCloud
Collaboration with NSFCloudCollaboration with NSFCloud
Collaboration with NSFCloudEd Dodds
 
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareAppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareEd Dodds
 
Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Ed Dodds
 
Data Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesData Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesEd Dodds
 
Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Ed Dodds
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in EuropeEd Dodds
 
New Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisNew Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisEd Dodds
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryEd Dodds
 

Mehr von Ed Dodds (20)

Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaUpdated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
 
ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8
 
Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...
 
Iris Ritter interconnection map
Iris Ritter interconnection mapIris Ritter interconnection map
Iris Ritter interconnection map
 
Inoversity - Bob Metcalfe
Inoversity - Bob MetcalfeInoversity - Bob Metcalfe
Inoversity - Bob Metcalfe
 
Distributed Ledger Technology
Distributed Ledger TechnologyDistributed Ledger Technology
Distributed Ledger Technology
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond
 
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
 
Jetstream
JetstreamJetstream
Jetstream
 
Innovation Accelerators Report
Innovation Accelerators ReportInnovation Accelerators Report
Innovation Accelerators Report
 
Work.
Work.Work.
Work.
 
Strategy for American Innovation
Strategy for American InnovationStrategy for American Innovation
Strategy for American Innovation
 
Collaboration with NSFCloud
Collaboration with NSFCloudCollaboration with NSFCloud
Collaboration with NSFCloud
 
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareAppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
 
Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...
 
Data Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesData Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of Responses
 
Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
 
New Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisNew Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman Jacknis
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways Summary
 

Kürzlich hochgeladen

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

NextGen Infrastructure for Big Data

  • 1. IMEX RESEARCH.COM NextGen Infrastructure Are SSDs Ready for Enterprise Storage Systems Anil Vasudeva, President & Chief Analyst, IMEX Research for Big Data © 2007-12 IMEX Research All Rights Reserved Copying Prohibited Contact IMEX for authorization Anil Vasudeva President & Chief Analyst imex@imexresearch.com IMEX RESEARCH.COM 408-268-0800 © 2010-12 IMEX Research, Copying prohibited. All rights reserved.
  • 2. Abstract IMEX RESEARCH.COM • NextGen Infrastructure for Big Data • This session will appeal to Business Planning, Marketing, Technology System Integrators and Data Center Managers seeking to understand the drivers behind the demand for and rise of Big Data. • Abstract • The internet has spawned an explosion in data growth in the form of data sets, called Big Data, that are so large they are difficult to store, manage and analyze using traditional RDBMS which are tuned for Online Transaction Processing (OLTP) only. Not only is this new data heavily unstructured, voluminous and streams rapidly and difficult to harness but even more importantly, the infrastructure cost of HW and SW required to crunch it using traditional RDBMS, to derive any analytics or business intelligence online (OLAP) from it, is prohibitive. • To capitalize on the Big Data trend, a new breed of Big Data technologies (such as Hadoop and others) many companies have emerged which are leveraging new parallelized processing, commodity hardware, open source software and tools to capture and analyze these new data sets and provide a price/performance that is 10 times better than existing Database/Data Warehousing/Business Intelligence Systems. • Learning Objectives • The presentation will illustrate the existing operational challenges businesses face today using RDBMS systems despite using fast access in-memory and solid state storage technologies. It details how IT is harnessing the emergent Big Data to manage massive amounts of data and new techniques such as parallelization and virtualization to solve complex problems in order to empower businesses with knowledgeable decision-making. • It lays out the rapidly evolving big data technology ecosystem - different big data technologies from Hadoop, Distributed File Systems, emerging NoSQL derivatives for implementation in private and hybrid cloud-based environments, Storage Infrastructure Requirements to Store, Access, Secure, Prepare for analytics and visualization of data while manipulating it rapidly to derive business intelligence online, to run businesses smartly. 2 2
  • 3. Big Data in IT Industry Roadmap IMEX RESEARCH.COM Analytics – BI IT Industry Predictive Analytics - Unstructured Data Roadmap From Dashboards Visualization to Prediction Engines using Big Data. Cloudization On-Premises > Private Clouds > Public Clouds DC to Cloud-Aware Infrast. & Apps. Cascade migration to SPs/Public Clouds. Automation Automatically Maintains Application SLAs (Self-Configuration, Self-Healing©IMEX, Self-Acctg. Charges etc.) Virtualization Pools Resources. Provisions, Optimizes, Monitors Shuffles Resources to optimize Delivery of various Business Services Integration/Consolidation Integrate Physical Infrast./Blades to meet CAPSIMS ®IMEX Cost, Availability, Performance, Scalability, Inter-operability, Manageability & Security Standardization Standard IT Infrastructure- Volume Economics HW/Syst SW (Servers, Storage, Networking Devices, System Software (OS, MW & Data Mgmt. SW) 3
  • 4. Harnessing Big Data for Business Insights IMEX RESEARCH.COM Data Big Data Business Sources Infrastructure Insights Information is at the center of Majority of data growth is being 80% of world’s data is unstructured New Wave of opportunity driven by unstructured data driven by rise in Mobility devices, and billions of large objects collaboration machine generated data. 4
  • 5. Corporate Need: Business Perf... OptimizationIMEX RESEARCH.COM Unstructured Big Data can provide Next Gen Analytics to help businesses make informed, better decision in: ▪ Product Strategy ▪ Targeting Sales ▪ Just-In-Time Supply-Chain Economics ▪ Business Performance Optimization ▪ Predictive Analytics & Recommendations ▪ Country Resources Management 5
  • 6. Corporate Need: Real Time Analytics IMEX RESEARCH.COM 6
  • 7. Corporate Need: Business Insights IMEX RESEARCH.COM Item Issue . Solution Information Exploding A unified information/content Volume: Digital Content doubling every 18 storage methodology that enables Store months. Velocity: >80% growth driven from users to manage the volume, velocity and unstructured data. Variety: sources of data variety of information from multiple sources changing Complexity in "managing" A solution portfolio of tools and information. - Need to classify, services to manage all types of Manage synchronize, aggregate, integrate, share, information in a hybrid storage environment transform, profile, move, cleanse, protect, retire Current solutions limited to BI tools Build/buy packaged Real-Time Analyze focused on structured and lagging Predictive Analytical Solutions for information unstructured analytics tools Multiple access methods needed to Centralized share, collaborate and Collaborate meet needs of a diverse audience. act on insights anytime, anywhere on any device. Ability to understand how the Model Information on current Model/ Adapt information impacts the business. operations w/potential strategy How to transfer to action. impact. Leverage Tech. to adapt. 7
  • 8. Opportunity: Converting Big Data Deluge into Predictive Analytics & Insights IMEX RESEARCH.COM Big Data Predictive Analytics Personal Location Services Data Generated by TB/Year Navigation Devices 600 Navigation Apps on Phones 20 Smart Phone Opt-In Tracking 1000 Geo Targeted Ads 20 People Locator (Emergency Calls/Search..) 10 Location based Services (e.g.Games) 5 Other 45 Total (Est.) 1700 8
  • 9. Issues with Existing RDBMS IMEX RESEARCH.COM Key Issues with RDBMS Technologies Handling Mixed Unstructured Data Unstructured Scalability - RDBMs don’t handle non-tabular data Data (Notorious for doing a poor job on recursive data structure) Legacy Archaic Architecture - RDBMS don’t parallelize well to accommodate Fault commodity HW clusters Cost Tolerance Speed Big Data - Seek time of physical Storage has not kept pace with Analytics network speed improvements EU Adhoc Latency Scale Analytics - Difficult to scale-out RDBMS efficiently – Clustering beyond few servers notoriously hard Integration Deep - Data processing tasks need to combine data from non- Availability Insights related sources, over a network Volume - Data volumes have grown from 10s GB >100s TB > PBs in recent years. Existing Tabular RDBMS can’t handle such large DBs 9
  • 10. Issues with Existing RDBMS IMEX RESEARCH.COM Present RDBMS struggling to Store & Analyze Big Data 10
  • 11. Big Data - Database Solutions IMEX RESEARCH.COM 11
  • 12. Big Data – The New Face of DBs IMEX RESEARCH.COM Big Data Paradigm - The New face of DB Systems • Adopts Schema-Free Architecture • Can do away with Legacy Relational DB Systems Some data have sparse attributes, do not need relational property • Key Oriented Queries Some data stored/retrieved mainly by primary key, w/o complex joins • Trade-off of Consistency, Availability & Partition Tolerance • Scale Out, not up, - Online Load balancing cluster growth 12
  • 13. Analytics – The Next Frontier in IT IMEX RESEARCH.COM 13
  • 14. Key Innovations: HW Technologies IMEX RESEARCH.COM 14
  • 15. Key Innovations – Solid State Storage IMEX RESEARCH.COM Storage - IOPS/GB & Price Erosion - HDD vs. SSDs 600 8 HDD Units (Millions) IOPS/GB 300 SSD IOPS/GB HDDs 0 0 2009 2010 2011 2012 2013 2014 Note: 2U storage rack, • 2.5” HDD max cap = 400GB / 24 HDDs, de-stroked to 20%, • 2.5” SSD max cap = 800GB / 36 SSDs Key to Database performance are random IOPS. SSDs outshine HDD in IO price/performance – a major reason, besides better space and power, for their explosive growth. ©2011 1515 Source: IMEX Research SSD Industry Report
  • 16. Innovations – DB SW Technologies IMEX RESEARCH.COM Tech Innovation 1985 1990 1995 2000 2005 2010 2015 OLTP Transactions Open Source / Rows Locking Optimizer Parallel Query Clustering XML Grid DB SW Hadoop OLAP- Analytics Materialized Bit Mapped Indexing Partitioning Columnar In-Memory Query Binding DB SW View Index Multi-core/ Hardware 32 bit SMP NUMA 64 bit Flash MPP Blades Columnar MPP Big Data Multi-core In-Memory Visualization OLTP Database Innovation Progress Big Data: Analytics DB Technology Impact 100000 10000 100,000 100000 10000 10000 1000 10,000 $/GB 1000 TPMc / Processor 1,000 DW Size TB $ / TPMc 1000 100 100 $/TPMc 100 10 100 TPMc/Processor 10 10 1 10 1 0.1 1 0.01 1 0.1 .1 0 0.001 0.1 0.01 .01 0 0.0001 1985 1990 1995 2000 2005 2010 2015 1985 1990 1995 2000 2005 2010 2015 16
  • 17. Big Data - Key Requirements IMEX RESEARCH.COM Types of Data Organizations Analyze 59% Customer/member data 68% 44% Transactional data from applications 68% 69% Application Logs 37% 64% Other Types of Event Data 23% 41% Network Monitoring/Network Traffic 33% 33% Online Retail Transactions 34% 51% Other Log Files 26% 28% Call Data Records 32% 46% Web Logs 21% 36% Text data from social media and online 15% 36% Search logs 11% 18% Trade/quote data 15% Hadoop 18% Intelligence/defense data 11% Non Hadoop 21% Multimedia (audio/video/images) 9% 8% Weather 3% 3% Smartmeter data 6% 3% Other (please specify) 5% 17
  • 18. Big Data – Architectural Goals IMEX RESEARCH.COM Meet Requirements of V3 Meet Enterprise Criterion Big Data Platform Analyze Data in Native Format 18
  • 19. Big Data - Market Requirements IMEX RESEARCH.COM Unified system: Pre-integrated for Ease of Installation and Management • Platform – Large Scale Indexing Pre-integrated using Hadoop Foundation, • Integrated Text Analytics - Address Unstructured Data • Usability - User Friendly Admin Console including HDFS Explorer, Query Languages • Enterprise Class Features – Provisioning, Storage, Scheduler, Advance Security • Supports search-centric, document-based XML data model o store documents within a transactional repository. • Schema-Free: o No advance knowledge of the document structure (its "schema") needed o Index words and values from each of the loaded documents together with its document structure. • Standard commodity hardware leveraged 19
  • 20. Big Data – Market Requirements IMEX RESEARCH.COM Architectural • Shared-nothing clustered DB architecture o programmable and extensible application servers. • Support massive scalability to petabytes of source data • Support open-source XQuery- and XSLT-driven architecture • Simple to Deploy, Develop and Manage (UI & Restful Interface) • Support extreme mixed workloads - a wide variety of data types including arbitrarily hierarchical data structures, images, waveforms, data logs etc. • Support thousands of geographically dispersed on--line users and programs executing variety of requests from ad hoc queries to strategic analysis • Loading data before declaring or discovering its structure • Load data in batch and streaming fashion • Integrate data from multiple sources during load process at very high rates Spread I/O and data across instances • Provide consistent performance with linear cost • Leverage Open Source SW Lo Costs, Multiple Sources, Hadoop Foundation Tools • Connectivity with Oracle DB, Teradata Warehouse, JDBC Connectivity, 20
  • 21. Big Data - Market Requirements IMEX RESEARCH.COM Real Time Analytics Execution • Execute “streaming” analytic queries in real time on incoming load data • Updating data in place at full load speeds • Scheduling and execution of complex multi-hundred node workflows • Join a billion row dimension table to a trillion row fact table without pre-clustering the dimension table with the fact table Performance • Analyze data, at very high rates >GB/sec • Predictable Sub-ms response time for highly constrained standard SQL queries Availability • Ability to configure without any single point of failure • Auto-Failover Extreme High Availability o Automated failover and process continuation without operational interruption when processing nodes fail 21
  • 22. Big Data – Product Metrics Choices IMEX RESEARCH.COM PB Data Set Size TB GB Transaction Machine Data Structure Unstructured Other Transaction Access/Use Search Big Data Analytics - Appliance Product Parallel Cluster < 1K Processing Metrics Cluster > 1K In-Memory Memory Flash Columnar DB Technique Zero Sharing No SQL Text Data Image Cataloging Audio SW Video 22
  • 23. Advantage: Big Data Products IMEX RESEARCH.COM Characteristic Legacy Paradigm Big Data Paradigm Structure •Transactional/Corporate •Unstructured/Derivative/Internet Mode •Data Collection •Data Analysis Focus •Find Answers •Find Questions Facility •Reportive / What Happened? •Analytic / Why did it Happen? Predictive / What will Happen Next? Opportunity •Very Small Growth •Massive Growth Players •Legacy Players •Agile Start Ups, well funded Impact •Analyze Existing Businesses •Create New Businesses 23
  • 24. Advantage: Big Data Products IMEX RESEARCH.COM Characteristic Traditional RDBMS Big Data/MapReduce Data Size •GB •PB Access •Interactive •Batch/Near Real-Time Latency •Low •High Data Updates •Read & Write Many Times •Write Once Read Many Times Schema/Structure •Static Schema •Dynamic Schema Language •SQL •UQL/Procedural (Java,C++..) Integrity •High •Not 100% Works Well for •Process Intensive Jobs •Data Intensive Jobs Works Well w Data Size •Gigabytes •Petabytes Data/Processing •Low Latency/High BW – precursor to •Sends Code to Data, instead of Interactions success. Ntwk. BW can be a Sending Data to other Nodes bottleneck causing nodes to be idle (Requiring Lower BW in Cluster) Fault Tolerance •Coordinating Processes with Node •Fault Tolerant for HW/SW Failures Failures – a challenge Access •Interactive •Batch/Near Real-time Scaling •Non-linear •Linear Pgm-Distribution of Jobs •Difficult •Simple & Effective 24
  • 25. Big Data Ecosystem IMEX RESEARCH.COM 25
  • 26. Big Data Stack IMEX RESEARCH.COM Merging Hadoop innovations into Nextgen DBMS Data Analytics Platform Applications Sources Advanced RHive Analytics Oracle DBA PerfMon, Query IBM Teradata… Data Enterprise Plan Importer Hive Workflow ETL Databases Oracle-to-Hive Ad-hoc query Data Data Sources Importer OLAP Data Server Data Store exporter Streaming (Hadoop) OLTP Data Server Collector Devices Rest/ JSON Search API Real-time Admin Center Queries 26
  • 27. Hadoop’s Fit in Enterprise Stack IMEX RESEARCH.COM 27
  • 28. Big Data - Hadoop Architecture IMEXRESEARCH.COM Data Flow SQL Programming (Pig) (Hive) Languages Coordination Management Distributed Computing Framework (Zookeeper) Computations (MapReduce) (HMS) Metadata Column-Stg Tabular (HCatalog) (HBase) Storage Hadoop Distributed File System Object (HDFS) Storage 28
  • 29. Big Data Connectors to EDW/BI IMEX RESEARCH.COM BI Framework - Interoperable with Enterprise Data Warehousing EDW BI APPLICATIONS (Query, Analytics, Reporting, Statistics) Big Data Big Data Orchestration Framework (Connectors) HBase Avro Flume ZooKeeper Big Data Access Framework Backup & Recovery Hive Sqoop Management Pig Security Network Big Data Storage Framework Big Data Processing Framework (HDFS) (MapReduce) Hypervisor/VMs Operating System Servers 29
  • 30. Big Data Infrastructure – Map IMEX RESEARCH.COM Reduce Data Flow SQL (Pig) (Hive) MapReduce Coordination Management Computing (Zookeeper) (HMS) Framework) Metadata MetadataColumn-Stg (HCatalog)CV (HBase) (HCatalog)CV Hadoop Distributed File System (HDFS) Map Reduce  A Distributed Computing Model  Typical Pipeline: Input>Map>Shuffle/Sort>Reduce>Output  Easy to Use , Developer writes few functions, Moves compute to Data  Schedules work on HDFS node with data  Scans through data, reducing seeks  Automatic Reliability and re-execution on failure 30
  • 31. Big Data Infrastructure – HDFS IMEX RESEARCH.COM HDFS Architecture Actively Maintaining High Availability Persistent Namespace Metadata & Journal Persistent Namespace Metadata & Journal Namespace NFS Namespace Namespace NFS Namespace State Block State Block NFS Map NFS Map Name Node Name Node Hierarchical Namespace File name >> Block IDs >> Block Locations 0. Bad/ 3. Block Lost Block 1. Replicate Periodically Check Heartbeats & Block Reports Received Block Checksums b1 b2 b1 b2 b1 b2 b1 b2 b1 b2 b1 b2 b1 b2 b1 b2 b3 Data Node b3 Data Node b3 Data Node b3 Data Node b3 Data Node b3 Data Node 2. Copy b3 Data Node b3 Data Node JBOD JBOD JBOD JBOD JBOD JBOD JBOD JBOD Block ID >> Data Block ID >> Data Horizontally Scale I/O & Storage Horizontally Scale I/O & Storage HDFS  Immutable File System – Read, Write, Sync/Flush – No random writes  Storage Server used for Computation – Move Computation to Data  Fault Tolerant & Easy Management – Built In Redundancy, Tolerates Disk & Node Failure, Auto-Managing addition/removal of nodes, One operator/8K nodes  Not a SAN but high bandwidth network access to data via Ethernet  Used typically to Solve problems not feasible with traditional systems: Large Storage Capacity >100PB raw, Large IO/computational BW >4K node/cluster, scale by adding commodity HW, Cost ~$1.5/GB incl. MR cluster
  • 32. Hadoop Distributed File System IMEX RESEARCH.COM HDFS Architecture HDFS Characteristics • Based on Google GFS (Google File System) • Redundant Storage for massive amounts of data • Data is distributed across all nodes at load time – efficient MapReduce processing • Runs on commodity hardware – assumes high failure rate for components • Works well with lots of large files • Built around Write once – Read many times” • Large Streaming Reads – Not random access • High Throughtput more important than low latency 32
  • 33. Hadoop Architecture - Overview IMEX RESEARCH.COM Hadoop Data Processing Architecture 33
  • 34. Key Technologies Required for Big Data IMEX RESEARCH.COM Key Technologies Required for Big Data • Cloud Infrastructure • Virtualization • Networking • Storage o In-Memory Data Base (Solid State Memory) o Tiered Storage Software (Performance Enhancement) o Deduplication (Cost Reduction) o Data Protection (Back Up, Archive & Recovery) 34
  • 35. Cloud Infrastructure for Big Data IMEX RESEARCH.COM Service Cloud Services Providers Examples Public - BT, Telstra, T-Systems France Telecom Public – Mutitenancy,OnDemand Providers Private - On Premises, Enterprise Private – Hybrid – IBM/Cloudburst, Hybrid – Interoperable P2P Examples SaaS Software-as-a-Service - Servers, Network, Storage eMail - Yahoo!,Google… Collaboration - Facebook,Twitter … - Management, Reporting Bus.Apps - SalesForce, GoogleApps, Intuit… Examples PaaS Platform Tools & Services - Deploy developed platforms Amazon EC2 Force.com ready for Application SW on Navitaire Cloud Aware Infrastructure Examples Infrastructure HW & IaaS Amazon S3 Services Nirvanix - Servers, Network, Storage - Management, Reporting 35
  • 36. Cloud Infrastructure for Big Data IMEX RESEARCH.COM Cloud Computing Private Cloud Hybrid Public Cloud Service Enterprise Cloud Providers SaaS Applications SaaS App App App App App SLA SLA SLA SLA SLA Platform Tools & Services Management Python Ruby .Net ….. ….. EJB PHP PaaS Operating Systems Virtualization IaaS Resources (Servers, Storage, Networks) Application’s SLA dictates the Resources Required to meet specific requirements of Availability, Performance, Cost, Security, Manageability etc.
  • 37. Private Cloud Requirements for Big Data IMEX RESEARCH.COM Public Operational Flexibility Cloud Storage Private Cloud Storage Service Catalog VZ  Choose pre-defined IT Services/user/dept. App  Define SLA to efficiently meet services Silos Costs Self- Service Private Service Cloud Analytics  Access Resources on demand to Storage  Monitor & Analyze Usage for speed deployment and delivery Charge back  Scale Resource Up/down  Interactively auto-tune  to optimize their usage, performance  Release when not needed Availability with SLA Automation requirements  Automated Provisioning - Moving Data & Processes Seamlessly  Allocation, Self Tuning of Resources to meet Workload Requirements
  • 38. Virtualization: Workloads Consolidation IMEX RESEARCH.COM • A single server 1.5x larger than standard 2-way server will handle consolidated load of 6 servers. • VZ manages the workloads + important apps get the compute resources they need automatically w/o operator intervention. • Physical consolidation of 15-20:1 is easily possible • Reasonable goal for VZ x86 servers – 40-50% utilization on large systems (>4way), rising as dual/quad core processors becomes available • Savings result in Real Estate, Power & Cooling, High Availability, Hardware, Management Source: Dan Olds & IMEX Research 2009
  • 39. Storage Architecture - Impact from VZ IMEX RESEARCH.COM Data Protection Storage Efficiency Virtualization (VZ) requires Shared Storage for Back Up/Archive/DR Virtualization - VMotion RAID – 0,1,5,6,10 Thin Provisioning - Storage VMotion Virtual Tape Deduplication - HA/DRS Auto Tiering - Fault Tolerance Replication MAID Additional Capacity Consumed for - VZ snapshots, - VM Kernel etc. Source: IMEX Research SSD Industry Report ©2011 39
  • 40. Storage – Issues & Solutions IMEX RESEARCH.COM Technologies Reducing Storage Costs Storage Costs Reduction Traditional Thin Replication ~ 95% Data Growth Snapshots ~ 75% Virtual Clones ~80% Thin Provisioning ~30% DeDuplication ~ 25-95% CAPACITY Auto-Tiering 65-95% SAVINGS ~ xx % RAID*DP ~ 40% vs. R10 Capacity Requirements
  • 41. Storage Architecture Impacting Big Data IMEX RESEARCH.COM Data Protection Storage Efficiency Back Up/Archive/DR Auto-Tiering System Storage Virtualization using Flash SSDs RAID – 0,1,5,6,10 Thin Provisioning Data Class Virtual Tape Deduplication (Tiers 0,1,2,3) Auto-Tiering Storage Media Type Replication MAID (Flash/Disk/Tape) Policy Engines (Workload Mgmt.) DRAM Transparent Migration Flash (Data Placement) File Virtualization SSD Performance Disk (Uninterrupted App.Opns.in Migration) Capacity Disk Tape Source: IMEX Research SSD Industry Report ©2011 41
  • 42. Data Storage: Hierarchical Usage IMEX RESEARCH.COM I/O Access Frequency vs. Percent of Corporate Data 95% 75% Cloud 65% FCoE/ Storage SAS SATA Arrays % of I/O Accesses • Back Up Data SSD • Tables • Archived Data • Logs • Indices • Offsite DataVault • Journals • Hot Data • Temp Tables • Hot Tables 1% 2% 10% 50% 100% % of Corporate Data Source:: IMEX Research - Cloud Infrastructure Report ©2009-12 42
  • 43. SSD Storage: Filling Price/Perf.Gaps IMEX RESEARCH.COM Price $/GB CPU SDRAM DRAM getting DRAM NOR Faster (to feed faster CPUs) & Larger (to feed Multi-cores & NAND Multi-VMs from Virtualization) PCIe SCM SSD HDD SSD segmenting into SATA  PCIe SSD Cache Tape SSD - as backend to DRAM &  SATA SSD HDD becoming - as front end to HDD Cheaper, not faster Source: IMEX Research SSD Industry Report ©2010-12 Performance I/O Access Latency 43
  • 44. SSD Storage - Performance & TCO IMEX RESEARCH.COM SAN TCO using HDD vs. Hybrid Storage SAN Performance 250 Improvements using SSD 300 10 IOPS 9 200 $/IOPS 250 Improvement Improvement 8 475% 145 800% 7 200 150 HDD-FC 6 Cost $K 0 IOPS $/IOP HDD- 36 150 5 100 SATA 4 0 100 3 SSD 64 50 75 2 50 RackSpace 1 28 14.2 Pwr/Cool 5.2 0 0 0 HDD Only HDD/SSD FC-HDD Only SSD/SATA-HDD Power & Cooling RackSpace SSDs HDD SATA HDD FC Performance (IOPS) $/IOP Source: IMEX Research SSD Industry Report ©2011 44
  • 45. Workloads Characterization IMEX RESEARCH.COM 1000 K OLTP Transaction Processing eCommerce 100 K Business IOPS* (*Latency-1) (RAID - 1, 5, 6) Data Intelligence (RAID - 0, 3) Warehousing 10K OLAP 1K Scientific Computing HPC Imaging TP 100 Audio Web 2.0 HPC Video 10 1 5 10 50 100 500 *IOPS for a required response time ( ms) MB/sec *=(#Channels*Latency-1) Source:: IMEX Research - Cloud Infrastructure Report ©2009-12
  • 46. Workloads Characterization IMEX RESEARCH.COM Storage performance, management and costs are big issues in running Databases  Data Warehousing Workloads are I/O intensive • Predominantly read based with low hit ratios on buffer pools • High concurrent sequential and random read levels  Sequential Reads requires high level of I/O Bandwidth (MB/sec)  Random Reads require high IOPS) • Write rates driven by life cycle management and sort operations  OLTP Workloads are strongly random I/O intensive • Random I/O is more dominant  Read/write ratios of 80/20 are most common but can be 50/50  Can be difficult to build out test systems with sufficient I/O characteristics  Batch Workloads are more write intensive • Sequential Writes requires high level of I/O Bandwidth (MB/sec)  Backup & Recovery times are critical for these workloads • Backup operations drive high level of sequential IO • Recovery operation drives high levels of random I/O Source: IMEX Research SSD Industry Report ©2011 46
  • 47. Best Practices – Storage in Big Data Apps IMEX RESEARCH.COM Goals & Implementation • Establish Goals for SLAs (Performance/Cost/Availability), BC/DR (RPO/RTO) & Compliance • Increase Performance for DB, OLTP and OLAP Apps: −Random I/O > 20x , Sequential I/O Bandwidth > 5x −Remove Stale data from Production Resources to improve performance • Use Partitioning Software to Classify Data −By Frequency of Access (Recent Usage) and −Capacity (by percent of total Data) using general guidelines as: −Hyperactive (1%), Active (5%), Less Active (20%), Historical (74%) Implementation • Optimize Tiering by Classifying Hot & Cold Data −Improve Query Performance by reducing number of I/Os −Reduce number of Disks Needed by 25-50% using advance compression software achieving 2-4x compression • Match Data Classification vs.Tiered Devices accordingly −Flash, High Perf Disk, Low Cost Capacity Disk, Online Lowest Cost Archival Disk/Tape • Balance Cost vs. Performance of Flash −More Data in Flash > Higher Cache Hit Ratio > Improved Data Performance • Create and Auto-Manage Tiering (Monitoring, Migrations, Placements) without manual intervention 47
  • 48. Best Practices: I/O Forensics in Storage-Tiering IMEX RESEARCH.COM Storage-Tiered Virtualization Storage-Tiering at LBA/Sub-LUN Level Physical Storage Logical Volume SSDs Hot Data Arrays Cold Data HDDs Arrays LBA Monitoring and Tiered Placement • Every workload has unique I/O access signature • Historical performance data for a LUN can identify performance skews & hot data regions Automatic by LBAs Migration (Policy Based) Source: IBM & IMEX Research SSD Industry Report 2011 ©IMEX 2010-12 48
  • 49. Best Practices: Cached Storage IMEX RESEARCH.COM App/DB Application Improvement over Cached Server vs.HDD only Oracle OLTP Benchmarks 681% SQL Server OLTP 1251% Benchmark Neoload (Web Server 533% Simulation SysBench (MySQL OLTP 150% LSI MegaRAID CacheCade Pro 2.0 Server) Response Time TPS 660 700 655 700 600 600 500 500 400 330 373 400 300 300 200 200 100 0 0 100 0 All HDD Smart Flash Persist 0 Cache Data on All HDD Smart Flash Persist Data Warpdrive Cache on Warpdrive 49
  • 50. Big Data Targets: Analytics IMEX RESEARCH.COM Key Industries Benefitting from Big Data Analytics Global IT Spending by Industry Verticals 2010-15 $B CAGR % (2010-15) 5 Year Cum Global IT Spending 2010-15 ($B) 50
  • 51. Big Data Targets – Storage Infrastructure IMEX RESEARCH.COM Value Potential of Using Big Data by Data Intensive Verticals Data Intensity by Industry Vertical Banking & Financial Services Media & Entrertainment Healthcare Providers Professional Services Telecommunications Pharma, Life Sciences & Medical Pdcts Retail & Wholesales Utilities Ind'l Elex & Electrical Equipment SW Publishing & Internet Srvcs Consumer Products Insurance Transportation Energy 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Installed Terabytes/$M Rev 2010 51
  • 52. Big Data Targets: Storage Infrastructure IMEX RESEARCH.COM Data Stored by Large US Enterprises Big Data Storage Potential Data Stored by Large US Enterprises Big Data Storage Potential Data Stored by Large US Enterprise Stored Data TB/Firm Stored Data by Industry (in US 2009 PB) (>1K Employees US) Discrete Manafacturing 14% 231 Government 12% 150 Communications and Media 10% 825 Process manufacturing 10% 1,507 Banking 9% 536 Health Care Providers 6% 801 Securities & Investment Srvcs 6% 870 Professional Services 6% 319 Retail 5% 697 Education 4% 278 Insurance 4% 3,866 Transportation 3% 370 Wholesale 3% 1,931 Utilities 3% 831 Resource Industries 2% 1,792 Consumer and Recreational 2% 1,312 Services Construction 1% 967 52
  • 53. Big Data Targets: Savings w Open Source IMEX RESEARCH.COM Legacy BI vs. Open Source Big Data Analytics 53
  • 54. Rise of Big Data Adoption IMEX RESEARCH.COM 54
  • 55. Key Takeaways IMEX RESEARCH.COM Big Data creating paradigm shift in IT Industry Leverage the opportunity to optimize your computing infrastructure with Big Data Infrastructure after making a due diligence in selection of vendors/products, industry testing and interoperability. Apply best storage technologies listed in this presentation and elsewhere Optimize Big Data Analytics for Query Response Time vs. # of Users Improving Query Response time for a given number of users (IOPs) or Serving more users (IOPS) for a given query response time Select Automated Storage Management Software Data Forensics and Tiered Placement • Every workload has unique I/O access signature • Historical performance data for a LUN can identify performance skews & hot data regions by LBAs.Non-disruptively migrate hot data using auto-tiering Software Optimize Infrastructure to meet needs of Applications/SLA • Performance Economics/Benefits • Typically 4-8% of data becomes a candidate and when migrated for higher performance tiering can provide response time reduction of ~65% at peak loads. Many industry Verticals and Applications will benefit using Big Data Source: IMEX Research SSD Industry Report ©2011 55