SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Architecting Virtualized Infrastructure for Big Data

Richard McDougall
@richardmcdougll
CTO, Application Infrastructure, Big Data Lead, VMware, Inc




                                                              © 2009 VMware Inc. All rights reserved
Cloud: Big Shifts in Simplification and Optimization


1. Reduce the Complexity       2. Dramatically Lower        3. Enable Flexible, Agile
                                      Costs                    IT Service Delivery
     to simplify operations   to redirect investment into   to meet and anticipate the
        and maintenance        value-add opportunities        needs of the business




 2
Infrastructure, Apps and now Data…




                            Build    Run
     Private
               Public


                               Manage




Simplify Infrastructure   Simplify App Platform
                                                  Simplify Data
     With Cloud              Through PaaS




 3
Trend 1/3: New Data Growing at 60% Y/Y

Exabytes of information stored                                         20 Zetta by 2015

                                                                       1 Yotta by 2030

                                                                       Yes, you are part
                                                                       of the yotta
                                                       audio(          generation…
                                                 digital(tv(
                                              digital(photos(
                                      camera(phones,(rfid(
                                  medical(imaging,(
                                  sensors(
                 satellite(images,(logs,(scanners,(twi7er(
       cad/cam,(appliances,(machine(data,(digital(movies(



                                                        Source: The Information Explosion, 2009


4
Data Growth in the Enterprise




5
Trend 2/3: Big Data – Driven by Real-World Benefit




6
Trend 3/3: Value from Data Exceeds Hardware Cost

!  Value from the intelligence of data analytics now outstrips the cost
     of hardware
     •  Hadoop enables the use of 10x lower cost hardware
     •  Hardware cost halving every 18mo
                                                            Value
                     Big Iron:
                     $40k/CPU

                                                                    Commodity
                                                                    Cluster:
                                                                    $1k/CPU
                                              Cost




 7
A Holistic View of a Big Data System:


                Real Time
                 Streams


                     Real-Time
                     Processing
                      (s4, storm)
                                                    Analytics

    ETL                     Real Time
                            Structured     Big SQL           Batch
                            Database       (Greenplum,     Processin
                                            AsterData,
                              (hBase,
                                              Etc…)
                                                               g
                              Gemfire,
                             Cassandra)




                            Unstructured Data (HDFS)



8
Big Data Frameworks and Characteristics

Framework                Scale of      Scale of   Computable Local
                         data          Cluster    Data?      Disks?

File System:             10s PB        100s       Some       Yes, for cost
Gluster, Isilon, etc,…

Map-reduce:              100s PB       1,000s     Yes        Yes, for cost,
Hadoop                                                       bandwidth
                                                             and
                                                             availability
Big-SQL:                 PB’s          100s       Some       Yes, for cost
Greenplum, Aster Data,                                       and
Netezza, …                                                   bandwidth
No-SQL:                  Trilions      100s       Some       Yes, for cost
Cassandra, hBase, …      Of rows                             and
                                                             availability
In-Memory:               Billions of   10s-100s   Yes        Primarily
Redis, Gemfire,          rows                                Memory
Membase, …
  9
The Unified Analytics Cloud Platform



          Madlib
                           Analytics Tools      Karmasphere
             Data Meer                                    Tableau

         Hadoop              Developer           Spring
                                                             PaaS
          Python            Frameworks       Cloudfoundry

        Cassandra                                  hBase
                  HDFS   Database/DataStore
           Greenplum                                 Voldemort


         Data-Director
                            Data Platform           Data PaaS
            EMC Chorus



             vSphere     Cloud Infrastructure
                                                   Private
                                                               Public




10
Unifying the Big Data Platform using Virtualization

!  Goals
 •  Make it fast and easy to provision new data Clusters on Demand
 •  Allow Mixing of Workloads
 •  Leverage virtual machines to provide isolation (esp. for Multi-tenant)
 •  Optimize data performance based on virtual topologies
 •  Make the system reliable based on virtual topologies
!  Leveraging Virtualization
 •  Elastic scale
 •  Use high-availability to protect key services, e.g., Hadoop’s namenode/job tracker
 •  Resource controls and sharing: re-use underutilized memory, cpu
 •  Prioritize Workloads: limit or guarantee resource usage in a mixed environment


                                           Cloud Infrastructure

                                                                      Private
                                                                                Public

11
A Unified Analytics Cloud Significantly Simplifies

                                 !  Simplify
                                    •  Single Hardware Infrastructure
                                    •  Faster/Easier provisioning

SQLCluster



                                         Big SQL       NoSQL            Hadoop
      NoSQL Cluster

                                               Unifed Analytics Infrastructure

                                                   Private
                                                             Public
 Hadoop Cluster

                                  !  Optimize
                                     •  Shared Resources = higher utilization
                                     •  Elastic resources = faster on-demand
      Decision Support Cluster        access
 12
Use Local Disk where it’s Needed




      SAN Storage          NAS Filers       Local Storage

     $2 - $10/Gigabyte   $1 - $5/Gigabyte   $0.05/Gigabyte

         $1M gets:          $1M gets:          $1M gets:
       0.5Petabytes        1 Petabyte         20 Petabytes
       200,000 IOPS       400,000 IOPS      10,000,000 IOPS
        1Gbyte/sec         2Gbyte/sec        800 Gbytes/sec

13
VMware is Commited to be the Best Virtual platform for
Hadoop
!  Performance Studies and Best Practices
 •  Studies through 2010-2011 of Hadoop 0.20 on vSphere 5
 •  White paper, including detailed configurations and recommendations
!  Making Hadoop run well on vSphere
 •  Performance optimizations in vSphere releases
 •  VMware engagement in Hadoop Community effort
 •  Supporting key partners with their distibutions on vSphere
 •  Contributing enhancements to Hadoop
!  Hadoop Framework Integration
 •  Spring Hadoop: Enabling Spring to simplify Map-Reduce Jobs
 •  Spring Batch: Sophisticated batch management (Oozie on steroids)




14
Extend Virtual Storage Architecture to Include Local Disk

 !  Shared Storage: SAN or NAS                                                            !  Hybrid Storage
         •  Easy to provision                                                              •  SAN for boot images, VMs, other
         •  Automated cluster rebalancing                                                        workloads
                                                                                           •  Local disk for Hadoop & HDFS
                                                                                           •  Scalable Bandwidth, Lower Cost/GB
          Other VM

                     Other VM




                                                  Other VM




                                                                               Other VM




                                                                                                     Other VM

                                                                                                                Other VM




                                                                                                                                             Other VM




                                                                                                                                                                          Other VM
Hadoop




                                Hadoop

                                         Hadoop




                                                             Hadoop

                                                                      Hadoop




                                                                                            Hadoop




                                                                                                                           Hadoop

                                                                                                                                    Hadoop




                                                                                                                                                        Hadoop

                                                                                                                                                                 Hadoop
          Host                           Host                         Host                           Host                           Host                         Host




     15
Performance Analysis of Big Data (Hadoop) on Virtualization

                             Ratio of time taken – Lower is Better
                       1.2



                        1



                       0.8
     Ratio to Native




                       0.6

                                                                              1 VM
                       0.4                                                    2 VMs



                       0.2



                        0




                                                                     Tested on vSphere 5.0

16
Simplify Hetrogeneous Data Management via Data PaaS



                                       Large-             In-
                         File-                                          Big
                                        Scale            Memor
                        system                                          SQL
                                       NoSQL               y



  Analytics Tools

      Developer

      Databases
                       Data PaaS – Common Data Management Layer

   Data Platform        Provisioning      Multi-tenancy          Import/Export
Cloud Infrastructure        Management             Data Discovery




                                          Cloud Infrastructure



 17
vFabric Data Director Powers Database-as-a-Service



                    Existing Applications                              New Applications

                                            vFabric Data Director

                    Automation                         Backup/                      One click
                                     Provisioning                    Clone            HA
                    Self-Service                       Restore
     DBA App Dev



                    Policy Based      Resource         Security     Database
                                                                                     Monitor
                      Control          Mgmt             Mgmt        Templates
     DBA IT Admin




                                              VMware vSphere




18
Data Systems: Databases, file systems




     Analytics Tools    Unstructured               Structured
       Developer

       Databases
                                  Large-    In-
     Data Platform       File-                           Big
                                   Scale   Memor
                        system                           SQL
 Cloud Infrastructure             NoSQL      y




19
Technology: Databases and Data Stores for Big Data

                  Unstructured                                           Structured



                                   Large-
               File-                                         In-                    Big
                                    Scale
              system                                       Memory                   SQL
                                   NoSQL



           Log files,
           machine            Loosely typed device
Types of   generated data,    data, records, events,   Structured,
                                                                              Structured data
  Data     documents,         statistics, complex      partitionable data
           device data,       relations/graphs
           etc…
           NAS, HDFS,
 Techno-                      Cassandra, hBase,        Gemfire, Redis,        Greenplum, Sybase
           Blob (S3, Atmos,
  logies                      Voldemort                Membase                IQ, Aster Data, etc,.
           etc..)

           Store any data,                                                    High performance
                              Easy to scale-out,
           easy to scale-out,                          High Throughput, low   for repetitive
 Values                       flexible and dynamic
           can optimize for                            latency                queries. Ease of
                              schema’s
20         cost                                                               query language.
Simplified Developer Experience through PaaS




     Analytics Tools

       Developer

       Databases

     Data Platform

Cloud Infrastructure        Platform as a Service




21
Spring Big Data Integrations

!  NoSQL Integration
 •  Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra
!  Spring Hadoop
 •  Announced this week at Strata!
 •  Provides support for developing applications based on Hadoop technologies by
     leveraging the capabilities of the Spring ecosystem.

!  Spring Batch
 •  Integration allows Hadoop jobs and HDFS operations as part of workflow




22
The Unified Analytics Cloud Platform



          Madlib
                           Analytics Tools      Karmasphere
             Data Meer                                    Tableau

         Hadoop              Developer           Spring
                                                             PaaS
          Python            Frameworks       Cloudfoundry

        Cassandra                                  hBase
                  HDFS   Database/DataStore
           Greenplum                                 Voldemort


         Data-Director
                            Data Platform           Data PaaS
            EMC Chorus



             vSphere     Cloud Infrastructure
                                                   Private
                                                               Public




23
Summary

!  Revolution in Big Data is under way
 •  Data centric applications are now critical
!  Hadoop on Virtualization
 •  Proven performance
 •  Cloud/Virtualization values apparent for Hadoop use
!  Simplify through a Unified Analytics Cloud
 •  One Platform for today’s and future big-data systems
 •  Better Utilization
 •  Faster deployment, elastic resources
 •  Secure, Isolated, Multi-tenant capability for Analytics




24
References

!  Twitter
  •  @richardmcdougll
!  My CTO Blog
  •  http://communities.vmware.com/community/vmtn/cto/cloud

!  Hadoop on vSphere
  •  Talk @ Hadoop World
  •  Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf
!  Spring Hadoop
  •  http://blog.springsource.org/2012/02/29/introducing-spring-hadoop




25

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentGlusterFS
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflowrjmurphyslideshare
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGlusterFS
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Postgres Plus Cloud Database
Postgres Plus Cloud DatabasePostgres Plus Cloud Database
Postgres Plus Cloud DatabaseGary Carter
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010GlusterFS
 
Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterpriseGruter
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Open Stack
 
Arsys at hp discovery emea 2011
Arsys at hp discovery emea 2011Arsys at hp discovery emea 2011
Arsys at hp discovery emea 2011Arsys
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAmazon Web Services
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2Tim Bell
 

Was ist angesagt? (19)

Cloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and DeploymentCloud Storage Adoption, Practice, and Deployment
Cloud Storage Adoption, Practice, and Deployment
 
Lug best practice_hpc_workflow
Lug best practice_hpc_workflowLug best practice_hpc_workflow
Lug best practice_hpc_workflow
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 
Gluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFSGluster Webinar: Introduction to GlusterFS
Gluster Webinar: Introduction to GlusterFS
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Postgres Plus Cloud Database
Postgres Plus Cloud DatabasePostgres Plus Cloud Database
Postgres Plus Cloud Database
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010
 
Cloumon enterprise
Cloumon enterpriseCloumon enterprise
Cloumon enterprise
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
Arsys at hp discovery emea 2011
Arsys at hp discovery emea 2011Arsys at hp discovery emea 2011
Arsys at hp discovery emea 2011
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloud
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 

Andere mochten auch

Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Richard McDougall
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareRichard McDougall
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011Dan Brinkmann
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution SoupDan Brinkmann
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingDan Brinkmann
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3Vepsun Technologies
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6Vepsun Technologies
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialRichard McDougall
 

Andere mochten auch (18)

Making of the Burner Board
Making of the Burner BoardMaking of the Burner Board
Making of the Burner Board
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009Solaris Internals Preso circa 2009
Solaris Internals Preso circa 2009
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Virtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMwareVirtualizing Oracle Databases with VMware
Virtualizing Oracle Databases with VMware
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Denver VMUG nov 2011
Denver VMUG nov 2011Denver VMUG nov 2011
Denver VMUG nov 2011
 
Citrix Remote Access Solution Soup
Citrix Remote Access Solution SoupCitrix Remote Access Solution Soup
Citrix Remote Access Solution Soup
 
VMware vSphere Performance Troubleshooting
VMware vSphere Performance TroubleshootingVMware vSphere Performance Troubleshooting
VMware vSphere Performance Troubleshooting
 
VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5VMware Advance Troubleshooting Workshop - Day 5
VMware Advance Troubleshooting Workshop - Day 5
 
VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2VMware Advance Troubleshooting Workshop - Day 2
VMware Advance Troubleshooting Workshop - Day 2
 
VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4VMware Advance Troubleshooting Workshop - Day 4
VMware Advance Troubleshooting Workshop - Day 4
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6VMware Advance Troubleshooting Workshop - Day 6
VMware Advance Troubleshooting Workshop - Day 6
 
IdP, SAML, OAuth
IdP, SAML, OAuthIdP, SAML, OAuth
IdP, SAML, OAuth
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
 

Ähnlich wie Architecting Virtualized Infrastructure for Big Data

Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsJ. David Morris
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big datasolarisyourep
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big dataxKinAnx
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationVlad Ponomarev
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...i_scienceEU
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPCNetApp
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scaleinside-BigData.com
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopHazelcast
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10keirdo1
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedJazz Yao-Tsung Wang
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 

Ähnlich wie Architecting Virtualized Infrastructure for Big Data (20)

Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
 
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics PrezoCetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Presentation architecting virtualized infrastructure for big data
Presentation   architecting virtualized infrastructure for big dataPresentation   architecting virtualized infrastructure for big data
Presentation architecting virtualized infrastructure for big data
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Architecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentationArchitecting virtualized infrastructure for big data presentation
Architecting virtualized infrastructure for big data presentation
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPC
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud Testbed
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

Kürzlich hochgeladen

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Architecting Virtualized Infrastructure for Big Data

  • 1. Architecting Virtualized Infrastructure for Big Data Richard McDougall @richardmcdougll CTO, Application Infrastructure, Big Data Lead, VMware, Inc © 2009 VMware Inc. All rights reserved
  • 2. Cloud: Big Shifts in Simplification and Optimization 1. Reduce the Complexity 2. Dramatically Lower 3. Enable Flexible, Agile Costs IT Service Delivery to simplify operations to redirect investment into to meet and anticipate the and maintenance value-add opportunities needs of the business 2
  • 3. Infrastructure, Apps and now Data… Build Run Private Public Manage Simplify Infrastructure Simplify App Platform Simplify Data With Cloud Through PaaS 3
  • 4. Trend 1/3: New Data Growing at 60% Y/Y Exabytes of information stored 20 Zetta by 2015 1 Yotta by 2030 Yes, you are part of the yotta audio( generation… digital(tv( digital(photos( camera(phones,(rfid( medical(imaging,( sensors( satellite(images,(logs,(scanners,(twi7er( cad/cam,(appliances,(machine(data,(digital(movies( Source: The Information Explosion, 2009 4
  • 5. Data Growth in the Enterprise 5
  • 6. Trend 2/3: Big Data – Driven by Real-World Benefit 6
  • 7. Trend 3/3: Value from Data Exceeds Hardware Cost !  Value from the intelligence of data analytics now outstrips the cost of hardware •  Hadoop enables the use of 10x lower cost hardware •  Hardware cost halving every 18mo Value Big Iron: $40k/CPU Commodity Cluster: $1k/CPU Cost 7
  • 8. A Holistic View of a Big Data System: Real Time Streams Real-Time Processing (s4, storm) Analytics ETL Real Time Structured Big SQL Batch Database (Greenplum, Processin AsterData, (hBase, Etc…) g Gemfire, Cassandra) Unstructured Data (HDFS) 8
  • 9. Big Data Frameworks and Characteristics Framework Scale of Scale of Computable Local data Cluster Data? Disks? File System: 10s PB 100s Some Yes, for cost Gluster, Isilon, etc,… Map-reduce: 100s PB 1,000s Yes Yes, for cost, Hadoop bandwidth and availability Big-SQL: PB’s 100s Some Yes, for cost Greenplum, Aster Data, and Netezza, … bandwidth No-SQL: Trilions 100s Some Yes, for cost Cassandra, hBase, … Of rows and availability In-Memory: Billions of 10s-100s Yes Primarily Redis, Gemfire, rows Memory Membase, … 9
  • 10. The Unified Analytics Cloud Platform Madlib Analytics Tools Karmasphere Data Meer Tableau Hadoop Developer Spring PaaS Python Frameworks Cloudfoundry Cassandra hBase HDFS Database/DataStore Greenplum Voldemort Data-Director Data Platform Data PaaS EMC Chorus vSphere Cloud Infrastructure Private Public 10
  • 11. Unifying the Big Data Platform using Virtualization !  Goals •  Make it fast and easy to provision new data Clusters on Demand •  Allow Mixing of Workloads •  Leverage virtual machines to provide isolation (esp. for Multi-tenant) •  Optimize data performance based on virtual topologies •  Make the system reliable based on virtual topologies !  Leveraging Virtualization •  Elastic scale •  Use high-availability to protect key services, e.g., Hadoop’s namenode/job tracker •  Resource controls and sharing: re-use underutilized memory, cpu •  Prioritize Workloads: limit or guarantee resource usage in a mixed environment Cloud Infrastructure Private Public 11
  • 12. A Unified Analytics Cloud Significantly Simplifies !  Simplify •  Single Hardware Infrastructure •  Faster/Easier provisioning SQLCluster Big SQL NoSQL Hadoop NoSQL Cluster Unifed Analytics Infrastructure Private Public Hadoop Cluster !  Optimize •  Shared Resources = higher utilization •  Elastic resources = faster on-demand Decision Support Cluster access 12
  • 13. Use Local Disk where it’s Needed SAN Storage NAS Filers Local Storage $2 - $10/Gigabyte $1 - $5/Gigabyte $0.05/Gigabyte $1M gets: $1M gets: $1M gets: 0.5Petabytes 1 Petabyte 20 Petabytes 200,000 IOPS 400,000 IOPS 10,000,000 IOPS 1Gbyte/sec 2Gbyte/sec 800 Gbytes/sec 13
  • 14. VMware is Commited to be the Best Virtual platform for Hadoop !  Performance Studies and Best Practices •  Studies through 2010-2011 of Hadoop 0.20 on vSphere 5 •  White paper, including detailed configurations and recommendations !  Making Hadoop run well on vSphere •  Performance optimizations in vSphere releases •  VMware engagement in Hadoop Community effort •  Supporting key partners with their distibutions on vSphere •  Contributing enhancements to Hadoop !  Hadoop Framework Integration •  Spring Hadoop: Enabling Spring to simplify Map-Reduce Jobs •  Spring Batch: Sophisticated batch management (Oozie on steroids) 14
  • 15. Extend Virtual Storage Architecture to Include Local Disk !  Shared Storage: SAN or NAS !  Hybrid Storage •  Easy to provision •  SAN for boot images, VMs, other •  Automated cluster rebalancing workloads •  Local disk for Hadoop & HDFS •  Scalable Bandwidth, Lower Cost/GB Other VM Other VM Other VM Other VM Other VM Other VM Other VM Other VM Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Host Host Host Host Host Host 15
  • 16. Performance Analysis of Big Data (Hadoop) on Virtualization Ratio of time taken – Lower is Better 1.2 1 0.8 Ratio to Native 0.6 1 VM 0.4 2 VMs 0.2 0 Tested on vSphere 5.0 16
  • 17. Simplify Hetrogeneous Data Management via Data PaaS Large- In- File- Big Scale Memor system SQL NoSQL y Analytics Tools Developer Databases Data PaaS – Common Data Management Layer Data Platform Provisioning Multi-tenancy Import/Export Cloud Infrastructure Management Data Discovery Cloud Infrastructure 17
  • 18. vFabric Data Director Powers Database-as-a-Service Existing Applications New Applications vFabric Data Director Automation Backup/ One click Provisioning Clone HA Self-Service Restore DBA App Dev Policy Based Resource Security Database Monitor Control Mgmt Mgmt Templates DBA IT Admin VMware vSphere 18
  • 19. Data Systems: Databases, file systems Analytics Tools Unstructured Structured Developer Databases Large- In- Data Platform File- Big Scale Memor system SQL Cloud Infrastructure NoSQL y 19
  • 20. Technology: Databases and Data Stores for Big Data Unstructured Structured Large- File- In- Big Scale system Memory SQL NoSQL Log files, machine Loosely typed device Types of generated data, data, records, events, Structured, Structured data Data documents, statistics, complex partitionable data device data, relations/graphs etc… NAS, HDFS, Techno- Cassandra, hBase, Gemfire, Redis, Greenplum, Sybase Blob (S3, Atmos, logies Voldemort Membase IQ, Aster Data, etc,. etc..) Store any data, High performance Easy to scale-out, easy to scale-out, High Throughput, low for repetitive Values flexible and dynamic can optimize for latency queries. Ease of schema’s 20 cost query language.
  • 21. Simplified Developer Experience through PaaS Analytics Tools Developer Databases Data Platform Cloud Infrastructure Platform as a Service 21
  • 22. Spring Big Data Integrations !  NoSQL Integration •  Spring data for MongoDB, Gemfire, Riak, Neo4j, Blob, Cassandra !  Spring Hadoop •  Announced this week at Strata! •  Provides support for developing applications based on Hadoop technologies by leveraging the capabilities of the Spring ecosystem. !  Spring Batch •  Integration allows Hadoop jobs and HDFS operations as part of workflow 22
  • 23. The Unified Analytics Cloud Platform Madlib Analytics Tools Karmasphere Data Meer Tableau Hadoop Developer Spring PaaS Python Frameworks Cloudfoundry Cassandra hBase HDFS Database/DataStore Greenplum Voldemort Data-Director Data Platform Data PaaS EMC Chorus vSphere Cloud Infrastructure Private Public 23
  • 24. Summary !  Revolution in Big Data is under way •  Data centric applications are now critical !  Hadoop on Virtualization •  Proven performance •  Cloud/Virtualization values apparent for Hadoop use !  Simplify through a Unified Analytics Cloud •  One Platform for today’s and future big-data systems •  Better Utilization •  Faster deployment, elastic resources •  Secure, Isolated, Multi-tenant capability for Analytics 24
  • 25. References !  Twitter •  @richardmcdougll !  My CTO Blog •  http://communities.vmware.com/community/vmtn/cto/cloud !  Hadoop on vSphere •  Talk @ Hadoop World •  Performance Paper – http://www.vmware.com/files/.../VMW-Hadoop-Performance-vSphere5.pdf !  Spring Hadoop •  http://blog.springsource.org/2012/02/29/introducing-spring-hadoop 25