SlideShare ist ein Scribd-Unternehmen logo
1 von 30
TRACK 2S
                              S7 - Big Data
                  Performance and Capacity Management

                                            Paul Seaton-Smith
                                        RightSize Solutions Limited

                                       www.rightsizesolutions.co.uk




Copyright © 2012 RightSize Solutions Limited. All rights reserved.    1
Agenda

 What is Big Data?

 Big Data technologies

 Implications for capacity management




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   2
What is Big Data?

 Exceeds the processing capacity of legacy databases

 The data is too big, arrives too fast, or can’t be stored in existing
     database architectures

 To leverage this data, we need new ways to store and analyse it




Copyright © 2012 RightSize Solutions Limited. All rights reserved.        3
Volume, Velocity, Variety

 Big data spans three dimensions

        Volume

        Velocity

        Variety




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   4
Volume

 Every day 2.5 Exabytes (1018 bytes) of data are created

 90% of the world’s data has been created in the last two years

 Historically much of that data has not been able to be processed




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   5
Velocity

 Streaming data

 Less sampling or aggregation (leading to greater Volume)

 Big data may be used as soon as it is collected as results are often
     time-sensitive

 Real-time or near-time feedback

 Maximize the value to the business

        E.g. Sales data that may have been collated and analysed on a
             monthly basis might now be available every hour


Copyright © 2012 RightSize Solutions Limited. All rights reserved.       6
Variety

 Unstructured data of all varieties that is hard to store in databases

        Text                                                         Social network feeds

        Audio (e.g. MP3, radio)                                      Vehicle telemetry

        Video (e.g. CCTV, YouTube)                                   Financial market data

        Click streams                                                Web page content

        Log files (e.g. web logs)                                    GPS trails

        Images (e.g. satellite images)                               etc.




Copyright © 2012 RightSize Solutions Limited. All rights reserved.                             7
Why do we need Big Data?

 Previously 80% of corporate information was stored on paper

 20% was kept in electronic form

        At least 80% of that was held in databases

 Now 80% of corporate information is in electronic form (Volume)

        At least 80% of that is not in a database (Variety)




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   8
What is driving Big Data?

 More data is stored as storage prices drop

        Over the last 30 years, storage space per unit cost has doubled
             roughly every 14 months

 Commodity hardware, cloud architectures and open source software

 Marketing from large vendors (EMC, IBM, Oracle)

 Change of attitude towards leveraging data

        Tackle complex problems that previously could not be solved

        Monetise your business data


Copyright © 2012 RightSize Solutions Limited. All rights reserved.         9
Examples

 Financial services could examine data sources to determine likely
     potential sources of financial fraud

        Credit and transaction history

        Social networking behaviour

        Demographics

        Voice recordings




Copyright © 2012 RightSize Solutions Limited. All rights reserved.    10
Examples

 Analyse Tweets to understand public sentiment for a new product

 Monitor power meter readings to better predict consumption

 Examine service desk call detail records in real-time to predict
     customer churn faster

 Analyse real time traffic information to change traffic lights and ease
     congestion




Copyright © 2012 RightSize Solutions Limited. All rights reserved.      11
Technology

 Apache Hadoop

        MapReduce

        HDFS

 Further Apache projects to support Hadoop

 3rd party enterprise class distributions built on Hadoop




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   12
Apache Hadoop

 Distributed processing of large data sets across clusters of
     computers

 Scale to thousands of machines, each offering local computation
     and storage

 Designed to detect and handle failures at the application layer

 Delivers a highly-available service on top of a cluster of computers,
     each of which may be prone to failures

 Based on MapReduce and HDFS


Copyright © 2012 RightSize Solutions Limited. All rights reserved.        13
MapReduce

 MapReduce is a system for distributing computation
 Created by Google in response to the problem of creating web
     search indexes
 Take a query over a dataset, divide it, and run it in parallel over
     multiple nodes
 Processes and analyses any data type across clusters of
     commodity servers
 Distributing the computation solves the problem of having data too
     large to fit onto a single node


Copyright © 2012 RightSize Solutions Limited. All rights reserved.      14
HDFS

 A file system that spans all the nodes in a cluster for data storage
 Data in a cluster is broken down into smaller blocks and distributed
     throughout the cluster
 The map and reduce functions can be ex-ecuted on smaller subsets
     of the original data sets
 HDFS links together the file systems on many local nodes to make
     them into one big file system
 HDFS assumes nodes will fail, so it achieves reliability by replicating
     data across multiple nodes


Copyright © 2012 RightSize Solutions Limited. All rights reserved.       15
Apache Hadoop

 Apache Hadoop is an open source MapReduce implementation with
     HDFS

 Originally funded by Yahoo

 Hadoop is supplemented by further Apache components to enhance
     its usability and functionality

 3rd party distributions have additional functionality and management
     tools




Copyright © 2012 RightSize Solutions Limited. All rights reserved.       16
NoSQL Database Management Systems

 Do not have to use SQL

 May not guarantee atomicity, consistency, isolation, durability

 Distributed, fault-tolerant architecture

 Various types

        Document store databases

        Graph databases

        Key-value stores

        BigTable implementations


Copyright © 2012 RightSize Solutions Limited. All rights reserved.   17
Apache HBase

 HBase is the NoSQL Hadoop database

 Runs on top of HDFS

 Provides BigTable-like support for Hadoop

 Provides random, real-time read/write access to data in Hadoop




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   18
Apache Hive

 Built on top of Hadoop enabling it to operate as a data warehouse

 Superimposes structure on data in HDFS on a variety of data
     formats

 Enables ad-hoc analytical queries over the data using a SQL-like
     syntax

 Tools to enable easy data extract/transform/load

 Best used for batch jobs over large sets of append-only data (such
     as web logs)


Copyright © 2012 RightSize Solutions Limited. All rights reserved.     19
Further Apache projects

 ZooKeeper is a centralised service for maintaining configuration
     information, naming, distributed synchronisation, group services

 Pig is a programming language that simplifies the common tasks of
     working with Hadoop

 Sqoop is a tool designed to import data from relational databases
     into Hadoop, either directly into HDFS or into Hive

 Flume is designed to import streaming flows of log data directly into
     HDFS



Copyright © 2012 RightSize Solutions Limited. All rights reserved.        20
Benefits of Hadoop

 Scalability
        Add new nodes as needed
        Add nodes without needing to change data formats, how data is
             loaded, how jobs are written, or the applications on top
 Value for money
        Massively parallel computing on commodity servers




Copyright © 2012 RightSize Solutions Limited. All rights reserved.      21
Benefits of Hadoop

 Flexibility

        Schema-less

        Can absorb any type of data, structured or not, from any
             number of sources

 Fault tolerance

        If you lose a node, the system redirects work to another location
             of the data and continues processing




Copyright © 2012 RightSize Solutions Limited. All rights reserved.       22
Commercial Hadoop Distributions

 EMC Greenplum HD

        http://www.greenplum.com/products/greenplum-hd/

 Microsoft Big Data solution

        http://www.microsoft.com/sqlserver/en/us/solutions-
             technologies/business-intelligence/big-data-solution.aspx

 Oracle Big Data Appliance

        http://www.oracle.com/us/products/database/big-data-
             appliance/overview/index.html


Copyright © 2012 RightSize Solutions Limited. All rights reserved.       23
Commercial Hadoop Distributions

 Cloudera CDH

        http://www.cloudera.com/hadoop/

 IBM InfoSphere BigInsights

        http://www-01.ibm.com/software/data/infosphere/biginsights/

 Amazon Elastic MapReduce (Amazon EMR)

        http://aws.amazon.com/elasticmapreduce/




Copyright © 2012 RightSize Solutions Limited. All rights reserved.     24
Capacity Management Challenges

 Hard to forecast what new data may be available in the future

        Social networks

        Social video

        Social media

        Social location

        Augmented reality

 What next?




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   25
Capacity Management Challenges

 Forecasting

        Will data growth of existing types continue exponentially

        What new data types will emerge

 Monitoring

        Cost of agent-based licencing model for monitoring tools

        Lack of standard Hadoop management tools

 Response times

        Velocity means results are time-critical


Copyright © 2012 RightSize Solutions Limited. All rights reserved.   26
Capacity Management Challenges

 Technology

        Understanding new technologies, which are still emerging

        Experienced staff principally come from US companies (Google,
             Yahoo, Facebook etc.)

        Crowded marketplace means harder to build expertise




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   27
Contact

 Email

        paul.seatonsmith@rightsizesolutions.co.uk

 Website

        www.rightsizesolutions.co.uk

 LinkedIn

        http://linkd.in/seatonsmith

 LinkedIn Group

        http://linkd.in/capacitymanagement


Copyright © 2012 RightSize Solutions Limited. All rights reserved.   28
References

 http://radar.oreilly.com/2012/01/what-is-big-data.html

 http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html

 http://radar.oreilly.com/2012/01/big-data-ecosystem.html

 http://www.theregister.co.uk/2012/01/16/big_data_study/

 http://www.mkomo.com/cost-per-gigabyte




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   29
References

 http://www-01.ibm.com/software/data/bigdata/

 http://en.wikipedia.org/wiki/NoSQL

 http://www.techrepublic.com/blog/cio-insights/big-data-cheat-
     sheet/39748353

 http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_
     BigTable




Copyright © 2012 RightSize Solutions Limited. All rights reserved.   30

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesYahoo Developer Network
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 

Was ist angesagt? (20)

February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
February 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and InsidesFebruary 2014 HUG : Tez Details and Insides
February 2014 HUG : Tez Details and Insides
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
10c introduction
10c introduction10c introduction
10c introduction
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 

Andere mochten auch

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Metron
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterChris Henry
 
Big Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredBig Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredGreenway Health
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Desing Pathshala
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingSaliya Ekanayake
 
Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Lokukaluge Prasad Perera
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupQualitest
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeROHIT KHARABE
 
ATAGTR2017 Performance Testing of Big Data Application
ATAGTR2017 Performance Testing of Big Data ApplicationATAGTR2017 Performance Testing of Big Data Application
ATAGTR2017 Performance Testing of Big Data ApplicationAgile Testing Alliance
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

Andere mochten auch (16)

Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1 Cost savings and expert system advice with athene ES/1
Cost savings and expert system advice with athene ES/1
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
Big Data - Outcomes Performance Measured
Big Data - Outcomes Performance MeasuredBig Data - Outcomes Performance Measured
Big Data - Outcomes Performance Measured
 
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
Hadoop Basics - Apache hadoop Bigdata training by Design Pathshala
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.Handling Big Data in Ship Performance & Navigation Monitoring.
Handling Big Data in Ship Performance & Navigation Monitoring.
 
ATAGTR2017 Appium
ATAGTR2017 AppiumATAGTR2017 Appium
ATAGTR2017 Appium
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
ATAGTR2017 Performance Testing of Big Data Application
ATAGTR2017 Performance Testing of Big Data ApplicationATAGTR2017 Performance Testing of Big Data Application
ATAGTR2017 Performance Testing of Big Data Application
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Ähnlich wie Big Data Performance and Capacity Management

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Big data tim
Big data timBig data tim
Big data timT Weir
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory SessionAbhinav Parmar
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxshujee381
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 

Ähnlich wie Big Data Performance and Capacity Management (20)

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big data tim
Big data timBig data tim
Big data tim
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory Session
 
Sycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptxSycamore Quantum Computer 2019 developed.pptx
Sycamore Quantum Computer 2019 developed.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Big Data Performance and Capacity Management

  • 1. TRACK 2S S7 - Big Data Performance and Capacity Management Paul Seaton-Smith RightSize Solutions Limited www.rightsizesolutions.co.uk Copyright © 2012 RightSize Solutions Limited. All rights reserved. 1
  • 2. Agenda  What is Big Data?  Big Data technologies  Implications for capacity management Copyright © 2012 RightSize Solutions Limited. All rights reserved. 2
  • 3. What is Big Data?  Exceeds the processing capacity of legacy databases  The data is too big, arrives too fast, or can’t be stored in existing database architectures  To leverage this data, we need new ways to store and analyse it Copyright © 2012 RightSize Solutions Limited. All rights reserved. 3
  • 4. Volume, Velocity, Variety  Big data spans three dimensions  Volume  Velocity  Variety Copyright © 2012 RightSize Solutions Limited. All rights reserved. 4
  • 5. Volume  Every day 2.5 Exabytes (1018 bytes) of data are created  90% of the world’s data has been created in the last two years  Historically much of that data has not been able to be processed Copyright © 2012 RightSize Solutions Limited. All rights reserved. 5
  • 6. Velocity  Streaming data  Less sampling or aggregation (leading to greater Volume)  Big data may be used as soon as it is collected as results are often time-sensitive  Real-time or near-time feedback  Maximize the value to the business  E.g. Sales data that may have been collated and analysed on a monthly basis might now be available every hour Copyright © 2012 RightSize Solutions Limited. All rights reserved. 6
  • 7. Variety  Unstructured data of all varieties that is hard to store in databases  Text  Social network feeds  Audio (e.g. MP3, radio)  Vehicle telemetry  Video (e.g. CCTV, YouTube)  Financial market data  Click streams  Web page content  Log files (e.g. web logs)  GPS trails  Images (e.g. satellite images)  etc. Copyright © 2012 RightSize Solutions Limited. All rights reserved. 7
  • 8. Why do we need Big Data?  Previously 80% of corporate information was stored on paper  20% was kept in electronic form  At least 80% of that was held in databases  Now 80% of corporate information is in electronic form (Volume)  At least 80% of that is not in a database (Variety) Copyright © 2012 RightSize Solutions Limited. All rights reserved. 8
  • 9. What is driving Big Data?  More data is stored as storage prices drop  Over the last 30 years, storage space per unit cost has doubled roughly every 14 months  Commodity hardware, cloud architectures and open source software  Marketing from large vendors (EMC, IBM, Oracle)  Change of attitude towards leveraging data  Tackle complex problems that previously could not be solved  Monetise your business data Copyright © 2012 RightSize Solutions Limited. All rights reserved. 9
  • 10. Examples  Financial services could examine data sources to determine likely potential sources of financial fraud  Credit and transaction history  Social networking behaviour  Demographics  Voice recordings Copyright © 2012 RightSize Solutions Limited. All rights reserved. 10
  • 11. Examples  Analyse Tweets to understand public sentiment for a new product  Monitor power meter readings to better predict consumption  Examine service desk call detail records in real-time to predict customer churn faster  Analyse real time traffic information to change traffic lights and ease congestion Copyright © 2012 RightSize Solutions Limited. All rights reserved. 11
  • 12. Technology  Apache Hadoop  MapReduce  HDFS  Further Apache projects to support Hadoop  3rd party enterprise class distributions built on Hadoop Copyright © 2012 RightSize Solutions Limited. All rights reserved. 12
  • 13. Apache Hadoop  Distributed processing of large data sets across clusters of computers  Scale to thousands of machines, each offering local computation and storage  Designed to detect and handle failures at the application layer  Delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures  Based on MapReduce and HDFS Copyright © 2012 RightSize Solutions Limited. All rights reserved. 13
  • 14. MapReduce  MapReduce is a system for distributing computation  Created by Google in response to the problem of creating web search indexes  Take a query over a dataset, divide it, and run it in parallel over multiple nodes  Processes and analyses any data type across clusters of commodity servers  Distributing the computation solves the problem of having data too large to fit onto a single node Copyright © 2012 RightSize Solutions Limited. All rights reserved. 14
  • 15. HDFS  A file system that spans all the nodes in a cluster for data storage  Data in a cluster is broken down into smaller blocks and distributed throughout the cluster  The map and reduce functions can be ex-ecuted on smaller subsets of the original data sets  HDFS links together the file systems on many local nodes to make them into one big file system  HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes Copyright © 2012 RightSize Solutions Limited. All rights reserved. 15
  • 16. Apache Hadoop  Apache Hadoop is an open source MapReduce implementation with HDFS  Originally funded by Yahoo  Hadoop is supplemented by further Apache components to enhance its usability and functionality  3rd party distributions have additional functionality and management tools Copyright © 2012 RightSize Solutions Limited. All rights reserved. 16
  • 17. NoSQL Database Management Systems  Do not have to use SQL  May not guarantee atomicity, consistency, isolation, durability  Distributed, fault-tolerant architecture  Various types  Document store databases  Graph databases  Key-value stores  BigTable implementations Copyright © 2012 RightSize Solutions Limited. All rights reserved. 17
  • 18. Apache HBase  HBase is the NoSQL Hadoop database  Runs on top of HDFS  Provides BigTable-like support for Hadoop  Provides random, real-time read/write access to data in Hadoop Copyright © 2012 RightSize Solutions Limited. All rights reserved. 18
  • 19. Apache Hive  Built on top of Hadoop enabling it to operate as a data warehouse  Superimposes structure on data in HDFS on a variety of data formats  Enables ad-hoc analytical queries over the data using a SQL-like syntax  Tools to enable easy data extract/transform/load  Best used for batch jobs over large sets of append-only data (such as web logs) Copyright © 2012 RightSize Solutions Limited. All rights reserved. 19
  • 20. Further Apache projects  ZooKeeper is a centralised service for maintaining configuration information, naming, distributed synchronisation, group services  Pig is a programming language that simplifies the common tasks of working with Hadoop  Sqoop is a tool designed to import data from relational databases into Hadoop, either directly into HDFS or into Hive  Flume is designed to import streaming flows of log data directly into HDFS Copyright © 2012 RightSize Solutions Limited. All rights reserved. 20
  • 21. Benefits of Hadoop  Scalability  Add new nodes as needed  Add nodes without needing to change data formats, how data is loaded, how jobs are written, or the applications on top  Value for money  Massively parallel computing on commodity servers Copyright © 2012 RightSize Solutions Limited. All rights reserved. 21
  • 22. Benefits of Hadoop  Flexibility  Schema-less  Can absorb any type of data, structured or not, from any number of sources  Fault tolerance  If you lose a node, the system redirects work to another location of the data and continues processing Copyright © 2012 RightSize Solutions Limited. All rights reserved. 22
  • 23. Commercial Hadoop Distributions  EMC Greenplum HD  http://www.greenplum.com/products/greenplum-hd/  Microsoft Big Data solution  http://www.microsoft.com/sqlserver/en/us/solutions- technologies/business-intelligence/big-data-solution.aspx  Oracle Big Data Appliance  http://www.oracle.com/us/products/database/big-data- appliance/overview/index.html Copyright © 2012 RightSize Solutions Limited. All rights reserved. 23
  • 24. Commercial Hadoop Distributions  Cloudera CDH  http://www.cloudera.com/hadoop/  IBM InfoSphere BigInsights  http://www-01.ibm.com/software/data/infosphere/biginsights/  Amazon Elastic MapReduce (Amazon EMR)  http://aws.amazon.com/elasticmapreduce/ Copyright © 2012 RightSize Solutions Limited. All rights reserved. 24
  • 25. Capacity Management Challenges  Hard to forecast what new data may be available in the future  Social networks  Social video  Social media  Social location  Augmented reality  What next? Copyright © 2012 RightSize Solutions Limited. All rights reserved. 25
  • 26. Capacity Management Challenges  Forecasting  Will data growth of existing types continue exponentially  What new data types will emerge  Monitoring  Cost of agent-based licencing model for monitoring tools  Lack of standard Hadoop management tools  Response times  Velocity means results are time-critical Copyright © 2012 RightSize Solutions Limited. All rights reserved. 26
  • 27. Capacity Management Challenges  Technology  Understanding new technologies, which are still emerging  Experienced staff principally come from US companies (Google, Yahoo, Facebook etc.)  Crowded marketplace means harder to build expertise Copyright © 2012 RightSize Solutions Limited. All rights reserved. 27
  • 28. Contact  Email  paul.seatonsmith@rightsizesolutions.co.uk  Website  www.rightsizesolutions.co.uk  LinkedIn  http://linkd.in/seatonsmith  LinkedIn Group  http://linkd.in/capacitymanagement Copyright © 2012 RightSize Solutions Limited. All rights reserved. 28
  • 29. References  http://radar.oreilly.com/2012/01/what-is-big-data.html  http://radar.oreilly.com/2012/02/what-is-apache-hadoop.html  http://radar.oreilly.com/2012/01/big-data-ecosystem.html  http://www.theregister.co.uk/2012/01/16/big_data_study/  http://www.mkomo.com/cost-per-gigabyte Copyright © 2012 RightSize Solutions Limited. All rights reserved. 29
  • 30. References  http://www-01.ibm.com/software/data/bigdata/  http://en.wikipedia.org/wiki/NoSQL  http://www.techrepublic.com/blog/cio-insights/big-data-cheat- sheet/39748353  http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_ BigTable Copyright © 2012 RightSize Solutions Limited. All rights reserved. 30