SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Capturing Big Value in Big Data –
How Use Case Segmentation Drives
Solution Design and Technology
Selection at Deutsche Telekom
Jürgen Urbanski
Vice President Cloud & Big Data Architectures & Technologies, T-Systems
Cloud Leadership Team, Deutsche Telekom
Board Member, BITKOM Big Data & Analytics Working Group
Inserting Hadoop in your organization – value
 proposition by buying center / stakeholder
                     IT Infrastructure   IT Applications          LOB             CXO
            Higher
                                                                               New
                                                                                business
                                                                                models
                                                            Faster
                                                             customer
                                                             acquisition
Potential                                                   Better
   value                                  Lower             product
                                           enterprise        development
                                           data             Better quality
                                           warehouse
                                                            Lower churn
                       Lower              cost
                        storage cost                        Lower fraud
                                                            Etc.

            Lower
                     Shorter                                                        Longer
                                                  Time to value


                                                                                             1
Waves of adoption – crossing the chasm
                                                                           Wave 3
                                                  Wave 2            Real-Time Orientation
                                          Interactive Orientation
                       Wave 1
                  Batch Orientation


Adoption          Mainstream,              Early adopters,          Bleeding edge,
today              70% of organizations      20% of organizations      10% of organizations
Example use       Enterprise log file      Forensic analysis          Sensor analysis
cases              analysis                 Analytic modeling          “Twitterscraping”
                  ETL offload              BI user focus              Telematics
                  Active archive                                       Process optimization
                  Fraud detection
                  Clickstream
                   analytics
Response time     Hour(s)                  Minutes                  Seconds
Data              Volume                                             Velocity
characteristic
Architectural     EDW / RDBMS talk         Analytic apps talk       Derived data also
characteristic     to Hadoop                 directly to Hadoop        stored in Hadoop


                                                                                            2
Data warehouse and ETL offload are promising
use cases with immediate ROI
 Data Warehouse Offload
  – Legacy data warehouse costly so can only keep one year of data
  – Older data is stored but “dark,” cannot swim around and explore it
  – With HDFS you could explore it, active archive
  – “Data refinery" where massively parallel processing (MPP) solution is
    saturated performance wise

 ETL Offload
  – ETL may have more than a dozen steps
  – Many can be offloaded to a Hadoop cluster

 Mainframe Offload
  – May have potential

                                                                            3
Big Data is about new application landscapes
               New apps taking advantage of Big Data
                Rapid app development
                Bridges back to legacy systems (wrapping with API, or data integration
                 via federation or data transport)




New data fabrics for a new IT                                                Fast data
 More data                                                                   In real-time
 More sources                                                                In context (what, when,
 More types                                                                    who, where)
 In ONE place                                                                Telemetry / sensor based
 NOSQL databases                                                               (serving humans or
                                                                                machines, where you
                                                                                need to reason over data
                                                                                as it comes in RT)




       These 3 areas need to come together in a platform
        Cloud abstraction (so it can run on any private or public cloud, no lock-in)
        Automated deployment and monitoring (rolling upgrades, no patching)
        Various deployment form factors (on premise as software, on premise as appliance, in the cloud)

                                                                                                       4
Example application landscape
                                          Machine Learning
                        Real Time             (Mahout, etc…)
                         Streams
                          (Social,
                         sensors)

                               Real-Time
                               Processing
                                    (s4, storm,
                                      spark)                       Data Visualization
                                                                      (Excel, Tableau)


      ETL                                Real Time           Interactive                 HIVE
                                         Database             Analytics
                                                               (Impala,
                                           (Shark,                                  Batch
(Informatica, Talend,                                         Greenplum,
Spring Integration)
                                        Gemfire, hBase,
                                                              AsterData,
                                                                                  Processing
                                          Cassandra)                              (Map-Reduce)
                                                              Netezza…)


                                     Structured and Unstructured Data
                                                    (HDFS, MAPR)

                                            Cloud Infrastructure
                          Compute                 Storage          Networking


  Source: Vmware
Reference architecture – high-level view

                 Presentation


                 Application
 Data




                                                 Operations
                                      Security
 Inte-
gration
               Data Processing


              Data Management


            Infrastructure


                                                              6
Reference architecture – component view
    Data                                   Presentation
Integration




                                                                                                                        Workflow and Scheduling
                                                                                         Data Isolation
                 Data Visualization and Reporting                 Clients
 Real Time
 Ingestion
                                            Application

                    Analytics Apps       Transactional Apps      Analytics Middleware
   Batch




                                                                                         Access Management
 Ingestion




                                                                                                                                                    Operations
                                                                                                             Security
                                         Data Processing

   Data                                   Real Time/Stream
                  Batch Processing                               Search and Indexing




                                                                                                                        Management and Monitoring
Connectors                                  Processing


                                        Data Management
 Metadata          Distributed




                                                                                         Data Encryption
 Services                            Distributed      Non-relational        Structured
                    Storage
                                     Processing            DB               In-Memory
                     (HDFS)

                                  Infrastructure
              Virtualization                       Compute / Storage / Network



                                                                                                                                                             7
Questions to ask in designing a solution
for a particular business use case
            Presentation                              What physical infrastructure best fits your needs?
                                                      What are your data placement requirements (service provider
Data         Application




                                        Operations
Inte-

                            Security
gra-
tion      Data Processing                              data centers or on-premise, jurisdiction)?
         Data Management

         Infrastructure                                           Innovation: Cheaper storage
                                                                  but not just storage…
Illustrative acquisition cost                                          ?                 !




SAN Storage                            NAS Filers             Enterprise Class    White Box DAS1)        Data Cloud1)
        3-5€/GB                               1-3€/GB         Hadoop Storage        0.50-1.00€/GB        0.10-0.30€ /GB
                                                                   ???€/GB

Based on HDS                Based on Netapp Based on Netapp                        Hardware can be      Based on large
 SAN Storage                  FAS-Series    E-Series (NOSH)                         self-assembled        scale object
                                                                                                       storage interfaces

   1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions       8
Dat     Presentation
                                                                                                         a




                                                                                                                                          Operations
                                                                                                                Application




                                                                                                                               Security
                                                                                                       Inte

 Questions to ask in designing a solution                                                                -
                                                                                                       gra-
                                                                                                       tion
                                                                                                             Data Processing
                                                                                                            Data Management


 for a particular business use case                                                                         Infrastructure




                  Enterprise Class Hadoop                          Enterprise Class Hadoop
              Packaged ready-to-deploy modular                    Packaged ready-to-deploy modular Hadoop
              Compute / Memory intensive Hadoop cluster           cluster
                Compute intensive applications                    The Data has intrinsic value $$$
                Tic Data Analysis                                 Usable capacity must expand faster than
                Extremely tight Service Level                      compute
                 expectations                                      Higher storage performance
                Severe financial consequences if the              Real human consequences if the system fails
                 analytic run is late                               (Threats, treatments, financial losses)
                                                                   System has to allow for asymmetric growth
Compute
 Power
                                                                  Enterprise Class Hadoop
                      White Box Hadoop                           Bounded Compute algorithm / Memory
                  Values associated with early adopters of       intensive Hadoop cluster
                  Hadoop                                            Compute intensive applications
                                                                    Additional CPUs do not improve run time
                      Social Media Space                           Extremely tight Service Level
                      Contributors to Apache                        expectations
                      Strong bias to JBOD
                                                                    Severe financial consequences if the
                      Skeptical of ALL vendors
                                                                     analytic run is late
                                                                    Need for deeper storage per datanode


                                                      Storage Capacity

 Source: NetApp                                              9
Questions to ask in designing a solution
for a particular business use case
            Presentation                             Do you run your Hadoop cluster bare-metal or virtual? Most
Data         Application                              run bare-metal today but virtualization helps with…

                                       Operations
Inte-

                            Security
gra-
tion      Data Processing                              –   Different failure domains
         Data Management                               –   Different hardware pools
         Infrastructure
                                                       –   Development vs. production

   Three big types of isolation are required for mixing workloads:

                                                               Resource Isolation
                                                                – Control the greedy neighbor
                                 Nosy                           – Reserve resources to meet needs
                                                               Version Isolation
                                                                – Allow concurrent OS, App, Distro versions
        Reckless                                                – For instance, test/dev vs. production, high
                                                                   performance vs. low cost
                                                               Security Isolation
                                                                – Provide privacy between users/groups
                                                                – Runtime and data privacy required


Adapted from: Vmware, see Apache Hadoop on vSphere http://www.vmware.com/de/hadoop/serengeti.html               10
Questions to ask in designing a solution
for a particular business use case
           Presentation                              Which distribution is right for your needs today vs. tomorrow?
                                                     Which distribution will ensure you stay on the main path of
Data        Application




                                      Operations
Inte-

                           Security
gra-
tion     Data Processing                              open source innovation, vs. trap you in proprietary forks?
        Data Management

        Infrastructure

                                       Widely adopted, mature distribution
                                       GTM partners include Oracle, HP, Dell, IBM

                                                  Fully open source distribution (incl. management tools)
                                                  Reputation for cost-effective licensing
                                                  Strong developer ecosystem momentum
                                                  GTM partners include Microsoft, Teradata, Informatica, Talend

                                       More proprietary distribution with features that appeal to some
                                        business critical use cases
                                       GTM partner AWS (M3 and M5 versions only)

                                       Just announced by EMC, very early stage
                                       Differentiator is HAWQ – claims 600x query speed improvement,
                                        full SQL instruction set
Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation.   11
Not shown: Intel, Fujitsu and other distributions
Questions to ask in designing a solution
for a particular business use case

           Presentation                             What data sources could be of value (internal vs. external,
Data
Inte-
            Application               Operations     people vs. machine generated)? Follow data privacy for
                           Security



gra-
tion     Data Processing                             people-generated data.
         Data Management                            How much data volume do you have (entry barrier discussion)
        Infrastructure
                                                     and of what type (structured, semi, unstructured)?
                                                    Data latency requirements (measured in minutes)?


        Hadoop APIs                                   NFS for file-         REST APIs           ODBC (JDBC)
        for Hadoop                                      based               for internet        for SQL-based
        Applications                                  applications            access              applications




                                                                                                                 12
Questions to ask in designing a solution
for a particular business use case
           Presentation                             What type of analytics is required (machine learning,
Data        Application                              statistical analysis)?

                                      Operations
Inte-
                           Security
                                                    How fast do decisions need to be made (decision latency)?
gra-
tion     Data Processing

        Data Management
                                                    Is multi-stage data processing a requirement (before data
        Infrastructure
                                                     gets stored)?
                                                    Do you need stream computing and complex event
                                                     processing (CEP)? If so do you have strict time-based SLAs?
                                                     Is data loss acceptable?
                                                    How often does data get updated and queried (real time vs.
                                                     batch)?
                                                    How tightly coupled are your Hadoop data with existing
                                                     relational data sets?
                                                    Which non-relational DB suits your needs? Hbase and
                                                     Cassandra work natively on HDFS, while Couchbase and
                                                     MongoDB work on copies of the data

                                      Stay focused on what is possible quickly

                                                                                                              13
Innovations: Store first, ask questions later
Data
             Parallel processing (scale out)
           Presentation

            Application




                                         Operations
Inte-




                              Security
gra-
tion     Data Processing

        Data Management
                                                                                          “Hadoop”
        Infrastructure
                                                           High Performance              Ecosystem
                                                                  BI                 Forward-looking
                                              Legacy BI                               predictive analysis
                                                           Quasi-real-time
                                                            analysis                 Questions defined in
                            Backward-looking                                         the moment, using
                             analysis                      Using data out of
  Business                                                  business applications     data from many
                            Using data out of                                        sources
  problem                    business applications



                                                            Selected Vendors
              SAP Business Objects                        Oracle Exadata           Hadoop distributions
              IBM Cognos                                  SAP HANA
  Technology  MicroStrategy
  Solution                                                Data Type/Scalability
                            Structured                    Structured               Structured or
                            Limited (2 – 3 TB in          Limited (2 – 8 TB in      unstructured
                             RAM)                           RAM)                     Unlimited (20 – 30 PB)
                                                                                     „True“ big data
                                                               Legacy vendor definition of big data
Questions to ask in designing a solution
for a particular business use case
           Presentation                             Is backup and recovery critical (number of copies in the
Data        Application                              HDFS cluster)?

                                      Operations
Inte-
                           Security
                                                    Do you need disaster recovery on the raw data?
gra-
tion     Data Processing

        Data Management
                                                    How do you optimize TCO over the life time of a cluster?
        Infrastructure
                                                    How to ensure the cluster remains balanced and performing
                                                     well as the underlying hardware pool becomes
                                                     heterogeneous?
                                                    What are the implications of a migration between different
                                                     distributions or versions of one distribution? Can you do
                                                     rolling upgrades to minimize disruption?
                                                    What level of multi-tenancy do you implement? Even within
                                                     the enterprise, one general purpose Hadoop cluster might
                                                     serve different legal entities / BUs.
                                                    How do you bring along existing talent? E.g., train developers
                                                     on Pig, database admins on Hive, IT operations on the
                                                     platform



                                                                                                                 15
Navigating the broader BI and big data vendor
ecosystem can be confusing
Do you really need Hadoop?
 Is your data structured and less than 10 TB?
 Is your data structured, less than 100 TB but tightly integrated with
  your existing data?
 Is your data structured, more than 100 TB but processing has to
  occur real-time with less than a minute of latency?*

        Then you could stay with legacy BI landscapes
            including RDBMS, MPP DB and EDW

                                 Otherwise


              Come and join us on a journey into
                  Hadoop based solutions!

 * Hadoop is making rapid progress in the real-time arena             17
ILLUSTRATIVE
Use Hadoop for VOLUME                                      NOT EXHAUSTIVE



 You require parallel / complex data processing power
  and you can live with minutes or more of latency to derive reports
 You need data storage and indexing for analytic applications


   Platform




   Data                                         MapReduce
   Transformation
ILLUSTRATIVE
Use Hadoop for VARIETY                                                                            NOT EXHAUSTIVE


 Your data is multi-structured
 You want to derive reports in batch on full data sets
 You have complex data flows or multi-stage data pipelines

    Workflow Mgt.


    Data                                                                         MapReduce
    Transformation

  Data Visualization
   and Reporting

    Low Latency
    Data Access*



 * Hbase and Cassandra work natively on HDFS, while Couchbase and MongoDB work on copies of the data             19
ILLUSTRATIVE
Use Hadoop for VELOCITY                                     NOT EXHAUSTIVE




 You are inundated with a flood of real-time data: Numerous live
  feeds from multiple data sources like machines, business systems
  or Internet sources
    Data                                                  Apache Kafka
    Ingestion

 You want to derive reports in (near) real time on a sample or full
  data sets

  Data Visualization
   and Reporting
                                               Shark

    Fast Analytics*


                                                                       20
 * May also use MPP database
Where to start inserting Hadoop in your
company? A call to action…
 IT Infrastructure IT Applications         LOB                CXO
    Accelerating implementation        Understanding Big Data
      – Solution design driven by        – Definition
         target use cases                – Benefits over adjacent and
      – Reference architecture              legacy technologies
      – Technology selection and         – Current mode vs. future
         POC                                mode for analytics
      – Implementation lessons          Assessing the Economic
         learnt                          Potential
                                         – Target use cases by
                                            function and industry
                                         – Best approach to adoption
     Puddles, pools                          Lakes, oceans
     AVOID: Systems separated by             GOAL: Platform that natively
     workload type due to contention         supports mixed workloads, shared
                                             service
                                                                                21

Weitere ähnliche Inhalte

Was ist angesagt?

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning EverywhereDataWorks Summit
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersCloudera, Inc.
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Cloudera, Inc.
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 

Was ist angesagt? (20)

Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 

Andere mochten auch

Viafoura's Big Data Use Case
Viafoura's Big Data Use CaseViafoura's Big Data Use Case
Viafoura's Big Data Use CaseVictor Anjos
 
Big data meets big analytics
Big data meets big analyticsBig data meets big analytics
Big data meets big analyticsDeepak Ramanathan
 
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Cloudera, Inc.
 
An Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseAn Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseRaul Chong
 
IT Project Portfolio Planning Using Excel
IT Project Portfolio Planning Using ExcelIT Project Portfolio Planning Using Excel
IT Project Portfolio Planning Using ExcelJerry Bishop
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 

Andere mochten auch (9)

Viafoura's Big Data Use Case
Viafoura's Big Data Use CaseViafoura's Big Data Use Case
Viafoura's Big Data Use Case
 
Big data meets big analytics
Big data meets big analyticsBig data meets big analytics
Big data meets big analytics
 
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
 
An Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use caseAn Intro to Text Analytics on Big Data with a use case
An Intro to Text Analytics on Big Data with a use case
 
IT Project Portfolio Planning Using Excel
IT Project Portfolio Planning Using ExcelIT Project Portfolio Planning Using Excel
IT Project Portfolio Planning Using Excel
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 

Ähnlich wie Don't be Hadooped when looking for Big Data ROI

Experiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleExperiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleDataWorks Summit
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsJ. David Morris
 
2013 storage prediction hds hong kong
2013 storage prediction hds hong kong2013 storage prediction hds hong kong
2013 storage prediction hds hong kongAndrew Wong
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Mindtree Ltd.
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data WarehouseHadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehousetervela
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupIBMInfoSphereUGFR
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandRichard McDougall
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Amazon Web Services
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Wed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorWed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorDATAVERSITY
 

Ähnlich wie Don't be Hadooped when looking for Big Data ROI (20)

Experiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleExperiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte Scale
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
 
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics PrezoCetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
 
2013 storage prediction hds hong kong
2013 storage prediction hds hong kong2013 storage prediction hds hong kong
2013 storage prediction hds hong kong
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data WarehouseHadoop, Big Data, and the Future of the Enterprise Data Warehouse
Hadoop, Big Data, and the Future of the Enterprise Data Warehouse
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroup
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012Planning the Migration to the Cloud - AWS India Summit 2012
Planning the Migration to the Cloud - AWS India Summit 2012
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Wed 1130 aasman_jans_color
Wed 1130 aasman_jans_colorWed 1130 aasman_jans_color
Wed 1130 aasman_jans_color
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Don't be Hadooped when looking for Big Data ROI

  • 1. Capturing Big Value in Big Data – How Use Case Segmentation Drives Solution Design and Technology Selection at Deutsche Telekom Jürgen Urbanski Vice President Cloud & Big Data Architectures & Technologies, T-Systems Cloud Leadership Team, Deutsche Telekom Board Member, BITKOM Big Data & Analytics Working Group
  • 2. Inserting Hadoop in your organization – value proposition by buying center / stakeholder IT Infrastructure IT Applications LOB CXO Higher  New business models  Faster customer acquisition Potential  Better value  Lower product enterprise development data  Better quality warehouse  Lower churn  Lower cost storage cost  Lower fraud  Etc. Lower Shorter Longer Time to value 1
  • 3. Waves of adoption – crossing the chasm Wave 3 Wave 2 Real-Time Orientation Interactive Orientation Wave 1 Batch Orientation Adoption  Mainstream,  Early adopters,  Bleeding edge, today 70% of organizations 20% of organizations 10% of organizations Example use  Enterprise log file  Forensic analysis  Sensor analysis cases analysis  Analytic modeling  “Twitterscraping”  ETL offload  BI user focus  Telematics  Active archive  Process optimization  Fraud detection  Clickstream analytics Response time  Hour(s)  Minutes  Seconds Data  Volume  Velocity characteristic Architectural  EDW / RDBMS talk  Analytic apps talk  Derived data also characteristic to Hadoop directly to Hadoop stored in Hadoop 2
  • 4. Data warehouse and ETL offload are promising use cases with immediate ROI  Data Warehouse Offload – Legacy data warehouse costly so can only keep one year of data – Older data is stored but “dark,” cannot swim around and explore it – With HDFS you could explore it, active archive – “Data refinery" where massively parallel processing (MPP) solution is saturated performance wise  ETL Offload – ETL may have more than a dozen steps – Many can be offloaded to a Hadoop cluster  Mainframe Offload – May have potential 3
  • 5. Big Data is about new application landscapes New apps taking advantage of Big Data  Rapid app development  Bridges back to legacy systems (wrapping with API, or data integration via federation or data transport) New data fabrics for a new IT Fast data  More data  In real-time  More sources  In context (what, when,  More types who, where)  In ONE place  Telemetry / sensor based  NOSQL databases (serving humans or machines, where you need to reason over data as it comes in RT) These 3 areas need to come together in a platform  Cloud abstraction (so it can run on any private or public cloud, no lock-in)  Automated deployment and monitoring (rolling upgrades, no patching)  Various deployment form factors (on premise as software, on premise as appliance, in the cloud) 4
  • 6. Example application landscape Machine Learning Real Time (Mahout, etc…) Streams (Social, sensors) Real-Time Processing (s4, storm, spark) Data Visualization (Excel, Tableau) ETL Real Time Interactive HIVE Database Analytics (Impala, (Shark, Batch (Informatica, Talend, Greenplum, Spring Integration) Gemfire, hBase, AsterData, Processing Cassandra) (Map-Reduce) Netezza…) Structured and Unstructured Data (HDFS, MAPR) Cloud Infrastructure Compute Storage Networking Source: Vmware
  • 7. Reference architecture – high-level view Presentation Application Data Operations Security Inte- gration Data Processing Data Management Infrastructure 6
  • 8. Reference architecture – component view Data Presentation Integration Workflow and Scheduling Data Isolation Data Visualization and Reporting Clients Real Time Ingestion Application Analytics Apps Transactional Apps Analytics Middleware Batch Access Management Ingestion Operations Security Data Processing Data Real Time/Stream Batch Processing Search and Indexing Management and Monitoring Connectors Processing Data Management Metadata Distributed Data Encryption Services Distributed Non-relational Structured Storage Processing DB In-Memory (HDFS) Infrastructure Virtualization Compute / Storage / Network 7
  • 9. Questions to ask in designing a solution for a particular business use case Presentation  What physical infrastructure best fits your needs?  What are your data placement requirements (service provider Data Application Operations Inte- Security gra- tion Data Processing data centers or on-premise, jurisdiction)? Data Management Infrastructure Innovation: Cheaper storage but not just storage… Illustrative acquisition cost ? ! SAN Storage NAS Filers Enterprise Class White Box DAS1) Data Cloud1) 3-5€/GB 1-3€/GB Hadoop Storage 0.50-1.00€/GB 0.10-0.30€ /GB ???€/GB Based on HDS Based on Netapp Based on Netapp Hardware can be Based on large SAN Storage FAS-Series E-Series (NOSH) self-assembled scale object storage interfaces 1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions 8
  • 10. Dat Presentation a Operations Application Security Inte Questions to ask in designing a solution - gra- tion Data Processing Data Management for a particular business use case Infrastructure Enterprise Class Hadoop Enterprise Class Hadoop Packaged ready-to-deploy modular Packaged ready-to-deploy modular Hadoop Compute / Memory intensive Hadoop cluster cluster  Compute intensive applications  The Data has intrinsic value $$$  Tic Data Analysis  Usable capacity must expand faster than  Extremely tight Service Level compute expectations  Higher storage performance  Severe financial consequences if the  Real human consequences if the system fails analytic run is late (Threats, treatments, financial losses)  System has to allow for asymmetric growth Compute Power Enterprise Class Hadoop White Box Hadoop Bounded Compute algorithm / Memory Values associated with early adopters of intensive Hadoop cluster Hadoop  Compute intensive applications  Additional CPUs do not improve run time  Social Media Space  Extremely tight Service Level  Contributors to Apache expectations  Strong bias to JBOD  Severe financial consequences if the  Skeptical of ALL vendors analytic run is late  Need for deeper storage per datanode Storage Capacity Source: NetApp 9
  • 11. Questions to ask in designing a solution for a particular business use case Presentation  Do you run your Hadoop cluster bare-metal or virtual? Most Data Application run bare-metal today but virtualization helps with… Operations Inte- Security gra- tion Data Processing – Different failure domains Data Management – Different hardware pools Infrastructure – Development vs. production Three big types of isolation are required for mixing workloads:  Resource Isolation – Control the greedy neighbor Nosy – Reserve resources to meet needs  Version Isolation – Allow concurrent OS, App, Distro versions Reckless – For instance, test/dev vs. production, high performance vs. low cost  Security Isolation – Provide privacy between users/groups – Runtime and data privacy required Adapted from: Vmware, see Apache Hadoop on vSphere http://www.vmware.com/de/hadoop/serengeti.html 10
  • 12. Questions to ask in designing a solution for a particular business use case Presentation  Which distribution is right for your needs today vs. tomorrow?  Which distribution will ensure you stay on the main path of Data Application Operations Inte- Security gra- tion Data Processing open source innovation, vs. trap you in proprietary forks? Data Management Infrastructure  Widely adopted, mature distribution  GTM partners include Oracle, HP, Dell, IBM  Fully open source distribution (incl. management tools)  Reputation for cost-effective licensing  Strong developer ecosystem momentum  GTM partners include Microsoft, Teradata, Informatica, Talend  More proprietary distribution with features that appeal to some business critical use cases  GTM partner AWS (M3 and M5 versions only)  Just announced by EMC, very early stage  Differentiator is HAWQ – claims 600x query speed improvement, full SQL instruction set Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation. 11 Not shown: Intel, Fujitsu and other distributions
  • 13. Questions to ask in designing a solution for a particular business use case Presentation  What data sources could be of value (internal vs. external, Data Inte- Application Operations people vs. machine generated)? Follow data privacy for Security gra- tion Data Processing people-generated data. Data Management  How much data volume do you have (entry barrier discussion) Infrastructure and of what type (structured, semi, unstructured)?  Data latency requirements (measured in minutes)? Hadoop APIs NFS for file- REST APIs ODBC (JDBC) for Hadoop based for internet for SQL-based Applications applications access applications 12
  • 14. Questions to ask in designing a solution for a particular business use case Presentation  What type of analytics is required (machine learning, Data Application statistical analysis)? Operations Inte- Security  How fast do decisions need to be made (decision latency)? gra- tion Data Processing Data Management  Is multi-stage data processing a requirement (before data Infrastructure gets stored)?  Do you need stream computing and complex event processing (CEP)? If so do you have strict time-based SLAs? Is data loss acceptable?  How often does data get updated and queried (real time vs. batch)?  How tightly coupled are your Hadoop data with existing relational data sets?  Which non-relational DB suits your needs? Hbase and Cassandra work natively on HDFS, while Couchbase and MongoDB work on copies of the data Stay focused on what is possible quickly 13
  • 15. Innovations: Store first, ask questions later Data Parallel processing (scale out) Presentation Application Operations Inte- Security gra- tion Data Processing Data Management “Hadoop” Infrastructure High Performance Ecosystem BI  Forward-looking Legacy BI predictive analysis  Quasi-real-time analysis  Questions defined in  Backward-looking the moment, using analysis  Using data out of Business business applications data from many  Using data out of sources problem business applications Selected Vendors  SAP Business Objects  Oracle Exadata  Hadoop distributions  IBM Cognos  SAP HANA Technology  MicroStrategy Solution Data Type/Scalability  Structured  Structured  Structured or  Limited (2 – 3 TB in  Limited (2 – 8 TB in unstructured RAM) RAM)  Unlimited (20 – 30 PB) „True“ big data Legacy vendor definition of big data
  • 16. Questions to ask in designing a solution for a particular business use case Presentation  Is backup and recovery critical (number of copies in the Data Application HDFS cluster)? Operations Inte- Security  Do you need disaster recovery on the raw data? gra- tion Data Processing Data Management  How do you optimize TCO over the life time of a cluster? Infrastructure  How to ensure the cluster remains balanced and performing well as the underlying hardware pool becomes heterogeneous?  What are the implications of a migration between different distributions or versions of one distribution? Can you do rolling upgrades to minimize disruption?  What level of multi-tenancy do you implement? Even within the enterprise, one general purpose Hadoop cluster might serve different legal entities / BUs.  How do you bring along existing talent? E.g., train developers on Pig, database admins on Hive, IT operations on the platform 15
  • 17. Navigating the broader BI and big data vendor ecosystem can be confusing
  • 18. Do you really need Hadoop?  Is your data structured and less than 10 TB?  Is your data structured, less than 100 TB but tightly integrated with your existing data?  Is your data structured, more than 100 TB but processing has to occur real-time with less than a minute of latency?* Then you could stay with legacy BI landscapes including RDBMS, MPP DB and EDW Otherwise Come and join us on a journey into Hadoop based solutions! * Hadoop is making rapid progress in the real-time arena 17
  • 19. ILLUSTRATIVE Use Hadoop for VOLUME NOT EXHAUSTIVE  You require parallel / complex data processing power and you can live with minutes or more of latency to derive reports  You need data storage and indexing for analytic applications Platform Data MapReduce Transformation
  • 20. ILLUSTRATIVE Use Hadoop for VARIETY NOT EXHAUSTIVE  Your data is multi-structured  You want to derive reports in batch on full data sets  You have complex data flows or multi-stage data pipelines Workflow Mgt. Data MapReduce Transformation Data Visualization and Reporting Low Latency Data Access* * Hbase and Cassandra work natively on HDFS, while Couchbase and MongoDB work on copies of the data 19
  • 21. ILLUSTRATIVE Use Hadoop for VELOCITY NOT EXHAUSTIVE  You are inundated with a flood of real-time data: Numerous live feeds from multiple data sources like machines, business systems or Internet sources Data Apache Kafka Ingestion  You want to derive reports in (near) real time on a sample or full data sets Data Visualization and Reporting Shark Fast Analytics* 20 * May also use MPP database
  • 22. Where to start inserting Hadoop in your company? A call to action… IT Infrastructure IT Applications LOB CXO  Accelerating implementation  Understanding Big Data – Solution design driven by – Definition target use cases – Benefits over adjacent and – Reference architecture legacy technologies – Technology selection and – Current mode vs. future POC mode for analytics – Implementation lessons  Assessing the Economic learnt Potential – Target use cases by function and industry – Best approach to adoption Puddles, pools Lakes, oceans AVOID: Systems separated by GOAL: Platform that natively workload type due to contention supports mixed workloads, shared service 21

Hinweis der Redaktion

  1. Automated deployment and monitoring. The cloud infrastructure has to provide 10 “verbs” so that the apps don't have to know anything about the infrastructure. Philosophy is No patching, rolling upgrades, constantly compares what the app needs with what the cloud provides
  2. Presentation Layer: Application Layer:Data Processing Layer: Infrastructure Layer: Data Ingestition Layer:Security Layer:Management & Monitoring LayerAmbari: Apache Ambari is a monitoring, administration and lifecycle management project for Apache Hadoop clusters. Hadoop clusters require many inter-related components that must be installed, configured, and managed across the entire cluster. Zookeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ZooKeeper is utilized significantly by many distributed applications such as HBase. HBase: HBase is the distributed Hadoop database, scalable and able to collect and store big data volumes on HDFS. This class of database is often categorized as NoSQL (Not only SQL). Pig: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Hive: Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. HCatalog: Apache HCatalog is a table and storage management service for data created using Apache Hadoop; this provides deep integration into Enterprise Data Warehouses (E.G. Teradata) and with Data Integration tools such as Talend. MapReduce: HadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. HDFS: Hadoop Distributed File System is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid parallel computations. • Talend Open Studio for Big Data: 100% Open Source Code Generator for Graphical User Interface used for Extract Transfer Load, Extract Load Transfer for data movement, cleansing in and out of Hadoop. Data Integration Services – HDP integrates Talend Open Studio for Big Data, the leading open source data integration platform for Apache Hadoop. Included is a visual development environment and hundreds of pre-built connectors to leading applications that allow you to connect to any data source without writing code. Centralized Metadata Services – HDP includes HCatalog, a metadata and table management system that simplifies data sharing both between Hadoop applications running on the platform and between Hadoop and other enterprise data systems. HDP’s open metadata infrastructure also enables deep integration with third-party tools.
  3. Presentation Layer: Application Layer:Data Processing Layer: Infrastructure Layer: Data Ingestition Layer:Security Layer:Management & Monitoring LayerAmbari: Apache Ambari is a monitoring, administration and lifecycle management project for Apache Hadoop clusters. Hadoop clusters require many inter-related components that must be installed, configured, and managed across the entire cluster. Zookeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ZooKeeper is utilized significantly by many distributed applications such as HBase. HBase: HBase is the distributed Hadoop database, scalable and able to collect and store big data volumes on HDFS. This class of database is often categorized as NoSQL (Not only SQL). Pig: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Hive: Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. HCatalog: Apache HCatalog is a table and storage management service for data created using Apache Hadoop; this provides deep integration into Enterprise Data Warehouses (E.G. Teradata) and with Data Integration tools such as Talend. MapReduce: HadoopMapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. HDFS: Hadoop Distributed File System is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid parallel computations. • Talend Open Studio for Big Data: 100% Open Source Code Generator for Graphical User Interface used for Extract Transfer Load, Extract Load Transfer for data movement, cleansing in and out of Hadoop. Data Integration Services – HDP integrates Talend Open Studio for Big Data, the leading open source data integration platform for Apache Hadoop. Included is a visual development environment and hundreds of pre-built connectors to leading applications that allow you to connect to any data source without writing code. Centralized Metadata Services – HDP includes HCatalog, a metadata and table management system that simplifies data sharing both between Hadoop applications running on the platform and between Hadoop and other enterprise data systems. HDP’s open metadata infrastructure also enables deep integration with third-party tools.