SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Co-Creating COMPetitive intelliGENCE through

   Process, Data and Domain driven Information Excellence


                    Hadoop and No SQL
                     PRESENTED in TDWI India, Hyderabad (2011 July)



                                                     Nagaraj Kulkarni
Hadoop and No SQL                    Slide 1                          2011 Jul
Process, Data and Domain Integrated Approach

                                         Market
                                         Actions                                     Decision Excellence
                      Actionable                       Systemic                      Competitive Advantage lies in
                                                       Changes                       the exploitation of:

            Usable                                                Business
                                         Process                 Landscape           –More detailed and specific
                                                                                     information
                                                                                     –More comprehensive
                                                                        Business     external data & dependencies
       Timely            Infor
                                                           Systems       Intent      –Fuller integration
                        mation

                                                                                     –More in depth analysis
                                                                        Business     –More insightful plans and
      Flexible                                                                       strategies
                                                                         Usage


                              Domain                Data                             –More rapid response to
           Scalable                                                                  business events
                                                                     Cost
                                                                                     –More precise and apt
                                                                                     response to customer events
                      Sustainable
                                                     Effort
                                         Skills &
                                       Competency


Ram Charan’s Book: What The CEO Wants You To Know: How Your Company Really Works
COMPEGENCE
Hadoop and No SQL                                    Slide 2
                                                         2                         Information Excellence Foundation
                                                                                                          2011 Jul
Touching Upon

      Context For Big Data Challenges

      Data Base Systems – Pre Hadoop Strengths and Limitations

      What is Scale, Why No SQL

      Think Hadoop, Hadoop Eco system

      Think Map Reduce

      Nail Down Map Reduce

      Think GRID (Distributed Architecture)

      Deployment Options

      Map Reduce Not and Map Reduce Usages

      Nail Down HDFS and GRID Architecture


Hadoop and No SQL                     Slide 3                    2011 Jul
Big Data Context




Hadoop and No SQL         Slide 4      2011 Jul
Systemic Changes


                                Boundary less ness
       Connected                Best Sourcing
         Globe                  Interlinked Culture




                                Demand Side Focus
        Customer
         Centric                Bottom Up Innovation
                                Empowered employees


                                Leading Trends
        Agility and             Responsiveness
      Response Time             Speed, Agility, Flexibility


Hadoop and No SQL     Slide 5                            2011 Jul
Landscape To Address


                                 Manageability
          Data                   Scalability
        Explosion                Performance




                                 Agility
       Information
         Overload                Decision Making
                                 Time to Action


       Interlinked               Boundaryless
       Processes                 Systemic Understanding
                                 Collaborate and Synergize
       & Systems
                                 Simplify and Scale
Hadoop and No SQL      Slide 6                        2011 Jul
Information Overload




                                                A wealth of
                                                information
                                                creates
                                                a poverty of
                                                attention.

                                                Herbert Simon,
                                                Nobel Laureate Economist
Hadoop and No SQL Confidential
COMPEGENCE                        Slide 7   7                    2011 Jul
More Touch points, More Channels




                    BACKUP




                                              Source: JupiterResearch (7/08)
                                              © 2008 JupiterResearch, LLC

Hadoop and No SQL          Slide 8                             2011 Jul
Scale – What is it?




Hadoop and No SQL           Slide 9       2011 Jul
How do we scale


   Traditional System - How they achieve Scalability
       Multi Threading

       Multiple CPU – Parallel Processing

       Distributed Programming – SMP & MPP

       ETL Load Distribution – Assigning jobs to different nodes

       Improved Throughput




Hadoop and No SQL                                Slide 10           2011 Jul
Scale – What is it about?


         Facebook                              1.73 Billion Internet Users
   500 Million Active            eBay
   Users per Month        90 Million Active    247 Billion emails per day
                          Users
   500 Billion+ Page                           126 Million Blogs
   Views per month        10 Billion
                          Requests per day     5 Billion Facebook Content
   25 Billion+ Content                         per week
   per month              220 million+ items
                          on sale              50 Million Tweets per day
   15 TB New Data / day
   1200 m/cs, 21 PB       40 TB + / day        80% of this data is
   Cluster                40 PB of Data        unstructured

          Yahoo                  Twitter       Estimated 800 GB of data
   82 PB of Data          1 TB plus / day      per user (million Petabyte!)
   25000+ nodes           80 + nodes


Hadoop and No SQL                 Slide 11                              2011 Jul
How do we scale – Think Numbers
     Thinking of Scale - Need for Grid
     Think Numbers




                        Data Highway
                                                                     1000 Nodes / DC
                                                         Datamart


       lb
             n1 -nn    100 mps/pipe                                  10 DC
                                       Log storage
      bp
                                        & processing


                                        dc 2
                                                                     1K byte webserver log record
      dc 1
              Web server Log
                                                         Datamart
                                                                     1 second / row
      ……….
      ……….




     In one day

                1000 * 10 * 1K * 60 * 60 * 24 = 864 GB
     Storage for a year
                                                       864 GB * 365 = 315 TB

                                            To store 1 PB – 40K * 1000 = Millions $
                                            To process 1 TB = 1000 minutes ~ 17 hrs

                                                                                Think Agility and Flexibility
Hadoop and No SQL                                                    Slide 12                          2011 Jul
Scale – What is it about?




              Volume
              Speed
              Integration level
              more…




                                             Does it scale linearly
                                             with data size and
                                             analysis complexity

Hadoop and No SQL                 Slide 13                            2011 Jul
We would not have no issues…

         If the following assumptions Hold Good:

         The network is reliable.
         Latency is zero.
         Bandwidth is infinite.
         The network is secure.
         Topology doesn't change.
         There is one administrator.
         Transport cost is zero.
         The network is homogeneous.



Hadoop and No SQL          Slide 14                 2011 Jul
Think Hadoop




Hadoop and No SQL       Slide 15   2011 Jul
New Paradigm: Go Back to Basics
    Divide and Conquer (Divide and Delegate and Get Done)
    Move Work or Workers ?
    Relax Constraints (Pre defined data models)
    Expect and Plan for Failures (avoid n address failures)
    Community backup
    Assembly Line Processing
         (Scale, Speed, Efficiency, Commodity Worker)
    The “For loop”
    Parallelization (trivially parallelizable)
    Infrastructure and Supervision (Grid Architecture)
    Manage Dependencies
    Ignore the Trivia (Trivia is relative!)
                                                  Joel Spolsky
Charlie Munger’s Mental Models                    http://www.joelonsoftware.com/items/2006/08/01.html
Hadoop and No SQL                      Slide 16                                          2011 Jul
New Paradigm: Go Back to Basics

    Map Reduce Paradigm                           Grid Architecture

    Divide and Conquer                           Split and Delegate
    The “for loop”                               Move Work or Workers
    Sort and Shuffle                             Expect and Plan for Failures
    Parallelization (trivially parallelizable)   Assembly Line Processing (Scale,
    Relax Data Constraints                       Speed, Efficiency, Commodity Worker)


    Assembly Line Processing Scale,              Manage Dependencies and Failures
    Speed, Efficiency, Commodity Worker)          Ignore the Trivia (Trivia is relative!)

     Map Reduce History                           Replication, Redundancy,
            Lisp                                  Heart Beat Check, Cluster rebalancing,
            Unix                                  Fault Tolerance, Task Restart,
            Google FS                             Chaining of jobs (Dependencies),
                                                  Graceful Restart,
                                                  Look Ahead or Speculative execution,

Hadoop and No SQL                          Slide 17                                     2011 Jul
No SQL Options


      Hbase/Cassandra for huge data volumes- PBs.
      •Hbase fits in well where Hadoop is already being used.
      •Cassandra less cumbersome to install/manage

      MongoDB/CouchDB
      Document oriented databases for easy use and GB-TB
      volumes. Might be problematic at PB scales

      Neo4j like graph databases
      for managing relationship oriented applications- nodes and
      edges

      Riak, redis, membase like Simple key-value databases
      for huge distributed in-memory hash maps

Hadoop and No SQL             Slide 18                      2011 Jul
Let us Think Hadoop




Hadoop and No SQL    Slide 19             2011 Jul
RDBMS and Hadoop


                    RDBMS             MapReduce
        Data size   Gigabytes         Petabytes
                    Interactive and
        Access                        Batch
                    batch
                                      Unstructured
        Structure   Fixed schema
                                      schema
                                      Procedural (Java,
        Language    SQL
                                      C++, Ruby, etc)
        Integrity   High              Low
        Scaling     Nonlinear         Linear
                                      Write once, read
        Updates     Read and write
                                      many times
        Latency     Low               High
Hadoop and No SQL         Slide 20                        2011 Jul
Apache Hadoop Ecosystem

    Hadoop Common: The common utilities that support the other Hadoop
    subprojects.
    HDFS: A distributed file system that provides high throughput access to
    application data.
    MapReduce: A software framework for distributed processing of large data sets
    on compute clusters.
    Pig: A high-level data-flow language and execution framework for parallel
    computation.
    HBase / Flume / Scribe: A scalable, distributed database that supports
    structured data storage for large tables.
    Hive: A data warehouse infrastructure that provides data summarization and ad
    hoc querying.
    ZooKeeper: A high-performance coordination service for distributed
    applications.
    Flume: Message Que Processing
    Mahout: scalable Machine Learning algorithms using Hadoop
    Chukwa: A data collection system for managing large distributed systems.
Hadoop and No SQL                      Slide 21                                 2011 Jul
Apache Hadoop Ecosystem



                                 ETL Tools        BI Reporting      RDBMS
       Zookeepr (Coordination)


                                 Pig (Data Flow) Hive (SQL)         Sqoop




                                                                               Avro (Serialization)
                                 MapReduce (Job Scheduling/Execution System)

                                 HBase (Key-Value store)


                                 HDFS
                                 (Hadoop Distributed File System)




Hadoop and No SQL                                    Slide 22                         2011 Jul
HDFS – The BackBone

                    Hadoop Distributed File System




Hadoop and No SQL               Slide 23             2011 Jul
Map Reduce – The New Paradigm
                                 Transforming Large Data




        MapReduce Basics

             •Functional Programming
                                                     Mappers
             •List Processing

             •Mapping Lists



                                                    Reducers




Hadoop and No SQL                      Slide 24                2011 Jul
PIG – Help the Business User Query
Pig: Data-aggregation functions over semi-structured data (log files).



 Pig Latin Programs               Query Parser                 Logical Plan


                               Semantic Checking               Logical Plan


                                Logical Optimizer           Optimized Logical Plan


                           Logical to Physical Translator         Physical Plan


                             Physical To M/R Translator          MapReduce Plan



                               Map Reduce Launcher


                Create a job jar to be submitted to Hadoop cluster
Hadoop and No SQL                    Slide 25                              2011 Jul
PIG Latin Example




Hadoop and No SQL   Slide 26            2011 Jul
HBASE – Scalable Columnar


     •   Scalable, Reliable, Distributed DB
     •   Columnar Structure
     •   Built on top of HDFS
     •   Map Reduceable

     • A SQL Database!
        – No joins
        – No sophisticated query engine
        – No transactions
        – No column typing
        – No SQL, no ODBC/JDBC, etc.

     • Not a replacement for your RDBMS...


Hadoop and No SQL                Slide 27              2011 Jul
HIVE – SQL Like


  • A high level interface on Hadoop for managing and
    querying structured data
     • Interpreted as Map-Reduce jobs for execution
     • Uses HDFS for storage
     • Uses Metadata representation over hdfs files



  • Key Building Principles:
     • Familiarity with SQL
     • Performance with help of built-in optimizers
     • Enable Extensibility – Types, Functions, Formats,
       Scripts


Hadoop and No SQL            Slide 28                      2011 Jul
FLUME – Distributed Data Collection


         • Distributed Data / Log Collection Service
         • Scalable, Configurable, Extensible
         • Centrally Manageable


         • Agents fetch data from apps, Collectors save it
         • Abstrations: Source -> Decrator(s) -> Sink




Hadoop and No SQL               Slide 29                     2011 Jul
Oozie – Workflow Management

     An Oozie Workflow
                                                            M/R
                                                         streaming   OK
                                                             job


                    SSH     OK
        start       HOD               fork                                  join
                    Alloc



                                                           Pig                     MORE
                                                                     OK                          decision
                                                           job
                    ERROR
                                   ERROR
                                                                               M/R                   ENOUGH
                                      ERROR                                    job

                                           ERROR
                            kill                                             OK


                                             ERROR

                                       ERROR
                                                                                     Java Main




                                                             OK       FS             OK
                                                   end
                                                                      job




Hadoop and No SQL                                    Slide 30                                                 2011 Jul
Think Map n Reduce




Hadoop and No SQL          Slide 31      2011 Jul
Understanding Map Reduce Paradigm

       Logical Architecture




Hadoop and No SQL                Slide 32                    2011 Jul
Understanding Map Reduce Paradigm




Hadoop and No SQL           Slide 33                2011 Jul
Map Reduce Paradigm


   Job
            Configure the Hadoop Job to run.

   Mapper
            map(LongWritable key, Text value, Context context)

   Reducer
            reduce(Text key, Iterable<IntWritable> values, Context context)




Hadoop and No SQL                     Slide 34                                2011 Jul
Programming model
       Map –Reduce Definition

       MapReduce is a

       functional programming model and an

       associated implementation model

       for processing and generating large data sets.

       Users specify a map function that processes a key/value pair
       to generate a set of intermediate key/value pairs,

       and

       a reduce function that merges all intermediate values associated
       with the same intermediate key.

       Many real world tasks are expressible in this model.
                                                                          CONCEPTS

Hadoop and No SQL                      Slide 35                               2011 Jul
Programming model


    Input & Output: each a set of key/value pairs

    Programmer specifies two functions:

    map (in_key, in_value) -> list(out_key, intermediate_value)
       •Processes input key/value pair
       •Produces set of intermediate pairs

    reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all
    intermediate values for a particular key

             •Produces a set of merged output values (usually just one)
             •Inspired by similar primitives in LISP and other languages




Hadoop and No SQL                       Slide 36                                 2011 Jul
Map Reduce Paradigm


  Word Count Example

  A simple MapReduce program can be written to determine how many times different words
  appear in a set of files.

  What does Mapper and Reducer do?

  Pseudo Code:

   mapper (filename, file-contents):
    for each word in file-contents:
     emit (word, 1)



   reducer (word, values):
    sum = 0
    for each value in values:
     sum = sum + value
    emit (word, sum)




Hadoop and No SQL                        Slide 37                                   2011 Jul
Programming model


    Example: Count word occurrences

    map(String input_key, String input_value):
            // input_key: document name
            // input_value: document contents
            for each word w in input_value:
                      EmitIntermediate(w, "1");

    reduce(String output_key, Iterator intermediate_values):
            // output_key: a word
            // output_values: a list of counts
            int result = 0;
            for each v in intermediate_values:
                       result += ParseInt(v);
            Emit(AsString(result));


    Pseudocode: See appendix in paper for real code

Hadoop and No SQL                      Slide 38                2011 Jul
Understanding Map Reduce Paradigm

        Map – Reduce Execution Recap

    •   Master-Slave architecture

    •   Master: JobTracker
         – Accepts MR jobs submitted by users

         – Assigns Map and Reduce tasks to TaskTrackers (slaves)

         – Monitors task and TaskTracker status, re-executes tasks upon failure

    •   Worker: TaskTrackers
         – Run Map and Reduce tasks upon instruction from the Jobtracker

         – Manage storage and transmission of intermediate output



Hadoop and No SQL                     Slide 39                                2011 Jul
Understanding Map Reduce Paradigm

       Map – Reduce Paradigm Recap

         Example of map functions –
                    Individual Count, Filter, Transformation, Sort, Pig load

         Example of reduce functions –
                    Group Count, Sum, Aggregator

         A job can have many map and reducers functions.




Hadoop and No SQL                           Slide 40                           2011 Jul
How are we doing on the Objective




Hadoop and No SQL             Slide 41           2011 Jul
Process, Data and Domain driven Information Excellence




                    ABOUT COMPEGENCE




Hadoop and No SQL          Slide 42                   2011 Jul
Process, Data and Domain
                                              Integrated Approach
                                         Market
                                         Actions
                      Actionable                       Systemic
                                                                                   Decision Excellence
                                                       Changes                     Competitive Advantage lies in
                                                                                   the exploitation of:
            Usable                                                Business
                                         Process                 Landscape
                                                                                   –More detailed and specific
                                                                                   information
                                                                                   –More comprehensive
                                                                        Business
       Timely            Infor                                                     external data & dependencies
                                                           Systems       Intent
                        mation                                                     –Fuller integration

                                                                                   –More in depth analysis
                                                                        Business
      Flexible
                                                                         Usage
                                                                                   –More insightful plans and
                                                                                   strategies
                              Domain                Data
                                                                                   –More rapid response to
           Scalable
                                                                     Cost          business events
                                                                                   –More precise and apt
                      Sustainable                                                  response to customer events
                                                     Effort
                                         Skills &
                                       Competency



                 We complement your “COMPETING WITH ANALYTICS JOURNEY”
Hadoop and No SQL                                    Slide 43                                          2011 Jul
Value Proposition
                          Constraints                                                                                                             Decisions?
                                                                                                                                                                                                         Decisions
  Tools
                         Alternatives                                                                                                                                       Data
  Technologies           Assumptions
                        Dependencies
  Trends               Concerns / Risks                                                                                              TeraBytes                              Processes                    Actions
                      Cost of Ownership  Meta data Laye r f or C on sistent Bu sin e ss Unde rstandi ng
                                                                                                                                                   Actions?
  Platforms
                     Technology Evolution
                  Sour ceD at
                         Data
                 C usto m D
                            a
                        er ata
                                             Extr ct
                                             Extrac t
                                                a           S ginng
                                                             ta g      Ta
                                                                       Tr nsfo r
                                                                        ra     m
                                                                                         Lo ad        A pl ica tion s
                                                                                                       p at i




                                                                                                                                                                                            COMPEGENCE
                 A ssets
                                                                         Busin s Rul s
                                                                             e s e                                      Anal sis
                                                                                                                           y

                 L i a i i t es
                     bl i

                 I n v stm t
                     e en
                                                                       n e ra
                                                                       I t g te     Trusted                             Dashboa ds
                                                                                                                               r
                                                                       T n t
                                                                        ra sla e     Da ta
                 C ards
                                                                       Segme n t


  People         R eference D ata
                 (B r nch, P rodu ct )
                    a

                 P art erD ata
                     n
                                   s                    Repeatable     D ri e
                                                                        e v
                                                                       P li g
                                                                        rofi n
                                                                                   Fou ndat i n

                                                                                      with
                                                                                            o                           Reports

                                                                                                                           Excel
                                                                                       DW                                n
                                                                                                                         I terface
                 C R M/ Marketi g
                 P rograms
                              n

                                                         Reusable     Su m rize
                                                                        m a
                                                                                    Pla tf orm
                                                                                    Pla tf orm
                                                                                                                                                                                                          Results
  Processes                                              Leverage
                                                   Data Qua lit y and Pro cess Aud it

                                                                                                                                                   Results?
                                                        Trade Offs
  Partners                                                                                                                                                                  People
                                                                                                                                     Reports

  Cost                 Ease of Use:                                                                                                                                         Current State                Returns
                 Drill Down, Up, Across
  Time                                                                                                                                            Returns?
                                                                                                                                     Dashboards



          Jump Start the “Process and Information Excellence” journey
          Focus on your business goals and “Competing with Analytics Journey”
          Overcome multiple and diverse expertise / skill-set paucity
          Preserve current investments in people and technology
          Manage Data complexities and the resultant challenges
          Manage Scalability to address data explosion with Terabytes of Data
          Helps you focus on the business and business processes
          Helps you harvest the benefits of your data investments faster
          Consultative Work-thru Workshops that help and mature your team
Hadoop and No SQL                                                                                                                                                Slide 44                                  2011 Jul
Our Expertise and Focus Areas

            Process + Data + Domain => Decision


           Analytics; Data Mining; Big Data; DWH & BI


           Architecture and Methodology


           Partnered Product Development


           Consulting, Competency Building, Advisory, Mentoring


           Executive Briefing Sessions and Deep Dive Workshops



Hadoop and No SQL                  Slide 45                       2011 Jul
Partners in Co-Creating Success


          Process, Data and Domain driven Information Excellence


      Process, Data and Domain driven Business Decision Life Cycle



                                              www.compegence.com
                                             info@compegence.com
Hadoop and No SQL                 Slide 46                         2011 Jul

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeIBM Danmark
 
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...Mingxia Zhang, Ph.D.
 
Introduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesIntroduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesEduardo Castro
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsDataWorks Summit
 
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingTeradata Aster
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligencesouravdas75
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Bringing Together Content and Process
Bringing Together Content and ProcessBringing Together Content and Process
Bringing Together Content and ProcessOpenText Global 360
 
Leveraging System z to Turn Information Into Insight
Leveraging System z to Turn Information Into InsightLeveraging System z to Turn Information Into Insight
Leveraging System z to Turn Information Into Insightdkang
 
Optimize Asset Value and Performance with Enterprise Content Management
Optimize Asset Value and Performance with Enterprise Content ManagementOptimize Asset Value and Performance with Enterprise Content Management
Optimize Asset Value and Performance with Enterprise Content ManagementSAP Solution Extensions
 
Vision - The Agile Data Center
Vision - The Agile Data CenterVision - The Agile Data Center
Vision - The Agile Data Centerincommoninc
 
Industry solutions 2012 final
Industry solutions 2012 finalIndustry solutions 2012 final
Industry solutions 2012 finalakilakumar
 
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...InSync2011
 
Workware systems company presentation web aug 11
Workware systems company presentation web aug 11Workware systems company presentation web aug 11
Workware systems company presentation web aug 11deppster
 
Improving SharePoint Business Process Maturity
Improving SharePoint Business Process MaturityImproving SharePoint Business Process Maturity
Improving SharePoint Business Process MaturityOpenText Global 360
 
Hadoop Enterprise Readiness
Hadoop Enterprise ReadinessHadoop Enterprise Readiness
Hadoop Enterprise Readinessad17633
 

Was ist angesagt? (20)

Big Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC RepresentativeBig Data i CSC's optik, CSC Representative
Big Data i CSC's optik, CSC Representative
 
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...
TeleManagement Forum OSSera Case Study - AIS Thailand Service Manager Present...
 
Introduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data ServicesIntroduccion a SQL Server Master Data Services
Introduccion a SQL Server Master Data Services
 
SQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analyticsSQL-H a new way to enable SQL analytics
SQL-H a new way to enable SQL analytics
 
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
121211 improve your productivity
121211 improve your productivity121211 improve your productivity
121211 improve your productivity
 
Bringing Together Content and Process
Bringing Together Content and ProcessBringing Together Content and Process
Bringing Together Content and Process
 
[StepTalks2011] Agility @ Scale - Rien Schot
[StepTalks2011] Agility @ Scale - Rien Schot[StepTalks2011] Agility @ Scale - Rien Schot
[StepTalks2011] Agility @ Scale - Rien Schot
 
Leveraging System z to Turn Information Into Insight
Leveraging System z to Turn Information Into InsightLeveraging System z to Turn Information Into Insight
Leveraging System z to Turn Information Into Insight
 
Day 3 p3 - xs and ec
Day 3   p3 - xs and ecDay 3   p3 - xs and ec
Day 3 p3 - xs and ec
 
Optimize Asset Value and Performance with Enterprise Content Management
Optimize Asset Value and Performance with Enterprise Content ManagementOptimize Asset Value and Performance with Enterprise Content Management
Optimize Asset Value and Performance with Enterprise Content Management
 
Cloud Computing
Cloud Computing  Cloud Computing
Cloud Computing
 
Vision - The Agile Data Center
Vision - The Agile Data CenterVision - The Agile Data Center
Vision - The Agile Data Center
 
Industry solutions 2012 final
Industry solutions 2012 finalIndustry solutions 2012 final
Industry solutions 2012 final
 
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...
JD Edwards & Peoplesoft 3 _ Kristina Webb _ Seeing through the clouds - A gui...
 
Workware systems company presentation web aug 11
Workware systems company presentation web aug 11Workware systems company presentation web aug 11
Workware systems company presentation web aug 11
 
Improving SharePoint Business Process Maturity
Improving SharePoint Business Process MaturityImproving SharePoint Business Process Maturity
Improving SharePoint Business Process Maturity
 
Hadoop Enterprise Readiness
Hadoop Enterprise ReadinessHadoop Enterprise Readiness
Hadoop Enterprise Readiness
 

Ähnlich wie Compegence: Nagaraj Kulkarni - Hadoop and No SQL_TDWI_2011Jul23_Preso

sap-demo-day.pdf
sap-demo-day.pdfsap-demo-day.pdf
sap-demo-day.pdfEd Dodds
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...Cloudera, Inc.
 
AllAccessSAP 2012 Finale - SAP Slides (incl links)
AllAccessSAP 2012 Finale - SAP Slides (incl links)AllAccessSAP 2012 Finale - SAP Slides (incl links)
AllAccessSAP 2012 Finale - SAP Slides (incl links)BI Brainz Group
 
Building the Agile Enterprise
Building the Agile EnterpriseBuilding the Agile Enterprise
Building the Agile EnterpriseSrini Koushik
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...InSync2011
 
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...InSync2011
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big AnalyticsDeepak Ramanathan
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloudtdwiindia
 
Three pillars of a working cloud model
Three pillars of a working cloud modelThree pillars of a working cloud model
Three pillars of a working cloud modelMaik Schmalstich
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The EnterpriseCloudera, Inc.
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architectureDataWorks Summit
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Innovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellenceInnovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellenceIFS
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
DB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZDB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZSurekha Parekh
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 

Ähnlich wie Compegence: Nagaraj Kulkarni - Hadoop and No SQL_TDWI_2011Jul23_Preso (20)

sap-demo-day.pdf
sap-demo-day.pdfsap-demo-day.pdf
sap-demo-day.pdf
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
How CBS Interactive uses Cloudera Manager to effectively manage their Hadoop ...
 
AllAccessSAP 2012 Finale - SAP Slides (incl links)
AllAccessSAP 2012 Finale - SAP Slides (incl links)AllAccessSAP 2012 Finale - SAP Slides (incl links)
AllAccessSAP 2012 Finale - SAP Slides (incl links)
 
Building the Agile Enterprise
Building the Agile EnterpriseBuilding the Agile Enterprise
Building the Agile Enterprise
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
E-Business Suite 1 | Nadia Bendiedou | Oracle E-Business Suite Technology rel...
 
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big Analytics
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloud
 
Three pillars of a working cloud model
Three pillars of a working cloud modelThree pillars of a working cloud model
Three pillars of a working cloud model
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 
Unified big data architecture
Unified big data architectureUnified big data architecture
Unified big data architecture
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Antonio piraino v1
Antonio piraino v1Antonio piraino v1
Antonio piraino v1
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Innovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellenceInnovation Webinar - Using IFS Applications BI to drive business excellence
Innovation Webinar - Using IFS Applications BI to drive business excellence
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
DB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System ZDB2 for z/OS Update Data Warehousing On System Z
DB2 for z/OS Update Data Warehousing On System Z
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 

Kürzlich hochgeladen

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Compegence: Nagaraj Kulkarni - Hadoop and No SQL_TDWI_2011Jul23_Preso

  • 1. Co-Creating COMPetitive intelliGENCE through Process, Data and Domain driven Information Excellence Hadoop and No SQL PRESENTED in TDWI India, Hyderabad (2011 July) Nagaraj Kulkarni Hadoop and No SQL Slide 1 2011 Jul
  • 2. Process, Data and Domain Integrated Approach Market Actions Decision Excellence Actionable Systemic Competitive Advantage lies in Changes the exploitation of: Usable Business Process Landscape –More detailed and specific information –More comprehensive Business external data & dependencies Timely Infor Systems Intent –Fuller integration mation –More in depth analysis Business –More insightful plans and Flexible strategies Usage Domain Data –More rapid response to Scalable business events Cost –More precise and apt response to customer events Sustainable Effort Skills & Competency Ram Charan’s Book: What The CEO Wants You To Know: How Your Company Really Works COMPEGENCE Hadoop and No SQL Slide 2 2 Information Excellence Foundation 2011 Jul
  • 3. Touching Upon Context For Big Data Challenges Data Base Systems – Pre Hadoop Strengths and Limitations What is Scale, Why No SQL Think Hadoop, Hadoop Eco system Think Map Reduce Nail Down Map Reduce Think GRID (Distributed Architecture) Deployment Options Map Reduce Not and Map Reduce Usages Nail Down HDFS and GRID Architecture Hadoop and No SQL Slide 3 2011 Jul
  • 4. Big Data Context Hadoop and No SQL Slide 4 2011 Jul
  • 5. Systemic Changes Boundary less ness Connected Best Sourcing Globe Interlinked Culture Demand Side Focus Customer Centric Bottom Up Innovation Empowered employees Leading Trends Agility and Responsiveness Response Time Speed, Agility, Flexibility Hadoop and No SQL Slide 5 2011 Jul
  • 6. Landscape To Address Manageability Data Scalability Explosion Performance Agility Information Overload Decision Making Time to Action Interlinked Boundaryless Processes Systemic Understanding Collaborate and Synergize & Systems Simplify and Scale Hadoop and No SQL Slide 6 2011 Jul
  • 7. Information Overload A wealth of information creates a poverty of attention. Herbert Simon, Nobel Laureate Economist Hadoop and No SQL Confidential COMPEGENCE Slide 7 7 2011 Jul
  • 8. More Touch points, More Channels BACKUP Source: JupiterResearch (7/08) © 2008 JupiterResearch, LLC Hadoop and No SQL Slide 8 2011 Jul
  • 9. Scale – What is it? Hadoop and No SQL Slide 9 2011 Jul
  • 10. How do we scale Traditional System - How they achieve Scalability  Multi Threading  Multiple CPU – Parallel Processing  Distributed Programming – SMP & MPP  ETL Load Distribution – Assigning jobs to different nodes  Improved Throughput Hadoop and No SQL Slide 10 2011 Jul
  • 11. Scale – What is it about? Facebook 1.73 Billion Internet Users 500 Million Active eBay Users per Month 90 Million Active 247 Billion emails per day Users 500 Billion+ Page 126 Million Blogs Views per month 10 Billion Requests per day 5 Billion Facebook Content 25 Billion+ Content per week per month 220 million+ items on sale 50 Million Tweets per day 15 TB New Data / day 1200 m/cs, 21 PB 40 TB + / day 80% of this data is Cluster 40 PB of Data unstructured Yahoo Twitter Estimated 800 GB of data 82 PB of Data 1 TB plus / day per user (million Petabyte!) 25000+ nodes 80 + nodes Hadoop and No SQL Slide 11 2011 Jul
  • 12. How do we scale – Think Numbers Thinking of Scale - Need for Grid Think Numbers Data Highway 1000 Nodes / DC Datamart lb n1 -nn 100 mps/pipe 10 DC Log storage bp & processing dc 2 1K byte webserver log record dc 1 Web server Log Datamart 1 second / row ………. ………. In one day 1000 * 10 * 1K * 60 * 60 * 24 = 864 GB Storage for a year 864 GB * 365 = 315 TB To store 1 PB – 40K * 1000 = Millions $ To process 1 TB = 1000 minutes ~ 17 hrs Think Agility and Flexibility Hadoop and No SQL Slide 12 2011 Jul
  • 13. Scale – What is it about? Volume Speed Integration level more… Does it scale linearly with data size and analysis complexity Hadoop and No SQL Slide 13 2011 Jul
  • 14. We would not have no issues… If the following assumptions Hold Good: The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous. Hadoop and No SQL Slide 14 2011 Jul
  • 15. Think Hadoop Hadoop and No SQL Slide 15 2011 Jul
  • 16. New Paradigm: Go Back to Basics Divide and Conquer (Divide and Delegate and Get Done) Move Work or Workers ? Relax Constraints (Pre defined data models) Expect and Plan for Failures (avoid n address failures) Community backup Assembly Line Processing (Scale, Speed, Efficiency, Commodity Worker) The “For loop” Parallelization (trivially parallelizable) Infrastructure and Supervision (Grid Architecture) Manage Dependencies Ignore the Trivia (Trivia is relative!) Joel Spolsky Charlie Munger’s Mental Models http://www.joelonsoftware.com/items/2006/08/01.html Hadoop and No SQL Slide 16 2011 Jul
  • 17. New Paradigm: Go Back to Basics Map Reduce Paradigm Grid Architecture Divide and Conquer Split and Delegate The “for loop” Move Work or Workers Sort and Shuffle Expect and Plan for Failures Parallelization (trivially parallelizable) Assembly Line Processing (Scale, Relax Data Constraints Speed, Efficiency, Commodity Worker) Assembly Line Processing Scale, Manage Dependencies and Failures Speed, Efficiency, Commodity Worker) Ignore the Trivia (Trivia is relative!) Map Reduce History Replication, Redundancy, Lisp Heart Beat Check, Cluster rebalancing, Unix Fault Tolerance, Task Restart, Google FS Chaining of jobs (Dependencies), Graceful Restart, Look Ahead or Speculative execution, Hadoop and No SQL Slide 17 2011 Jul
  • 18. No SQL Options Hbase/Cassandra for huge data volumes- PBs. •Hbase fits in well where Hadoop is already being used. •Cassandra less cumbersome to install/manage MongoDB/CouchDB Document oriented databases for easy use and GB-TB volumes. Might be problematic at PB scales Neo4j like graph databases for managing relationship oriented applications- nodes and edges Riak, redis, membase like Simple key-value databases for huge distributed in-memory hash maps Hadoop and No SQL Slide 18 2011 Jul
  • 19. Let us Think Hadoop Hadoop and No SQL Slide 19 2011 Jul
  • 20. RDBMS and Hadoop RDBMS MapReduce Data size Gigabytes Petabytes Interactive and Access Batch batch Unstructured Structure Fixed schema schema Procedural (Java, Language SQL C++, Ruby, etc) Integrity High Low Scaling Nonlinear Linear Write once, read Updates Read and write many times Latency Low High Hadoop and No SQL Slide 20 2011 Jul
  • 21. Apache Hadoop Ecosystem Hadoop Common: The common utilities that support the other Hadoop subprojects. HDFS: A distributed file system that provides high throughput access to application data. MapReduce: A software framework for distributed processing of large data sets on compute clusters. Pig: A high-level data-flow language and execution framework for parallel computation. HBase / Flume / Scribe: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. ZooKeeper: A high-performance coordination service for distributed applications. Flume: Message Que Processing Mahout: scalable Machine Learning algorithms using Hadoop Chukwa: A data collection system for managing large distributed systems. Hadoop and No SQL Slide 21 2011 Jul
  • 22. Apache Hadoop Ecosystem ETL Tools BI Reporting RDBMS Zookeepr (Coordination) Pig (Data Flow) Hive (SQL) Sqoop Avro (Serialization) MapReduce (Job Scheduling/Execution System) HBase (Key-Value store) HDFS (Hadoop Distributed File System) Hadoop and No SQL Slide 22 2011 Jul
  • 23. HDFS – The BackBone Hadoop Distributed File System Hadoop and No SQL Slide 23 2011 Jul
  • 24. Map Reduce – The New Paradigm Transforming Large Data MapReduce Basics •Functional Programming Mappers •List Processing •Mapping Lists Reducers Hadoop and No SQL Slide 24 2011 Jul
  • 25. PIG – Help the Business User Query Pig: Data-aggregation functions over semi-structured data (log files). Pig Latin Programs Query Parser Logical Plan Semantic Checking Logical Plan Logical Optimizer Optimized Logical Plan Logical to Physical Translator Physical Plan Physical To M/R Translator MapReduce Plan Map Reduce Launcher Create a job jar to be submitted to Hadoop cluster Hadoop and No SQL Slide 25 2011 Jul
  • 26. PIG Latin Example Hadoop and No SQL Slide 26 2011 Jul
  • 27. HBASE – Scalable Columnar • Scalable, Reliable, Distributed DB • Columnar Structure • Built on top of HDFS • Map Reduceable • A SQL Database! – No joins – No sophisticated query engine – No transactions – No column typing – No SQL, no ODBC/JDBC, etc. • Not a replacement for your RDBMS... Hadoop and No SQL Slide 27 2011 Jul
  • 28. HIVE – SQL Like • A high level interface on Hadoop for managing and querying structured data • Interpreted as Map-Reduce jobs for execution • Uses HDFS for storage • Uses Metadata representation over hdfs files • Key Building Principles: • Familiarity with SQL • Performance with help of built-in optimizers • Enable Extensibility – Types, Functions, Formats, Scripts Hadoop and No SQL Slide 28 2011 Jul
  • 29. FLUME – Distributed Data Collection • Distributed Data / Log Collection Service • Scalable, Configurable, Extensible • Centrally Manageable • Agents fetch data from apps, Collectors save it • Abstrations: Source -> Decrator(s) -> Sink Hadoop and No SQL Slide 29 2011 Jul
  • 30. Oozie – Workflow Management An Oozie Workflow M/R streaming OK job SSH OK start HOD fork join Alloc Pig MORE OK decision job ERROR ERROR M/R ENOUGH ERROR job ERROR kill OK ERROR ERROR Java Main OK FS OK end job Hadoop and No SQL Slide 30 2011 Jul
  • 31. Think Map n Reduce Hadoop and No SQL Slide 31 2011 Jul
  • 32. Understanding Map Reduce Paradigm Logical Architecture Hadoop and No SQL Slide 32 2011 Jul
  • 33. Understanding Map Reduce Paradigm Hadoop and No SQL Slide 33 2011 Jul
  • 34. Map Reduce Paradigm Job Configure the Hadoop Job to run. Mapper map(LongWritable key, Text value, Context context) Reducer reduce(Text key, Iterable<IntWritable> values, Context context) Hadoop and No SQL Slide 34 2011 Jul
  • 35. Programming model Map –Reduce Definition MapReduce is a functional programming model and an associated implementation model for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. CONCEPTS Hadoop and No SQL Slide 35 2011 Jul
  • 36. Programming model Input & Output: each a set of key/value pairs Programmer specifies two functions: map (in_key, in_value) -> list(out_key, intermediate_value) •Processes input key/value pair •Produces set of intermediate pairs reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a particular key •Produces a set of merged output values (usually just one) •Inspired by similar primitives in LISP and other languages Hadoop and No SQL Slide 36 2011 Jul
  • 37. Map Reduce Paradigm Word Count Example A simple MapReduce program can be written to determine how many times different words appear in a set of files. What does Mapper and Reducer do? Pseudo Code: mapper (filename, file-contents): for each word in file-contents: emit (word, 1) reducer (word, values): sum = 0 for each value in values: sum = sum + value emit (word, sum) Hadoop and No SQL Slide 37 2011 Jul
  • 38. Programming model Example: Count word occurrences map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); Pseudocode: See appendix in paper for real code Hadoop and No SQL Slide 38 2011 Jul
  • 39. Understanding Map Reduce Paradigm Map – Reduce Execution Recap • Master-Slave architecture • Master: JobTracker – Accepts MR jobs submitted by users – Assigns Map and Reduce tasks to TaskTrackers (slaves) – Monitors task and TaskTracker status, re-executes tasks upon failure • Worker: TaskTrackers – Run Map and Reduce tasks upon instruction from the Jobtracker – Manage storage and transmission of intermediate output Hadoop and No SQL Slide 39 2011 Jul
  • 40. Understanding Map Reduce Paradigm Map – Reduce Paradigm Recap Example of map functions – Individual Count, Filter, Transformation, Sort, Pig load Example of reduce functions – Group Count, Sum, Aggregator A job can have many map and reducers functions. Hadoop and No SQL Slide 40 2011 Jul
  • 41. How are we doing on the Objective Hadoop and No SQL Slide 41 2011 Jul
  • 42. Process, Data and Domain driven Information Excellence ABOUT COMPEGENCE Hadoop and No SQL Slide 42 2011 Jul
  • 43. Process, Data and Domain Integrated Approach Market Actions Actionable Systemic Decision Excellence Changes Competitive Advantage lies in the exploitation of: Usable Business Process Landscape –More detailed and specific information –More comprehensive Business Timely Infor external data & dependencies Systems Intent mation –Fuller integration –More in depth analysis Business Flexible Usage –More insightful plans and strategies Domain Data –More rapid response to Scalable Cost business events –More precise and apt Sustainable response to customer events Effort Skills & Competency We complement your “COMPETING WITH ANALYTICS JOURNEY” Hadoop and No SQL Slide 43 2011 Jul
  • 44. Value Proposition Constraints Decisions? Decisions Tools Alternatives Data Technologies Assumptions Dependencies Trends Concerns / Risks TeraBytes Processes Actions Cost of Ownership Meta data Laye r f or C on sistent Bu sin e ss Unde rstandi ng Actions? Platforms Technology Evolution Sour ceD at Data C usto m D a er ata Extr ct Extrac t a S ginng ta g Ta Tr nsfo r ra m Lo ad A pl ica tion s p at i COMPEGENCE A ssets Busin s Rul s e s e Anal sis y L i a i i t es bl i I n v stm t e en n e ra I t g te Trusted Dashboa ds r T n t ra sla e Da ta C ards Segme n t People R eference D ata (B r nch, P rodu ct ) a P art erD ata n s Repeatable D ri e e v P li g rofi n Fou ndat i n with o Reports Excel DW n I terface C R M/ Marketi g P rograms n Reusable Su m rize m a Pla tf orm Pla tf orm Results Processes Leverage Data Qua lit y and Pro cess Aud it Results? Trade Offs Partners People Reports Cost Ease of Use: Current State Returns Drill Down, Up, Across Time Returns? Dashboards Jump Start the “Process and Information Excellence” journey Focus on your business goals and “Competing with Analytics Journey” Overcome multiple and diverse expertise / skill-set paucity Preserve current investments in people and technology Manage Data complexities and the resultant challenges Manage Scalability to address data explosion with Terabytes of Data Helps you focus on the business and business processes Helps you harvest the benefits of your data investments faster Consultative Work-thru Workshops that help and mature your team Hadoop and No SQL Slide 44 2011 Jul
  • 45. Our Expertise and Focus Areas Process + Data + Domain => Decision Analytics; Data Mining; Big Data; DWH & BI Architecture and Methodology Partnered Product Development Consulting, Competency Building, Advisory, Mentoring Executive Briefing Sessions and Deep Dive Workshops Hadoop and No SQL Slide 45 2011 Jul
  • 46. Partners in Co-Creating Success Process, Data and Domain driven Information Excellence Process, Data and Domain driven Business Decision Life Cycle www.compegence.com info@compegence.com Hadoop and No SQL Slide 46 2011 Jul