SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Architecting business
critical enterprise
application:
Automated Support


Kumar Palaniappan
Enterprise Architect, NetApp
Agenda

¡  NetApp’s Business Challenge
¡  Solution Architecture
¡  Best Practices
¡  Performance Benchmarks
¡  Questions




                                  2
The AutoSupport Family
The Foundation of NetApp Support Strategies

            ¡  Catch issues before they become critical
            ¡  Secure automated “call-home” service
            ¡  System monitoring and nonintrusive
                alerting
            ¡  RMA requests without customer action
            ¡  Enables faster incident management

      “My AutoSupport Upgrade Advisor tool does all the hard work
       for me, saving me 4 to 5 hours of work per storage system and
       providing an upgrade plan that’s complete and easy to follow.”

                                                                        3
AutoSupport – Why Does it Matter?
                   Customers                    Partners                            NetApp
                                                                                     Product Adoption & Usage
Product Planning
                                                                       Install Base Mgmt
 & Development
                                                                           Data Mining
                                                                        Lead Generation
   Pre Sales                                                          Stickiness Measurements
                                          “What If’ Scenarios & Capacity Planning
                                                 Establish Initial Call Home

  Deployment                                 Measure Implementation Effectiveness
                                             Storage usage Monitoring & Billing (NAFS)
                                               Event-Based Triggers & Alerts                         Automated
                                                                                                     E2E Case
   Technical                                     Automated Case Creation                              Handling
    Support
                        Automated…                                                  …Parts & Support Dispatch

                              SAM Services: 1) Proactive Health Checks 2) Upgrade Planning
    Proactive
   Planning &                      Storage Efficiency Measurements & Recommendations
  Optimization       PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning

                                                                                      Critical to Quality Metrics
    Product                                                                           Adoption & Usage Metrics
   Feedback
                                                                                    Quality & Reliability Metrics

                                                               NetApp Confidential – Limited Use                4
Business Challenges




   Gateways                ETL               Data Warehouse                              Reporting
                                        •  Only 5% of data goes into the
•  600K ASUPs        •  Data needs to                                            •  Numerous mining
                                           data warehouse, rest
   every week           be parsed and                                               requests are not satisfied
                                           unstructured. It’s growing
                        loaded in 15                                                currently
•  40% coming over                         6-8TB per month
   the weekend          mins                                                     •  Huge untapped potential
                                        •  Oracle DBMS struggling to
                                                                                    of valuable information for
•  .5% growth week                         scale, maintenance and
                                                                                    lead generation,
   over week                               backups challenging
                                                                                    supportability, and BI
                                        •  No easy way to access this
                                           unstructured content


       Finally, the incoming load doubles every 16 months!
                                             NetApp Confidential – Limited Use                               5
Incoming AutoSupport Volumes
   and TB Consumption
6,000
                          Actual (tb)                            Projected
5,000                     Double                                 High Count & Size

                          Low Count & Size
4,000


3,000


2,000


1,000


   0
        Jan-00


                 Jan-01


                          Jan-02


                                   Jan-03




                                                     Jan-05


                                                              Jan-06


                                                                       Jan-07


                                                                                Jan-08


                                                                                         Jan-09


                                                                                                  Jan-10


                                                                                                           Jan-11


                                                                                                                    Jan-12


                                                                                                                             Jan-13




                                                                                                                                               Jan-15


                                                                                                                                                        Jan-16


                                                                                                                                                                 Jan-17
                                            Jan-04




                                                                                                                                      Jan-14
    ¡  At projected current rate of growth,
        total storage requirements continue
        doubling every 16 months
    ¡  Cost Model:
        > $15M per year Ecosystem costs

                                                                        NetApp Confidential – Limited Use                                                                 6
New Functionality Needed


 Weeks
                                          Product
                                          Analysis
                                                                   Service
                Cross Sell &                  Performance
                  Up Sell                      Planning
                                Customer
                               Intelligence                         Sales
                   License
                 Management           Proactive
                                      Support
              Customer                                             Product
             Self Service                                        Development
Seconds
          Gigabytes                                  Petabytes


                                                                             7
Solution Architecture




                        8
Hadoop Architecture




Ingest   F Ingest HDFS       Ingest                           Lookup
         l                                           ASUP
         u          Logs,
         m                                           Config      R
         e       Performance                                           Tools
                 and raw config                       Data       E
                                                                 S
                                                                 T


                                         Subscribe
                             MapReduce                  Pig
                   Analyze




                Metrics, Analytics, EBI
                                                                               9
Solution Architecture




                        10
Data Ingestion
¡  Use of Flume (v1) to consume large XML objects up to
  20 MB compressed ea.
¡  4 agents feed 2 collectors in production
¡  Basic Process Control using supervisord (ZK in R2?)
¡  Reliability Mode: Disk Failover (Store on Failure)
¡  Separate sinks for Text and Binary sections
¡  Arrival time bucketing by minute
¡  Snappy Sequence Files with JSON values
¡  Evaluating Flume NG
¡  Ingesting 4.5 TB uncompressed/week 80% in an 8
    hour window
Data Transformation
¡  Ingested data processed every 1 min. (w/ 5 min. lag)
  –  Relies on Fair Scheduler to meet SLA
  –  Oozie (R0) -> Pentaho PDI (R1) for scheduling
¡  Configuration data written to HBase using Avro
¡  Duplicate data written to HDFS as Hive / JSON for ad
    hoc queries
¡  User scans of HBase for ad hoc queries avoided to
    meet SLA
¡  Also simplifies data access
    –  query tools don’t yet have support for Avro
       serialization in HBase
    –  they all assume String keys and values (evolving to
       support Avro)
Low Latency Application Data Access
¡  High performance REST lookups
¡  Data stored as Avro serialized objects for
    performance and versioning
¡  Solr used to search for objects (one core per region)
¡  Then details pulled from HBase
¡  Large objects (logs) indexed and pulled from HDFS
¡  ~100 HBase regions (500 GB ea.)
  –  no splitting
  –  Snappy compressed tables
¡  Future: HBase coprocessors to keep Solr indexes up
    to date
Export to Oracle DSS

¡  Pentaho pulls data from HBase and HDFS
¡  Pushes into Oracle star schema
¡  Daily export
 –  530 million rows and 350 GB on peak days
¡  Runs on 2 VMs
 –  64 GB RAM, 12 cores
¡  Enables existing BI tools (OBIE) to query DSS
    database
Disaster Recovery
¡  DR cluster with 75% of production capacity
    –  in Release 2
¡  Active/active from Flume back
    –  Primary cluster the one HTTP/SMTP responder
¡  SLA: cannot lose >1 hour of data
  –  can be lost in front-end switchover
¡  HBase incremental backups
¡  Staging used frequently for engineering test,
    operationally expensive so not used for DR
NetApp Open
Solution for Hadoop
(NOSH)




                      16
HDFS Storage: Key Needs
Attribute     Key Drivers                                 Requirement

Performance   •  Fast response time for                   •  Minimize Network bottlenecks
                 search, ad-hoc, and real-                •  Optimize server workload
                 time queries                             •  Leverage storage HW to
              •  High replication counts                     increase cluster performance
                 impact throughput

Opex          •  Lower operational costs for              •  Optimize usable storage
                 managing huge amounts of                    capacity
                 data                                     •  Decouple storage from
              •  Controlling staff costs and                 compute nodes to decrease
                 cluster management costs                    the need to add more
                 as clusters scale                           compute nodes

Enterprise    •  Protect SPOF at the                      •  Protect cluster metadata from
Robustness       Hadoop name node                            SPOF
              •  Minimize cluster rebuild                 •  Minimize risks where
                                                             equipment tends to fail

                              NetApp Confidential – Limited Use                              17
NetApp Open Solution for Hadoop
                                     NFS over 1GbE
                      HDFS                              ¡  Easy to Deploy, Manage and Scale
 10GbE
                     NameNode                           ¡  Uses High Performance storage
                                           FAS2040          –  Resilient and Compact
                     Secondary                              –  RAID Protection of Data
                     NameNode
                                                            –  Less Network Congestion
                                                        ¡  Raw Capacity and density
Map                                                         –  120TB or 180TB in 4U
Reduce
                    DataNodes /                             –  Fully serviceable storage system
                    TaskTracker        4 separate shared
JobTracker
                          :                             ¡  Reliability
                                       nothing partitions
                                         per datanode
                                                            –  Hardware RAID & hot swap prevent
                                                               job restart due to node go off-line in
                                                               case of media failure
                                            E2660
                    DataNodes /                             –  Reliable metadata (Name Node)
                    TaskTracker
                                    6Gb/s SAS Direct
                                     Connect (1 per
                                      DataNode)
                                                                 Enterprise Class Hadoop
         10GbE Links (1 per Node)


                                             NetApp Confidential – Limited Use                          18
Performance and
Scaling




                  19
Linear Throughput Scaling as
             DataNode Count Increases
                            Read/Write Throughput
             6000
                    Tot Read Throughput (MB/s)
             5000   Tot Write Throughput (MB/s)

             4000
Throughput




             3000

             2000

             1000

                0
                    4           8              12                       24
                               DataNodes per Configuration Tested

                                    NetApp Confidential – Limited Use        20
Summary




          21
Takeaways
¡  Hadoop-based Big Data architecture
    enables
  –  Cost effective scaling
  –  Low latency access to data
  –  Ad hoc issues & pattern detection
  –  Predictive modeling in future
¡  Using our own innovative Hadoop storage
    technology NOSH
¡  An enterprise transformation


                                              22
¡  Kumar Palaniappan
                                                                                  @megamda


© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without
prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp,
the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc.
in the United States and/or other countries. All other brands or products are trademarks or
registered trademarks of their respective holders and should be treated as such.

Weitere ähnliche Inhalte

Andere mochten auch

Transform Your Enterprise Faster with Seamless Hybrid Cloud from Netapp
Transform Your Enterprise Faster with Seamless Hybrid Cloud from NetappTransform Your Enterprise Faster with Seamless Hybrid Cloud from Netapp
Transform Your Enterprise Faster with Seamless Hybrid Cloud from NetappAmazon Web Services
 
NetApp MVC Project PPT
NetApp MVC Project PPTNetApp MVC Project PPT
NetApp MVC Project PPTAkhil Razdan
 
NetApp Insight Berlin Top 5 Most Popular Breakout Sessions
NetApp Insight Berlin Top 5 Most Popular Breakout SessionsNetApp Insight Berlin Top 5 Most Popular Breakout Sessions
NetApp Insight Berlin Top 5 Most Popular Breakout SessionsNetApp Insight
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiGluster.org
 
Hybrid IT Approach and Technologies on AWS
Hybrid IT Approach and Technologies on AWSHybrid IT Approach and Technologies on AWS
Hybrid IT Approach and Technologies on AWSAmazon Web Services
 
NetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPTNetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPTShridhar Shriraghavan
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)Ashwin Pawar
 
SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.NetApp
 
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for HadoopSeungYong Baek
 
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)NetApp
 

Andere mochten auch (10)

Transform Your Enterprise Faster with Seamless Hybrid Cloud from Netapp
Transform Your Enterprise Faster with Seamless Hybrid Cloud from NetappTransform Your Enterprise Faster with Seamless Hybrid Cloud from Netapp
Transform Your Enterprise Faster with Seamless Hybrid Cloud from Netapp
 
NetApp MVC Project PPT
NetApp MVC Project PPTNetApp MVC Project PPT
NetApp MVC Project PPT
 
NetApp Insight Berlin Top 5 Most Popular Breakout Sessions
NetApp Insight Berlin Top 5 Most Popular Breakout SessionsNetApp Insight Berlin Top 5 Most Popular Breakout Sessions
NetApp Insight Berlin Top 5 Most Popular Breakout Sessions
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
Hybrid IT Approach and Technologies on AWS
Hybrid IT Approach and Technologies on AWSHybrid IT Approach and Technologies on AWS
Hybrid IT Approach and Technologies on AWS
 
NetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPTNetApp Pure Storage - A Business Intelligence PPT
NetApp Pure Storage - A Business Intelligence PPT
 
ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)ODX (Offloaded Data Transfers)
ODX (Offloaded Data Transfers)
 
SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.SOFTBANK TELECOM Corp.
SOFTBANK TELECOM Corp.
 
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
 
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
How NetApp IT Integrates ServiceNow with OnCommand Insight (OCI)
 

Ähnlich wie Architecting Business Critical Enterprise Apps-NetApp

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Cloudera, Inc.
 
Collaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueCollaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueSAP Ariba
 
SAP Analytics for Procurement
SAP Analytics for ProcurementSAP Analytics for Procurement
SAP Analytics for ProcurementHenner Schliebs
 
Oracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapOracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapJerome Leonard
 
Radium presentation sap.upload
Radium presentation   sap.uploadRadium presentation   sap.upload
Radium presentation sap.uploadbobj-vivek
 
Analytics for procurement health care
Analytics for procurement health careAnalytics for procurement health care
Analytics for procurement health careHenner Schliebs
 
Analytics For Procurement Health Care
Analytics For Procurement Health CareAnalytics For Procurement Health Care
Analytics For Procurement Health CareHenner Schliebs
 
MVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOMVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOwlmurphy
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comAn introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comsharedserviceslink.com
 
Practical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSPractical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSSamsung Electronics
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Nek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mNek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mOracle Hrvatska
 
Acumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen
 
Business Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightBusiness Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightDavid Sogn
 
Session7part1
Session7part1Session7part1
Session7part1abiraaman
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDevOps.com
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 

Ähnlich wie Architecting Business Critical Enterprise Apps-NetApp (20)

Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
 
Collaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater valueCollaborative sourcing keys to unlocking greater value
Collaborative sourcing keys to unlocking greater value
 
SAP Analytics for Procurement
SAP Analytics for ProcurementSAP Analytics for Procurement
SAP Analytics for Procurement
 
Oracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And RoadmapOracle CRM On Demand Product Strategy And Roadmap
Oracle CRM On Demand Product Strategy And Roadmap
 
Radium presentation sap.upload
Radium presentation   sap.uploadRadium presentation   sap.upload
Radium presentation sap.upload
 
Analytics for procurement health care
Analytics for procurement health careAnalytics for procurement health care
Analytics for procurement health care
 
Analytics For Procurement Health Care
Analytics For Procurement Health CareAnalytics For Procurement Health Care
Analytics For Procurement Health Care
 
Technical presentation
Technical presentationTechnical presentation
Technical presentation
 
MVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNOMVN Analytics- BI for the MVNE and the MVNO
MVN Analytics- BI for the MVNE and the MVNO
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]Ams Webinar 25 March 2010 Jf Final[1]
Ams Webinar 25 March 2010 Jf Final[1]
 
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.comAn introduction to Areva T&D FSSC lean service - sharedserviceslink.com
An introduction to Areva T&D FSSC lean service - sharedserviceslink.com
 
Practical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBSPractical Approach to Data Maintenance in for PLM in Oracle EBS
Practical Approach to Data Maintenance in for PLM in Oracle EBS
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Nek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility mNek e am_overview_2010_1f.ppt [compatibility m
Nek e am_overview_2010_1f.ppt [compatibility m
 
Acumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule IntegrationAcumen & ARES: Simplified Cost & Schedule Integration
Acumen & ARES: Simplified Cost & Schedule Integration
 
Business Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done RightBusiness Intelligence - Architecture & Execution Done Right
Business Intelligence - Architecture & Execution Done Right
 
Session7part1
Session7part1Session7part1
Session7part1
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Architecting Business Critical Enterprise Apps-NetApp

  • 1. Architecting business critical enterprise application: Automated Support Kumar Palaniappan Enterprise Architect, NetApp
  • 2. Agenda ¡  NetApp’s Business Challenge ¡  Solution Architecture ¡  Best Practices ¡  Performance Benchmarks ¡  Questions 2
  • 3. The AutoSupport Family The Foundation of NetApp Support Strategies ¡  Catch issues before they become critical ¡  Secure automated “call-home” service ¡  System monitoring and nonintrusive alerting ¡  RMA requests without customer action ¡  Enables faster incident management “My AutoSupport Upgrade Advisor tool does all the hard work for me, saving me 4 to 5 hours of work per storage system and providing an upgrade plan that’s complete and easy to follow.” 3
  • 4. AutoSupport – Why Does it Matter? Customers Partners NetApp Product Adoption & Usage Product Planning Install Base Mgmt & Development Data Mining Lead Generation Pre Sales Stickiness Measurements “What If’ Scenarios & Capacity Planning Establish Initial Call Home Deployment Measure Implementation Effectiveness Storage usage Monitoring & Billing (NAFS) Event-Based Triggers & Alerts Automated E2E Case Technical Automated Case Creation Handling Support Automated… …Parts & Support Dispatch SAM Services: 1) Proactive Health Checks 2) Upgrade Planning Proactive Planning & Storage Efficiency Measurements & Recommendations Optimization PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning Critical to Quality Metrics Product Adoption & Usage Metrics Feedback Quality & Reliability Metrics NetApp Confidential – Limited Use 4
  • 5. Business Challenges Gateways ETL Data Warehouse Reporting •  Only 5% of data goes into the •  600K ASUPs •  Data needs to •  Numerous mining data warehouse, rest every week be parsed and requests are not satisfied unstructured. It’s growing loaded in 15 currently •  40% coming over 6-8TB per month the weekend mins •  Huge untapped potential •  Oracle DBMS struggling to of valuable information for •  .5% growth week scale, maintenance and lead generation, over week backups challenging supportability, and BI •  No easy way to access this unstructured content Finally, the incoming load doubles every 16 months! NetApp Confidential – Limited Use 5
  • 6. Incoming AutoSupport Volumes and TB Consumption 6,000 Actual (tb) Projected 5,000 Double High Count & Size Low Count & Size 4,000 3,000 2,000 1,000 0 Jan-00 Jan-01 Jan-02 Jan-03 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-15 Jan-16 Jan-17 Jan-04 Jan-14 ¡  At projected current rate of growth, total storage requirements continue doubling every 16 months ¡  Cost Model: > $15M per year Ecosystem costs NetApp Confidential – Limited Use 6
  • 7. New Functionality Needed Weeks Product Analysis Service Cross Sell & Performance Up Sell Planning Customer Intelligence Sales License Management Proactive Support Customer Product Self Service Development Seconds Gigabytes Petabytes 7
  • 9. Hadoop Architecture Ingest F Ingest HDFS Ingest Lookup l ASUP u Logs, m Config R e Performance Tools and raw config Data E S T Subscribe MapReduce Pig Analyze Metrics, Analytics, EBI 9
  • 11. Data Ingestion ¡  Use of Flume (v1) to consume large XML objects up to 20 MB compressed ea. ¡  4 agents feed 2 collectors in production ¡  Basic Process Control using supervisord (ZK in R2?) ¡  Reliability Mode: Disk Failover (Store on Failure) ¡  Separate sinks for Text and Binary sections ¡  Arrival time bucketing by minute ¡  Snappy Sequence Files with JSON values ¡  Evaluating Flume NG ¡  Ingesting 4.5 TB uncompressed/week 80% in an 8 hour window
  • 12. Data Transformation ¡  Ingested data processed every 1 min. (w/ 5 min. lag) –  Relies on Fair Scheduler to meet SLA –  Oozie (R0) -> Pentaho PDI (R1) for scheduling ¡  Configuration data written to HBase using Avro ¡  Duplicate data written to HDFS as Hive / JSON for ad hoc queries ¡  User scans of HBase for ad hoc queries avoided to meet SLA ¡  Also simplifies data access –  query tools don’t yet have support for Avro serialization in HBase –  they all assume String keys and values (evolving to support Avro)
  • 13. Low Latency Application Data Access ¡  High performance REST lookups ¡  Data stored as Avro serialized objects for performance and versioning ¡  Solr used to search for objects (one core per region) ¡  Then details pulled from HBase ¡  Large objects (logs) indexed and pulled from HDFS ¡  ~100 HBase regions (500 GB ea.) –  no splitting –  Snappy compressed tables ¡  Future: HBase coprocessors to keep Solr indexes up to date
  • 14. Export to Oracle DSS ¡  Pentaho pulls data from HBase and HDFS ¡  Pushes into Oracle star schema ¡  Daily export –  530 million rows and 350 GB on peak days ¡  Runs on 2 VMs –  64 GB RAM, 12 cores ¡  Enables existing BI tools (OBIE) to query DSS database
  • 15. Disaster Recovery ¡  DR cluster with 75% of production capacity –  in Release 2 ¡  Active/active from Flume back –  Primary cluster the one HTTP/SMTP responder ¡  SLA: cannot lose >1 hour of data –  can be lost in front-end switchover ¡  HBase incremental backups ¡  Staging used frequently for engineering test, operationally expensive so not used for DR
  • 16. NetApp Open Solution for Hadoop (NOSH) 16
  • 17. HDFS Storage: Key Needs Attribute Key Drivers Requirement Performance •  Fast response time for •  Minimize Network bottlenecks search, ad-hoc, and real- •  Optimize server workload time queries •  Leverage storage HW to •  High replication counts increase cluster performance impact throughput Opex •  Lower operational costs for •  Optimize usable storage managing huge amounts of capacity data •  Decouple storage from •  Controlling staff costs and compute nodes to decrease cluster management costs the need to add more as clusters scale compute nodes Enterprise •  Protect SPOF at the •  Protect cluster metadata from Robustness Hadoop name node SPOF •  Minimize cluster rebuild •  Minimize risks where equipment tends to fail NetApp Confidential – Limited Use 17
  • 18. NetApp Open Solution for Hadoop NFS over 1GbE HDFS ¡  Easy to Deploy, Manage and Scale 10GbE NameNode ¡  Uses High Performance storage FAS2040 –  Resilient and Compact Secondary –  RAID Protection of Data NameNode –  Less Network Congestion ¡  Raw Capacity and density Map –  120TB or 180TB in 4U Reduce DataNodes / –  Fully serviceable storage system TaskTracker 4 separate shared JobTracker : ¡  Reliability nothing partitions per datanode –  Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure E2660 DataNodes / –  Reliable metadata (Name Node) TaskTracker 6Gb/s SAS Direct Connect (1 per DataNode) Enterprise Class Hadoop 10GbE Links (1 per Node) NetApp Confidential – Limited Use 18
  • 20. Linear Throughput Scaling as DataNode Count Increases Read/Write Throughput 6000 Tot Read Throughput (MB/s) 5000 Tot Write Throughput (MB/s) 4000 Throughput 3000 2000 1000 0 4 8 12 24 DataNodes per Configuration Tested NetApp Confidential – Limited Use 20
  • 21. Summary 21
  • 22. Takeaways ¡  Hadoop-based Big Data architecture enables –  Cost effective scaling –  Low latency access to data –  Ad hoc issues & pattern detection –  Predictive modeling in future ¡  Using our own innovative Hadoop storage technology NOSH ¡  An enterprise transformation 22
  • 23. ¡  Kumar Palaniappan @megamda © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.