Architecting Business Critical Enterprise Apps-NetApp

Architecting business
critical enterprise
application:
Automated Support

Kumar Palaniappan
Enterprise Architect, NetApp

Agenda

¡  NetApp’s Business Challenge
¡  Solution Architecture
¡  Best Practices
¡  Performance Benchmarks
¡  Questions

2

The AutoSupport Family
The Foundation of NetApp Support Strategies

¡  Catch issues before they become critical
¡  Secure automated “call-home” service
¡  System monitoring and nonintrusive
alerting
¡  RMA requests without customer action
¡  Enables faster incident management

“My AutoSupport Upgrade Advisor tool does all the hard work
for me, saving me 4 to 5 hours of work per storage system and
providing an upgrade plan that’s complete and easy to follow.”

3

AutoSupport – Why Does it Matter?
Customers Partners NetApp
Product Adoption & Usage
Product Planning
Install Base Mgmt
& Development
Data Mining
Lead Generation
Pre Sales Stickiness Measurements
“What If’ Scenarios & Capacity Planning
Establish Initial Call Home

Deployment Measure Implementation Effectiveness
Storage usage Monitoring & Billing (NAFS)
Event-Based Triggers & Alerts Automated
E2E Case
Technical Automated Case Creation Handling
Support
Automated… …Parts & Support Dispatch

SAM Services: 1) Proactive Health Checks 2) Upgrade Planning
Proactive
Planning & Storage Efficiency Measurements & Recommendations
Optimization PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning

Critical to Quality Metrics
Product Adoption & Usage Metrics
Feedback
Quality & Reliability Metrics

NetApp Confidential – Limited Use 4

Business Challenges

Gateways ETL Data Warehouse Reporting
•  Only 5% of data goes into the
•  600K ASUPs •  Data needs to •  Numerous mining
data warehouse, rest
every week be parsed and requests are not satisfied
unstructured. It’s growing
loaded in 15 currently
•  40% coming over 6-8TB per month
the weekend mins •  Huge untapped potential
•  Oracle DBMS struggling to
of valuable information for
•  .5% growth week scale, maintenance and
lead generation,
over week backups challenging
supportability, and BI
•  No easy way to access this
unstructured content

Finally, the incoming load doubles every 16 months!

Incoming AutoSupport Volumes
and TB Consumption
6,000
Actual (tb) Projected
5,000 Double High Count & Size

Low Count & Size
4,000

3,000

2,000

1,000

0
Jan-00

Jan-01

Jan-02

Jan-03

Jan-05

Jan-06

Jan-07

Jan-08

Jan-09

Jan-10

Jan-11

Jan-12

Jan-13

Jan-15

Jan-16

Jan-17
Jan-04

Jan-14
¡  At projected current rate of growth,
total storage requirements continue
doubling every 16 months
¡  Cost Model:
> $15M per year Ecosystem costs


New Functionality Needed

Weeks
Product
Analysis
Service
Cross Sell & Performance
Up Sell Planning
Customer
Intelligence Sales
License
Management Proactive
Support
Customer Product
Self Service Development
Seconds
Gigabytes Petabytes

7

Solution Architecture

8

Hadoop Architecture

Ingest F Ingest HDFS Ingest Lookup
l ASUP
u Logs,
m Config R
e Performance Tools
and raw config Data E
S
T

Subscribe
MapReduce Pig
Analyze

Metrics, Analytics, EBI
9

Solution Architecture

10

Data Ingestion
¡  Use of Flume (v1) to consume large XML objects up to
20 MB compressed ea.
¡  4 agents feed 2 collectors in production
¡  Basic Process Control using supervisord (ZK in R2?)
¡  Reliability Mode: Disk Failover (Store on Failure)
¡  Separate sinks for Text and Binary sections
¡  Arrival time bucketing by minute
¡  Snappy Sequence Files with JSON values
¡  Evaluating Flume NG
¡  Ingesting 4.5 TB uncompressed/week 80% in an 8
hour window

Data Transformation
¡  Ingested data processed every 1 min. (w/ 5 min. lag)
–  Relies on Fair Scheduler to meet SLA
–  Oozie (R0) -> Pentaho PDI (R1) for scheduling
¡  Configuration data written to HBase using Avro
¡  Duplicate data written to HDFS as Hive / JSON for ad
hoc queries
¡  User scans of HBase for ad hoc queries avoided to
meet SLA
¡  Also simplifies data access
–  query tools don’t yet have support for Avro
serialization in HBase
–  they all assume String keys and values (evolving to
support Avro)

Low Latency Application Data Access
¡  High performance REST lookups
¡  Data stored as Avro serialized objects for
performance and versioning
¡  Solr used to search for objects (one core per region)
¡  Then details pulled from HBase
¡  Large objects (logs) indexed and pulled from HDFS
¡  ~100 HBase regions (500 GB ea.)
–  no splitting
–  Snappy compressed tables
¡  Future: HBase coprocessors to keep Solr indexes up
to date

Export to Oracle DSS

¡  Pentaho pulls data from HBase and HDFS
¡  Pushes into Oracle star schema
¡  Daily export
–  530 million rows and 350 GB on peak days
¡  Runs on 2 VMs
–  64 GB RAM, 12 cores
¡  Enables existing BI tools (OBIE) to query DSS
database

Disaster Recovery
¡  DR cluster with 75% of production capacity
–  in Release 2
¡  Active/active from Flume back
–  Primary cluster the one HTTP/SMTP responder
¡  SLA: cannot lose >1 hour of data
–  can be lost in front-end switchover
¡  HBase incremental backups
¡  Staging used frequently for engineering test,
operationally expensive so not used for DR

NetApp Open
Solution for Hadoop
(NOSH)

16

HDFS Storage: Key Needs
Attribute Key Drivers Requirement

Performance •  Fast response time for •  Minimize Network bottlenecks
search, ad-hoc, and real- •  Optimize server workload
time queries •  Leverage storage HW to
•  High replication counts increase cluster performance
impact throughput

Opex •  Lower operational costs for •  Optimize usable storage
managing huge amounts of capacity
data •  Decouple storage from
•  Controlling staff costs and compute nodes to decrease
cluster management costs the need to add more
as clusters scale compute nodes

Enterprise •  Protect SPOF at the •  Protect cluster metadata from
Robustness Hadoop name node SPOF
•  Minimize cluster rebuild •  Minimize risks where
equipment tends to fail


NetApp Open Solution for Hadoop
NFS over 1GbE
HDFS ¡  Easy to Deploy, Manage and Scale
10GbE
NameNode ¡  Uses High Performance storage
FAS2040 –  Resilient and Compact
Secondary –  RAID Protection of Data
NameNode
–  Less Network Congestion
¡  Raw Capacity and density
Map –  120TB or 180TB in 4U
Reduce
DataNodes / –  Fully serviceable storage system
TaskTracker 4 separate shared
JobTracker
: ¡  Reliability
nothing partitions
per datanode
–  Hardware RAID & hot swap prevent
job restart due to node go off-line in
case of media failure
E2660
DataNodes / –  Reliable metadata (Name Node)
TaskTracker
6Gb/s SAS Direct
Connect (1 per
DataNode)
Enterprise Class Hadoop
10GbE Links (1 per Node)


Performance and
Scaling

19

Linear Throughput Scaling as
DataNode Count Increases
Read/Write Throughput
6000
Tot Read Throughput (MB/s)
5000 Tot Write Throughput (MB/s)

4000
Throughput

3000

2000

1000

0
4 8 12 24
DataNodes per Configuration Tested


Takeaways
¡  Hadoop-based Big Data architecture
enables
–  Cost effective scaling
–  Low latency access to data
–  Ad hoc issues & pattern detection
–  Predictive modeling in future
¡  Using our own innovative Hadoop storage
technology NOSH
¡  An enterprise transformation

22

¡  Kumar Palaniappan
@megamda

© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without
prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp,
the NetApp logo, and Go further, faster, are trademarks or registered trademarks of NetApp, Inc.
in the United States and/or other countries. All other brands or products are trademarks or
registered trademarks of their respective holders and should be treated as such.

Architecting Business Critical Enterprise Apps-NetApp

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Architecting Business Critical Enterprise Apps-NetApp

Ähnlich wie Architecting Business Critical Enterprise Apps-NetApp (20)

Mehr von DataWorks Summit

Mehr von DataWorks Summit (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Architecting Business Critical Enterprise Apps-NetApp