2. Agenda
¡ NetApp’s Business Challenge
¡ Solution Architecture
¡ Best Practices
¡ Performance Benchmarks
¡ Questions
2
3. The AutoSupport Family
The Foundation of NetApp Support Strategies
¡ Catch issues before they become critical
¡ Secure automated “call-home” service
¡ System monitoring and nonintrusive
alerting
¡ RMA requests without customer action
¡ Enables faster incident management
“My AutoSupport Upgrade Advisor tool does all the hard work
for me, saving me 4 to 5 hours of work per storage system and
providing an upgrade plan that’s complete and easy to follow.”
3
4. AutoSupport – Why Does it Matter?
Customers Partners NetApp
Product Adoption & Usage
Product Planning
Install Base Mgmt
& Development
Data Mining
Lead Generation
Pre Sales Stickiness Measurements
“What If’ Scenarios & Capacity Planning
Establish Initial Call Home
Deployment Measure Implementation Effectiveness
Storage usage Monitoring & Billing (NAFS)
Event-Based Triggers & Alerts Automated
E2E Case
Technical Automated Case Creation Handling
Support
Automated… …Parts & Support Dispatch
SAM Services: 1) Proactive Health Checks 2) Upgrade Planning
Proactive
Planning & Storage Efficiency Measurements & Recommendations
Optimization PS Consulting: 1) Perf Analysis & Opt. Recommendations 2) Storage Capacity Planning
Critical to Quality Metrics
Product Adoption & Usage Metrics
Feedback
Quality & Reliability Metrics
NetApp Confidential – Limited Use 4
5. Business Challenges
Gateways ETL Data Warehouse Reporting
• Only 5% of data goes into the
• 600K ASUPs • Data needs to • Numerous mining
data warehouse, rest
every week be parsed and requests are not satisfied
unstructured. It’s growing
loaded in 15 currently
• 40% coming over 6-8TB per month
the weekend mins • Huge untapped potential
• Oracle DBMS struggling to
of valuable information for
• .5% growth week scale, maintenance and
lead generation,
over week backups challenging
supportability, and BI
• No easy way to access this
unstructured content
Finally, the incoming load doubles every 16 months!
NetApp Confidential – Limited Use 5
6. Incoming AutoSupport Volumes
and TB Consumption
6,000
Actual (tb) Projected
5,000 Double High Count & Size
Low Count & Size
4,000
3,000
2,000
1,000
0
Jan-00
Jan-01
Jan-02
Jan-03
Jan-05
Jan-06
Jan-07
Jan-08
Jan-09
Jan-10
Jan-11
Jan-12
Jan-13
Jan-15
Jan-16
Jan-17
Jan-04
Jan-14
¡ At projected current rate of growth,
total storage requirements continue
doubling every 16 months
¡ Cost Model:
> $15M per year Ecosystem costs
NetApp Confidential – Limited Use 6
7. New Functionality Needed
Weeks
Product
Analysis
Service
Cross Sell & Performance
Up Sell Planning
Customer
Intelligence Sales
License
Management Proactive
Support
Customer Product
Self Service Development
Seconds
Gigabytes Petabytes
7
9. Hadoop Architecture
Ingest F Ingest HDFS Ingest Lookup
l ASUP
u Logs,
m Config R
e Performance Tools
and raw config Data E
S
T
Subscribe
MapReduce Pig
Analyze
Metrics, Analytics, EBI
9
11. Data Ingestion
¡ Use of Flume (v1) to consume large XML objects up to
20 MB compressed ea.
¡ 4 agents feed 2 collectors in production
¡ Basic Process Control using supervisord (ZK in R2?)
¡ Reliability Mode: Disk Failover (Store on Failure)
¡ Separate sinks for Text and Binary sections
¡ Arrival time bucketing by minute
¡ Snappy Sequence Files with JSON values
¡ Evaluating Flume NG
¡ Ingesting 4.5 TB uncompressed/week 80% in an 8
hour window
12. Data Transformation
¡ Ingested data processed every 1 min. (w/ 5 min. lag)
– Relies on Fair Scheduler to meet SLA
– Oozie (R0) -> Pentaho PDI (R1) for scheduling
¡ Configuration data written to HBase using Avro
¡ Duplicate data written to HDFS as Hive / JSON for ad
hoc queries
¡ User scans of HBase for ad hoc queries avoided to
meet SLA
¡ Also simplifies data access
– query tools don’t yet have support for Avro
serialization in HBase
– they all assume String keys and values (evolving to
support Avro)
13. Low Latency Application Data Access
¡ High performance REST lookups
¡ Data stored as Avro serialized objects for
performance and versioning
¡ Solr used to search for objects (one core per region)
¡ Then details pulled from HBase
¡ Large objects (logs) indexed and pulled from HDFS
¡ ~100 HBase regions (500 GB ea.)
– no splitting
– Snappy compressed tables
¡ Future: HBase coprocessors to keep Solr indexes up
to date
14. Export to Oracle DSS
¡ Pentaho pulls data from HBase and HDFS
¡ Pushes into Oracle star schema
¡ Daily export
– 530 million rows and 350 GB on peak days
¡ Runs on 2 VMs
– 64 GB RAM, 12 cores
¡ Enables existing BI tools (OBIE) to query DSS
database
15. Disaster Recovery
¡ DR cluster with 75% of production capacity
– in Release 2
¡ Active/active from Flume back
– Primary cluster the one HTTP/SMTP responder
¡ SLA: cannot lose >1 hour of data
– can be lost in front-end switchover
¡ HBase incremental backups
¡ Staging used frequently for engineering test,
operationally expensive so not used for DR
17. HDFS Storage: Key Needs
Attribute Key Drivers Requirement
Performance • Fast response time for • Minimize Network bottlenecks
search, ad-hoc, and real- • Optimize server workload
time queries • Leverage storage HW to
• High replication counts increase cluster performance
impact throughput
Opex • Lower operational costs for • Optimize usable storage
managing huge amounts of capacity
data • Decouple storage from
• Controlling staff costs and compute nodes to decrease
cluster management costs the need to add more
as clusters scale compute nodes
Enterprise • Protect SPOF at the • Protect cluster metadata from
Robustness Hadoop name node SPOF
• Minimize cluster rebuild • Minimize risks where
equipment tends to fail
NetApp Confidential – Limited Use 17
18. NetApp Open Solution for Hadoop
NFS over 1GbE
HDFS ¡ Easy to Deploy, Manage and Scale
10GbE
NameNode ¡ Uses High Performance storage
FAS2040 – Resilient and Compact
Secondary – RAID Protection of Data
NameNode
– Less Network Congestion
¡ Raw Capacity and density
Map – 120TB or 180TB in 4U
Reduce
DataNodes / – Fully serviceable storage system
TaskTracker 4 separate shared
JobTracker
: ¡ Reliability
nothing partitions
per datanode
– Hardware RAID & hot swap prevent
job restart due to node go off-line in
case of media failure
E2660
DataNodes / – Reliable metadata (Name Node)
TaskTracker
6Gb/s SAS Direct
Connect (1 per
DataNode)
Enterprise Class Hadoop
10GbE Links (1 per Node)
NetApp Confidential – Limited Use 18
22. Takeaways
¡ Hadoop-based Big Data architecture
enables
– Cost effective scaling
– Low latency access to data
– Ad hoc issues & pattern detection
– Predictive modeling in future
¡ Using our own innovative Hadoop storage
technology NOSH
¡ An enterprise transformation
22