Weitere ähnliche Inhalte Ähnlich wie Hp Converged Systems and Hortonworks - Webinar Slides (20) Mehr von Hortonworks (20) Kürzlich hochgeladen (20) Hp Converged Systems and Hortonworks - Webinar Slides1. Delivering Apache Hadoop for the Modern
Data Architecture
Page 1 © Hortonworks Inc. 2014
HP & Hortonworks. We do Hadoop Together
2. Your speakers…
Page 2 © Hortonworks Inc. 2014
Raghu Thiagarajan
Director, Partner Product Management,
Hortonworks
Chris Daly
Chief Outbound Engineer, CSS and Big Data Systems,
HP
3. Why Hadoop: Traditional Data Architecture Pressured
Page 3 © Hortonworks Inc. 2014
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Data source: IDC
SOURCES
OLTP, ERP,
CRM
Documents,
Emails
Web Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
4. What: Business Applications of Hadoop
Page 4 © Hortonworks Inc. 2014
Sensor
Server
Logs
Text
Social
Geographic
Machine
Clickstream
Structured
Unstructured
Financial
Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔
✔
✔
Telecom Call Detail Records (CDR) ✔
✔
Infrastructure Investment ✔
✔
Real-time Bandwidth Allocation ✔
✔
✔
Retail 360° View of the Customer ✔
✔
Localized, Personalized Promotions ✔
Website Optimization ✔
5. What: Business Applications of Hadoop
Page 5 © Hortonworks Inc. 2014
Sensor
Server
Logs
Text
Social
Geographic
Machine
Clickstream
Structured
Unstructured
Manufacturing Supply Chain and Logistics ✔
Preventive Maintenance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔
✔
Monitor Patient Vitals in Real-Time
Pharmaceutical
s
Recruit & Retain Patients for Drug
Trials ✔
✔
Improve Prescription Adherence ✔
✔
✔
Oil & Gas Unify Exploration & Production Data ✔
✔
✔
Monitor Rig Safety in Real-Time ✔
✔
Government ETL Offload in Response to Budgetary
Pressures ✔
Sentiment Analysis for Gov’t Programs
✔
6. How: Modern Data Architecture with Hadoop
Statistical
Analysis
Page 6 © Hortonworks Inc. 2014
DEV & DATA TOOLS
Build & Test
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DATA SYSTEMS APPLICATIONS
Repositories
ROOMS
BI / Reporting,
Ad Hoc Analysis
Interactive Web
& Mobile Apps
Enterprise
Applications
RDBMS EDW MPP
Governance
&
Integra.on
ENTERPRISE HADOOP
Security
Opera.ons
Data
Access
Data
Management
SOURCES OLTP, ERP,
CRM
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
7. YARN Transforms Hadoop’s Architecture
Page 7 © Hortonworks Inc. 2014
Enables
deep
insight
across
a
large,
broad,
diverse
set
of
data
at
efficient
scale
Mul.-‐Use
Data
Pla>orm
Store
all
data
in
one
place,
process
in
many
ways
Batch
Interac.ve
Itera.ve
Streaming
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
n
Store any/all raw data sources
and processed data over
extended periods of time.
YARN
:
Data
Opera.ng
System
8. Designing Hadoop Cluster
§ Cluster Storage Capacity
§ Server Specification
§ Cluster Size
§ Factoring Performance
Page 8 © Hortonworks Inc. 2014
Key Considerations
§ Any piece of hardware can and will
fail
§ More nodes means less impact on
failure
§ Resiliency and fault tolerance
improve with scale
§ Build resiliency through scale
§ Still use modern hardware
§ Software handles hardware failures
9. Storage Capacity
§ Key Input
§ Initial Data Size
§ 3 year YOY growth
§ Compression ratio
§ Intermediate and materialized views
§ Replication Factor
§ Note
Materialized Views
Master Data
Work In Process Data
§ Hard to accurately predict the size of intermediate & materialized views at the start of a
project
§ Be conservative with compression ratio. Mileage varies by data type
§ Hadoop needs temp space to store intermediate files
Page 9 © Hortonworks Inc. 2014
Hadoop Cluster
Raw Data
10. Storage Capacity
Page 10 © Hortonworks Inc. 2014
Total Storage
Required
(Initial Size + "
YOY Growth +
Intermediate Data Size) "
X Replication Count "
X 1.2"
Compression Ratio"
Good Rule of Thumb
Replication Count = 3"
"
Compression Ratio =
4-5"
"
Intermediate Data Size
= 50%-100% of Raw
Data Size"
Note
1.2 factor is included in
the sizing estimator to
account for the temp
space requirement of
Hadoop"
11. Server Specification
§ Master Nodes – NameNode, Resource Manager, HBase Master
§ Dual Intel Xeon E5-26xx series processors
§ 128GB or 256GB RAM per chassis
§ 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares
§ Worker Nodes – DataNode, Node Manager and Region Server
§ Dual Intel Xeon E5-26xx series processors
§ 128GB RAM or 256GB RAM
§ 12 – 1-4 TB NLSAS/SATA Drives
§ Gateway Nodes / Edge Nodes
§ Mirror of Master Nodes configuration
Page 11 © Hortonworks Inc. 2014
12. Cluster Size
Number of Data Nodes
Page 12 © Hortonworks Inc. 2014
12
Storage Per Server
Number of Master Nodes
§ Name Node, Zookeeper
§ Resource Manager, Zookeeper
§ Failover Name Node, HBase Master, Hive
Server, Zookeeper
§ In a half-rack cluster, this would be combined with
Resource Manager
§ Management Node (Ambari, Ganglia, Nagios)
§ In a half-rack cluster, this would be combined with
the Name Node
Total Storage"
Required"
Note
§ Large clusters may need more than 4
master nodes
§ Start at 2/4 and grow based on usage
13. Factoring Performance
§ Data Nodes
§ 1 TB drives for performance clusters
§ 4 TB drives for archive clusters
§ Meeting SLA Requirements
§ Hadoop workloads are varied
§ Difficult to assess cluster size based on SLAs without actual testing
§ Good News: Hadoop performs linearly with scale
§ Enables one to design experiments using a fraction of data
§ Best Practice Guidance
§ Create a test configuration with a rack of servers
§ Load a slice of data
§ Run tests with real-life queries to measure performance & fine tune the system
§ Scale cluster size based on observed performance
Page 13 © Hortonworks Inc. 2014
13
14. HDP and HP are deeply integrated in the data center
Page 14 © Hortonworks Inc. 2014
DEV
&
DATA
TOOLS
OPERATIONAL
TOOLS
INFRASTRUCTURE
SOURCES
EXISTING
Systems
YARN
Clickstream
Web
&Social
Geoloca.on
Sensor
&
Machine
Server
Logs
Unstructured
DATA SYSTEM
RDBMS
EDW
MPP
HANA
APPLICATIONS
BusinessObjects BI
Deep Partnerships
Hortonworks and HP engaged
in deep engineered relationships
with the leaders in the data center,
such as Microsoft, Teradata, Redhat,
& SAP
Broad Partnerships
Over 600 partners work with
Hortonworks to certify their
applications to work with Hadoop so
they can extend big data to their
users
HDP 2.1 Governance
& Integration
Security
Operations
Data Access
Data Management
15. Delivering Apache
Hadoop for the Modern
Data Architecture
HP + Hortonworks Validated Design
Christopher Daly
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
16. The HP Approach to Apache Hadoop
Why a Reference Architecture?
• Provides a starting point or
baseline
• Maximum flexibility
• Customizable to fit YOUR needs
• Adopt the parts you want
• Replace the parts you don’t
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
17. Solution components
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
18. Pre-deployment considerations / system
selection
• Operating system
• Computation
• Memory
• Storage
• Network
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 18 without notice.
19. High-availability considerations
• Hadoop NameNode HA
• ResourceManager HA
• OS availability and
reliability
• Network reliability
• Power supply
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 19 without notice.
20. Server selection
Management nodes – The HP ProLiant DL360p Gen8
The Management node and head nodes, as
tested in the Reference Architecture, contain
the following base configuration:
2 x Eight-Core Intel E5-2650 v2 Processors
Smart Array P420i Controller with 512MB FBWC
3.6 TB – 4 x 900GB SFF SAS 10K RPM disks
128 GB DDR3 Memory – 8 x 16GB 2Rx4
PC3-14900R-13
10GbE 2P NIC 561FLR-T card
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice.
21. Server selection
Worker nodes – ProLiant DL380p Gen8
The ProLiant DL380p Gen8 (2U) as
configured for the Reference Architecture
as a worker node has the following
configuration:
Dual 10-Core Intel Xeon E5-2670 v2 Processors
with Hyper-Threading
Twelve 2TB 3.5” 7.2K LFF SATA MDL (22 TB for
Data)
128 GB DDR3 Memory (8 x HP 16GB), 4
channels per socket
1 x 10GbE 2 Port NIC FlexibleLOM (Bonded)
1 x Smart Array P420i Controller with 512MB
FBWC
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 21 without notice.
22. Switch selection
Top of Rack (ToR) switches
The 5900AF-48XGT-4QSFP+10GbE is an ideal ToR
switch with forty eight 10GbE ports and four 40GbE
uplinks providing resiliency, high availability and
scalability support. In addition this model comes with
support for CAT6 cables (copper wires) and Software
defined networking (SDN).
Aggregation switches
The FlexFabric 5930-32QSFP+40GbE switch is an
ideal aggregation switch as it is well suited to handle
very large volumes of inter-rack traffic such as can
occur during shuffle and sort operations, or large scale
block replication to recreate a failed node
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 22 without notice.
23. HP Insight CMU – pushbutton scale-out
management
Provision, monitor, and
control
Thousands of nodes instantly
Push-button roll out
Provisioning via cloning for
seamless scaling
Rest easy
Battletested at top 500 sites for
over a decade
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 23 without notice.
24. HP Insight CMU – GUI Monitoring at a Cluster level
Historical analysis and job recording
• Designed for Big Data
customer
• Multi-petal aggregated, 3D
RT, and time series views of
cluster metrics
• “Click & zoom” analysis at
both solution and component
levels
• Proactively identify and
isolate performance issues
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 24 without notice.
25. Single Rack Reference Architecture
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 25 without notice.
26. Multi-Rack Reference Architecture
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 26 without notice.
27. Capacity and sizing
Here is a general guideline on data
inventory:
• Sources of data
• Frequency of data
• Raw storage
• Processed HDFS storage
• Replication factor
• Default compression turned on
• Space for intermediate files
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 27 without notice.
28. System configuration guidance
Machine
Type
Workload
Patten/Cluster
Type
Storage Processor
(# of
Cores)
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 28 without notice.
Memory
(GB)
Network
Slaves
Balanced workload Four to six
1-2 TB disks
Dual 6/8/10
cores 48-96
Dual 10 GB
links for all
nodes in a 20
node rack and
min 2x10 / 2 x
40 GB
interconnect
links per rack
going to a pair
of central
switches
Compute intensive
workload
Four to six
1-2 TB disks
Dual 8/10/12
cores 48-128
IO intensive workload Twelve 1-2
TB disks
Dual 8/10/12
cores 48-96
HBase clusters Twelve 1-2
TB disks
Dual 8/10/12
cores 48-128
Masters All workload patterns/
HBase clusters
Four to six
1-2 TB disks
Dual 6/8/10
cores
Depends on number
of file system
objects to be
created by
NameNode.
29. For More Information
Get the Reference Architecture at
http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-4975ENW
Hortonworks www.hortonworks.com
HP Solutions for Apache Hadoop hp.com/go/Hadoop
HP ProLiant servers hp.com/go/proliant
HP Insight Cluster Management Utility (CMU) hp.com/go/cmu
HP Networking hp.com/go/networking
Or Contact Me: Christopher.Daly@hp.com
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 29 without notice.
30. Next Steps...
More about HP & Hortonworks
http://hortonworks.com/partner/HP
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
Contact us: events@hortonworks.com
Page 30 © Hortonworks Inc. 2014