Weitere ähnliche Inhalte Ähnlich wie Delivering Apache Hadoop for the Modern Data Architecture (20) Mehr von Hortonworks (20) Kürzlich hochgeladen (20) Delivering Apache Hadoop for the Modern Data Architecture 1. Page 1 © Hortonworks Inc. 2014
Delivering Apache Hadoop for the Modern
Data Architecture
Cisco & Hortonworks. We do Hadoop
Together
2. Page 2 © Hortonworks Inc. 2014
Our speakers…
Ajay Singh
Director Technical Channels, Hortonworks
Sean McKeown
Solutions Architect, Data Center, Cisco
3. Page 3 © Hortonworks Inc. 2014
Why Hadoop: Traditional Data Architecture Pressured
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
Data source: IDC
SOURCES
OLTP, ERP,
CRM
Documents,
Emails
Web Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
4. Page 4 © Hortonworks Inc. 2014
Sensor
Server
Logs
Text
Social
Geographic
Machine
Clickstream
Structured
Unstructured
Financial
Services
New Account Risk Screens ✔ ✔
Trading Risk ✔
Insurance Underwriting ✔
✔
✔
Telecom Call Detail Records (CDR) ✔
✔
Infrastructure Investment ✔
✔
Real-time Bandwidth Allocation ✔
✔
✔
Retail 360° View of the Customer ✔
✔
Localized, Personalized Promotions ✔
Website Optimization ✔
What: Business Applications of Hadoop
5. Page 5 © Hortonworks Inc. 2014
Sensor
Server
Logs
Text
Social
Geographic
Machine
Clickstream
Structured
Unstructured
Manufacturing Supply Chain and Logistics ✔
Preventive Maintenance ✔
Crowd-sourced Quality Assurance ✔
Healthcare Use Genomic Data in Medial Trials ✔
✔
Monitor Patient Vitals in Real-Time
Pharmaceutical
s
Recruit & Retain Patients for Drug
Trials
✔
✔
Improve Prescription Adherence ✔
✔
✔
Oil & Gas Unify Exploration & Production Data ✔
✔
✔
Monitor Rig Safety in Real-Time ✔
✔
Government ETL Offload in Response to Budgetary
Pressures ✔
Sentiment Analysis for Gov’t Programs
✔
What: Business Applications of Hadoop
6. Page 6 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision,
Manage &
Monitor
DEV & DATA TOOLS
Build & Test
DATASYSTEMSAPPLICATIONS
Repositories
ROOMS
Statistical
Analysis
BI / Reporting,
Ad Hoc Analysis
Interactive Web
& Mobile Apps
Enterprise
Applications
RDBMS EDW MPP
How: Modern Data Architecture with Hadoop
Governance
&
Integra.on
Security
Opera.ons
Data
Access
Data
Management
ENTERPRISE HADOOP
SOURCES
OLTP, ERP,
CRM
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
7. Page 7 © Hortonworks Inc. 2014
YARN Transforms Hadoop’s Architecture
Enables
deep
insight
across
a
large,
broad,
diverse
set
of
data
at
efficient
scale
Mul.-‐Use
Data
Pla>orm
Store
all
data
in
one
place,
process
in
many
ways
Batch
Interac.ve
Itera.ve
Streaming
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
n
Store any/all raw data sources
and processed data over
extended periods of time.
YARN
:
Data
Opera.ng
System
8. Page 8 © Hortonworks Inc. 2014
Designing Hadoop Cluster
§ Cluster Storage Capacity
§ Server Specification
§ Cluster Size
§ Factoring Performance
Key Considerations
§ Any piece of hardware can and will
fail
§ More nodes means less impact on
failure
§ Resiliency and fault tolerance
improve with scale
§ Build resiliency through scale
§ Still use modern hardware
§ Software handles hardware failures
9. Page 9 © Hortonworks Inc. 2014
Storage Capacity
§ Key Input
§ Initial Data Size
§ 3 year YOY growth
§ Compression ratio
§ Intermediate and materialized views
§ Replication Factor
§ Note
§ Hard to accurately predict the size of intermediate & materialized views at the start of a
project
§ Be conservative with compression ratio. Mileage varies by data type
§ Hadoop needs temp space to store intermediate files
Hadoop Cluster
Raw Data
Work In Process Data
Master Data
Materialized Views
10. Page 10 © Hortonworks Inc. 2014
Storage Capacity
Total Storage
Required
(Initial Size + "
YOY Growth +
Intermediate Data Size) "
X Replication Count "
X 1.2"
Compression Ratio"
Good Rule of Thumb
Replication Count = 3"
"
Compression Ratio =
4-5"
"
Intermediate Data Size
= 50%-100% of Raw
Data Size"
Note
1.2 factor is included in
the sizing estimator to
account for the temp
space requirement of
Hadoop"
11. Page 11 © Hortonworks Inc. 2014
Server Specification
Page 11
§ Master Nodes – NameNode, Resource Manager, HBase Master
§ Dual Intel Xeon E5-26xx series processors
§ 128GB or 256GB RAM per chassis
§ 4+ – 1TB NL-SAS/SATA Drives RAID10+ Spares
§ Worker Nodes – DataNode, Node Manager and Region Server
§ Dual Intel Xeon E5-26xx series processors
§ 128GB RAM or 256GB RAM
§ 12 – 1-4 TB NLSAS/SATA Drives
§ Gateway Nodes / Edge Nodes
§ Mirror of Master Nodes configuration
12. Page 12 © Hortonworks Inc. 2014
Number of Data Nodes
Cluster Size
12
Storage Per Server
Number of Master Nodes
§ Name Node, Zookeeper
§ Resource Manager, Zookeeper
§ Failover Name Node, HBase Master, Hive
Server, Zookeeper
§ In a half-rack cluster, this would be combined with
Resource Manager
§ Management Node (Ambari, Ganglia, Nagios)
§ In a half-rack cluster, this would be combined with
the Name Node
Total Storage"
Required"
Note
§ Large clusters may need more than 4
master nodes
§ Start at 2/4 and grow based on usage
13. Page 13 © Hortonworks Inc. 2014
Factoring Performance
§ Data Nodes
§ 1 TB drives for performance clusters
§ 4 TB drives for archive clusters
§ Meeting SLA Requirements
§ Hadoop workloads are varied
§ Difficult to assess cluster size based on SLAs without actual testing
§ Good News: Hadoop performs linearly with scale
§ Enables one to design experiments using a fraction of data
§ Best Practice Guidance
§ Create a test configuration with a rack of servers
§ Load a slice of data
§ Run tests with real-life queries to measure performance & fine tune the system
§ Scale cluster size based on observed performance
13
14. Page 14 © Hortonworks Inc. 2014
OPERATIONAL
TOOLS
DEV
&
DATA
TOOLS
INFRASTRUCTURE
HDP and Cisco are deeply integrated in the data centerSOURCES
EXISTING
Systems
Clickstream
Web
&Social
Geoloca.on
Sensor
&
Machine
Server
Logs
Unstructured
DATASYSTEM
RDBMS
EDW
MPP
HANA
APPLICATIONS
BusinessObjects BI
Deep Partnerships
Hortonworks and Cisco engages
in deep engineered relationships
with the leaders in the data center,
such as Microsoft, Teradata, Redhat,
& SAP
Broad Partnerships
Over 600 partners work with
Hortonworks to certify their
applications to work with Hadoop so
they can extend big data to their
users
HDP 2.1
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
15. Page 15 © Hortonworks Inc. 2014
Cisco + Hortonworks Validated Design
Sean McKeown
Solutions Architect, Data Center, Cisco
16. Page 16 © Hortonworks Inc. 2014
Cisco + Hortonworks Validated Design
17. Page 17 © Hortonworks Inc. 2014
Cisco UCS Common Platform Architecture (CPA)
Building Blocks for Big Data
17
UCS
6200
Series
Fabric
Interconnects
Nexus
2232
Fabric
Extenders
UCS
Manager
UCS
240
M3
Servers
LAN,
SAN,
Management
18. Page 18 © Hortonworks Inc. 2014
UCS + Hortonworks Reference Configurations
18
unformatted storage per rack for a total
of 7.68 petabytes (PB) when scaled to
per rack, for a total of 7.68 PB and
31.25 TB of flash memory per domain.
entailed in designing and building your
own custom solution. The solution
Performance Optimized
(UCS-SL-CPA2-P)
Performance and Capacity
Balanced
(UCS-SL-CPA2-PC)
Capacity Optimized
(UCS-SL-CPA2-C)
Capacity Optimized with
Flash Memory
(UCS-SL-CPA2-CF)
Connectivity • 2 Cisco UCS 6248UP 48-
Port Fabric Interconnects
• 2 Cisco Nexus® 2232PP
10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-
Port Fabric Interconnects
• 2 Cisco Nexus 2232PP
10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-
Port Fabric Interconnects
• 2 Cisco Nexus 2232PP
10GE Fabric Extenders
• 2 Cisco UCS 6296UP 96-
Port Fabric Interconnects
• 2 Cisco Nexus 2232PP
10GE Fabric Extenders
Management Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager Cisco UCS Manager
Servers 8 Cisco UCS C240 M3
Rack Servers, each with:
• 2 Intel Xeon processors
E5-2680 v2
• 256 GB of memory
• LSI MegaRaid 9271CV
8i card
• 24 900-GB 10K SFF SAS
drives (168 TB total)
16 Cisco UCS C240 M3
Rack Servers, each with:
• 2 Intel Xeon processors
E5-2660 v2
• 256 GB of memory
• LSI MegaRaid 9271CV
8i card
• 24 1-TB 7.2K SFF SAS
drives (384 TB total)
16 Cisco UCS C240 M3
Rack Servers, each with:
• 2 Intel Xeon processors
E5-2640 v2
• 128 GB of memory
• LSI MegaRaid 9271CV
8i card
• 12 4-TB 7.2K LFF SAS
drives (768 TB total)
16 Cisco UCS C240 M3
Rack Servers, each with:
• 2 Intel Xeon processors
E5-2660 v2
• 128 GB of memory
• Cisco UCS Nytro
MegaRAID 200-GB
Controller
• 12 4-TB 7.2K LFF SAS
drives (768 TB total)
Table 1. Cisco CPA v2 for Big Data Includes Four Optimized Configurations
19. Page 19 © Hortonworks Inc. 2014
Installing Servers Today
LAN
SAN
• RAID settings
• Disk scrub actions
• Number of vHBAs
• HBA WWN assignments
• FC Boot Parameters
• HBA firmware
• FC Fabric assignments for HBAs
• QoS settings
• Border port assignment per vNIC
• NIC Transmit/Receive Rate Limiting
• VLAN assignments for NICs
• VLAN tagging config for NICs
• Number of vNICs
• PXE settings
• NIC firmware
• Advanced feature settings
• Remote KVM IP settings
• Call Home behaviour
• Remote KVM firmware
• Server UUID
• Serial over LAN settings
• Boot order
• IPMI settings
• BIOS scrub actions
• BIOS firmware
• BIOS Settings
20. Page 20 © Hortonworks Inc. 2014
UCS Service Profiles
LAN
SAN
ServiceProfile
21. Page 21 © Hortonworks Inc. 2014
Abstracting the Logical Architecture
21
Adapter
Switch
10GE
A
Eth 1/1
FEX A
6200-A
Physical
Cable
Virtual Cable
(VN-Tag)Server
vNIC
1
10GE
A
vEth
1
FEX A
Adapte
r
6200-A
vHBA
1
vFC
1
Service Profile
Cables
vNIC
1
vEth
1
6200-A
vHBA
1
vFC
1
(Server)
Server
ü Dynamic,
Rapid
Provisioning
ü State
abstraction
ü Location
Independence
ü Blade or Rack
What you getWhat you see
Chassis
22. Page 22 © Hortonworks Inc. 2014
Cisco UCS: Physical Architecture
22
6200
Fabric A
6200
Fabric B
B200
VIC
F
E
X
B
F
E
X
A
SAN
A
SAN
B
ETH
1
ETH
2
MGMT MGMT
Chassis 1
Fabric Switch
Fabric Extenders
Uplink Ports
Compute Blades
Half / Full width
OOB Mgmt
Server Ports
Virtualized Adapters
Cluster
Rack Mount C240
VIC
FEX A FEX B
23. Page 23 © Hortonworks Inc. 2014
Simple Scalability
23
Single Rack
16 servers
Single Domain
Up to 10 racks, 160 servers,
7PBytes
Multiple Domains
L2/L3 Switching
24. Page 24 © Hortonworks Inc. 2014
Proven performance and linear scalability
24
25. Page 25 © Hortonworks Inc. 2014
Simplified Management Throughout Cluster Lifecycle
Provisioning
Monitoring
Maintenance
Growth
UCSM provides:
• Speed
• Ease of experimentation
• Consistency
• Simplicity
• Visibility
26. Page 26 © Hortonworks Inc. 2014
Complete Network Flexibility
Example:
• vNIC0 for management
• vNIC1 for internal
• vNIC2 for external
• No OS bonding needed
with Fabric Failover
Configure as vNICs and vLANs as you need with the click of a mouse
26
Data ingress/egress
VNIC
0
VNIC
0
VNIC 1
L2/L3 Switching
Data
Node
1
VNIC 2
Data
Node
2
6200 A
VNIC 2
6200 B
VNIC 1
27. Page 27 © Hortonworks Inc. 2014
Creating QoS Policies and Enabling JumboFrames
27
!!
Best Effort policy for management VLAN Platinum policy for cluster VLAN
28. Page 28 © Hortonworks Inc. 2014
Switch Buffer Usage
With Network QoS
Policy to prioritize
HBase Read
Operations
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
40000"
Latency((us)(
Time(
READ","Average"Latency"(us)" QoS","READ","Average"Latency"(us)"
1"
70"
139"
208"
277"
346"
415"
484"
553"
622"
691"
760"
829"
898"
967"
1036"
1105"
1174"
1243"
1312"
1381"
1450"
1519"
1588"
1657"
1726"
1795"
1864"
1933"
2002"
2071"
2140"
2209"
2278"
2347"
2416"
2485"
2554"
2623"
2692"
2761"
2830"
2899"
2968"
3037"
3106"
3175"
3244"
3313"
3382"
3451"
3520"
3589"
3658"
3727"
3796"
3865"
3934"
4003"
4072"
4141"
4210"
4279"
4348"
4417"
4486"
4555"
4624"
4693"
4762"
4831"
4900"
4969"
5038"
5107"
5176"
5245"
5314"
5383"
5452"
5521"
5590"
5659"
5728"
5797"
5866"
5935"
Buffer&Used&
Timeline&
Hadoop"TeraSort" Hbase"
Read Latency
Comparison of Non-
QoS vs. QoS Policy
~60% Read
Improvement
HBase + Hadoop Map Reduce (Terasort)
29. Page 29 © Hortonworks Inc. 2014
UCS Rack-Mount
Servers
UCS Blade
Servers
UCS Common Platform
Architecture with Hortonworks
SAN/NAS Arrays
Enterprise Applications
Single Platform for Traditional and Big Data Applications
30. Page 30 © Hortonworks Inc. 2014
THANK YOU
ajaysingh@hortonworks.com
semckeow@cisco.com