Building the Right Platform Architecture for Hadoop
1. Building the right Platform
Architecture for Hadoop
ROMMEL GARCIA
HORTONWORKS
2. #whoami
▸ Sr. Solutions Engineer & Platform Architect @hortonworks
▸ Global Security SME Lead @hortonworks
▸ Author of “Virtualizing Hadoop: How to Install, Deploy,
and Optimize Hadoop in a Virtualized Architecture”
▸ Runs Atlanta Hadoop User Group
5. THE TWO FACE OF HADOOP
Hadoop Cluster
Master Nodes Worker Nodes
Manage cluster state
Manage data & job state
Manage CPU resources
Manage RAM resources
Manage I/O resources
Manage Network resources
RUN
W/
YARN
14. HIVE/SPARK SQL WORKLOAD BEHAVIOR
▸ Hive/Spark SQL workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Large data set, consumes lots of memory, fair cpu usage
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ Typically Memory bound first, then I/O
15. HIVE/SPARK SQL WORKLOAD PARALLELISM
▸ Hive
▸ Hive auto-parallelism ENABLED
▸ Hive on Tez ENABLED
▸ Reuse Tez Session ENABLED
▸ ORC for Hive ENFORCED
▸ Spark
▸ Repartition Spark SQL RDD ENFORCED
▸ 2GB to 4GB YARN container size
16. HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare
metal
▸ Storage (AWS/Azure)
▸ Batch - S3/Blob/ADLS
▸ Interactive - EBS/Premium
▸ Near-realtime - Local/Local
▸ vm types (AWS/Azure)
▸ >=m4.4xlarge (EBS) / >= A9
(Blob/Premium)
▸ >=i2.4xlarge (Local) / >=
DS14 (Premium/Local)
▸ >=d2.2xlarge (Local) / >=
DS14_v2 (Premium/Local)
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node
▸ 6 vCPU Cores
▸ 32 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
▸ Storage (data)
▸ SAN/NAS
▸ Appliance
▸ NetApp
▸ Isilon
▸ vBlock
18. HBASE/SOLR WORKLOADS BEHAVIOR
▸ HBase & Solr workload
▸ Realtime, in sub-seconds SLA
▸ Large data set
▸ Millions to Billions of hits per day
▸ HBase
▸ HBase for GET/PUT/SCAN
▸ Typically IO bound first, then memory for HBase
▸ Solr
▸ Solr for Search
▸ Typically CPU bound first, then memory for Solr
19. HBASE WORKLOAD PARALLELISM
▸ Use HBase Java API/Phoenix, not ThriftServer
▸ HBase “short circuit” reads ENABLED
▸ HBase BucketCache ENABLED
▸ Manage HBase Compaction for block locality
▸ HBase Read HA ENABLED
▸ Properly design “rowkey” and ColumnFamily
▸ Properly design RPC Handlers
▸ Minimize GC pauses
20. HBASE WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK, HBase Master)
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
21. SOLR WORKLOAD PARALLELISM
▸ # of Shards I/O operations Speed of disk
▸ # of threads Indexing Speed
▸ Properly manage RAM Buffer Size
▸ Merge Factor = 25 (indexing), 2 (search)
▸ Use large disk cache
▸ Minimize GC pauses
22. SOLR WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK)
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
24. DATA SCIENCE WORKLOAD BEHAVIOR
▸ Spark ML workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Runs “Monte Carlo” type of processing
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ CPU bound first, then memory
25. SPARK ML WORKLOAD PARALLELISM
▸ YARN ENABLED
▸ Cache data
▸ Serialized/Raw for fast access/processing
▸ Off-heap, slower processing
▸ Use Checkpointing
▸ Use Broadcast Variables to minimize network traffic
▸ Minimize GC by limiting object creation, use Builders
▸ executor-memory = between 8GB to 64GB
▸ executor-cores = between 2 to 4
▸ num-executors = w/ caching, datasize * 2 as total app memory
26. SPARK ML WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 64 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 - 512 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
29. NIFI WORKLOAD PARALLELISM
▸ Network bound first, then memory or cpu
▸ Go granular on NiFi processors
▸ nifi.bored.yield.duration=10
▸ RAID 10 for repositories: Content, FlowFile and
Provenance
▸ nifi.queue.swap.threshold=20000
▸ No sharing of disk across repositories, use RAID 10
30. NIFI WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= m4.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 16 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
31. STORM WORKLOAD PARALLELISM
▸ CPU bound first, then memory
▸ Keep Tuple processing light, quick execution of execute()
▸ Use Google’s Guava for bolt-local caches (mem <1GB)
▸ Shared caches across bolts: HBase, Phoenix, Redis or
MemcacheD
▸ # of workers memory
32. STORM WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 4 x 1TB SSD RAID 10 plus
1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= r3.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare
Metal
▸ 2 x 4 vCPU Cores
▸ 24 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded