SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Building the right Platform
Architecture for Hadoop
ROMMEL GARCIA
HORTONWORKS
#whoami
▸ Sr. Solutions Engineer & Platform Architect @hortonworks
▸ Global Security SME Lead @hortonworks
▸ Author of “Virtualizing Hadoop: How to Install, Deploy,
and Optimize Hadoop in a Virtualized Architecture”
▸ Runs Atlanta Hadoop User Group
Overview
The Elephant keeps getting bigger
THE HADOOP ECOSYSTEM
THE TWO FACE OF HADOOP
Hadoop Cluster
Master Nodes Worker Nodes
Manage cluster state
Manage data & job state
Manage CPU resources
Manage RAM resources
Manage I/O resources
Manage Network resources
RUN
W/
YARN
PLATFORM FOCUS
▸ Performance (SLA)
▸ Scale (Bursts)
▸ Speed (RT)
▸ Throughput (data/time, compute/time)
▸ Resiliency (HA)
▸ Graceful Degradation (Throttling/Failure Mgt.)
Network
It’s not Pyramidal
NETWORK IS EVERYTHING!
W
N
E
S
Workloads
Your workload no longer affects me
HADOOP AND THE WORLD OF WORKLOADS
Workloads (YARN)
Hadoop Cluster
Deployment
OLAP OLTP
DATA
SCIENCE STREAMING
STRUCTURED UNSTRUCTUREDData (HDFS)
HADOOP WORKLOAD MANAGEMENT
Hadoop Cluster
YARN
PHYSICAL MEMORY/CPUqueues
containers
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
C1 C2
C3 CN
DEFAULT GRID SETTINGS FOR ALL WORKLOADS
▸ ext4 or XFS for Worker Nodes ENFORCED
▸ Transparent Huge Pages (THP) Compaction OFF
▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF
▸ Jumbo Frames ENABLED
▸ IO Scheduling “deadline” ENABLED
▸ Limiting “processes” and “files” ENFORCED
▸ Name Service Cache ENABLED
OLAP
Give back my precious SQL !
HIVE/SPARK SQL WORKLOAD BEHAVIOR
▸ Hive/Spark SQL workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Large data set, consumes lots of memory, fair cpu usage
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ Typically Memory bound first, then I/O
HIVE/SPARK SQL WORKLOAD PARALLELISM
▸ Hive
▸ Hive auto-parallelism ENABLED
▸ Hive on Tez ENABLED
▸ Reuse Tez Session ENABLED
▸ ORC for Hive ENFORCED
▸ Spark
▸ Repartition Spark SQL RDD ENFORCED
▸ 2GB to 4GB YARN container size
HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs bare
metal
▸ Storage (AWS/Azure)
▸ Batch - S3/Blob/ADLS
▸ Interactive - EBS/Premium
▸ Near-realtime - Local/Local
▸ vm types (AWS/Azure)
▸ >=m4.4xlarge (EBS) / >= A9
(Blob/Premium)
▸ >=i2.4xlarge (Local) / >=
DS14 (Premium/Local)
▸ >=d2.2xlarge (Local) / >=
DS14_v2 (Premium/Local)
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node
▸ 6 vCPU Cores
▸ 32 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
▸ Storage (data)
▸ SAN/NAS
▸ Appliance
▸ NetApp
▸ Isilon
▸ vBlock
OLTP
Come on, OLTP is fun!
HBASE/SOLR WORKLOADS BEHAVIOR
▸ HBase & Solr workload
▸ Realtime, in sub-seconds SLA
▸ Large data set
▸ Millions to Billions of hits per day
▸ HBase
▸ HBase for GET/PUT/SCAN
▸ Typically IO bound first, then memory for HBase
▸ Solr
▸ Solr for Search
▸ Typically CPU bound first, then memory for Solr
HBASE WORKLOAD PARALLELISM
▸ Use HBase Java API/Phoenix, not ThriftServer
▸ HBase “short circuit” reads ENABLED
▸ HBase BucketCache ENABLED
▸ Manage HBase Compaction for block locality
▸ HBase Read HA ENABLED
▸ Properly design “rowkey” and ColumnFamily
▸ Properly design RPC Handlers
▸ Minimize GC pauses
HBASE WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK, HBase Master)
▸ 2 x 6 CPU Cores
▸ 128 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
SOLR WORKLOAD PARALLELISM
▸ # of Shards I/O operations Speed of disk
▸ # of threads Indexing Speed
▸ Properly manage RAM Buffer Size
▸ Merge Factor = 25 (indexing), 2 (search)
▸ Use large disk cache
▸ Minimize GC pauses
SOLR WORKLOAD DEPLOYMENT MODEL
Bare Metal
Master Node (ZK)
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 10 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 2 - 4TB SATA/SAS/NL-
SAS/SSD
▸ 2 x 10Gbe NIC Bonded
Data Science
But Captain, one does not simply warp into Mordor
DATA SCIENCE WORKLOAD BEHAVIOR
▸ Spark ML workload, depends on YARN
▸ Near-realtime to Batch SLA
▸ Runs “Monte Carlo” type of processing
▸ Hundreds to hundreds of thousands of analytical jobs daily
▸ CPU bound first, then memory
SPARK ML WORKLOAD PARALLELISM
▸ YARN ENABLED
▸ Cache data
▸ Serialized/Raw for fast access/processing
▸ Off-heap, slower processing
▸ Use Checkpointing
▸ Use Broadcast Variables to minimize network traffic
▸ Minimize GC by limiting object creation, use Builders
▸ executor-memory = between 8GB to 64GB
▸ executor-cores = between 2 to 4
▸ num-executors = w/ caching, datasize * 2 as total app memory
SPARK ML WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 6 CPU Cores
▸ 64 GB RAM
▸ 4 x 1TB SSD RAID 10 plus 1
Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 - 512 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local
▸ vm types (AWS/Azure)
▸ >= d2.4xlarge / >=
D14_v2
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 48 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8-12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 12-24 x 4TB SATA/SAS/NL-SAS
▸ 2 x 10Gbe NIC Bonded
Streaming
Stream me in scotty !
NIFI/STORM WORKLOAD BEHAVIOR
▸ NiFi & Storm Streaming workloads
▸ Always “ON” data ingest and processing
▸ Guarantees data delivery
▸ Highly distributed
▸ simple event processing -> complex event processing
NIFI WORKLOAD PARALLELISM
▸ Network bound first, then memory or cpu
▸ Go granular on NiFi processors
▸ nifi.bored.yield.duration=10
▸ RAID 10 for repositories: Content, FlowFile and
Provenance
▸ nifi.queue.swap.threshold=20000
▸ No sharing of disk across repositories, use RAID 10
NIFI WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= m4.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare Metal
▸ 2 x 4 vCPU Cores
▸ 16 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 8 CPU Cores
▸ 128 GB RAM
▸ 2 x 1TB SSD (OS) RAID 1
▸ 6 x 1TB SATA/SAS/NL-SAS
RAID 10
▸ 2 x 10Gbe NIC Bonded
STORM WORKLOAD PARALLELISM
▸ CPU bound first, then memory
▸ Keep Tuple processing light, quick execution of execute()
▸ Use Google’s Guava for bolt-local caches (mem <1GB)
▸ Shared caches across bolts: HBase, Phoenix, Redis or
MemcacheD
▸ # of workers memory
STORM WORKLOAD DEPLOYMENT MODELS
Bare Metal
Master Node
▸ 2 x 4 CPU Cores
▸ 32 GB RAM
▸ 4 x 1TB SSD RAID 10 plus
1 Hot-spare
▸ 2 x 10Gbe NIC Bonded
Worker Node
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 2 x 1TB SSD RAID 1
▸ 2 x 10Gbe NIC Bonded
Cloud
All Nodes are the same
▸ Typically more nodes vs
bare metal
▸ Storage (AWS/Azure)
▸ Local/EBS/Premium
▸ vm types (AWS/Azure)
▸ >= r3.4xlarge / >= A9
Virtualized On-prem
More Master Node vs Bare
Metal
▸ 2 x 4 vCPU Cores
▸ 24 GB vRAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe vNIC Bonded
Worker Node w/ DAS
▸ 2 x 12 CPU Cores
▸ 256 GB RAM
▸ 1TB SAN/NAS
▸ 2 x 10Gbe NIC Bonded
http://hortonworks.com/products/sandbox/
?
THANK YOU!
@rommelgarcia
/in/rommelgarcia
#hortonworks

Weitere ähnliche Inhalte

Was ist angesagt?

Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Jan Kalcic
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 

Was ist angesagt? (20)

Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Jumbo Mumbo in OpenStack
Jumbo Mumbo in OpenStackJumbo Mumbo in OpenStack
Jumbo Mumbo in OpenStack
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
 
Nova: Openstack Compute-as-a-service
Nova: Openstack Compute-as-a-serviceNova: Openstack Compute-as-a-service
Nova: Openstack Compute-as-a-service
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops TeamManaging 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
 

Andere mochten auch

Andere mochten auch (20)

Student Pipeline to Open Source Communities using HFOSS
Student Pipeline to Open Source Communities using HFOSSStudent Pipeline to Open Source Communities using HFOSS
Student Pipeline to Open Source Communities using HFOSS
 
Contribution & Confidence
Contribution & ConfidenceContribution & Confidence
Contribution & Confidence
 
Civic Hacking 201: Successful techniques for civic tech
Civic Hacking 201: Successful techniques for civic techCivic Hacking 201: Successful techniques for civic tech
Civic Hacking 201: Successful techniques for civic tech
 
Modern Container Orchestration (Without Breaking the Bank)
Modern Container Orchestration (Without Breaking the Bank)Modern Container Orchestration (Without Breaking the Bank)
Modern Container Orchestration (Without Breaking the Bank)
 
Scaling Your Logging Infrastructure With Syslog-NG
Scaling Your Logging Infrastructure With Syslog-NGScaling Your Logging Infrastructure With Syslog-NG
Scaling Your Logging Infrastructure With Syslog-NG
 
DevOps for Managers
DevOps for ManagersDevOps for Managers
DevOps for Managers
 
Data Encryption at Rest
Data Encryption at RestData Encryption at Rest
Data Encryption at Rest
 
The Many Ways to Test Your React App
The Many Ways to Test Your React AppThe Many Ways to Test Your React App
The Many Ways to Test Your React App
 
Cross-platform Mobile Development on Open Source
Cross-platform Mobile Development on Open SourceCross-platform Mobile Development on Open Source
Cross-platform Mobile Development on Open Source
 
You Don't Have to Moodle: Ways to leverage the power of Wordpress for online ...
You Don't Have to Moodle: Ways to leverage the power of Wordpress for online ...You Don't Have to Moodle: Ways to leverage the power of Wordpress for online ...
You Don't Have to Moodle: Ways to leverage the power of Wordpress for online ...
 
Graphs are Eating the World
Graphs are Eating the WorldGraphs are Eating the World
Graphs are Eating the World
 
Understanding Open Source Licenses
Understanding Open Source LicensesUnderstanding Open Source Licenses
Understanding Open Source Licenses
 
How Companies can Effectively Work with Open Source Communities
How Companies can Effectively Work with Open Source CommunitiesHow Companies can Effectively Work with Open Source Communities
How Companies can Effectively Work with Open Source Communities
 
The Power of Openness
The Power of OpennessThe Power of Openness
The Power of Openness
 
How To Get Your Next Job as a Developer
How To Get Your Next Job as a DeveloperHow To Get Your Next Job as a Developer
How To Get Your Next Job as a Developer
 
BFFs: UX & SEO Partnering to Design Successful Products
BFFs: UX & SEO Partnering to Design Successful ProductsBFFs: UX & SEO Partnering to Design Successful Products
BFFs: UX & SEO Partnering to Design Successful Products
 
Building a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at NetflixBuilding a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at Netflix
 
Marketing is not all fluff; engineering is not all math
Marketing is not all fluff; engineering is not all mathMarketing is not all fluff; engineering is not all math
Marketing is not all fluff; engineering is not all math
 
The New Era of Community
The New Era of CommunityThe New Era of Community
The New Era of Community
 
CSS Grid Layout
CSS Grid LayoutCSS Grid Layout
CSS Grid Layout
 

Ähnlich wie Building the Right Platform Architecture for Hadoop

Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
Sperasoft
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
JAXLondon2014
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 

Ähnlich wie Building the Right Platform Architecture for Hadoop (20)

Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
MySQL HA
MySQL HAMySQL HA
MySQL HA
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Database performance tuning for SSD based storage
Database  performance tuning for SSD based storageDatabase  performance tuning for SSD based storage
Database performance tuning for SSD based storage
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...
 
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
How to randomly access data in close-to-RAM speeds but a lower cost with SSD’...
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Storage spaces direct webinar
Storage spaces direct webinarStorage spaces direct webinar
Storage spaces direct webinar
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 

Mehr von All Things Open

Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

Mehr von All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Building the Right Platform Architecture for Hadoop

  • 1. Building the right Platform Architecture for Hadoop ROMMEL GARCIA HORTONWORKS
  • 2. #whoami ▸ Sr. Solutions Engineer & Platform Architect @hortonworks ▸ Global Security SME Lead @hortonworks ▸ Author of “Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture” ▸ Runs Atlanta Hadoop User Group
  • 5. THE TWO FACE OF HADOOP Hadoop Cluster Master Nodes Worker Nodes Manage cluster state Manage data & job state Manage CPU resources Manage RAM resources Manage I/O resources Manage Network resources RUN W/ YARN
  • 6. PLATFORM FOCUS ▸ Performance (SLA) ▸ Scale (Bursts) ▸ Speed (RT) ▸ Throughput (data/time, compute/time) ▸ Resiliency (HA) ▸ Graceful Degradation (Throttling/Failure Mgt.)
  • 9. Workloads Your workload no longer affects me
  • 10. HADOOP AND THE WORLD OF WORKLOADS Workloads (YARN) Hadoop Cluster Deployment OLAP OLTP DATA SCIENCE STREAMING STRUCTURED UNSTRUCTUREDData (HDFS)
  • 11. HADOOP WORKLOAD MANAGEMENT Hadoop Cluster YARN PHYSICAL MEMORY/CPUqueues containers C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN C1 C2 C3 CN
  • 12. DEFAULT GRID SETTINGS FOR ALL WORKLOADS ▸ ext4 or XFS for Worker Nodes ENFORCED ▸ Transparent Huge Pages (THP) Compaction OFF ▸ Masters’ Swapiness = 1, Workers’ Swap turned OFF ▸ Jumbo Frames ENABLED ▸ IO Scheduling “deadline” ENABLED ▸ Limiting “processes” and “files” ENFORCED ▸ Name Service Cache ENABLED
  • 13. OLAP Give back my precious SQL !
  • 14. HIVE/SPARK SQL WORKLOAD BEHAVIOR ▸ Hive/Spark SQL workload, depends on YARN ▸ Near-realtime to Batch SLA ▸ Large data set, consumes lots of memory, fair cpu usage ▸ Hundreds to hundreds of thousands of analytical jobs daily ▸ Typically Memory bound first, then I/O
  • 15. HIVE/SPARK SQL WORKLOAD PARALLELISM ▸ Hive ▸ Hive auto-parallelism ENABLED ▸ Hive on Tez ENABLED ▸ Reuse Tez Session ENABLED ▸ ORC for Hive ENFORCED ▸ Spark ▸ Repartition Spark SQL RDD ENFORCED ▸ 2GB to 4GB YARN container size
  • 16. HIVE/SPARK SQL WORKLOAD DEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 6 CPU Cores ▸ 128 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Batch - S3/Blob/ADLS ▸ Interactive - EBS/Premium ▸ Near-realtime - Local/Local ▸ vm types (AWS/Azure) ▸ >=m4.4xlarge (EBS) / >= A9 (Blob/Premium) ▸ >=i2.4xlarge (Local) / >= DS14 (Premium/Local) ▸ >=d2.2xlarge (Local) / >= DS14_v2 (Premium/Local) Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node ▸ 6 vCPU Cores ▸ 32 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe NIC Bonded ▸ Storage (data) ▸ SAN/NAS ▸ Appliance ▸ NetApp ▸ Isilon ▸ vBlock
  • 18. HBASE/SOLR WORKLOADS BEHAVIOR ▸ HBase & Solr workload ▸ Realtime, in sub-seconds SLA ▸ Large data set ▸ Millions to Billions of hits per day ▸ HBase ▸ HBase for GET/PUT/SCAN ▸ Typically IO bound first, then memory for HBase ▸ Solr ▸ Solr for Search ▸ Typically CPU bound first, then memory for Solr
  • 19. HBASE WORKLOAD PARALLELISM ▸ Use HBase Java API/Phoenix, not ThriftServer ▸ HBase “short circuit” reads ENABLED ▸ HBase BucketCache ENABLED ▸ Manage HBase Compaction for block locality ▸ HBase Read HA ENABLED ▸ Properly design “rowkey” and ColumnFamily ▸ Properly design RPC Handlers ▸ Minimize GC pauses
  • 20. HBASE WORKLOAD DEPLOYMENT MODEL Bare Metal Master Node (ZK, HBase Master) ▸ 2 x 6 CPU Cores ▸ 128 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded
  • 21. SOLR WORKLOAD PARALLELISM ▸ # of Shards I/O operations Speed of disk ▸ # of threads Indexing Speed ▸ Properly manage RAM Buffer Size ▸ Merge Factor = 25 (indexing), 2 (search) ▸ Use large disk cache ▸ Minimize GC pauses
  • 22. SOLR WORKLOAD DEPLOYMENT MODEL Bare Metal Master Node (ZK) ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 10 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 12 x 2 - 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 10 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 2 - 4TB SATA/SAS/NL- SAS/SSD ▸ 2 x 10Gbe NIC Bonded
  • 23. Data Science But Captain, one does not simply warp into Mordor
  • 24. DATA SCIENCE WORKLOAD BEHAVIOR ▸ Spark ML workload, depends on YARN ▸ Near-realtime to Batch SLA ▸ Runs “Monte Carlo” type of processing ▸ Hundreds to hundreds of thousands of analytical jobs daily ▸ CPU bound first, then memory
  • 25. SPARK ML WORKLOAD PARALLELISM ▸ YARN ENABLED ▸ Cache data ▸ Serialized/Raw for fast access/processing ▸ Off-heap, slower processing ▸ Use Checkpointing ▸ Use Broadcast Variables to minimize network traffic ▸ Minimize GC by limiting object creation, use Builders ▸ executor-memory = between 8GB to 64GB ▸ executor-cores = between 2 to 4 ▸ num-executors = w/ caching, datasize * 2 as total app memory
  • 26. SPARK ML WORKLOAD DEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 6 CPU Cores ▸ 64 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 12 CPU Cores ▸ 256 - 512 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local ▸ vm types (AWS/Azure) ▸ >= d2.4xlarge / >= D14_v2 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 48 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8-12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 12-24 x 4TB SATA/SAS/NL-SAS ▸ 2 x 10Gbe NIC Bonded
  • 28. NIFI/STORM WORKLOAD BEHAVIOR ▸ NiFi & Storm Streaming workloads ▸ Always “ON” data ingest and processing ▸ Guarantees data delivery ▸ Highly distributed ▸ simple event processing -> complex event processing
  • 29. NIFI WORKLOAD PARALLELISM ▸ Network bound first, then memory or cpu ▸ Go granular on NiFi processors ▸ nifi.bored.yield.duration=10 ▸ RAID 10 for repositories: Content, FlowFile and Provenance ▸ nifi.queue.swap.threshold=20000 ▸ No sharing of disk across repositories, use RAID 10
  • 30. NIFI WORKLOAD DEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 8 CPU Cores ▸ 128 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10 ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local/EBS/Premium ▸ vm types (AWS/Azure) ▸ >= m4.4xlarge / >= A9 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 16 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 8 CPU Cores ▸ 128 GB RAM ▸ 2 x 1TB SSD (OS) RAID 1 ▸ 6 x 1TB SATA/SAS/NL-SAS RAID 10 ▸ 2 x 10Gbe NIC Bonded
  • 31. STORM WORKLOAD PARALLELISM ▸ CPU bound first, then memory ▸ Keep Tuple processing light, quick execution of execute() ▸ Use Google’s Guava for bolt-local caches (mem <1GB) ▸ Shared caches across bolts: HBase, Phoenix, Redis or MemcacheD ▸ # of workers memory
  • 32. STORM WORKLOAD DEPLOYMENT MODELS Bare Metal Master Node ▸ 2 x 4 CPU Cores ▸ 32 GB RAM ▸ 4 x 1TB SSD RAID 10 plus 1 Hot-spare ▸ 2 x 10Gbe NIC Bonded Worker Node ▸ 2 x 12 CPU Cores ▸ 256 GB RAM ▸ 2 x 1TB SSD RAID 1 ▸ 2 x 10Gbe NIC Bonded Cloud All Nodes are the same ▸ Typically more nodes vs bare metal ▸ Storage (AWS/Azure) ▸ Local/EBS/Premium ▸ vm types (AWS/Azure) ▸ >= r3.4xlarge / >= A9 Virtualized On-prem More Master Node vs Bare Metal ▸ 2 x 4 vCPU Cores ▸ 24 GB vRAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe vNIC Bonded Worker Node w/ DAS ▸ 2 x 12 CPU Cores ▸ 256 GB RAM ▸ 1TB SAN/NAS ▸ 2 x 10Gbe NIC Bonded
  • 34. ?