SlideShare a Scribd company logo
1 of 20
© 2009 VMware Inc. All rights reserved
vSphere Big Data Extensions 之
Hadoop 参考架构和性能最佳实践
李欣慧
大数据研发高级工程师
VMware 中国研发中心
2
Agenda
Recommended Deployment Topology
 Plan Your Cluster
3
Virtualization
Host
VMDK
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration on Single Worker
VMDKVMDK
Ext4 Ext4 Ext4 Ext4
4
Standard Deployment Configuration on Single Worker
Virtualization
Host
VMDK
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
VMDKVMDK
Ext4 Ext4 Ext4 Ext4
5
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration
6
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration
7
Virtualization
Host
OS Image –
VMDK
Hadoop
Virtual
Node 1
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK
… …
Standard Deployment Configuration for D/C Separation
8
Data Path for Combined vs. Data/Compute Separation
Virtualization
Host
Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch
Hadoop Virtual NodeHadoop Virtual Node
Virtual Switch
TaskTrackerTaskTracker
 Serengeti provide local storage based temp for D/C separation.
• Each compute VM needs its own temp space
• Required temp space is different from an application to another
• Can result in wasted space
9
Recommended Topology of Data/Compute Separation
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Ext4
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
VMDK
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
…
10
Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch
Data Path for Local TT Storage vs. NFS Temp
 Serengeti provide NFS based temp for D/C separation
• Improve local storage space utilization.
• Trade-off between bandwidth efficiency vs. overhead of NFS.
11
Consolidated Storage on Single DN VM
Virtualization
Host
OS Image –
VMDK
Hadoop
Virtual
Node 1
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4dirdirdirdirdirdirdirdir
VMDK
… …
NFS
Client
NFS
Server
12
Recommended Topology of Computing Only Cluster
Virtualization
Host
OS Image –
VMDK
Shared storage
SAN/NAS
OS Image –
VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Hadoop
Virtual
Node 1
Task-
tracker
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
…
VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK
VMDK
13
Plan Your Cluster
 Start with a small cluster and grow it as required
• Initially just four or six nodes
• Increase amount of computation/data/memory as required
• Available space of HDFS = (DFS Remaining . value * 95%)/
dfs.replication.value
 Choose right hardware – master node
• Namenode and Jobtracker often run on same machine for smaller clusters
• Consider HA/FT settings
• separate NameNode and Jobtracker from slave nodes’ host.
• Dual power supplies
14
Plan Your Cluster
 Choose right hardware – slave node
• 2 * Quad-core CPUs at least, HT enabled
• RAM
• Consider 6% overhead for virtualization
• Recommend 4-8 GB memory per core
• Storage
• At least 8 disks per host, 12 disks per host may be ideal for absolute performance
but probably not for price-performance.
• Recommend 1-1.5 disks per core
• JBOD, SATA RPM7,200 is fine
• A good practical maximum is 24TB or 36TB per slave node. More than that will result
in massive network traffic if a node dies and block re-replication must take place.
15
Plan Your Cluster
 Networking
• Use dedicate switches for your Hadoop cluster and Nodes are connected to a
top-of-rack switch
• Nodes should be connected at a minimum speed of 1Gb/sec and consider
10Gb/sec for clusters with large scale of intermediate data
• Racks are interconnected via core switches
• Core switches should connect to top-of-rack switches by dual 10Gb/sec links
• Redundant top-of-rack switches, core switches
• Separate management network and vm network
• Adopt vDS and dvport groups that span hosts and ensure configuration consistency
for vms and virtual ports for functions of Vmotion and network storage
• Leave the management port out of your vDS
16
Virtualization Host
Networking Configurations – Four 1G NICs
vmnic 0
pSwitch 1
Virtual Switch 1
Hadoop cluster
VM portgroup
vmnic 1
pSwitch 2
Virtual Switch 0
MGMT
192.168.1.100
VMOTION
192.168.3.100
FT
192.168.4.100
VMKERNEL
192.168.2.100
vmnic 3
 Hadoop vm traffic goes
through vSwitch1
(vmnic2 and vmnic3,
both active)
 On vSwitch0, it goes
through MGMT, VM
kernel on
vmnic0(active, vmnic1
on standby)
 vMotion and FT on
vmnic1 (active, vmnic0
on standby)
1Gbs 1Gbs
vmnic 2
1Gbs 1Gbs
17
Virtualization Host
Networking Configurations -10G for Hadoop VMs
vmnic 0
pSwitch 1
Virtual Switch 1
Hadoop cluster
VM portgroup
vmnic 1
pSwitch 2
Virtual Switch 0
MGMT
192.168.1.100
VMOTION
192.168.3.100
FT
192.168.4.100
VMKERNEL
192.168.2.100
vmnic 2
 Hadoop vm traffic goes
through vSwitch1
(vmnic3)
 10G for Hadop cluster
vms
• more performance
benefits
• If any need, keep
redundancy with the other
suit of vmnic /pSwitch
 Keep redundancy for
management network
pSwitch 3
1Gbs 1Gbs
10 GBe
18
vSphere Configurations
 Configure hosts with NTP service and to ensure the time on all the
nodes is synchronized
 Virtual Disk Settings
• One datastore per physical disk
• Warm-up is needed on the provisioned cluster
 NUMA scheduler important for virtualized Hadoop performance
• Poor configuration can result in 12%(1)
performance degradation
• Data VM preferably should be distributed across NUMA nodes
 Provision right VM size
• Reserve 6% memory for vSphere usage
• Avoid over-commitment
• Enable NUMA and keep VM size within the NUMA node
19
For Existing Devices
 Crudely fit existing resource capacity for Hadoop
• CPU : RAM : Throughput - 4*1333MHZ: 32G: 800M/s
 Use powerful machine to run master node/computing node
 Use high throughput machine for slave node/data node
20
Q&A

More Related Content

What's hot

OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebula Project
 
VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design Cormac Hogan
 
GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB IntegrationEtsuji Nakai
 
2017 VMUG Storage Policy Based Management
2017 VMUG Storage Policy Based Management2017 VMUG Storage Policy Based Management
2017 VMUG Storage Policy Based ManagementCormac Hogan
 
Disk Performance Comparison Xen v.s. KVM
Disk Performance Comparison Xen v.s. KVMDisk Performance Comparison Xen v.s. KVM
Disk Performance Comparison Xen v.s. KVMnknytk
 
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixXPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixThe Linux Foundation
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
The dark side of stretched cluster
The dark side of stretched clusterThe dark side of stretched cluster
The dark side of stretched clusterAndrea Mauro
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleThe Linux Foundation
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...The Linux Foundation
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
VMworld 2013: VMware Virtual SAN Technical Best Practices
VMworld 2013: VMware Virtual SAN Technical Best Practices VMworld 2013: VMware Virtual SAN Technical Best Practices
VMworld 2013: VMware Virtual SAN Technical Best Practices VMworld
 
VMware vSphere Networking deep dive
VMware vSphere Networking deep diveVMware vSphere Networking deep dive
VMware vSphere Networking deep diveSanjeev Kumar
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerFernando Barrientos
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 

What's hot (20)

OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
 
(Free and Net) BSD Xen Roadmap
(Free and Net) BSD Xen Roadmap(Free and Net) BSD Xen Roadmap
(Free and Net) BSD Xen Roadmap
 
VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design VMworld 2017 vSAN Network Design
VMworld 2017 vSAN Network Design
 
GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB Integration
 
2017 VMUG Storage Policy Based Management
2017 VMUG Storage Policy Based Management2017 VMUG Storage Policy Based Management
2017 VMUG Storage Policy Based Management
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
Disk Performance Comparison Xen v.s. KVM
Disk Performance Comparison Xen v.s. KVMDisk Performance Comparison Xen v.s. KVM
Disk Performance Comparison Xen v.s. KVM
 
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, CitrixXPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
The dark side of stretched cluster
The dark side of stretched clusterThe dark side of stretched cluster
The dark side of stretched cluster
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
 
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
VMworld 2013: VMware Virtual SAN Technical Best Practices
VMworld 2013: VMware Virtual SAN Technical Best Practices VMworld 2013: VMware Virtual SAN Technical Best Practices
VMworld 2013: VMware Virtual SAN Technical Best Practices
 
VMware vSphere Networking deep dive
VMware vSphere Networking deep diveVMware vSphere Networking deep dive
VMware vSphere Networking deep dive
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 

Similar to 4. v sphere big data extensions hadoop

Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardwayDave Pitts
 
Virtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayVirtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayDatabricks
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Storage user cases
Storage user casesStorage user cases
Storage user casesAndrea Mauro
 
VMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes EverythingVMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes EverythingVMUG IT
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCPThe Linux Foundation
 
VDCF Overview
VDCF OverviewVDCF Overview
VDCF OverviewJomaSoft
 
Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Duncan Epping
 
20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's dayqnapivan
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKMaxime Coquelin
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppetbuildacloud
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetupwim_provoost
 
Virtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin MurrayVirtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin MurrayDatabricks
 
Xen Virtualization 2008
Xen Virtualization 2008Xen Virtualization 2008
Xen Virtualization 2008mwlang88
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5solarisyourep
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionFlavio Bertini
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014Philippe Fierens
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 

Similar to 4. v sphere big data extensions hadoop (20)

Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Virtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin MurrayVirtualizing Apache Spark and Machine Learning with Justin Murray
Virtualizing Apache Spark and Machine Learning with Justin Murray
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Storage user cases
Storage user casesStorage user cases
Storage user cases
 
VMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes EverythingVMware - Virtual SAN - IT Changes Everything
VMware - Virtual SAN - IT Changes Everything
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
 
VDCF Overview
VDCF OverviewVDCF Overview
VDCF Overview
 
Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1Building a Stretched Cluster using Virtual SAN 6.1
Building a Stretched Cluster using Virtual SAN 6.1
 
20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day20150531 virtualizatino station 2.0 partner's day
20150531 virtualizatino station 2.0 partner's day
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDK
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetup
 
Virtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin MurrayVirtualizing Apache Spark with Justin Murray
Virtualizing Apache Spark with Justin Murray
 
Xen Virtualization 2008
Xen Virtualization 2008Xen Virtualization 2008
Xen Virtualization 2008
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
 
Presentation oracle rac on vsphere 5
Presentation   oracle rac on vsphere 5Presentation   oracle rac on vsphere 5
Presentation oracle rac on vsphere 5
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
 
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
 
Sharded cluster tutorial
Sharded cluster tutorialSharded cluster tutorial
Sharded cluster tutorial
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 

More from Chiou-Nan Chen

More from Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 
101 cd 1445-1515
101 cd 1445-1515101 cd 1445-1515
101 cd 1445-1515
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

4. v sphere big data extensions hadoop

  • 1. © 2009 VMware Inc. All rights reserved vSphere Big Data Extensions 之 Hadoop 参考架构和性能最佳实践 李欣慧 大数据研发高级工程师 VMware 中国研发中心
  • 3. 3 Virtualization Host VMDK Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration on Single Worker VMDKVMDK Ext4 Ext4 Ext4 Ext4
  • 4. 4 Standard Deployment Configuration on Single Worker Virtualization Host VMDK Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir VMDKVMDK Ext4 Ext4 Ext4 Ext4
  • 5. 5 Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration
  • 6. 6 Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration
  • 7. 7 Virtualization Host OS Image – VMDK Hadoop Virtual Node 1 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK … … Standard Deployment Configuration for D/C Separation
  • 8. 8 Data Path for Combined vs. Data/Compute Separation Virtualization Host Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Hadoop Virtual NodeHadoop Virtual Node Virtual Switch TaskTrackerTaskTracker  Serengeti provide local storage based temp for D/C separation. • Each compute VM needs its own temp space • Required temp space is different from an application to another • Can result in wasted space
  • 9. 9 Recommended Topology of Data/Compute Separation Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Ext4 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode VMDK Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 …
  • 10. 10 Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Data Path for Local TT Storage vs. NFS Temp  Serengeti provide NFS based temp for D/C separation • Improve local storage space utilization. • Trade-off between bandwidth efficiency vs. overhead of NFS.
  • 11. 11 Consolidated Storage on Single DN VM Virtualization Host OS Image – VMDK Hadoop Virtual Node 1 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4dirdirdirdirdirdirdirdir VMDK … … NFS Client NFS Server
  • 12. 12 Recommended Topology of Computing Only Cluster Virtualization Host OS Image – VMDK Shared storage SAN/NAS OS Image – VMDK Hadoop Virtual Node 2 Datanode Ext4 Hadoop Virtual Node 1 Task- tracker Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 … VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK VMDK
  • 13. 13 Plan Your Cluster  Start with a small cluster and grow it as required • Initially just four or six nodes • Increase amount of computation/data/memory as required • Available space of HDFS = (DFS Remaining . value * 95%)/ dfs.replication.value  Choose right hardware – master node • Namenode and Jobtracker often run on same machine for smaller clusters • Consider HA/FT settings • separate NameNode and Jobtracker from slave nodes’ host. • Dual power supplies
  • 14. 14 Plan Your Cluster  Choose right hardware – slave node • 2 * Quad-core CPUs at least, HT enabled • RAM • Consider 6% overhead for virtualization • Recommend 4-8 GB memory per core • Storage • At least 8 disks per host, 12 disks per host may be ideal for absolute performance but probably not for price-performance. • Recommend 1-1.5 disks per core • JBOD, SATA RPM7,200 is fine • A good practical maximum is 24TB or 36TB per slave node. More than that will result in massive network traffic if a node dies and block re-replication must take place.
  • 15. 15 Plan Your Cluster  Networking • Use dedicate switches for your Hadoop cluster and Nodes are connected to a top-of-rack switch • Nodes should be connected at a minimum speed of 1Gb/sec and consider 10Gb/sec for clusters with large scale of intermediate data • Racks are interconnected via core switches • Core switches should connect to top-of-rack switches by dual 10Gb/sec links • Redundant top-of-rack switches, core switches • Separate management network and vm network • Adopt vDS and dvport groups that span hosts and ensure configuration consistency for vms and virtual ports for functions of Vmotion and network storage • Leave the management port out of your vDS
  • 16. 16 Virtualization Host Networking Configurations – Four 1G NICs vmnic 0 pSwitch 1 Virtual Switch 1 Hadoop cluster VM portgroup vmnic 1 pSwitch 2 Virtual Switch 0 MGMT 192.168.1.100 VMOTION 192.168.3.100 FT 192.168.4.100 VMKERNEL 192.168.2.100 vmnic 3  Hadoop vm traffic goes through vSwitch1 (vmnic2 and vmnic3, both active)  On vSwitch0, it goes through MGMT, VM kernel on vmnic0(active, vmnic1 on standby)  vMotion and FT on vmnic1 (active, vmnic0 on standby) 1Gbs 1Gbs vmnic 2 1Gbs 1Gbs
  • 17. 17 Virtualization Host Networking Configurations -10G for Hadoop VMs vmnic 0 pSwitch 1 Virtual Switch 1 Hadoop cluster VM portgroup vmnic 1 pSwitch 2 Virtual Switch 0 MGMT 192.168.1.100 VMOTION 192.168.3.100 FT 192.168.4.100 VMKERNEL 192.168.2.100 vmnic 2  Hadoop vm traffic goes through vSwitch1 (vmnic3)  10G for Hadop cluster vms • more performance benefits • If any need, keep redundancy with the other suit of vmnic /pSwitch  Keep redundancy for management network pSwitch 3 1Gbs 1Gbs 10 GBe
  • 18. 18 vSphere Configurations  Configure hosts with NTP service and to ensure the time on all the nodes is synchronized  Virtual Disk Settings • One datastore per physical disk • Warm-up is needed on the provisioned cluster  NUMA scheduler important for virtualized Hadoop performance • Poor configuration can result in 12%(1) performance degradation • Data VM preferably should be distributed across NUMA nodes  Provision right VM size • Reserve 6% memory for vSphere usage • Avoid over-commitment • Enable NUMA and keep VM size within the NUMA node
  • 19. 19 For Existing Devices  Crudely fit existing resource capacity for Hadoop • CPU : RAM : Throughput - 4*1333MHZ: 32G: 800M/s  Use powerful machine to run master node/computing node  Use high throughput machine for slave node/data node

Editor's Notes

  1. Combined model refers to TaskTracker runs in the same node with DataNode System disks are put on shared storage (or local storage) without split to leverage extensibility, HA/FT, vMotion features. Data disks are suggested to put on local storage, which will be split with available datastores and partitioned with peered vms for performance consideration At least 8 disks per host.
  2. SAN/NAS storage based temp for D/C separation deployment: Pro: extensibility of temp space Con: performance downgrade
  3. Shared storage based Datanode Local storage based computing only cluster
  4. Please refer to “VMware vSphere Design” document on networking part.