Building highly efficient cloud infrastructure, and lessons learned from real deployments: The session will cover how to build converged cloud solution based on industry standard components and open source software, to deliver the best cost/performance, lowest $/GB storage, and lowest $/VM, and the right balance of compute, network, and storage resources. This is based on the speaker experience of working with multiple OpenStack based cloud providers, integrators, and internal implementation of OpenStack private cloud in Mellanox The session will also discuss various software defined storage (SDS) and commercial options, what’s the benefit of one vs the other, how to efficiently combine SSD & HDD, and expiriance with BigData and Hadoop applications, will cover latest innovations in the space of high-performance networking and storage (VXLAN in hardware, DPDK/NFV, Cinder acceleration, Ceph over RDMA, ..) , and will go over a concrete for high-density, high-perform
2. Agenda
• OpenStack Overview, Benefits, and Challenges
• Overall topology and setup
• OpenStack Networking
• Options,Overlays,Challenges
• Storage in OpenStack
• Options (Ceph, iSCSI, iSER, Swift, HDD/SSD), Pro and Cons
• Building Cost effective solutions with application in mind
• Summary
3. • Based on commodity
• Reduce software costs
• Automation
What Does OpenStack BuyYou ?
Flexibility & FeaturesCost Effective Cloud
Integration & Interoperability
• Endless amount of services
• OpenSource and Extensible
• Many configuration and scripting options
• Compute, Storage, Networks, and Apps under one system
• Modular approach with Pluggable services
• Testing, integration, and certification by multiple vendors and communities
4. Deploy OpenStack, PracticalView ?
• Require strong IT group with DevOps professionals, not suitable for small/non-
technical organizations (yet)
• Must think cloud to take advantage, and not tie your self to Enterprise paradigms
• Start with solid distribution (RedHat RHOS, Ubuntu, Mirantis, ..) or use
professional services to ramp up
• It’s a community, if you want to get help, you must also contribute
• Test on small scale, validate assumptions, before going to production setups
5. Typical OpenStack Deployment
Firewall, L3 Routing,
NAT, DHCP services
(can have multiple)
Cinder
iSCSI or Ceph
based storage
Nova
Compute (VM) nodes
Management services
can also be installed on
the same server
Source: stackops.org
6. OpenStack in Converged Network Environment
Native integration
of Mellanox
products with
Neutron
Hardware-based
support for
security, and isolation
functions
Accelerating
storage access by
up to 5x
DPDK DPDK
Source: Mellanox
7. Types of Networks in OpenStack
• Hypervisor Networks
• Console
• Storage
• Messaging
• VM (Tennant) Networks
• VLAN Based
• Overlay based (VXLAN,GRE)
• Connection/PortTypes
• Para-virtualized, e.g. using OpenvSwitch
• Direct Attached, using SR-IOV, for highest performance & native NIC features (RDMA,OS
Bypass, ..)
8. LogicalView of an OpenStack Multi-Tenant Cloud
Implemented via software
or hardware appliances
Using VLANs or
Tunnels (VXLAN)
9. Mellanox NetworkVirtualization (Neutron) Plug-in
OS
VM
Para-
virtual
OS
VM
OS
VM
OS
VM
tap tap SR-IOV
to the VM
ProvisionVM & Fabric Policy in hardware, through standard APIs
Benefits: Isolation, functionality, performance & offload, simpler SDN
Embedded
Switch
Mellanox
Nova
Plug-in
Create/delete,
configure policy per
VM vNIC
Neutron
Plug-Ins
Servers
Manager
OpenStack Manager
Compare eSwitch vs OVS
Qperf (TCP) Latency
Source: Mellanox
10. TypicalVxLAN Overlay Network Deployment
OS
VM
OS
VM
OS
VM
UDP
OS
VM
BR0 BR1
VXLAN Overlay
IP
VNI100 VNI300Open vSwitch (OVS)
Underlay Network (Layer 2 or Layer 3)
Hypervisor
vTap vTap vTap vTap
OS
VM
OS
VM
OS
VM
UDP
OS
VM
BR0 BR1 BR2
VXLAN Overlay
IP
VNI100 VNI200 VNI300Open vSwitch (OVS)
Hypervisor
vTap vTap vTap vTap vTap
VxLAN Overlay (tenant)
networks
UDP
BR0 BR1
VXLAN
IP
VNI200 VNI300
Router / NAT
Software Gateway
vEth1vEth0
Internet
Gateway (Network) Nodes
OVS
SDN Manager e.g. OpenStack
Neutron, ODL, VMware NSX
What is VxLAN
• “Virtual Extensible LAN (VXLAN) is a network virtualization technology that attempts to ameliorate the
scalability problems associated with large cloud computing deployments. It uses a VLAN-like encapsulation
technique to encapsulate MAC-based OSI layer 2 Ethernet frames within layer 3 UDP packets.” Wikipedia
11. Performance & CPU Advantage UsingVxLAN Offload
0
5
10
15
20
25
1 VM 2 VMs 3 VMs
NO VxLAN 11 19 21
VxLAN in software 2 3 3.5
VxLAN HW Offload 10 19 21
Bandwidth[Gbit/sec] Total VM Bandwidth when using VxLAN
0.00
1.00
2.00
3.00
4.00
1 VM 2 VMs 3 VMs
NO VxLAN 0.55 0.68 0.67
VxLAN in software 3.50 3.33 4.29
VxLAN HW Offload 0.90 0.89 1.19
CPU%/Bandwidth
(Gbit/sce)
CPU Usage Per Gbit/sec with VxLAN
5x More Bandwidth
withVxLAN Offload
4x Less CPU Overhead with
VxLAN Offload
Test Details
- Test command: netperf -t TCP_STREAM –H
- 1-3 VMs talking to 1-3 VMs on a second server
- OpenvSwitch (OVS) with VxLAN support
- Servers:
- HP ProLiant DL380p Gen8
- 2 x Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
- 32GB RAM
- Hypervisor OS: Linux Upstream Kernel 3.14-rc1 + KVM
- Guest VM OS: RHEL 6.5 2.6.32-431.el6.x86_64
- NIC: ConnectX-3Pro , FW: 2.30.8000
- CPU% and Bandwidth measures on the Hypervisor (aggregate 1-3 VMs)
Source: Mellanox
13. OpenStack Storage Options
Option Pro Con Perf $/GB*
Ceph • Free
• Multiple front-ends
• Built in HA, Snaps, DR, ..
• Slow & CPU intensive
• Complexity & Stability
• Replication
(with apps like Hadoop)
Low-Med 0.40
iSCSI-LVM
or iSER-LVM
• Free
• Fast
• HA Features
• Cinder only
Med-High 0.20
Commercial • Performance
• Stability & Support
• Features
• Cost
• Flexibility
Med-High 0.50 - 2
Swift • Free
• Low-cost
• Performance
• Object only
Low 0.30
Reference:
Amazon AWS
• No OpEx overhead • Slow
• data transfer costs
Low 0.30/yr
* - all inclusive, using 4TB HDDs, with redundancy
15. Cinder iSCSI Deployment Example
iSCSI or iSER Storage
Servers/Heads
JBODs or Mid-range
RAID Enclosure
HA
SAS/SATA
OpenSource software (Cinder LVM) or
commercial software/system
Initiators/Hosts accessing through
iSCSI or iSER (iSCSI RDMA)
Note: need to use the right ratio of disks per Head to optimize cost/performance
With 40GbE and RDMA offload, each head can support 5x more disks, lowering overall costs
16. • Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no
additional software is required, RDMA is already inbox and used by our OpenStack customers !
RDMA Provide Fastest OpenStack Storage Access
Hypervisor (KVM)
OS
VM
OS
VM
OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA
to accelerate
iSCSI storage
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 64 128 256
Bandwidth[MB/s]
I/O Size [KB]
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
PCIe Limit
6X
RDMA enable 6x More Bandwidth, 5x lower I/O latency, and lower CPU%
Source: Mellanox
17. StorageTiers
Workload Comment $/GB
(Raw)
HDD Big Data workloads
(e.g. Hadoop,Video)
Very slow IOPs,VM cross
interference
0.04
HDD with SSD as Cache
(e.g. LSI FlashCache,
Bcache, commercial)
Localized disk access
(e.g.VM Images)
0.20
SSD Storage IOPs/latency sensitive
apps (e.g. Databases,VDI)
1-2
SSD/TLC Read mostly, Random Fast reads
Low-endurance SSDs
0.40
Challenge:
How to pre-allocate storage to different tiers and different usage models (Block, Object, Files, ..)
18. Food ForThought
Should I use 4TB or 2TB Disks ?
$/GB cost with 60 x 3.5” HDD JBOD
(Disks + Enclosure costs / Disks #)
Disk $/GB
4TB 0.08
2TB 0.11
For 30% extra cost you can double the IOPs & BW
19. Highly Efficient Rack Example
U
36 27 28
35 25 26
34 23 24
33 21 22
32 19 20
31 17 18
30 15 16
29 13 14
28 11 12
27 9 10
26 7 8
25 5 6
24 3 4
23 1 2
22
21
20 Storage
19
18 Storage
17
16 Storage
15
14 Storage
13
12 Storage
11
10 Storage
9
8 Storage
7
6 Storage
5
4 Storage
3
2 Storage
1
Switch: 36 x 40/56Gb (IB/Eth)
2nd Switch (Optional)
28 x 0.5U (twin) servers
• Each with Mellanox ConnectX3-Pro 40/56Gb Adapter
• Optional up to 6 Disks per server
2 x High performance storage servers, each with
• 2U 24 bay server + 2 x 45 bay JBODs
• 24 x 2.5” 500-1000GB SSDs
• 90 x 3.5” 2-4TB SAS/SATA HDDs
• 1-2 x ConnectX3-Pro 40/56Gb Adapter
• Run Cinder/LVM (iSCSI/iSER) and/or Ceph
1-2 x Switches (e.g. Mellanox SX1036/6036)
• 28 x 40/56Gb copper links to servers
• 4 x 56Gb Copper links to storage
• 4 x 56Gb optical uplinks to core
21. Manila – File Shares as A Service
What is Manila?
– Multitenant, secure file share as a service
– “Cinder for shared file systems”
– NFS & CIFS protocols supported today;
more to come
Will be available Junos