3. What is a Private Cloud?
Generally considered to be smaller than a “public”
cloud
Less than 100 physical servers (for this talk)
API endpoints may not be publicly accessible
Limited inbound connectivity. Use floating IPs to
allow for inbound connectivity
Can be customized for specific workloads (hardware/
network/etc)
Company may leverage multiple private clouds
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
4. What is a Private Cloud?
Generally considered to be smaller than a “public”
cloud
Less than 100 physical servers (for this talk)
API endpoints may not be publicly accessible
Limited inbound connectivity. Use floating IPs to
allow for inbound connectivity
Can be customized for specific workloads (hardware/
network/etc)
Company may leverage multiple private clouds
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
5. Build with the End in
Mind
What are you building for?
A. Are you building for 10 servers? 20? 100?
B. Or are you building 500 instances? 1000?
2000?
C. Or are you building 400 CPUs? 3TB RAM?
100TB disk?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
6. Build with the End in
Mind
What are you building for?
A. Are you building for 10 servers? 20? 100?
B. Or are you building 500 instances? 1000?
2000?
C. Or are you building 400 CPUs? 3TB RAM?
100TB disk?
d. ALL OF THE ABOVE
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
7. Build with the End in
Example hardware
Mind
12 Physical Cores - 24 w/ Hyperthreading - 48 vcpus w/ 2:1
overcommit ratio
128GB of RAM - 1:1 overcommit ratio
8 x 300GB drives RAID 10 - ~1.2 TB usable disk space
How many instances can I run on this physical host?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
8. Build with the End in
Example hardware
Mind
12 Physical Cores - 24 w/ Hyperthreading - 48 vcpus w/ 2:1
overcommit ratio
128GB of RAM - 1:1 overcommit ratio
8 x 300GB drives RAID 10 - ~1.2 TB usable disk space
How many instances can I run on this physical host?
(total VCPUs / smallest flavor #VCPUs) = maximum # of
instances
Double or quadruple this to account for growth - size of
fixed network range
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
9. Build with the End in
Mind
Networking
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine
access, OpenStack
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
10. Build with the End in
Mind
Networking
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine
access, OpenStack
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
11. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine
access, OpenStack
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
12. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine
access, OpenStack
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
13. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine Easy to add physical nodes
access, OpenStack and/or networks
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
14. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine Easy to add physical nodes
access, OpenStack and/or networks
services)
Fixed Network
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
15. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine Easy to add physical nodes
access, OpenStack and/or networks
services)
Don’t try to change the fixed
Fixed Network network once in production
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
16. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine Easy to add physical nodes
access, OpenStack and/or networks
services)
Don’t try to change the fixed
Fixed Network network once in production
(instance network)
Floating network
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
17. Build with the End in
Mind Networking is the important
Networking
part, get it right!
We can build a cloud with
2 networks (3 if using
floating IPs)
Host Network
(physical machine Easy to add physical nodes
access, OpenStack and/or networks
services)
Don’t try to change the fixed
Fixed Network network once in production
(instance network)
Easy to add additional floating
Floating network networks
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
18. Build with the End in
Mind
Glance
Disk space on server acting as glance backend (file based) will be
a limiting factor.
Good alternatives: Swift, CloudFiles, NFS (locally mounted)
Local disk is considerably faster than the alternatives
Will you be leveraging snapshots? If so, disk space will need
to be a serious consideration
If using qcow2, set “snapshot_image_format=qcow2“ to help
limit disk usage
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
19. Build with the End in
Mind
Glance Performance
Network throughput is a limitation
1000Mb/s = 125MB/s max (expect ~112MB/s
realistically)
Large sequential read/writes - RAID5 may be preferred
Lean towards disk bandwidth over raw IOPs
Reduce # of images to allow for more efficient local caches on
compute nodes (dramatically increasing performance of
instance creation)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
20. Build with the End in
Mind
Glance Performance
Network throughput is a limitation
1000Mb/s = 125MB/s max (expect ~112MB/s
realistically)
Large sequential read/writes - RAID5 may be preferred
Lean towards disk bandwidth over raw IOPs
Reduce # of images to allow for more efficient local caches on
compute nodes (dramatically increasing performance of
instance creation)
Image Size Not Cached Cached
1.4GB 20secs 1sec
16.4GB 2min 21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
21. Build with the End in
Mind
Glance Performance
Network throughput is a limitation
1000Mb/s = 125MB/s max (expect ~112MB/s
realistically)
Large sequential read/writes - RAID5 may be preferred
Lean towards disk bandwidth over raw IOPs
Reduce # of images to allow for more efficient local caches on
compute nodes (dramatically increasing performance of
instance creation)
Image Size Not Cached Cached *times from
“creating image” to
1.4GB 20secs 1sec “qemu-img create”
16.4GB 2min 21secs 1sec RACKSPACE® HOSTING | WWW.RACKSPACE.COM
22. To Swift or not to Swift?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
23. To Swift or not to Swift?
Pros
Scalable object storage that
works great as a backend
for Glance
Can be leveraged as object
storage for other parts of
the business
Ability to quickly increase
the amount of storage
available
Extremely stable if designed
correctly
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
24. To Swift or not to Swift?
Pros
Cons
Scalable object storage that
works great as a backend Additional expertise needed to
for Glance run Swift
Can be leveraged as object Architecture (network/swift
storage for other parts of components) design is
the business important to get right
Ability to quickly increase Depending on initial usage,
the amount of storage there may be high up front
available costs to populate 5 zones
Extremely stable if designed
correctly
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
25. Architecture Examples and
Thoughts
1 - 20 physical servers 20-50 physical servers
Single controller (single API Single controller (single API
endpoint, single scheduler, etc) endpoint, single scheduler,
should suffice etc) should suffice
Single network (1Gbps) for Investigate Swift as a glance
instance connectivity and backend.
OpenStack services is sufficient
Start looking into ways to
Rackspace “Alamo” installer break apart various controller
services
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
26. Architecture Examples and
Thoughts
50-100 servers
Keep an eye on the scheduler
to make sure it’s not a
bottleneck
Strongly consider swift
especially for snapshots
Consider Availability Zones/
Cells (didn’t make it into
Folsom)
Consider “frontend” and
“backend” networks for
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
27. Architecture Examples and
Thoughts
50-100 servers
Keep an eye on the scheduler
to make sure it’s not a
bottleneck
Strongly consider swift
especially for snapshots
Consider Availability Zones/
Cells (didn’t make it into
Folsom)
Consider “frontend” and
“backend” networks for
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
28. Architecture Examples and
Thoughts
50-100 servers
Keep an eye on the scheduler
to make sure it’s not a
bottleneck
Strongly consider swift
especially for snapshots
Consider Availability Zones/
Cells (didn’t make it into
Folsom)
Consider “frontend” and two or more instance networks?
“backend” networks for
Set
“use_single_default_gateway”
in nova.confRACKSPACE® HOSTING | WWW.RACKSPACE.COM
29. Performance Considerations and
Bottlenecks
IO
20-40 instances per
physical server causes high
random IO
Reduce IO as much as
possible - i.e. centralized
logging
Can be further mitigated
with Cinder
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
30. Performance Considerations and
Bottlenecks
Async&Random&IO&
IO rs/speed/test12"(cfq,"host"deadline,"cache=none)"
Rs/speed/test13"(noop,"cache=writeback)"
20-40 instances per
rs/speed/test13"(cfq,"cache=writeback)"
physical server causes high
Rs/speed/test12"(noop"cache=none)"
randW"(direct)"
random IO
Rs/speed/test12"(cfq"cache=none)"
randR"(direct)"
randW"
Rs/speed/test13"(cfq,"cache=none,"no"ht)"
randR"
Rs/speed/test13"(deadline"cache=none)"
Reduce IO as much as compute/host"(deadline)"
possible - i.e. centralized compute/host"(no"ht)"
logging compute/host"
0" 200" 400" 600" 800" 1000" 1200" 1400" 1600"
Host&vs.&Instance&
14000"
Can be further mitigated
with Cinder
12000"
10000"
8000"
compute/host"
6000" Rs/speed/test12"(cfq"cache=none)"
4000"
2000"
0"
randR" randW" randR" randW" seqR" seqW"RACKSPACE® HOSTING
seqR" seqw" | WWW.RACKSPACE.COM
(direct)" (direct)" (direct)" (direct)"
31. Final Thoughts
Lessons learned
Standardize on a design that works for your organization
Find the right questions to ask
Important to understand OpenStack as a whole
OpenStack is still changing often, keep up to date with
current state of the projects
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
32. But....
But this is a design summit also
Open to discussions/thoughts/questions
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Goal: To give ideas on how to build true private clouds powered by OpenStack software\n\nThese slides are to serve as a guide not and end all be all. Ultimately you’ll have to find the right solution for your company\n\nAsk to hold questions till the end\n\n\n
In a private cloud, you need to understand all of these services and how they interact\neach box on here could be and probably is at least one talk here this week\n\nthis isn’t plug and play yet but a number of companies are trying to get it there\n lots of companies have released installers\n
Multiple private clouds - \nseems Counter Intuitive to the concept of the cloud but drastically different hardware requirements may drive this \nCan be mitigated by AZs or Cells\n\nDiagram is overly simplified for this talk. Doesn’t take HA, out of band management, backup network, etc into account\n
Don’t paint yourself into an architectural corner by answering just one of these\nanswer all of these when looking at design\n\nprivate clouds generally don’t have the luxury of building massive capacity like a public cloud.\n
The problem is that you’ll have no idea how many virtual machines can run on here due to flavors\n\nHard to determine size of your environment\nYou have a head start if your already running workloads on a public cloud\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Fixed network is the one thing you need to get right.\ntell why!!\nAssigning a floating IP completely changes the way inbound and outbound connectivity happens.\n\nCould be an entire talk on networking in openstack. In fact, there probably is, everyone should go to it.\n
Explain what glance is\n\nIt’s hard to realistically guess how many images or size of images you’ll be using. \n\nIt’s simpler to standardize on base images and use automation tools to configure services within the instances.\n\n
I’d love to see a tool that could detect often used images and pre-cache those on remote hosts\n
I’d love to see a tool that could detect often used images and pre-cache those on remote hosts\n
example of an “object”\ncall back “build with the end in mind”: \nextremely important to build swift partitions correctly\nextrapolate on “zones”\ncould be a drive, a server, a cab\n\n
example of an “object”\ncall back “build with the end in mind”: \nextremely important to build swift partitions correctly\nextrapolate on “zones”\ncould be a drive, a server, a cab\n\n
Everyones network utilization will be different. Understand your current usage and plan accordingly.\nIf worried about nic saturation, break out your nova services (glance, nova services) to a separate network\n\nSIngle Controller - MySQL, rabbitmq-server, keystone, glance registry/api, nova-scheduler, nova-os-api-compute, nova-cert, nova-vncproxy, horizon\n
Convey why to consider swift\n100 nodes, 2000 instances (20 nodes per), any of them could be snapshotting. Will be a bottleneck\n\nfrontend backend networks\nfrontend for external connectivity \nbackend for instance to instance, instance to non-openstack server (dedicated DB, \n
Convey why to consider swift\n100 nodes, 2000 instances (20 nodes per), any of them could be snapshotting. Will be a bottleneck\n\nfrontend backend networks\nfrontend for external connectivity \nbackend for instance to instance, instance to non-openstack server (dedicated DB, \n
If you’re not using another system (Cinder, SAN, NetApp. etc) for additional storage, IO will need to a top consideration\n
right questions: (not taking hardware into account) I can ask 5 questions and build your environment\n\nAs I mentioned when starting, every major slide on here could be an entire talk and most of these slides are “lessons learned”\n\n
Specifically - \nThoughts on nova-volumes/Cinder\nthoughts on pre-caching\n\nAdditional thoughts on deployments\n