In this session, we outline the development of a policy-driven scheduler that allows the Nova administrator (as well as Nova tenant administrators) to control placement decisions by writing policy statements that dictate which VMs or containers can be placed on which hosts. The DevOps-friendly, policy-driven scheduler plugs into the existing scheduler framework, to enable people to easily and quickly express custom scheduling rules and constraints, and at the same time provide assurances related to performance, security, and data integrity.
2. 2
Dell - Restricted - Confidential
Team
• Core Team (besides presenters)
– Arun Yerra, Dell
– Dilip Krishnaswamy, IBM Research
– Joseph Gasparakis, Intel
– Ruby Krishnaswamy, Orange
• Contributor Acknowledgement
– Anoop Ghanwani, Dell
– Diego Lopez, Telefonica (Operator)
– Francisco Javier, Telefonica (Operator)
– Frank Zdarsky, Red Hat
– Jim Hao Chen, Northwestern University
– Norival Figueira, Brocade
– Peter Willis, BT (Operator)
– Sridhar Ramaswamy, Brocade
– Steve Gordon, Red Hat
– Sylvain Bauza, Red Hat
– Uri Elzur, Intel
3. 3
Dell - Restricted - Confidential
OpenStack Nova Scheduler Challenges
• Platform Features Beyond Compute
– SDS use case: High perf storage and compute isolation
– Wait for next OpenStack Release?
• Ease of Use
– Gen use case: Determine highly loaded or unusable hosts
– Build use case specific analysis tools?
• Initial Placement vs other Functions
– NFV use case: Dynamic monitoring and violation detection
– Design one-off monitoring framework?
Admin
User
4. 4
Dell - Restricted - Confidential
Application Performance-aware Workload Placement (1)
Delivering “Low-latency, reliable delivery” workloads e.g. Broadcast
Video, Distance Learning, Augmented Reality in the Telco Cloud
• NFV Orchestrator - End-to-end - Intra-dc, Inter-dc WAN etc.
• Exemplary VNFs - Stateful firewall, Wireless Video Proxy, Crypto
• Compute: Fine grained resource partitioning for VM
– Dedicated core(s) AND NUMA awareness AND L3 cache part [1] AND
SR-IOV *** ELSE **
– Dedicated core(s) AND NUMA awareness AND L3 cache partitioning
AND DPDK vSwitch *** ELSE ***
– Dedicated physical server
• Network: Overlay/Underlay QoS
– High QoS AND Minimum buffer depth in switches
• Storage: High Performance Logging
– NVMe SSD based storage *** ELSE *** SSD based storage
Ref. [1] - Intel RDT - http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
3G 4G 5G
Premium Quality Video
Poor Quality Video
Infrastructure Issues
5. 5
Dell - Restricted - Confidential
Application Performance-aware Workload Placement (2)
Delivering “Classic enterprise" workloads, e.g. Email, CRM in the Telco Cloud
• Exemplary data plane VNFs - Stateful firewall, IDS/IPS, WAN Opt and IPSEC crypto
• Compute: Deterministic performance by avoiding memory contention
– NUMA awareness AND SR-IOV *** ELSE ***
– NUMA awareness
• Network: No HA requirement
• Storage: SSD for High performance logging
Delivering “Residential broadband" workloads, e.g. cost-effective Internet in the Telco Cloud
• Exemplary data plane VNFs - NAT
• Compute/Network: Max capacity limit
• Storage: HDD for Low cost
6. 6
Dell - Restricted - Confidential
Policy-driven Scheduler Approach (1)
Minimize Vendor Lock-in and Dependency
Maximize feature velocity
• Extensibility
– Admin/User can add new compute (Nova),
networking (Neutron), storage (Cinder) constraints
on the fly
• Understandability
– Admin/User uses human readable scheduling
policies and build analysis tools on a need basis
• Monitoring
– Admin/User benefits from a single representation
for handling variation in resource utilization and
initial placement
Minimize additional
code
No custom analysis
tools
No delay in
monitoring feature
availability
7. 7
Dell - Restricted - Confidential
Policy-driven Scheduler Approach (2)
Best of Breed
• Imperative Interface Choices
– Extensions to current JSON filter - JSON Weight
• Declarative Interface Choices
– JSON Filter extensions to current Nova Flavors
– Datalog embedded in YAML for flexible constraint
specification and database manipulation
Enable user to
customize specific
applications
Address User
understandability,
Admin extensibility
8. 8
Dell - Restricted - Confidential
Imperative Example
Policy-driven Scheduler
User Request
NUMA and SR-IOV
else
NUMA and more cores
Host 1 Host 2
Host 3 Host 4
Host1: SRIOV
Host2: NUMA, SRIOV
Host3: NUMA, more cores
Host4: L3 partitioning
Output
2
1
Host 2
Host 3
User Describes
Desired
Hardware
9. 9
Dell - Restricted - Confidential
Declarative Example
Policy-driven Scheduler
User Request
affinity: [“vm123”, “vm456”]
memory: 10GB
type: “low-latency, reliable-delivery”
Host 2
Output
Host 1 Host 2
Host 3 Host 4
Policy
Store
Policy
This type requires
local ephemeral
SSD-backed storage
Host2 data
memory: 20GB
storage: ssd
User Describes
Workload
10. 10
Dell - Restricted - Confidential
OpenStack Nova Scheduler
Host 1
Host 2
Host 3
Host N
Host 1
Host 3
Host 8
Host 9
Filters Weighting
• 30+ types of
filters.
• Find the
subset of
suitable
hosts.
• Order
suitable
hosts.
Host 8
Host 1
Host 9
Host 3
:
:
11. 11
Dell - Restricted - Confidential
Nova Scheduler Filter
• Administrator configures the filter list (30+ options)
• scheduler_default_filters=RamFilter,Compute
Filter,AvailabilityZoneFilter,ComputeCapabi
litiesFilter,ImagePropertiesFilter,ServerGr
oupAntiAffinityFilter,ServerGroupAffinityFi
lter'
• Admin configures various filter input data sets such as the
flavor definition with extra_specs
Host 1:
Host 3:
Host 8:
Host 9:
Each host complies with an imperative request
based on user and admin input.
E.g. 4GB for VM, huge pages, AES-NI, same
availability zone, PCIe accelerators, can meet
image property requirements, etc., etc.
12. 12
Dell - Restricted - Confidential
Nova Scheduler Weight
• Configured by the administrator.
• RAM
– Spread across hosts evenly based on RAM
utilisation.
• Metrics
– Weigh hosts based on a combination of the
weight associated with the specified
host_state metrics.
• IO Ops
– Weight hosts based on I/O operations.
• Affinity
– Weights hosts based on the number of
instances from a given server group.
– Affinity and Anti-Affinity options available.
Host 8: 10GB Free
Host 1: 7GB Free
Host 9: 3GB Free
Host 3: 1GB Free
RAM Centric
weighting policy
13. 13
Dell - Restricted - Confidential
• Administrator input to the filter scheduler is largely static and Nova centric
– E.g. flavour and extra_spec definitions, Host aggregate definitions, etc.
• Not possible to deploy to a given service level with different infrastructure
resource allocations (in the same request) under policy governance.
• Not possible to modify the weighting configuration/policy for different parts of
the environment such as per availability zone or host aggregates.
Problem Statement(s) – Nova Placement
14. 14
Dell - Restricted - Confidential
Empower User: JsonFilter + JsonWeight
Filter Scheduler
Host Data
(Nova’s HostState)
User Request
NUMA and SR-IOV weighted 2
NUMA and more cores weighted 1
JsonFilter JsonWeight
15. 15
Dell - Restricted - Confidential
Empower Admin 1: New Filter
Filter Scheduler
Policy StoreHost Data
(Config, File)(Nova’s HostState)
User Request
workload: “low-latency, reliable-delivery”
tenant-id: “pepsi”
AdminJsonFilter AdminJsonWeight
Pro: Extensible by admin to
external data sources
like Cinder and Neutron.
Con: New filter on already
long list.
16. 16
Dell - Restricted - Confidential
Empower Admin 2: Modify Existing Filters
Field Description
vCPUs Number of virtual CPUs
Memory_MB VM memory in megabytes
Disk Virtual root disk size in GB
…
Extra_specs Key-value pairs
Policy AND/OR/NOT of tests
Flavor fields
Field Description
ID Number of virtual CPUs
Name VM memory in megabytes
AvailabilityZone Virtual root disk size in GB
Hosts List of hosts in group
Metadata Key-value pairs
Policy AND/OR/NOT of tests
Host Aggregate FieldsPro: Extensible by admin.
Already part of workflow.
Con: Adds complexity to
established filters
17. 17
Dell - Restricted - Confidential
Status
• Concept stage with early drafts of several specs
– Imperative: json-weight
– Declarative:
– New scheduler: policy-based-scheduler
– New filter+weight: admin-json-filter
– Modify existing flavor: flavor-policy
– New Host aggregate field: host-aggregate-policy
• 3 sessions at this summit
– Wednesday, 9-10:30 (Nova scheduler working session)
– Wednesday, 11-11:40 (Congress Integrations session)
– Wednesday, 11:45-12:30 (NFV Orchestration BoF)
18. 18
Dell - Restricted - Confidential
Key Takeaways
• Contributors: 10+ companies
• Goal: Policy-driven scheduling, Service-assured resource-allocation
• Approach:
– Imperative: User describes desired hardware in policy language OR
– JSON Weight
– Declarative: User describes application; admin maps application to
hardware
– Admin JSON Filter, Admin JSON Weight
– Enhance Flavor and Host Aggregates
• Weekly meeting: 8am Pacific = 1300 UTC
– Please join us!
21. 21
Dell - Restricted - Confidential
Policy Language: JsonFilter and JsonWeight
For "low-latency" workloads:
• At least 8GB of free ram
• At least 8 free vCPUs
• NUMA awareness
[‘or’, [‘and', ['=', '$user.type', 'low-latency'],
[‘>’, ’$host.free_ram_mb’, 8*1024],
[‘>’, ’$host.vcpus_total’ - '$host.vcpus_used', 8],
[‘not’, [‘=', '$host.numa_topology', 'None']]]]
22. 22
Dell - Restricted - Confidential
Policy Language: YAML based policy
parameters:
availability_zone:
type: String
label: availability zone number
description: Name of the availability zone server
should be hosted on.
affinity :
type : String
label : Affinity
description: Affinity Group Id
ram :
type : integer
label : RAM
description: Minimum RAM size required by server
instance in GB.
hard_constraints:
ram_constraint:
operation_type : min
value : { get_param : ram }
affinity_constraint:
operation_type : equals
value : { get_param : affinity }
availability_zone_constraint:
operation_type : equals
value : { get_param : availability_zone }
soft_constraints:
ram_factor:
operation_type : multiplication
value : { get_param : ram-weight}