Hwee Ming Ng, Red Hat, Sadique Puthen, Red Hat
Many Service providers and communities like OPNFV is seeing OpenStack as the preferred cloud IaaS platform for NFV. However, Openstack was not designed with NFV in mind from day 1 and brings a lot of challenges when adapting to Telco environments. These challenges range from product design and development to solution design and architecture, deployment and support to match Telco expectations.
Red Hat has been working with a number of early adopters to roll out NFV solutions. Even though we have many successes, we have our fair share of challenges. When a solution architect and support engineer stand on the dais, it may be appropriate to recollect these challenges based on our experience from a solution design, architecture and support perspective. These challenges include distributed NFV, High Availability everywhere, Fault Tolerance, Predictive recovery, network performance, interoperability with multiple vendors, accommodating different types of VNFs with different operating systems, troubleshooting, feature availability, etc from a solution design perspective and support perspective.
Throughout this session we will touch base on these challenges, what are the possible solutions, how did we overcome them and open a discussion for challenges which do not have an acceptable solution. We will also discuss details of some of the challenges associated with troubleshooting issues specific to NFV deployments.
3. Architectural Style
3
Telco-style (ex. 3GPP SA5) Cloud IT-style
“standards first”
• committee-designed system
• (re-)implementation to specs
“implementations first”
• open source reference implementations
• standardization to stabilize best practices
Model-Driven Architecture
• comprehensive info model plus bindings to
interface + data models
bottom-up, increasingly microservice
monolithic, additive modular, iterative
domain-specific general-purpose
full-stack transparency abstractions, separation of concerns
workflow-oriented, active event-driven, reactive
3
4. What is NFV…. really?
NFV is the Ultimate Transformation!
Network
Transformation
• From Physical Network
Functions (PNFs) to Virtual
Network Functions (VNFs)
• Will allow telcos to implement
network and IT convergence
leveraging common underlying
infrastructure
Business
Transformation
• Impacts how telcos will procure
solutions from vendors
• Impacts how telcos will bring
new services to market (e.g
eliminating the need for truck
rolls, etc)
Organizational
Transformation
• A move to a more Agile and
DevOps approach will impact
how telcos design, deploy and
operate new technologies and
services
• A need for more software
expertise. Telco C-level major
concern is talent, finding and
retaining top developers!
And one of the largest inflection
points ever in the telco industry
4
7. 7
NFV vs Cloud use cases.
● Carrier grade vs Enterprise grade
○ Extensive testing
○ Detect and correct errors
○ Service SLA
● VNF and infrastructure high availability
● VNF not built to adapt the cloud model which provides fault tolerance in the application layer.
○ Sometimes the OS and applications from the hardware appliance is moved and deploy
● VNF may come with an operating system not supported on a Hypervisor like KVM.
○ FreeBSD, WindRiver, CentOS, Custom OS
● Resource scheduling and reservation.
○ Eg: CPU cycles and Memory for critical workloads.
8. ADDRESSING THE NEED FOR NETWORK FUNCTIONS VIRTUALIZATION
Openstack support for Network Functions Virtualization (“NFV”) is evolving to
meet the carrier-grade workload requirements of service providers
PERFORMANCE AVAILABILITY SECURITY MANAGEABILITY LIFECYCLE
NFV Readiness
8
9. Operation needs
● Patching and Upgrades without service impact
● Service Assurance
● Monitoring
● Performance
● Logging
● Security and Compliance Check
● Reporting
● Portal
创业难守业更难
To start a business is difficult but
to keep going is more difficult
9
10. 10
CHALLENGES WITH
POSITIONING
OPENSTACK FOR NFV
Security
Interoperability
Troubleshooting and Support
Life Cycle Management
Distributed NFV
Network Performance
HA, FT and Recovery
14. 14
DISTRIBUTED CLOUD vs DISTRIBUTED NFV
● Cloud is distributed either for disaster recovery or
facilitate deployments (regions) closer to the end
user.
○ Mainly achieved through independent
deployments or some level of stretched
deployments.
○ Some level of storage replication is
expected.
○ Most of the time uses identical flavor for
deployments. Like number of controllers,
storage and compute nodes.
● Distributed NFV must provide multiple types of
deployments.
○ Full fledged deployment for Core
○ 1 controller and hyper converged
deployment for Edge use case.
○ Compute node only deployment for CPE.
vs
● Centralized authentication - Keystone federation and fernet
tokens
● Latency of the infrastructure network
● Storage replication
● Hyperconvergence of compute and storage
16. 16
PERFORMANCE
● Telcos need low latency and best performance for processing network packets.
● The NFV solution should provide performance with SLA attached to it.
● This requires having a set of features to deliver best performance which is a
challenge. Eg
○ SR-IOV
○ CPU Pinning
○ Hugepages
○ DPDK
18. 18
HIGH AVAILABILITY, FT AND RECOVERY
● Telco need end to end high availability with predictable failover and recovery.
● Who should do what?
○ HA of NFV Infrastructure
■ Control plane HA
■ Network and storage high availability
■ Instance HA through nova evacuation.
■ Predictable failover time.
● High availability of VFNs
○ Instance HA
○ MANO or external monitoring services?
20. 20
LIFECYCLE MANAGEMENT
● Deployment and Life Cycle Management of the NFV Infrastructure.
● Heavy customization requirements to integrate with third party components.
○ OSPd integration with third party SDN
○ Automating security related changes
● Getting all NFV features out of the box. Most of they are available, but some still not.
● Lack of monitoring, log aggregation, performance metrics out of the box.
● Life cycle management of VNF from multiple vendors. MANO.
22. 22
INTEROPERABILITY WITH MULTIPLE VENDORS
● An NFV solution is not just OSP. It involves
○ MANO
○ Different types of VNFs
○ SDN
○ NFVI
○ VNF Manager
● We position OSP for NFVI and rely on partners to
integrate other pieces well with OSP for NFVI.
● Certification is important and helps to some extent.
● Everyone need to have basic understanding on how
this works to effectively design/architect, deploy
and operate. Eg:
○ Orchestration fails from MANO since
instance boot up fails, volume attachment
fails, etc.
○ VNF fails to boot.
○ VNF crashes randomly.
○ Network performance issue to and from VNF.
○ Configuration related options.
24. 24
TROUBLESHOOTING AND SUPPORT
● Troubleshooting configuration issues.
● Troubleshooting performance issues.
● Takes time to isolate to an OSP specific component from whole NFV stack.
● VNF may come with an operating system not supported on RHEL KVM which makes troubleshooting
very hard.
○ FreeBSD, WindRiver, CentOS, Customized OS
○ Requires close collaboration with the VNF vendor to troubleshoot.
● Eg:
○ VNF panics randomly while using an unsupported OS.
○ VNF booting up slowly. Issue with KVM isolcpu bug.
27. 27
● Security requirement for deployments at telco has more stringent requirements.
○ Audit and compliance to meet certain standards. Eg ANSSI
● While OpenStack was built with security in mind, it does not have all the security features baked
into the product to meet telco requirements.
○ Eg. End to End encryption of cinder volumes with key management.
■ Barbican
○ SSL everywhere.
■ Available, but configuring properly and automating it via deployment tools are a challenge.
SECURITY
28. 28
● We have made great progress as a community to close the gaps.
● Collaborate with the community. Avoid Forking
● Augment Openstack with other components, e.g Cloud management solution, FCAPS
SUMMARY