SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Enabling Carrier-Grade Availability
Within a Cloud Infrastructure
Aaron Smith, Red Hat
Pasi Vaananen, Red Hat
Agenda
• Introduction
• Problem and goals?
• Fault management cycle and timeline
• Relative impact to Service Availability
• Proof of concept
• PoC results
• What's next?
Problem
• The move to a NFV and a cloud infrastructure complicates the
delivery of highly-available services
 No longer a vertically integrated hardware / software stack
 Stack components provided by different vendors
• Same requirements apply (50ms … 1000ms, increasing by “layer”)
• For a cloud infrastructure, the network impacts availability more
than individual compute hosts, and detection / protection strategies
must adjust accordingly
Goals
• Produce a monitoring and event detection framework that
distributes fault information to various listeners with low latency
(<10’s of milliseconds)
• Provide a hierarchy of remediation controllers, which can react
quickly (<10’s of milliseconds) to faults.
• Provide FM mechanisms for both current virtualization environments
and future containerization environments orchestrated by
Kubernetes, etc…
Fault Management Cycle
Detection
(Prediction)
Localization
IsolationRemediation
RecoveryDiagnosis
Re-pool
Repair
Suspect
HW
Bad
Good
Fault Management Cycle Phases
• Detection – Requires low-latency, low-overhead mechanisms
• Localization – Physical/Virtualized resources to resource
consumer(s) mapping within the context of fault trees
• Isolation – Remove the ability of the failed component to
affect service state
• Remediation – Service restoration through failover to
redundant resource / component, or component restart
• Recovery – Restoration of service redundancy configuration
FM Cycle Timeline
Up, redundant Down, Remediation
Up, Recovering
Up, Repair Pending
Minimize TUA
TDET TREM
1st
Failure -- Potential
Outage or Degradation
TUA = TDET + TREM
Up, Redundant
Up, Recovering Up, Redundant
Failure Event
Service
Recovered
Redundancy
Restored
(pooled)
Repair
Completed
(non-pooled)
Redundancy
Restored (non-
pooled)
TREC, Pooled
2nd failure exposure, typ. ~2 mins MTTREC
TREP
TREC, Non-Pooled
2nd failure exposure, typ. 4+ hrs MTTREP
1st Indication:
FM cycle start
For non-pooled resources: coupled, critical repair
For pooled resources: uncoupled, deferred repairs
Fault Management Cycle Timeline
• TDET + TNOT + TREM < 50 ms (lowest “layers”, typ. network)
• TDET -- Detection time
• TNOT-- Notification
• TREM-- Remediation is often the longest process and therefore TDET
+ TNOT should be made as small as possible
Minimize
Automated Service Recovery Survey
Within 1 second
Within 50 ms
Within 5 seconds
Automated recovery not important
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
40%
39%
20%
1%
Heavy Reading NFV operator survey of 128 service providers, “Telco Requirements for NFVI”, November 2016
Relative Impact to Service Availability
• Different infrastructure components do have different impact
potential on the application level Service Availability e.g.:
• Network switch faults have a very high impact potential on the SA
(can affect all associated nodes / services)
• Compute node faults can only affect the VMs / Containers running on
them
• Spine > Leaf > Network Nodes > Storage Nodes > Control Nodes >
Control Node (Specific Service) > Compute Nodes > Compute Nodes
(Critical Services) > Compute Node (Specific VM/Container)
Service Relative Criticality (cont’d)
• Focus monitoring/remediation efforts with respect to the
relative impact potential, e.g.:
 Switch failure affects 10s of hosts (100s of services)
 Need fast detection and remediation of switch failures
Proof of Concept
• Demonstrate that events can be detected < 10ms
• Node network interfaces
• Kernel fault conditions
• Complete node failure (and differentiate host vs. switch)
• Demonstrate that event messages can be delivered to
subscribed components with consistently low latency
(99.999% of the latency values < 10ms)
Proof of Concept (cont’d)
• Applications can be enhanced to include the subscription and
reception of events
• Ensure that the collectd framework is suitable for event
monitoring (detection latency & overhead)
• Prototype integration with OpenStack services
• Prototype a node/switch monitoring system that provides quick
detection without adding significant overhead
Node Monitoring (PoC)
rules / action
engine
policies /
topology
Ingress Plugins
Kafka/AMQP
Local Agent
Config
Kublet
process
kernel
syslogd
libVirt
network
cpu
libVirt
cAdvisor
MCE CollectdCore
Egress Plugins
kernel
net
cpu
mem
hardware
syslog
/proc
pid
interface
Event
Telemetry
Gnocchi
telemetry
collectd config
Policy,
topology,
events
Local corrective actions
G-VNFM
Aodh
Keystone
NFVO/E2EO
RTMD
Ceilometer
Ceilometer
Services
Local Agent
Visualization
Proof of Concept Results
• Demonstrate that events can be detected < 10ms
• Node network interfaces – Dependent upon driver but achievable
• Kernel fault conditions – Verified monitoring of syslog output
• Complete node failure (and differentiate host vs. switch) – 802.1ag
Proof of Concept Results (cont'd)
• Demonstrate that event messages can be delivered to
subscribed services with consistently low latency. (99.999% of
the latency values < 10ms) – Mixed results with Kafka. With
simulated metrics from 700 nodes, average latency is
below 10ms. However, the cumulative latency distribution
had a long tail with values out to 200ms.
• Applications can be enhanced to include the subscription and
reception of events
Proof of Concept Results (cont'd)
• Telco and enterprise applications can be enhanced to include
the subscription and reception of events – In Progress. Low-
latency delivery of messages is achievable, however,
issues of scale and multi-tenancy/security need to be
addressed.
What’s Next?
• Common Object Model for Events and Telemetry
• Inclusion of Object and Event model in TOSCA
• Event interfaces towards G-VNFM and other MANO subsystems

Weitere ähnliche Inhalte

Was ist angesagt?

Openstack Tacker - Moving into Pike
Openstack Tacker - Moving into PikeOpenstack Tacker - Moving into Pike
Openstack Tacker - Moving into PikeOPNFV
 
Software-defined migration how to migrate bunch of v-ms and volumes within a...
Software-defined migration  how to migrate bunch of v-ms and volumes within a...Software-defined migration  how to migrate bunch of v-ms and volumes within a...
Software-defined migration how to migrate bunch of v-ms and volumes within a...OPNFV
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
MEF's inter-domain orchestration delivering dynamic third networks [presente...
MEF's  inter-domain orchestration delivering dynamic third networks [presente...MEF's  inter-domain orchestration delivering dynamic third networks [presente...
MEF's inter-domain orchestration delivering dynamic third networks [presente...OPNFV
 
Connection points between opnfv and etsi nfv tst working group
Connection points between opnfv and etsi nfv tst working groupConnection points between opnfv and etsi nfv tst working group
Connection points between opnfv and etsi nfv tst working groupOPNFV
 
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...OPNFV
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...OPNFV
 
Test and perspectives on nfvi from china unicom sdn nfv lab
Test and perspectives on nfvi from china unicom sdn nfv labTest and perspectives on nfvi from china unicom sdn nfv lab
Test and perspectives on nfvi from china unicom sdn nfv labOPNFV
 
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackSummit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackOPNFV
 
Requirement analysis of vim platform reliability in a three-layer decoupling ...
Requirement analysis of vim platform reliability in a three-layer decoupling ...Requirement analysis of vim platform reliability in a three-layer decoupling ...
Requirement analysis of vim platform reliability in a three-layer decoupling ...OPNFV
 
Challenges in testing for composite vim platforms
Challenges in testing for composite vim platformsChallenges in testing for composite vim platforms
Challenges in testing for composite vim platformsOPNFV
 
Building the carrier grade nfv infrastructure
Building the carrier grade nfv infrastructureBuilding the carrier grade nfv infrastructure
Building the carrier grade nfv infrastructureOPNFV
 
KVM Enhancements for OPNFV
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFVOPNFV
 
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingSummit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingOPNFV
 
Challenges in positioning open stack for nf-vi_ are we biting off more than w...
Challenges in positioning open stack for nf-vi_ are we biting off more than w...Challenges in positioning open stack for nf-vi_ are we biting off more than w...
Challenges in positioning open stack for nf-vi_ are we biting off more than w...OPNFV
 
OPNFV scenarios challenges and opportunities
OPNFV scenarios  challenges and opportunitiesOPNFV scenarios  challenges and opportunities
OPNFV scenarios challenges and opportunitiesOPNFV
 
Upstream Testing Collaboration
Upstream Testing Collaboration Upstream Testing Collaboration
Upstream Testing Collaboration OPNFV
 
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?OPNFV
 
The Third Network: LSO, SDN and NFV
The Third Network: LSO, SDN and NFVThe Third Network: LSO, SDN and NFV
The Third Network: LSO, SDN and NFVOPNFV
 
Faster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next LevelFaster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next LevelOPNFV
 

Was ist angesagt? (20)

Openstack Tacker - Moving into Pike
Openstack Tacker - Moving into PikeOpenstack Tacker - Moving into Pike
Openstack Tacker - Moving into Pike
 
Software-defined migration how to migrate bunch of v-ms and volumes within a...
Software-defined migration  how to migrate bunch of v-ms and volumes within a...Software-defined migration  how to migrate bunch of v-ms and volumes within a...
Software-defined migration how to migrate bunch of v-ms and volumes within a...
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
MEF's inter-domain orchestration delivering dynamic third networks [presente...
MEF's  inter-domain orchestration delivering dynamic third networks [presente...MEF's  inter-domain orchestration delivering dynamic third networks [presente...
MEF's inter-domain orchestration delivering dynamic third networks [presente...
 
Connection points between opnfv and etsi nfv tst working group
Connection points between opnfv and etsi nfv tst working groupConnection points between opnfv and etsi nfv tst working group
Connection points between opnfv and etsi nfv tst working group
 
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
Summit 16: Providing Root Cause Analysis to OPNFV Using Pinpoint -the A-CORD ...
 
Crossing the river by feeling the stones from legacy to cloud native applica...
Crossing the river by feeling the stones  from legacy to cloud native applica...Crossing the river by feeling the stones  from legacy to cloud native applica...
Crossing the river by feeling the stones from legacy to cloud native applica...
 
Test and perspectives on nfvi from china unicom sdn nfv lab
Test and perspectives on nfvi from china unicom sdn nfv labTest and perspectives on nfvi from china unicom sdn nfv lab
Test and perspectives on nfvi from china unicom sdn nfv lab
 
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackSummit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
 
Requirement analysis of vim platform reliability in a three-layer decoupling ...
Requirement analysis of vim platform reliability in a three-layer decoupling ...Requirement analysis of vim platform reliability in a three-layer decoupling ...
Requirement analysis of vim platform reliability in a three-layer decoupling ...
 
Challenges in testing for composite vim platforms
Challenges in testing for composite vim platformsChallenges in testing for composite vim platforms
Challenges in testing for composite vim platforms
 
Building the carrier grade nfv infrastructure
Building the carrier grade nfv infrastructureBuilding the carrier grade nfv infrastructure
Building the carrier grade nfv infrastructure
 
KVM Enhancements for OPNFV
KVM Enhancements for OPNFVKVM Enhancements for OPNFV
KVM Enhancements for OPNFV
 
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV BenchmarkingSummit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
Summit 16: The Hitchhiker/Hacker's Guide to NFV Benchmarking
 
Challenges in positioning open stack for nf-vi_ are we biting off more than w...
Challenges in positioning open stack for nf-vi_ are we biting off more than w...Challenges in positioning open stack for nf-vi_ are we biting off more than w...
Challenges in positioning open stack for nf-vi_ are we biting off more than w...
 
OPNFV scenarios challenges and opportunities
OPNFV scenarios  challenges and opportunitiesOPNFV scenarios  challenges and opportunities
OPNFV scenarios challenges and opportunities
 
Upstream Testing Collaboration
Upstream Testing Collaboration Upstream Testing Collaboration
Upstream Testing Collaboration
 
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?
Summit 16: How to Do a Pre-deployment NFVI Validation Quickly and Efficiently?
 
The Third Network: LSO, SDN and NFV
The Third Network: LSO, SDN and NFVThe Third Network: LSO, SDN and NFV
The Third Network: LSO, SDN and NFV
 
Faster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next LevelFaster, Higher, Stronger – Accelerating Fault Management to the Next Level
Faster, Higher, Stronger – Accelerating Fault Management to the Next Level
 

Ähnlich wie Enabling Carrier-Grade Availability Within a Cloud Infrastructure

IOT model to Unified Communication Events in SDN
IOT model to Unified Communication  Events in SDNIOT model to Unified Communication  Events in SDN
IOT model to Unified Communication Events in SDNChandrashekhar Rao
 
QoS-Predictable SOA on TSN: Insights from a Case-Study
QoS-Predictable SOA on TSN: Insights from a Case-StudyQoS-Predictable SOA on TSN: Insights from a Case-Study
QoS-Predictable SOA on TSN: Insights from a Case-StudyRealTime-at-Work (RTaW)
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSDeepak Shankar
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDNNetCraftsmen
 
Lessons learned so far in operationalizing NFV
Lessons learned so far in operationalizing NFVLessons learned so far in operationalizing NFV
Lessons learned so far in operationalizing NFVJames Crawshaw
 
Software Defined Networking - Huawei, June 2017
Software Defined Networking - Huawei, June 2017Software Defined Networking - Huawei, June 2017
Software Defined Networking - Huawei, June 2017Novosco
 
2017_IMC_QUIC.pptx
2017_IMC_QUIC.pptx2017_IMC_QUIC.pptx
2017_IMC_QUIC.pptxBrian Zein
 
In-service synchronization monitoring and assurance
In-service synchronization monitoring and assuranceIn-service synchronization monitoring and assurance
In-service synchronization monitoring and assuranceADVA
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsPawandeep Kaur
 
The Show Must Go On! Using Kafka to Assure TV Signals Reach the Transmitters
The Show Must Go On! Using Kafka to Assure TV Signals Reach the TransmittersThe Show Must Go On! Using Kafka to Assure TV Signals Reach the Transmitters
The Show Must Go On! Using Kafka to Assure TV Signals Reach the TransmittersHostedbyConfluent
 
Service assurance for NFV
Service assurance for NFVService assurance for NFV
Service assurance for NFVJames Crawshaw
 
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Tal Lavian Ph.D.
 
performanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdfperformanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdfABYTHOMAS46
 
Network time protocol
Network time protocolNetwork time protocol
Network time protocolMohd Amir
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...VMworld
 
5G and Open Reference Platforms
5G and Open Reference Platforms5G and Open Reference Platforms
5G and Open Reference PlatformsMichelle Holley
 

Ähnlich wie Enabling Carrier-Grade Availability Within a Cloud Infrastructure (20)

IOT model to Unified Communication Events in SDN
IOT model to Unified Communication  Events in SDNIOT model to Unified Communication  Events in SDN
IOT model to Unified Communication Events in SDN
 
QoS-Predictable SOA on TSN: Insights from a Case-Study
QoS-Predictable SOA on TSN: Insights from a Case-StudyQoS-Predictable SOA on TSN: Insights from a Case-Study
QoS-Predictable SOA on TSN: Insights from a Case-Study
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
 
Lessons learned so far in operationalizing NFV
Lessons learned so far in operationalizing NFVLessons learned so far in operationalizing NFV
Lessons learned so far in operationalizing NFV
 
Software Defined Networking - Huawei, June 2017
Software Defined Networking - Huawei, June 2017Software Defined Networking - Huawei, June 2017
Software Defined Networking - Huawei, June 2017
 
2017_IMC_QUIC.pptx
2017_IMC_QUIC.pptx2017_IMC_QUIC.pptx
2017_IMC_QUIC.pptx
 
In-service synchronization monitoring and assurance
In-service synchronization monitoring and assuranceIn-service synchronization monitoring and assurance
In-service synchronization monitoring and assurance
 
Real Time Operating Systems
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating Systems
 
The Show Must Go On! Using Kafka to Assure TV Signals Reach the Transmitters
The Show Must Go On! Using Kafka to Assure TV Signals Reach the TransmittersThe Show Must Go On! Using Kafka to Assure TV Signals Reach the Transmitters
The Show Must Go On! Using Kafka to Assure TV Signals Reach the Transmitters
 
SDN and NFV
SDN and NFVSDN and NFV
SDN and NFV
 
Service assurance for NFV
Service assurance for NFVService assurance for NFV
Service assurance for NFV
 
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
Enabling Active Flow Manipulation (AFM) in Silicon-based Network Forwarding E...
 
Multilin™ Intelligent Line Monitoring System
Multilin™ Intelligent Line Monitoring SystemMultilin™ Intelligent Line Monitoring System
Multilin™ Intelligent Line Monitoring System
 
performanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdfperformanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdf
 
Performance and traffic management for WSNs
Performance and traffic management for WSNsPerformance and traffic management for WSNs
Performance and traffic management for WSNs
 
Network time protocol
Network time protocolNetwork time protocol
Network time protocol
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...
VMworld 2013: Network Function Virtualization in the Cloud: Case for Enterpri...
 
5G and Open Reference Platforms
5G and Open Reference Platforms5G and Open Reference Platforms
5G and Open Reference Platforms
 

Mehr von OPNFV

Energy Audit aaS with OPNFV
Energy Audit aaS with OPNFVEnergy Audit aaS with OPNFV
Energy Audit aaS with OPNFVOPNFV
 
Hands-On Testing: How to Integrate Tests in OPNFV
Hands-On Testing: How to Integrate Tests in OPNFVHands-On Testing: How to Integrate Tests in OPNFV
Hands-On Testing: How to Integrate Tests in OPNFVOPNFV
 
Storage Performance Indicators - Powered by StorPerf and QTIP
Storage Performance Indicators - Powered by StorPerf and QTIPStorage Performance Indicators - Powered by StorPerf and QTIP
Storage Performance Indicators - Powered by StorPerf and QTIPOPNFV
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...OPNFV
 
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)OPNFV
 
Being Brave: Deploying OpenStack from Master
Being Brave: Deploying OpenStack from MasterBeing Brave: Deploying OpenStack from Master
Being Brave: Deploying OpenStack from MasterOPNFV
 
Learnings From the First Year of the OPNFV Internship Program
Learnings From the First Year of the OPNFV Internship ProgramLearnings From the First Year of the OPNFV Internship Program
Learnings From the First Year of the OPNFV Internship ProgramOPNFV
 
OPNFV and OCP: Perfect Together
OPNFV and OCP: Perfect TogetherOPNFV and OCP: Perfect Together
OPNFV and OCP: Perfect TogetherOPNFV
 
The Return of QTIP, from Brahmaputra to Danube
The Return of QTIP, from Brahmaputra to DanubeThe Return of QTIP, from Brahmaputra to Danube
The Return of QTIP, from Brahmaputra to DanubeOPNFV
 
Improving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and TestingImproving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and TestingOPNFV
 
Distributed vnf management architecture and use-cases
Distributed vnf management  architecture and use-casesDistributed vnf management  architecture and use-cases
Distributed vnf management architecture and use-casesOPNFV
 
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...OPNFV
 
Challenge in asia region connecting each testbed and poc of distributed nfv ...
Challenge in asia region  connecting each testbed and poc of distributed nfv ...Challenge in asia region  connecting each testbed and poc of distributed nfv ...
Challenge in asia region connecting each testbed and poc of distributed nfv ...OPNFV
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...OPNFV
 
OPNFV with 5G Applications
OPNFV with 5G ApplicationsOPNFV with 5G Applications
OPNFV with 5G ApplicationsOPNFV
 
NFV interoperability, for the success of commercial deployments
NFV interoperability, for the success of commercial deploymentsNFV interoperability, for the success of commercial deployments
NFV interoperability, for the success of commercial deploymentsOPNFV
 

Mehr von OPNFV (17)

Energy Audit aaS with OPNFV
Energy Audit aaS with OPNFVEnergy Audit aaS with OPNFV
Energy Audit aaS with OPNFV
 
Hands-On Testing: How to Integrate Tests in OPNFV
Hands-On Testing: How to Integrate Tests in OPNFVHands-On Testing: How to Integrate Tests in OPNFV
Hands-On Testing: How to Integrate Tests in OPNFV
 
Storage Performance Indicators - Powered by StorPerf and QTIP
Storage Performance Indicators - Powered by StorPerf and QTIPStorage Performance Indicators - Powered by StorPerf and QTIP
Storage Performance Indicators - Powered by StorPerf and QTIP
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
 
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)
 
Being Brave: Deploying OpenStack from Master
Being Brave: Deploying OpenStack from MasterBeing Brave: Deploying OpenStack from Master
Being Brave: Deploying OpenStack from Master
 
Learnings From the First Year of the OPNFV Internship Program
Learnings From the First Year of the OPNFV Internship ProgramLearnings From the First Year of the OPNFV Internship Program
Learnings From the First Year of the OPNFV Internship Program
 
OPNFV and OCP: Perfect Together
OPNFV and OCP: Perfect TogetherOPNFV and OCP: Perfect Together
OPNFV and OCP: Perfect Together
 
The Return of QTIP, from Brahmaputra to Danube
The Return of QTIP, from Brahmaputra to DanubeThe Return of QTIP, from Brahmaputra to Danube
The Return of QTIP, from Brahmaputra to Danube
 
Improving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and TestingImproving POD Usage in Labs, CI and Testing
Improving POD Usage in Labs, CI and Testing
 
Distributed vnf management architecture and use-cases
Distributed vnf management  architecture and use-casesDistributed vnf management  architecture and use-cases
Distributed vnf management architecture and use-cases
 
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...
Securing your nfv and sdn integrated open stack cloud- challenges, use-cases ...
 
Challenge in asia region connecting each testbed and poc of distributed nfv ...
Challenge in asia region  connecting each testbed and poc of distributed nfv ...Challenge in asia region  connecting each testbed and poc of distributed nfv ...
Challenge in asia region connecting each testbed and poc of distributed nfv ...
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...Demo  how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
Demo how to efficiently evaluate nf-vi performance by leveraging opnfv testi...
 
OPNFV with 5G Applications
OPNFV with 5G ApplicationsOPNFV with 5G Applications
OPNFV with 5G Applications
 
NFV interoperability, for the success of commercial deployments
NFV interoperability, for the success of commercial deploymentsNFV interoperability, for the success of commercial deployments
NFV interoperability, for the success of commercial deployments
 

Kürzlich hochgeladen

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Kürzlich hochgeladen (20)

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Enabling Carrier-Grade Availability Within a Cloud Infrastructure

  • 1.
  • 2. Enabling Carrier-Grade Availability Within a Cloud Infrastructure Aaron Smith, Red Hat Pasi Vaananen, Red Hat
  • 3. Agenda • Introduction • Problem and goals? • Fault management cycle and timeline • Relative impact to Service Availability • Proof of concept • PoC results • What's next?
  • 4. Problem • The move to a NFV and a cloud infrastructure complicates the delivery of highly-available services  No longer a vertically integrated hardware / software stack  Stack components provided by different vendors • Same requirements apply (50ms … 1000ms, increasing by “layer”) • For a cloud infrastructure, the network impacts availability more than individual compute hosts, and detection / protection strategies must adjust accordingly
  • 5. Goals • Produce a monitoring and event detection framework that distributes fault information to various listeners with low latency (<10’s of milliseconds) • Provide a hierarchy of remediation controllers, which can react quickly (<10’s of milliseconds) to faults. • Provide FM mechanisms for both current virtualization environments and future containerization environments orchestrated by Kubernetes, etc…
  • 7. Fault Management Cycle Phases • Detection – Requires low-latency, low-overhead mechanisms • Localization – Physical/Virtualized resources to resource consumer(s) mapping within the context of fault trees • Isolation – Remove the ability of the failed component to affect service state • Remediation – Service restoration through failover to redundant resource / component, or component restart • Recovery – Restoration of service redundancy configuration
  • 8. FM Cycle Timeline Up, redundant Down, Remediation Up, Recovering Up, Repair Pending Minimize TUA TDET TREM 1st Failure -- Potential Outage or Degradation TUA = TDET + TREM Up, Redundant Up, Recovering Up, Redundant Failure Event Service Recovered Redundancy Restored (pooled) Repair Completed (non-pooled) Redundancy Restored (non- pooled) TREC, Pooled 2nd failure exposure, typ. ~2 mins MTTREC TREP TREC, Non-Pooled 2nd failure exposure, typ. 4+ hrs MTTREP 1st Indication: FM cycle start For non-pooled resources: coupled, critical repair For pooled resources: uncoupled, deferred repairs
  • 9. Fault Management Cycle Timeline • TDET + TNOT + TREM < 50 ms (lowest “layers”, typ. network) • TDET -- Detection time • TNOT-- Notification • TREM-- Remediation is often the longest process and therefore TDET + TNOT should be made as small as possible Minimize
  • 10. Automated Service Recovery Survey Within 1 second Within 50 ms Within 5 seconds Automated recovery not important 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 40% 39% 20% 1% Heavy Reading NFV operator survey of 128 service providers, “Telco Requirements for NFVI”, November 2016
  • 11. Relative Impact to Service Availability • Different infrastructure components do have different impact potential on the application level Service Availability e.g.: • Network switch faults have a very high impact potential on the SA (can affect all associated nodes / services) • Compute node faults can only affect the VMs / Containers running on them • Spine > Leaf > Network Nodes > Storage Nodes > Control Nodes > Control Node (Specific Service) > Compute Nodes > Compute Nodes (Critical Services) > Compute Node (Specific VM/Container)
  • 12. Service Relative Criticality (cont’d) • Focus monitoring/remediation efforts with respect to the relative impact potential, e.g.:  Switch failure affects 10s of hosts (100s of services)  Need fast detection and remediation of switch failures
  • 13. Proof of Concept • Demonstrate that events can be detected < 10ms • Node network interfaces • Kernel fault conditions • Complete node failure (and differentiate host vs. switch) • Demonstrate that event messages can be delivered to subscribed components with consistently low latency (99.999% of the latency values < 10ms)
  • 14. Proof of Concept (cont’d) • Applications can be enhanced to include the subscription and reception of events • Ensure that the collectd framework is suitable for event monitoring (detection latency & overhead) • Prototype integration with OpenStack services • Prototype a node/switch monitoring system that provides quick detection without adding significant overhead
  • 15. Node Monitoring (PoC) rules / action engine policies / topology Ingress Plugins Kafka/AMQP Local Agent Config Kublet process kernel syslogd libVirt network cpu libVirt cAdvisor MCE CollectdCore Egress Plugins kernel net cpu mem hardware syslog /proc pid interface Event Telemetry Gnocchi telemetry collectd config Policy, topology, events Local corrective actions G-VNFM Aodh Keystone NFVO/E2EO RTMD Ceilometer Ceilometer Services Local Agent Visualization
  • 16. Proof of Concept Results • Demonstrate that events can be detected < 10ms • Node network interfaces – Dependent upon driver but achievable • Kernel fault conditions – Verified monitoring of syslog output • Complete node failure (and differentiate host vs. switch) – 802.1ag
  • 17. Proof of Concept Results (cont'd) • Demonstrate that event messages can be delivered to subscribed services with consistently low latency. (99.999% of the latency values < 10ms) – Mixed results with Kafka. With simulated metrics from 700 nodes, average latency is below 10ms. However, the cumulative latency distribution had a long tail with values out to 200ms. • Applications can be enhanced to include the subscription and reception of events
  • 18. Proof of Concept Results (cont'd) • Telco and enterprise applications can be enhanced to include the subscription and reception of events – In Progress. Low- latency delivery of messages is achievable, however, issues of scale and multi-tenancy/security need to be addressed.
  • 19. What’s Next? • Common Object Model for Events and Telemetry • Inclusion of Object and Event model in TOSCA • Event interfaces towards G-VNFM and other MANO subsystems