This document summarizes a presentation on machine learning and fluid network planes. It begins with an agenda and introduction to fluid network planes and instances. It then discusses the role of machine learning in fluid network planes, including applications such as optimization, virtual network embedding problems, run-time operations, and intent-based closed-loop automation. Recent research is presented on machine learning-based YouTube QoE estimation using real 4G/5G network traces to predict video quality and inform control actions. Results are shown comparing 4G and 5G networks in terms of radio parameters, stalling events, handovers, and video resolutions under different mobility conditions.
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
1. The Role of Machine Learning in Fluid
Network Control and Data Planes
Prof. Dr. Christian Esteve Rothenberg
University of Campinas, Brazil
chesteve@dca.fee.unicamp.br
https://intrig.dca.fee.unicamp.br/christian/
https://www.dca.fee.unicamp.br/~chesteve/
TEWI-Kolloquium, 13. October 2022
2. Agenda
Disclaimer
“Fluid Network Planes” was first presented as a
Keynote of IEEE NetSoft'19, Paris, Jun .2019.
● Fluid Network Planes
○ The ‘Concept’
○ Instances
○ The Role of ML
● ML-based Youtube QoE
estimation in real 5G networks
3. 3
SDN in 2015 – 2020 → Network Softwarization
(i.e. NFV + SDN + IBN + xyz)
Old / Existing
• CLIs & Manual labour
• Closed Source
• Vendor Lead
• Classic Network Appliances (HW)
New / Softwarized
• APIs & Automation
• Open Source
• Customer Lead
• Virtual Network Functions (NFV/SW)
Source: Adapted from Kyle Mestery, Next Generation Network Developer Skills
4. 4
Different Network Softwarization Models Control plane component(s)
Canonical/Open
Traditional
Hybrid/Broker Overlay
Compiler
Source: C. Rothenberg (INTRIG/UNICAMP)
Whitebox / Baremetal
+
PISA / P4
Data plane component(s)
5. 5
Legacy
Models & Approaches to Program / Refactor the Netsoft Stack
Data Plane
Mgm.APIs
Distributed
L2/L3
Control Plane
Managemt
Software
Southbound
Agent
(e.g. OF)
Network Controller / OS
Southbound
Protocol (e.g. OF)
Business / Control Apps
Northbound APIs
Mgm.
HAL APIs / Drivers
Orchestrator (SO/RO/LCM)
APIs
Compiler
Auto-Generated
Target Binary
SDN
VNF
GP-CPU
(x86, ARM)
HW Resources
Virtualization
DP
CP
M
g
m
NFV
VNFM
(Manager)
VIM
(Infra-M)
OSS/BSS
APIs
Southbound
APIs/Plugins
Mgm.
Apps
Network OS / Bare Metal Switches
Source: C. Rothenberg (INTRIG/UNICAMP)
7. The Fluid Networking landscape
Customer
Premises
BS MEC | Access Cloud | PoP DC
Edge
Cloud DCs
Core
HW
SW
+
-
8. The Fluid Networking landscape
Customer
Premises
BS MEC | Access Cloud | PoP DC
Edge
Cloud DCs
Core
HW
SW
+
-
9. Fluid Networking: HW-SW Continuum
Performance
Portability Programmability
HW
SW
Source: D. Meyer (Courtesy by J. Doyle) Source: G. Pongracz. "Cheap silicon". HotSDN13
Source: C. Rothenberg. P3 Trade-offs. 2017
10. HW
SW
• General-purpose CPU
• HW-accelerated features**
• FPGA
• GPU, TPU,
• Programmable NIC, ASIC
• Domain Specific
Architectures (DSAS)
e.g., P4 + PISA
• Containers
• User space
• Kernel space
• Drivers, I/O SDKs
Flexibility*
(programmability + portability)
Performance***
* M. He et al. Flexibility in Softwarized
Networks: Classifications and Research
Challenges. IEEE Survey & Tutorials, 2019
*** G. Bianchi. Back to the Future:
Hardware-specialized Cloud Networking.
2019
** Linguaglossa et al. Survey of Performance
Acceleration Techniques for Network
Function Virtualization. Proc. of IEEE, 2019
Fluid Networking: HW-SW Continuum
11. Customer
Premises
BS MEC | Access Cloud | PoP DC
Edge
Cloud DCs
Core
+
-
Fluid Networking: Quest for Latency
Source: Google Cloud Infrastructure
12. Customer
Premises
BS MEC | Access Cloud | PoP DC
Edge
Cloud DCs
Core
+
-
Source: EU FP7 UNIFY
Fluid Networking: Optimizing the E2E Compute Pool
13. Fluid Networking: Decoupling functionality / location
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
+
-
Latency
Capacity
Cost
Source: EU
Superfluidity
14. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
Optimize for Latency
(Latency-sensitive Source to Function)
Optimize for
Performance/Cost
Control plane component(s)
Data plane component(s)
15. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
?
?
?
?
16. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
?
?
?
?
17. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
18. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
19. The Fluid Networking landscape
Customer
Premises
BS MEC - Access Cloud - PoP DC
Edge
Cloud DCs
Core
HW
SW
ML
22. NFV layers of SW, Virtualization and HW platforms
HW
SW
Source: https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/Kalimani-and-Barak-Accelerating-NFV-with-DPDK-and-SmartNICs.pdf
23. VNF offloading to Hardware
HW
SW
Source: https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/Kalimani-and-Barak-Accelerating-NFV-with-DPDK-and-SmartNICs.pdf
24. HW
SW
VNF offloading to Hardware
Source: https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/Kalimani-and-Barak-Accelerating-NFV-with-DPDK-and-SmartNICs.pdf
25. VNF offloading on multi-vendor P4 fabric
controlled by ONOS via P4Runtime
HW
SW
Source:
https://p4.org/assets/P4WS_2018/7_Carmelo_Cascone_VNF.pdf
26. Slicing IoT Analytics
Access Cloud
uRLLC GW eMBB
GW
vCDN
Edge Cloud Central
Cloud
5G CP
mMTC
vCDN
MEC/vCDN
Internet Internet
Midhaul
5G
AAU
5G
DU
5G CU
Fronthaul
5G
AAU
Transport
Network Slicing
Slice 1: 100M,
eMBB
Slice 2: 1G,
Video
Slice 3: 90G
Other Slices
Slice 1: 100M,
eMBB
Slice 2: 1G,
Video
Slice 3: 90G
Other Slices
Transport
Network Slicing
Cloud – Slice
Orchestr &
Management
System
Network – Slice
Orchestr &
Management System
DoJoT
DoJoT DoJoT
IoT
Monitoring & Control
Slice 5: mIoT – computing intensive
Slice 4: mIoT – latency
sensitive
Source: http://www.h2020-necos.eu/
27. In-Network Computing
* D. Ports and J. Nelson. When Should The Network Be
The Computer?. HotOS'19
IRTF Computation in the Network (COIN)
28. J. Vestin et al. In-Network Control and Caching for Industrial
Control Networks using Programmable Data Planes. 2018
29. X. Jin et al. Netcache: Balancing key-value
stores with fast in-network caching. SOSP'17
30. H. Tu Dang et al. P4xos: Consensus
as a Network Service. 2018
31. SwitchML: the network is the ML accelerator
A. Sapio et al. Scaling Distributed Machine
Learning with In-Network Aggregation. 2019
32. E. Costa Molero et al. Hardware-Accelerated
Network Control Planes. HotNets'18
* T. Holterbach et al. Blink: Fast Connectivity
Recovery Entirely in the Data Plane. NSDI'19
Control plane functions offloaded to Data plane / HW
33. S Chinchali et al. Network Offloading Policies for
Cloud Robotics: a Learning-based Approach.
Auton Robot 45, 997–1012 (2021).
Robot Offloading Problem
34. Robotic Arm Controller Offloading
Low Latency Industrial Robot Control in Programmable Data Planes
• P4 data plane
• parses robot 3D coordinates
• detects “stop region”
• accelerates stop command
by crafting a stop message
inside the TCP connection
F. Rodriguez et al. ."Towards Low Latency Industrial
Robot Control in Programmable Data Planes". In
IEEE NetSoft 2020, Ghent, Belgium, June 2020
35. Robotic Arm Controller Offloading
In-Network Centralized Done Collision Avoidance
Algorithm in Programmable Data Planes
• P4 Workshop 2021
(Award Winner of most novel non-networking use of P4)
37. ML for Networking
A myriad of recent works (as in all areas, not just networking)
• Boutaba, R., Salahuddin, M.A., Limam, N. et al. “A comprehensive survey on machine
learning for networking: evolution, applications and research opportunities”. JISA, 2018.
• M. Usama et al., "Unsupervised Machine Learning for Networking: Techniques,
Applications and Research Challenges," in IEEE Access, vol. 7, pp. 65579-65615, 2019,.
• J. Xie et al., "A Survey of Machine Learning Techniques Applied to Software Defined
Networking (SDN): Research Issues and Challenges," in IEEE Communications Surveys &
Tutorials, vol. 21, no. 1, pp. 393-430, 2019.
• M. A. Ridwan, N. A. M. Radzi, F. Abdullah and Y. E. Jalil, "Applications of Machine
Learning in Networking: A Survey of Current Issues and Future Challenges," in IEEE
Access, vol. 9, pp. 52523-52556, 2021
• ...
38. ML for Networking
Source: Boutaba, R., Salahuddin, M.A., Limam, N. et al. “A comprehensive survey on machine learning for networking: evolution, applications and research opportunities”. JISA, 2018.
40. ML applications
Fluid Network Planes
In-network Machine Learning moving to the edge
Machine Learning offloading strategies:
Design space for offloading machine learning inference tasks from the end-host’s CPU.
Siracusano, Giuseppe, et al.
"Running neural networks on the NIC."
https://arxiv.org/abs/2009.02353
42. ML applications
Fluid Network Planes
ETSI ZSM Closed Loop Automation
•
N. Sousa et al.,"Machine Learning-Assisted Closed-Control Loops for Beyond 5G Multi-Domain
Zero-Touch Networks". In Journal of Network and Systems Management (JNSM), Special Issue on
Network Management in beyond 5G Systems, 2022
43. ML applications
Fluid Network Planes
ETSI ZSM Closed Loop Automation
•
Nathan F. Saraiva, Christian E. Rothenberg. "CLARA: Closed Loop-based Zero-touch Network Management
Framework". In IEEE NFV-SDN Doctoral Symposium, Nov. 2021. (Best Doctoral Symposium Paper Award)
44. Intent-based CCL for DASH Video SLA using ML-based Edge QoE Estimation
Recent and ongoing research
C. Rothenberg et al. "Intent-based Control Loop for DASH Video Service Assurance using
ML-based Edge QoE Estimation". In IEEE NetSoft 2020 Demo Session, Ghent, Belgium, June 2020.
45. Intent-based CCL for DASH Video SLA using ML-based Edge QoE Estimation
Recent and ongoing research
R. Mustafa et al. . "DASH QoE performance evaluation framework with 5G
datasets". In IEEE CNSM AnServApp. Nov. 2020.
46. Recent and ongoing research
https://github.com/razaulmustafa852/EFFECTOR
DASH QoE estimation from QoS-level metrics in 4G/5G scenarios
47. Recent and ongoing research
https://github.com/razaulmustafa852/EFFECTOR
DASH QoE estimation from QoS-level metrics in 4G/5G scenarios
49. Recent and ongoing research
Real 4G/5G trace collection in Youtube streaming
• Data sets to become public for open & reproducible research
Research Questions
• Evaluation of Youtube QoE KPIs for different mobile networks.
ie. how does Youtube perform in 5G (DSS vs NSA vs SA) compared to 4G?
• Can we predict future (e.g. 5 sec) Youtube QoE KPIs based on network-level
metrics (radio parameters and flow-level?)
– What to do after predicting a QoE downgrade (e.g., stalls, shift down)?
50. Recent and ongoing research
Real 4G/5G trace collection in Youtube streaming
• completed in Nice, France (4G vs 5G DSS)
• starting in Campinas, Brazil (4G vs 5G DSS) and Sao Paulo (5G NSA vs SA)
51. Detail of the solution (3/4)
Data Collection Use Cases
1) Pedestrian – Low Mobility
2) Driving – High Mobility
3) Static – Bus and Railway Terminals
4) Static Outdoor – High Crowd
52. Dataset Statistics
Mobility – Total Kilometers 300 +
Pedestrian – Total Kilometers 100 +
Number of Videos 10
Total Video Sessions 300 +, 1500 + Minutes Streaming
4G and 5G Data Consumed 300 + GB
5G Smartphone Samsung Galaxy S21 5G
4G Smartphone Samsung Galaxy S8
54. Results (2/4): 4G vs. 5G - CQI, RSRP, RSRQ, SNR
• We see a nice CQI (CQI is a feedback provided by UE to eNodeB), RSRP (RSRP is used
for measuring cell signal strength/coverage and therefore cell selection (dBm)) at Bus and
Railway Terminals, however, a massive difference RSRQ for Mobility, Pedestrian in 4G.
• 5G shows a similar pattern, but we see an opposite effect of RSRQ for Mobility and
Pedestrian
• Moreover, we experience better CQI in 5G as compared to 4G.
55. Results (3/4): 4G vs. 5G – Case Mobility Stalling Events - Handover
o 5G in mobility suffers
more stalling events as
compared to 4G.
o 5G also experience
more handover events.
o In 5G, it continues
streaming at higher
resolutions, even there
are stalling events as
compared to 4G, where
we see multiple quality
shifts.
o Most handover occurs
at CQI value 6.
4G vs. 5G QoE
56. Results (4/4): 4G vs. 5G – Resolutions (Mobility Cases)
o Mobility - High: 4G
experiences multiple quality
shifts during a video
session, thus less stalling
events, but 5G it is more
greedy with maximum
resolutions, thus cause
stalls.
o Mobility – Low: 4G suffers
with quality shifting as
compared to 5G. We see
95.3% hd2160 resolution
in 5G.
o Static cases experience
more stability in both
technology, i.e., hd1440
(10%) and hd2160 (90 %) in
4G and in 5G hd2160
(100%)
4G vs. 5G Resolutions
Mobility
-
High
Mobility
-
Pedestrian
57. Prev n-seconds Experience
● Look at the previous n-seconds
experience
● Look at the application QoE, and when
interruption came, see the previous
n-second channel metrics
○ Quality Shifts
○ Stalls
60. References
● Kaljic, Enio, et al. "A Survey on Data Plane Flexibility and Programmability in Software-Defined
Networking." arXiv preprint arXiv:1903.04678 (2019).
● L. Linguaglossa et al., "Survey of Performance Acceleration Techniques for Network Function
Virtualization," in Proceedings of the IEEE, vol. 107, no. 4, pp. 746-764, April 2019.
● Edgar Costa Molero, Stefano Vissicchio, and Laurent Vanbever. 2018. Hardware-Accelerated
Network Control Planes. In Proceedings of the 17th ACM Workshop on Hot Topics in Networks
(HotNets '18). ACM, New York, NY, USA, 120-126.
● Huynh Tu Dang, Marco Canini, Fernando Pedone, and Robert Soulé. “Paxos Made Switch-y.” In
ACM SIGCOMM Computer Communication Review (CCR). April 2016.
● JIN, Xin et al. Netcache: Balancing key-value stores with fast in-network caching. In: Proceedings of
the 26th Symposium on Operating Systems Principles. ACM, 2017
● Yuta Tokusashi, Huynh Tu Dang, Fernando Pedone, Robert Soulé, and Noa Zilberman. “The Case
For In-Network Computing On Demand.” In European Conference on Computer Systems
(EuroSYS). March 2019.
61. References
● D. Ports and J. Nelson. When Should The Network Be The Computer?. In Proceedings of the
Workshop on Hot Topics in Operating Systems (HotOS '19)
● Atul Adya, Robert Grandl, Daniel Myers, and Henry Qin. 2019. Fast key-value stores: An idea whose
time has come and gone. In Proceedings of the Workshop on Hot Topics in Operating Systems
(HotOS '19)
● Theophilus A. Benson. 2019. In-Network Compute: Considered Armed and Dangerous. In
Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS '19)
● Theo Jepsen, Daniel Alvarez, Nate Foster, Changhoon Kim, Jeongkeun Lee, Masoud Moshref, and
Robert Soulé. 2019. Fast String Searching on PISA. In Proceedings of the 2019 ACM Symposium
on SDN Research (SOSR '19)
● Thomas Holterbach, Edgar Costa Molero, Maria Apostolaki, Alberto Dainotti, Stefano Vissicchio,
Laurent Vanbever. Blink: Fast Connectivity Recovery Entirely in the Data Plane. NSDI 2019.
● A. Sapio, M. Canini, C.-Y. Ho, J. Nelson, P. Kalnis, C. Kim, A. Krishnamurthy, M. Moshref, D. R. K.
Ports, P. Richtarik. Scaling Distributed Machine Learning with In-Network Aggregation. KAUST
62. References
● A. Sapio et al. Scaling Distributed Machine Learning with In-Network Aggregation. 2019.
● Huynh Tu Dang, Pietro Bressana, Han Wang, Ki Suh Lee, Hakim Weatherspoon, Marco Canini,
Fernando Pedone, Noa Zilberman, Robert Soulé, "P4xos: Consensus as a Network Service", Tech
Report, University of Lugano 2018/01, May 2018
● H. Tu Dang et al. P4xos: Consensus as a Network Service. 2018
● Raphael Rosa and Christian Esteve Rothenberg. "The Pandora of Network Slicing: A Multi-Criteria
Analysis". ETT. 2019
● J. Vestin, A. Kassler, J. Åkerberg, FastReact: In-Network Control and Caching for Industrial Control
Networks using Programmable Data Planes. In 2018 IEEE 23rd International Conference on
Emerging Technologies and Factory Automation September 4th - 7th, 2018, Torino, Italy.
66. E2E CCL
Recent and ongoing research
• Technique for Simplifying Management of a Service in a Cloud Computing Environment.
PCT/EP2019/058274. 2019
● Service-specific metrics:
○ Per domain (e.g., cloud and
transport)
○ E2E view (cloud + transport)
○ Represented by Graphs
○ Stored into Graph-based
Database (Neo4J)
67. Detail of the solution (1/4)
4G and 5G dataset collection using the most popular video streaming
website YouTube with 1-second granularity.
YOUTUBE FEATURES:
❑ Stalls, Resolution, Video Bytes Downloaded that can further
provide per-session Objective QoE (i) Total Stalling Event, ii)
Stalling Ratio, iii) Stalling Time, iv) Quality Shifts or Percentage of
Time in a single Resolution, v) Dominant Resolution, etc.
68. Detail of the solution (2/4)
CHANNEL LEVEL METRICS
1-second granularity, few features below
❑ Timestamp, Longitude, Latitude, Velocity, Operator Name, Cellid,
Network Mode, Download bitrate, Upload bitrate, RSRQ, RSRP,
SNR, RSSI, CQI, RSRQ and RSRP values for the neighbouring
cell, among 100 + other channel metrics *.
* https://gyokovsolutions.com/manual-g-nettrack/
69. Detail of the solution (4/4)
• YouTube IFRAME API for YouTube QoE Logs
• G-NetTrack Pro - Wireless network monitor and drive test tool