Speaker: Daniel Towner, System Architect for Wireless Access, Intel Corporation
5G brings many new capabilities over 4G including higher bandwidths, lower latencies, and more efficient use of radio spectrum. However, these improvements require a large increase in computing power in the base station. Fortunately the Xeon Scalable Processor series (Skylake-SP) recently introduced by Intel has a new high-performance instruction set called Intel® Advanced Vector Extensions 512 (Intel® AVX-512) which is capable of delivering the compute needed to support the exciting new world of 5G.
In his talk Daniel will give an overview of the new capabilities of the Intel AVX-512 instruction set and show why they are so beneficial to supporting 5G efficiently. The most obvious difference is that Intel AVX-512 has double the compute performance of previous generations of instruction sets. Perhaps surprisingly though it is the addition of brand new instructions that can make the biggest improvements. The new instructions mean that software algorithms can become more efficient, thereby enabling even more effective use of the improvements in computing performance and leading to very high performance 5G NR software implementations.
2. 2
Introduction
In these slides we will be talking about:
Building Physical Layer software for Virtual Radio Access Network (vRAN)
What 5G means for Physical Layer compute requirements
Introducing Intel® Xeon® Scalable processors and Intel® Advanced Vector
Extensions 512 (Intel® AVX-512)
3. Why must general purpose processors be good at signal processing?
4. Virtual Radio Access Networks (vRAN)
vRAN is being used to create a new style of
adaptable RAN:
Efficient scaling of resource
Overall compute capacity reduces
through pooling
Improved load balancing
New types of services
ACCESS
NETWORK
RADIO ACCESS
TECHNOLOGY
CORE
NETWORK
4
5. 5
Custom System on a Chip (SOCs) for RAN
The benefits of vRAN come from being able to build the entire radio access
network on virtualised general purpose processors.
Previous generation RANs had custom
processing (e.g., DSP, accelerators) in some
parts of the system.
Difficult to scale to different
deployment sizes.
Limited co-location makes pooling hard.
Custom hardware requires significant
expense early in the design cycle.
6. 6
Network Stack Layers
Application
Presentation
Session
Transport
Network (L3)
Data Link (L2)
Physical (L1)
Network stack
High software-
complexity
Low/modest
processing
requirements
Low software
complexity
Huge data
processing
requirements
L1 often requires hardware
accelerators, or special DSPs
L2+ can be run on general
purpose processors.
L2+ Easily handled by
general purpose
processor
For vRAN the general purpose processor must be capable of handling signal processing.
7. The Promise of 5G
7
5G is promising many new features
and improvements over previous
standards:
More bandwidth
Greater capacity
Lower latency
Internet of Things
Ultra reliability
What do these features mean for the
Physical Layer signal processing?
8. 8
The Effect of 5G on the Physical Layer
Capacity
Improve RF utilisation:
• Beamforming
• NOMA
• Massive MIMO
Requires sophisticated
floating-point signal
processing functions
to make better use
of spectrum.
Increased bandwidth
• More bands
• mmWave
More data to process at
any given time.
Lower latency
Faster turn-around, resulting in
the need to compute results
faster than previously required.
5G
9. Beamforming is used to improve capacity
Beamforming is a sophisticated algorithm, and requires floating-point to work
well. The virtualized general purpose processor must be able to run signal
processing in floating-point on large data sets.
We will revisit this later to see how Intel’s processors enable it to be done.
9
Case Study: Beamforming
Omni-directional
Frequency/time multiplexing to
target different UEs
Beamforming
Spatial multiplexing as well as frequency and
time multiplexing improves spectral efficiency
10. Delivering General Purpose
5G vRAN Compute
10
We have:
More complex algorithms
Processing more data
With less time to do it!
We have to compete against accelerators and DSPs
for Physical Layer processing, and do so in the more
demanding world of 5G.
The general purpose processor needs to deliver
dramatically better compute capabilities than
previously possible.
This is what Intel® Xeon® Scalable processors (formerly Skylake) can deliver.
12. 12
The Xeon Scalable Processor has many
improved features over the previous generation
Xeon processors:
Intel® AVX-512: A comprehensive extension
to to to the existing vector instruction set.
Improvements to the cache hierarchy.
Improved microarchitecture to deliver more
instructions per cycle.
We need to look at some of these in more
detail to understand how and why they help
with building 5G vRAN.
Intel® Xeon® Scalable Processors
13. 13
Compute: 2x Data Throughput compared to
previous generation
Twice as many
floating-point units
Basic instructions
are now 512b,
instead of 256b
L1 cache bandwidth
has doubled
L2 bandwidth has
doubled
1MB L2
Cache
32K Data
Cache
Load
Unit
Load
Unit
Store
Unit
Float
Unit
Float
Unit
ALU
Shuffle
Instruction scheduler
Register File
32 x 512b
L3 Cache
32K Insn
Cache
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
14. 14
Data: 4x as Much Storage as previous generation
Non-inclusive
caching improves
the overall
efficiency of the
memory hierarchy
(i.e., data is not
replicated in
different cache
levels), and gives
each core more
storage.
Each register is
twice as big and
there are twice as
many
L2 cache up to
1MB from 256KB
1MB L2
Cache
32K Data
Cache
Load
Unit
Load
Unit
Store
Unit
Float
Unit
Float
Unit
ALU
Shuffle
Instruction scheduler
Register File
32 x 512b
L3 Cache
32K Insn
Cache
The processor’s working set increases in size,
thereby improving efficiency.
5G wireless algorithms tend to fit neatly into the
new register sizes, making them more efficient.
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
15. 15
Many new instructions, ranging from
bit manipulation to sophisticated
floating-point operations.
Instructions aren’t just wider but do
more too. What took several
instructions in previous processors
can now be done in one instruction.
In combination the compute efficiency
has improved. There is 2x as much
compute resource, and more than 2x
as much processing can be done.
New Instructions: Intel® AVX-512
Intel® Xeon®
processor
families
(formerly
Haswell and
Broadwell)
Intel® Xeon®
processor
Scalable
family
(formerly
code-named
Skylake-SP)
SSE* SSE*
AVX AVX
AVX2 AVX2
AVX512CD
AVX512F
AVX512DQ
AVX512BW
AVX512VL
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
16. 16
Some of the important new
instructions:
Masking – operate on selected
SIMD elements only.
Ternary logic – combine 3 boolean
operations into one.
Bigger set of conversions possible.
Extended floating-point operations.
How does this help 5G vRAN?
New Instructions Continued…
17. 17
How Does Intel® AVX-512 Help Beamforming?
Sequential beamforming
Input Output
Input
Input
Input
Input
Input
Input from multiple
data sources:
wider loads, gather
instructions, multi-
input permute
Heavy-duty floating point
algorithm: more floating
point units available
Algorithm requires special
instructions to handle edge
cases: ternary logic, NaN/Inf
handling, and so on.
Output
Output
Output
Output
Output
Output to multiple
data sources: wider
stores, scatter
instructions,
multi-input permute
Run beamforming on multiple
data sets (external data
parallelism).
Number of data sets is
governed by the number of
SIMD lanes: Intel® AVX-512
provides more lanes
Some beams will be marked as
invalid and won’t generate a
useable answer: mask register
switches off individual lanes
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
18. 18
How Does Intel® AVX-512 Help Modulation Mapping?
…1101010100010101 S0 S1 S2 S3 S4 S5 S6…
High-throughput streaming of
input bits and output symbols:
wider load/store, higher cache
bandwidth
QPSK
QPSK mapping is a direct bit to symbol
conversion: mask registers allow direct lookup
QAM mapping is a table lookup of bit groups to
symbols. The large table size requires regions of
memory/cache to be set aside to store the table.
Throughput is governed by number of loads per
cycle from L1, not the raw compute throughput.
S0
s1
S1
Sn
0101 S5
QAMn
Intel® AVX-512 allows the tables to be stored in
the register file itself: more and wider registers,
multi-input permutes, masked blends.
Digital data streaming in Radio modulation data streaming out
Mapper
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
19. 19
How Does Intel® AVX-512 Help Polar List Decoding?
ListN decoding in
SIMD lanes. Each
small square is an
integer value (LLR)
Polar List
Decoder
Noisy digital data in Error corrected clean digital data out
Polar operates on blocks
of data: 2x compute,
wider registers, more
registers
Decoding sequence left-to-right
Each element has one of
two different operations
performed on it at any
given point: mask registers
Final `LLR’ integer values
converted to raw bits:
threshold to mask
instructions
Reorder elements across
or within lanes: multi-
source permute, wider
registers, finer element
granularity
New instructions can
allow faster processing:
range instruction (e.g.,
prior min product’)
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html
20. 20
Intel® AVX-512 for Signal Processing
This presentation has shown three different signal processing kernels:
Beamforming: heavy duty floating point
Modulation mapping: high-throughput bit bashing
Polar decoding: per-element conditional processing
All three benefit from Intel’s AVX-512 instruction set, and the same is true
of other important signal processing kernels.
Intel provides a SDK called FlexRAN which provides a comprehensive set of
software building blocks for RAN ECO system to build Virtualized LTE & 5G
NR VNF.
21.
22. 22
Conclusion
To realize the benefits of vRAN, general purpose processors need to handle
signal processing effectively.
Intel® Xeon® Scalable Processors deliver many new improvements which
allow them to take on the considerable challenge of 5G Physical Layer
processing.
In vRAN one compute platform can take on everything from edge to cloud,
being placed where needed.
Smart Devices Radio Access
Technology
Access and
Edge Network
Core Network Cloud
NFV/SDN
MN WAVE/
LTE/
NB-IOT/
WIFI