Standards based npu-switchfabricdevicesfornext-generationmulti-serviceplatformsaccess(communicationsdesignconferencep-00169-r10
1. P244_Borgioli_Paper page 1 of 9
Standards-Based NPU/Switch Fabric Devices for
Next-Generation Multi-Service Platforms
Richard Borgioli and Raffaele Noro, Vitesse Semiconductor
Ho Wang, Intel Corp
ABSTRACT
The need to merge voice and data onto managed IP networks has spawned the
development of new multi-service platforms for telecom applications. Packet switches
and routers used in the next-generation of multi-service platforms can now be
characterized by commercially available integrated devices featuring standards-based
interconnection interfaces to provide a low cost alternative to a custom ASIC solution.
These devices include network processors and switch fabrics that interconnect via the
CSIX-L1 standard. Presented, here, are simulation results for a CSIX-based
NPU/Switch Fabric configuration as may be used in establishing a service level
agreement for a differentiated services (DiffServ) node. The results serve to not only
profile the combined capability of these devices in an OC- 48 multi-service application,
but to also illustrate the utility of a standards-based approach.
1. INTRODUCTION
The convergence of voice, data, and video onto packet switched networks is being
realized through new network deployments and the transformation of traditional singular
communication networks into multi-service architectures [1]. The desire to support all of
the traffic types with packet-based protocols such as ATM, Ethernet, and IP, has resulted
in the need for telecom equipment with increased versatility and complexity to provide a
new generation of applications and services. Advances in network processor unit (NPU)
and switch fabric devices have now made standards-based components available to
telecom equipment manufacturers with the processing speeds, QoS features, and
programmability necessary to offer a viable alternative to an ASIC implementation [2].
Important areas of application for new standards-based NPU and switch fabrics
include multi-service media gateways designed to transfer VoIP and other packetized
media flows between PSTN, Mobile, Core and IP networks, as depicted in Figure 1. As
the access points of a converged network use different protocols for transporting data and
voice (i.e. ATM, IP, PPP, and SONET), the task of the media gateway is to seamlessly
reformat the media streams at each network interface while supporting QoS guarantees.
2. P244_Borgioli_Paper page 2 of 9
P S T N
C o r e
N e t w o r k
M o b i l e
N e t w o r k
I n t e r n e t
T 1 / E 1 , T 3 . O C -3 / S T M 1G b E , O C- 3 / 1 2 , O C -4 8
T 1 / T 3 , O C-3 / 1 2 , G b E O C -3 / 1 2 , O C-4 8 , G b E
M u l t i -S e r v i c e
P l a t f o r m
N P U / S w i t c h F a b r i c
N P
N P N P
N P
Figure 1 NPU-Switch Fabrics in Multi-Service Platforms
Current generation multi-service media gateway platforms provide aggregation at
medium densities scalable to OC-12 (622 Mbps) as well as 10/100 Base-T Ethernet.
Next generation platforms are aimed at handling OC-48 (2.5 Gbps) and 1 GbE flows
scalable to OC-192 and 10 GbE. Along with higher densities, these platforms must
provide carrier grade voice quality and programmable bandwidth management.
Historically, platforms of this complexity have required custom ASIC solutions.
Standards-based processor and switch fabric chip sets are now available, however, that
promise to deliver such capability without the up front cost of ASIC development. The
task then becomes one of investigating the capabilities of a standards-based system for
the application at hand, as described and illustrated in this paper.
2. DESIGN OF MULTI-SERVICE PLATFORMS
Architectural Features
Multi-service platforms require a highly scalable architecture that includes
integrated packet switches varying in size from a few ports in access networks to
hundreds of ports in enterprise networks. A multi-service platform will normally contain
a number of NPU-based line cards (typically one per interface port), a switch fabric, and
a management and control unit for control plane operations [3], as shown in Figure 2.
Physical layer devices connect to the NPU at both the source and destination ports. On
ingress, the NPU formats incoming data streams of various types (e.g. GbE, OC-48 POS)
into fabric compatible frames, or cells. The switch fabric transfers these cells to the
appropriate egress NPU(s), which in turn reformats the data before passing it to the
physically connected destination port.
3. P244_Borgioli_Paper page 3 of 9
Switch
Fabric
Management & Control Unit
PHY or MAC NPU
PHY or MAC NPU
O C-48, GbE, ..
NPU
NPU
OC-48, GbE, …
PHY or MAC
NPU – Fabric
Interfaces
NPU – Fabric
Interfaces
OC-48, GbE, …
PHY or MAC
O C-48, GbE, ..
Figure 2 Multi-Service Platform containing a NPU - Switch fabric assembly
Presently available NPU - Switch Fabrics chip sets now feature OC-48 POS (2.5
Gbps) port processing capability and switching capacities of 16 -32 ports (40 - 80 Gbps)
[2] with roadmaps to OC-192 port and 160 - 320 Gbps switch capacity. In addition,
switch fabric devices of this type now contain built-in queue management and standards-
based interfaces to NPUs to provide a fully integrated packet switching system with
designed-in quality of service (QoS) features for next generation multi-service platforms.
These quality of service categories include Guaranteed Delay (GD) for time-sensitive
traffic such as voice, audio and video, Guaranteed Bandwidth (GB) for loss-sensitive
traffic for VPN and file transfers, and Best Effort (BE) for the least time-critical traffic
such as Web access and e-mail. These service categories are indicative of a wide range
of QoS architectures [4] (e.g. DiffServ, IntServ and ATM TM 4.1) targeted in multi-
service platform applications.
Functionality
The multi-service platform performs traffic management, along with switching
and system control. Traffic management functions are largely carried out by the NPUs,
while the switch fabric allows traffic to be transferred at wire speed from one NPU to
another. The traffic management functions include classification, marking, metering,
policing and shaping [5, 6]. Packets received at the ingress NPU are classified based on
some packet label (DSCP, ATM VP/VC, MPLS) or on some other criteria (origin,
destination, protocol, etc). Packets may be conforming or non-conforming to the existing
traffic contract for the particular flow, with non-conforming packets marked and/or
discarded. Packets are segmented into cells and stored in memory for subsequent
switching. A scheduler determines the transmission order of the cells to the switch fabric,
essentially as a function of the bandwidth assigned to each output port and each class.
The switch fabric must maintain the integrity of data and the classification
performed by the NPU, as well as implement class-based handling of the different traffic
flows. This is achieved by either allocating a minimum bandwidth to each class or by
4. P244_Borgioli_Paper page 4 of 9
serving classes with a pre-determined priority, or in some instances by a combination of
both methods. Cells received at the egress NPU from the switch fabric are re-assembled
in packets and stored in memory. A scheduler determines the transmission order of
packets to the output port, and in certain applications will re-shape the traffic by
rescheduling certain flows ahead of others.
System management and control of the platform allows tailoring the features of the
system to the needs of the application, of the traffic, and of the network protocol. The
parameters that control the allocation of bandwidth, the policing, and the classification
should be programmable and should be optimized for each specific case
3. CSIX STANDARD-BASED IMPLEMENTATION
The interface between the NPU and the switch fabric provides for the transfer of
cells based on destination and service classification. Each cell consists of a header and a
payload, with the header containing system information (i.e., destination, class, and
congestion control), while the payload contains the data to be transferred across the
fabric. The physical interface is a parallel data bus that allows for the data payload to be
transferred as a continuous series of words. Standard interfaces are defined by the
transmission protocol contained in the cell header and by bus characteristics, such as
word size and clock frequency. These interface standards, which began with the CSIX-
L1 standard (described here), now include SPI4, NPSI, and emerging standards for
advanced switching employing PCI-Express.
The CSIX-L1 Standard
The CSIX-L1 standard [7] specifies a format and a protocol for the exchange of
information between NPUs and switch fabrics at the physical layer level. The cells
exchanged across the CSIX interface are called CFrames. A CFrame consists of a base
header, an optional extension header, a payload, and a vertical parity word, as
diagrammed in Figure 3. The headers carry system information (type, class, destination,
payload length, and flow control bits), while the payload contains the actual data to be
transferred. The maximum length of the payload is generally determined by the cell size
of the fabric (e.g. 64, 96, 128 bytes), not to exceed an absolute maximum of 256 bytes.
The class field of a CFrame can be used to implement differentiated per-class
treatment of data flows through the fabric by means of a bandwidth allocation and flow
control mechanisms built into the fabric. The CSIX standard provides for both an in-
band and out-of-band method for the transmitter to prevent congesting the receiving
component. The in-band mechanism, called link-level flow control, is defined by two
bits in the header (i.e., “Ready Bits”) that enable/disable transmission of CFrames to
prevent fabric memory overflow. The out-of-band method, referred to as CSIX flow
control, is defined by a control CFrame that enables and disables the transmission of
CFrames of a selected type, class, or destination.
5. P244_Borgioli_Paper page 5 of 9
CFrame header
• Frame Type
• Ready State
• Payload Length
CFrame extension header
(data frames only)
• Class
• Destination
CFrame payload (data frames only)
or
Flow control fields
CSIX transmitter
device
CSIX receiver
device
CSIX bus
Figure 3 Format of CSIX frames (CFrames) across a CSIX interface
Mapping of QoS categories into CSIX classes
The Guaranteed Delay (GD) service category is characterized by a traffic contract
in which the traffic source commits to an average bitrate and a maximum burst size. In
return, the network element guarantees a minimum throughput and a maximum latency.
In a CSIX-based multi-service platform, a sufficient amount of fabric bandwidth and
buffer space must be allocated to GD traffic. In this case GD traffic would be mapped to
a CSIX class that is served with strict priority over other classes. The Guaranteed
Bandwidth (GB) service category is characterized by a traffic contract in which the traffic
source commits to an average bitrate, but no maximum burst size. In return, the network
element guarantees a minimum throughput, but does not guarantee a maximum latency.
In this case the GB traffic would be mapped to a CSIX class that is served at a minimum
rate, for example with Weighted Fair Queuing (WFQ) or Weighted Round Robin WRR
scheduling [8]. The Best Effort (BE) service category is characterized by no traffic
contract and therefore no commitment is made by the network, and so BE traffic would
be mapped to a CSIX class that is served with lowest priority.
Enforcement of the traffic contract
In order to enforce a GD traffic contract both the NPU and switch fabric must be
considered in meeting the throughput and delay guarantee. Both ingress and egress
NPUs must police/shape the traffic and serve it at a rate that is equal to or greater than the
negotiated throughput [9], using a scheduling mechanism, such as WFQ or WRR. This
guarantees the minimum throughput and the maximum delay in the NPU, where the
maximum latency is determined from the arrival and service curve of the traffic [10] for
both the NPU and the switch fabric, as illustrated in Figure 4.
6. P244_Borgioli_Paper page 6 of 9
Max delay
Arrival curve:
Burst_size + arrival_rate*t
Service curve:
service_rate*(t-transfer_time)
Time t
Cumulative
bits of the
traffic flow
Backlog
Ingress/egress NP
A)
Max delay
Arrival curve:
Burst_size + arrival_rate*t
Service curve:
service_rate*(t-contention_time-
transfer time)
Time t
Cumulative
bits of the
traffic flow
Backlog
Switchfabric
B)
Figure 4 Maximum delay and backlog in the NPU and switch fabric
The maximum delay for the NPU is given by:
NPU Delay = Transfer_time + Burst_size / Service_rate
where the transfer time is the time required by a maximum size packet to traverse the
NPU and the burst delay is the time to transfer the additional packets accumulated during
the delay period in the transmitting component (i.e., the previous stage). The traffic
received from the ingress NPU is deterministic in average bitrate and maximum
burstiness. The switch fabric will serve it either with absolute priority (i.e., full
bandwidth) or by assigning a minimum rate (i.e., WFQ or WRR), while allocating
sufficient buffer space to absorb the burst. This guarantees the minimum throughput and
establishes the maximum latency in the switch fabric, as:
Fabric Delay = Transfer_time + Burst_size / Service_rate + Contention_time
where the contention time represents a full fabric arbitration cycle for all inputs and
outputs.
The global QoS level for GD traffic in the platform is determined by the combination of
individual QoS levels in each NPU and the switch fabric. The guaranteed throughput is
the minimum throughput of the three devices, and is typically determined by the ingress
NPU where policing/shaping occurs. The guaranteed latency is the sum of the maximum
delays for each device as they accumulate on the specific flow of interest.
Since the interaction of all three units must be considered in determining QoS
capabilities, it becomes important to have software development tools available that
enable the three devices to be programmed and studied as an integral unit, so that packet
flow studies can be performed, such as the simulation experiment described and
illustrated in the next section.
7. P244_Borgioli_Paper page 7 of 9
4. EXPERIMENTAL RESULTS
Simulation Environment
As an important example of QoS capabilities in multi-service platforms, we
simulate a DiffServ [11] node. The QoS mechanism of DiffServ, which is based on
‘local’ or per-node guarantees on throughput and latency (defined as a Per-Hop Behavior
(PHB)), consists of an Expedite Forwarding PHB, an Assured Forwarding PHB, and a
Default PHB, similar to the GD, GB, and BE categories previously outlined. The object
of the simulation is to quantify the latency of the node on a packet flow to be classified in
the time-critical EF PHB category and to ascertain over what levels of competing
background traffic the latency guarantee can be maintained.
In producing the experimental results presented, here, we used the simulation
tools provided in the Intel®
IXA SDK 3.0 [12]. A pre-built software model of the Vitesse
GigaStream®
switch fabric was “attached” to ingress and egress models of the Intel®
IXP2400 network processor. A simulated packet stream representing the flow under test,
was run through this “integrated simulator”, with timing information recorded on the
arrival and departure of each packet through the system. The simulation was used to
determine both the throughput and maximum latency of the system on various flows, as
would be needed in establishing a QoS service level agreement (SLA).
System Configuration
The DiffServ node simulated, here, is shown in Figure 5. It consists of 16 bi-
directional OC-48 ports, each connected to a line card containing an ingress and egress
IXP2400 network processor interfaced to a GigaStream VSC872 transceiver through a
32-bit CSIX-L1 bus clocked at 125 MHz, to provide up to 4 Gbps of CSIX bandwidth
per port. The line card for each port is connected to a switch card containing 4
GigaStream VSC882 crossbar switches, with each switch connection made through high
speed serial links (HSSLs). The internal clock speed of the IXP2400 network processor
is set at 600 MHz and the internal core clock of the Gigastream switch fabric is set at
155.52 MHz, establishing the switch fabric connection speed at 2.5 Gbps per serial link,
for a raw connection speed of 10 Gbps per port.
The IXP2400 network processor has fully programmable processing units, called
micro-engines. We program them to perform classification, metering, policing, and per
class scheduling at both ingress and egress, while packets are buffered in three
independent high bandwidth RDRAM channels. The GigaStream VSC872 contains
programmable buffer space for absorbing traffic bursts in order to avoid flow control
activity that can reduce user bandwidth across the CSIX interface. Two levels of priority
are provided (HP, LP) with HP served first at all stages. In the experiment, we allocate 20
CFrames of buffer space in the switch fabric for both HP and LP traffic. Round-Robin
scheduling is used in both ingress and egress NPUs.
8. P244_Borgioli_Paper page 8 of 9
Switch CardFull-duplex NPU line card (port 1 of 16)
Intel® IXP2400
network processor
(Ingress)
Intel® IXP2400
network processor
(Egress)
Traffic flow under test
1 Gbps IP POS
CSIX interface
at 4 Gbps
OC-48 POS interface
at 2.5 Gbps
Traffic flow under test
(1 Gbps) and background traffic
(up to 1.5 Gbps)
GigaStream®
VSC872
Transceiver
4 HSSLs at 2.5 Gbps
= 10 Gbps per 872
Background traffic injected
through these ports
872
8722
GigaStream®
VSC882
Crossbar
Switch
GigaStream®
VSC882
Crossbar
Switch
GigaStream®
VSC882
Crossbar
Switch
GigaStream®
VSC882
Crossbar
Switch16
Figure 5 Simulated DiffServ node designed with Intel® IXP2400 network
processors and a Vitesse GigaStream® switch fabric
Simulation data
A 1 Gbps flow of 64-byte IP packets was used as the traffic under test on one of
the ports, with a variable amount of competing background traffic run on the 15
remaining ports. We collect latency statistics for 1000 IP packets, and plot the end-to-
end latency vs. the background traffic level shown in Figure 6.
Latency of a 64-byte packet EF DiffServ flow at 1 Gbps in a 16x16 switch
for various levels of background traffic
Aggregate background traffic (Gbps)
LatencyasapercentageoftheGuaranteedValue
0 5 10 15 20 25 30 35 40
0
10
20
30
40
50
60
70
80
90
100
110
observed maximum latency
observed average latency
established maximum latency
37.5 Gbps=100% usage of
the 15 background SPI-3 POS
input ports.
Extra traffic is for simulation
purposes only
Guaranteed Latency Level for SLA
Figure 6 Latency of the 1 Gbps flow for varying background traffic
9. P244_Borgioli_Paper page 9 of 9
The simulation data plotted in the above figure shows maximum latencies for the
1 Gbps EF PHB, ranging from 56% to 80% of the guaranteed latency level for an EF
PHB SLA against aggregate background traffic levels up to 37.5 Gbps, or 100% of the
available bandwidth for competing traffic. These simulation results indicate that the
traffic handling capability of the integrated NPU - Switch fabric is fairly robust, with
only a 24% increase in latency over the full range of traffic loading.
Conclusion
We described, here, a multi-service packet switch in which the CSIX-L1 standard
is used to interface the NPUs and the switch fabric. The standard allowed us to treat the
NPU-Switch fabric assembly as an integrated unit in terms of traffic management. The
availability of a standard framework environment enabled us to simulate a DiffServ node
based on IXP2400 network processors and a GigaStream switch fabric. As proof of
concept, an EF PHB SLA was investigated and seen to provide a stable level of latency
over the entire range of traffic loads. Further investigation of the proposed DiffServ
implementation is to be performed to ascertain that each type of PHB can be efficiently
negotiated under general profiles of traffic.
REFERENCES
[1] Tech Guide, “The Converged Network Infrastructure: An Introductory View”, White
Paper, Jan. 2001, available under: http://www.itpapers.com/
[2] Intel®
IXP2400 Network Processor / Vitesse GigaStream®
Switch Fabric Solution,
White Paper, available at: http://www.intel.com/design/network/papers/252142.htm
[3] S. Keshavand and R. Sharma, “Issues and Trends in Router Design”, IEEE
Communications Magazine, May 1998
[4] F. Chiussi and A. Francini, “A Distributed Scheduling Architecture for Scalable
Packet Switches”, IEEE JSAC, Dec. 2000
[5] H. J. Chao and X. Guo, “Quality of Service Control in High-Speed Networks”, John
Wiley & Sons, Sep. 2001
[6] T. Chu, “WAN Multiprotocol Traffic Management: Theory and Practice”,
Communications Design Conference, Sep. 2002
[7] Common Switch Interface Consortium, “CSIX-L1: Common Switch Interface
Specification-L1”, Aug. 2000
[8] H. Zhang, “Service Disciplines For Guaranteed Performance Service in Packet-
Switching Networks”, Proceedings of the IEEE, Oct. 1995
[9] N. Christin and J. Liebeherr, “A QoS Architecture for Quantitative Service
Differentiation”, IEEE Communications Magazine, June 2003
[10] J. Y. Le Boudec, “Application of Network Calculus To Guaranteed Service
Networks”, IEEE Transactions on Information Theory, May 1998
[11] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang and W. Weiss, “An architecture
for differentiated services”, IETF RFC 2475, Dec. 1998
[12] Intel®
Internet Exchange Architecture, White Paper, Feb 2002, available at:
http://www.intel.com/design/network/papers/intelixa.htm