This document describes a 4.9-6.4 Gb/s transceiver designed for backplane communications. It uses an adaptive decision feedback equalizer and achieves bit error rates below 10-15 over a 34-inch legacy FR4 backplane. Key components include a phase locked loop for clock synthesis, a transmitter with single-tap pre-emphasis, and a receiver with adaptive equalization and digital clock recovery. Measurement results demonstrate error-free performance over various backplane lengths.
1. A 4.9-Gb/s to 6.4-Gb/s Transceiver with Adaptive
Equalizer for Backplane Communications
Chris Siu, Jurgen Hissen, Guillaume Fortin, Jatinder Chana, Bernard Guay, Graeme Boyd, Tony Zortea, John Plasterer,
Matthew McAdam, Charles Roy, Mike Venditti, Geoff Allbutt, Gershom Birk, Kevin Betts, and Hormoz Djahanshahi
PMC-Sierra, 8555 Baxter Place, Burnaby, BC, V5A 4V7, Canada
Abstract- A 4.9-Gb/s to 6.4-Gb/s transceiver using NRZ coding is
presented in this paper. To achieve low bit error rates (BER)
over legacy FR4 backplanes, adaptive equalization with a multi-
tap Decision Feedback Equalizer (DFE) is used. This CEI-6-LR
transceiver has been implemented with production quality in a
standard 0.13µm CMOS process and consumes 285 mW per
link. BER of less than 10-15
has been demonstrated over a 34-
inch (86-cm) FR4 backplane.
I. INTRODUCTION
High-performance telecommunication, computing and
storage systems require ever-increasing data throughput and
reliability. In addition, there is a need for platforms to offer
performance upgrades while preserving existing backplane
designs. This results in the need for advanced serial
transceiver technology with higher data rates and low Bit
Error Rate (BER) performance [1]-[7]. This same trend has
resulted in the creation by Optical Internetworking Forum
(OIF) of CEI-6 and CEI-11 standards [8]. These standards
offer system architects the desired improvement in data
throughput while ensuring the interoperability of transceivers
from multiple vendors. Such standardization efforts must
also consider the implementation cost of the resulting
technology.
For serial links in the multi-gigabit per second regime,
signal integrity imposes fundamental performance limits.
The severity of frequency dependent attenuation, signal
reflections, and crosstalk noise dictates the achievable BER
performance for a given architecture. As these effects begin
to dominate, improvements can only be achieved by
increasing the complexity of the transceiver. It is therefore
crucial to achieve the best architectural cost vs. performance
trade-offs while achieving the desired BER performance.
This paper is organized as follows. First, an overview of
the transceiver architecture is provided. Next, we discuss the
major sub-blocks, namely clock synthesis PLL, transmitter
and receiver. Finally, measurement results, including BER
performance over different backplanes, are provided.
II. ARCHITECTURE OVERVIEW
In recent years, there has been much interest in the use of
alternate coding schemes to operate at high data rates over
legacy backplanes. For example, the use of PAM-4 [9] can
cut the required bandwidth in half, but with a tradeoff on
Signal-to-Noise Ratio (SNR). The choice of whether NRZ or
PAM-4 is more advantageous would depend on the bit rate
and the backplane’s frequency response [10]. We have
studied the characteristics of a number of backplanes, and
concluded that for legacy backplanes, designed for 3.125-
Gb/s operation, NRZ coding is more advantageous than
PAM-4 for transmission in the 6-Gb/s range.
The theoretical basis for this conclusion is a BER
estimation methodology, which incorporates the
characteristics of the main channels and crosstalk aggressors,
as well as jitter sources. A similar methodology is
implemented in StatEye (www.stateye.org) developed within
OIF. Given the extremely low BER demanded by equipment
vendors, it is not practical to evaluate an architecture by
using transient simulation. Rather, a statistical technique
such as StatEye can efficiently provide insight into the factors
dominating BER performance and serve as a guide to
architectural decisions.
A block diagram of the resulting 6.25-Gb/s transceiver is
shown in Fig. 1. The clock synthesis phase-locked loop
(PLL) generates a line rate clock in the range of 4.9 GHz to
6.4 GHz. The PLL output is fed to the Multi-Clock
Generator unit, which creates a number of lower rate clocks
for other blocks in the transceiver. The parallel-to-serial
converter (PISO) converts an N-bit word from the digital
core into a line rate serial stream. A transmitter (TX) with a
single pre-emphasis tap is used in the architecture to drive the
backplane. While other published architectures have used
extensive linear equalization techniques [1]-[5] (whether
implemented in the RX as filtering or in the TX as pre-
emphasis), this development avoids putting too much weight
on linear filtering techniques. Linear filtering amplifies
crosstalk. Due to the differences in frequency content of the
forward and crosstalk channels, overall SNR is degraded by
linear filtering such as pre-emphasis. The role of pre-
emphasis in this architecture is therefore reduced to the
minimum required to overcome other implementation
challenges such as the dynamic range of the nonlinear RX
2. equalizer, and the ability to recover clock from an
unequalized signal.
Parallel to
Serial
Converter
Serial to
Parallel
Converter
RX
Decision
Feedback
Equalizer
4.9GHz -
6.4GHz
PLL
Multi-Clock
Generator
LMS
Adaptation
Controller
Digital Clock
Recovery
Unit
TX
N bits
@ rate/N
4.9Gb/s
to
6.4Gb/s
4.9Gb/s
to
6.4Gb/s
N bits
@ rate/N
rate/2
Serial Diagnostic
Loopback
Metallic Loopback
Fig. 1. Block Diagram of the 6.25-Gb/s transceiver
The receiver (RX) buffers and amplifies the incoming
signal, which is often attenuated and distorted after traveling
through a long backplane. To compensate for distortion due
to Inter-Symbol Interference (ISI) and to maximize SNR in
the presence of crosstalk, a Decision Feedback Equalizer
(DFE) is used. A DFE is a non-linear equalizer and therefore
does not amplify crosstalk. Accordingly, it provides superior
performance in crosstalk-heavy environments such as those
that use high-density backplane connectors. The DFE
coefficients are set using a least-mean-square (LMS)
adaptation controller. After equalization, the received signal
is presented to a data slicer. The clock recovery unit
regenerates a bit clock from the received signal, which is
used as a sampling clock for the data slicer. The clock is
recovered from the unequalized signal to prevent complicated
interaction with DFE adaptation.
III. CLOCK SYNTHESIS AND GENERATION
Clock synthesis is performed using a traditional PLL
multiplier architecture. To achieve low random jitter, an LC
Voltage Controlled Oscillator (VCO) is used. The VCO uses
an integrated inductor with patterned ground shield and two
accumulation-mode MOS varactors as shown in Fig. 2(a).
Cross-coupled NMOS and PMOS pairs are employed in the
VCO to compensate the losses in the LC tank.
L
Cvar Cvar
VC
M1 M2
M3 M4
IBIAS
highVCO
lowVCO
SELVCOH
VC
IBIAS
(250uA)
VCO
Buffers
cmlAL implementation
SELVCOH
(a) (b)
Fig. 2. (a) LC VCO circuit, (b) Two VCOs with overlapping tuning range
used in clock synthesis PLL
Although LC VCOs provide excellent phase noise and
jitter performance, one disadvantage is the limited tuning
range. A simple LC VCO cannot tune over 4.9 GHz to 6.4
GHz reliably over process, voltage, and temperature (PVT).
In this design, we have elected to use two LC VCOs with
overlapping tuning ranges. One VCO is tuned to cover the
low band (4.9 GHz to 5.8 GHz), while the other is tuned for
the high band (5.7 GHz to 6.5 GHz). The difference between
the two VCOs is the number of varactors in their LC tank.
The PLL output goes to the Multi-Clock Generator,
which divides the 6-GHz range clock into a number of lower
rate clock phases. The design is based on current-mode logic
with active peaked loading (cmlAL) for increased bandwidth.
The differential pair and active load transistors are all low-Vt
nMOS with pwell in deep nwell. The clocks are buffered and
distributed to multiple TX/RX links on the device. These
buffers can support up to 8 TX/RX pairs while operating at
the highest line rate of 6.4 Gb/s.
IV. HIGH-SPEED TRANSMITTER
The transmit block consists of a Parallel-to-Serial
Converter (PISO) and a high-speed transmitter. The PISO
accepts an N-bit word from the digital core and converts it
into an up to 6.4-Gb/s serial data stream. It requires several
clocks from the Multi-Clock Generator. A word-rate clock is
used to latch in the digital word. Subsequent higher rate
clocks are then used to serialize the data. The PISO drives a
Current Mode Logic (CML) driver, shown in Fig. 3.
TXOP
TXON
TPD[N]
:
TPD[1]
TPD[0]
TX CML Driver
PISO
Clocks
Main Path
Pre-emphasis Path
Fig. 3. The 6.25-Gb/s CML Transmitter
The transmitter incorporates an internal termination and
can be programmed to provide variable swing from 0.4 to 1
Vpp differential (under worst-case conditions) with
programmable one-tap pre-emphasis.
One-tap pre-emphasis is a first-order compensation for
channel attenuation. Even though this is not sufficient on its
own to reliably open the channel eye at 6.4 Gb/s, it reduces
the work for the RX equalizer and also makes clock recovery
easier. An adaptive biasing approach maximizes usable
output swing by ensuring saturation of output devices.
V. RECEIVER WITH ADAPTIVE EQUALIZER
The receive section is comprised of several blocks:
• Linear receiver with gain control (RX)
• Decision Feedback Equalizer (DFE)
• LMS Adaptation Controller
• Digital Clock Recovery Unit (DCRU)
3. In order for the DFE and LMS adaptation to work, the
receiver must amplify the incoming signal to a target level,
while not limiting or clipping it. Hence, the receiver must be
linear for the signal range of interest. In addition, one should
expect this transceiver to operate over a variety of backplane
lengths. As a result, the receiver may see the full transmitter
swing (for short traces) or a small signal with lots of ISI (for
long backplanes). In order for the receiver to cope with this
range of signals, a variable gain control is implemented to set
the target internal swing at around 500 mVpp differential.
Clock recovery is performed on the unequalized input
signal. This decouples the clock recovery and DFE
coefficient adaptation feedback loops making for a robust
and more deterministic system. Clock recovery can be
accomplished on the unequalized signal despite the fact that
the eye is closed to probabilities of 10-4
in some channels.
In this implementation, a digital bang-bang architecture is
used for clock recovery. A phase interpolator modulates
quadrature half-rate clocks to track the incoming data stream.
Digital filters are used to maximize controllability and
observability during test. An optimal sampling point in the
eye is chosen to minimize pre-cursor ISI while recovering the
data. This sampling point can have drastic effects on the
BER.
Although the clock is recovered and the signal is amplified
to the desired level, it still is typically severely distorted due
to ISI. The DFE is used to remove most post-cursor ISI
components. Fig. 4 shows a block diagram of the DFE.
LMS Coefficient Adaptation
(digital)
b0
Σ Z-1 Z-1
b1 b9
DFE
(analog)
Z-1
RX
RDAT
RCLK
Fig. 4. Decision Feedback Equalizer (DFE)
Note that this DFE can remove up to 10 post-cursor ISI
components, which was found to be sufficient for a great
majority of backplanes. Since the impulse response of the
backplane is not known apriori, the DFE coefficients must be
set automatically by some mechanism. This is accomplished
using the LMS adaptation controller. The controller uses the
sign-sign Least Mean Squares (LMS) algorithm, described by
Eq. (1) below:
])1[sgn(*])[sgn(*][]1[ −−+=+ ktdtetCtC stepkk µ (1)
for k = 0 to (N-1), where
Ck[t] is equalizer coefficient number k at time t,
µstep is the adaptation step size,
e[t] is the equalizer output error at time t,
d[t] is the equalizer output value at time t,
N is the number of equalizer taps.
For pseudo-random input data, the LMS controller does
not require a training sequence. This means it can adapt
continuously to changing environments, such as temperature
fluctuations, while passing system traffic. As with clock
recovery, using a digital LMS controller provides great
flexibility for test.
To save area and power, a single LMS engine is used in a
patent-pending way for adapting all receive-side parameters
including DFE coefficients, RX gain, and offset
compensation [11]. Both the clock recovery and DFE are
implemented using a half-rate architecture to ease timing.
VI. MEASUREMENT RESULTS
A 4.9-Gb/s to 6.4-Gb/s backplane serlializer/deserializer
(SERDES) incorporating this transceiver was fabricated in
early 2004 in a 0.13µm CMOS process with 1.2V and 2.5V
power supplies, standard- and low-Vt transistors and deep-
nwell option, and housed in 320-pin flipchip package. Fig. 5
shows the block diagram (top) and die micrograph (bottom)
of the SERDES. In a typical application it multiplexes eight
3.125-Gb/s streams onto four 6.25-Gb/s channels. The chip
power is 3 W total, which includes 258 mW per link for the
6.25-Gb/s interface and 123 mW per link for the 3.125-Gb/s
interface, with the remaining nearly 1 W consumed by the
digital core. Extensive lab characterization was performed,
including BER measurements over different backplanes.
Fig. 6 shows the measured eye diagram and intrinsic
jitter, taken right at the transmitter output. In this
measurement, the transmitter is generating a PRBS-31
pattern at 6.25 Gb/s. The peak-to-peak jitter is shown to be
22 ps, or 0.14 UI.
BER testing over backplanes of various lengths was
performed. Backplanes made from FR4 material, available
from Tyco International, were used for these tests. These
Tyco “legacy” backplanes are intended to emulate the
existing 3.125-Gb/s XAUI systems deployed on the market
today, which were not originally designed for such high rates.
Three different lengths of backplane were used in these
tests: 5”, 20”, and 34”. The 34” backplane, together with
connectors and daughter cards, provides the most severe
attenuation out of the three and also contained several NEXT
and FEXT (near- and far-end crosstalk) aggressors. The 6.4-
Gb/s SERDES operated error-free over all three lengths. In
particular, over 200 hours of continuous error-free operation
was achieved on the 34” backplane, which translates into a
BER better than 1E-15. The part also demonstrated error-
free operation on numerous external 3.125-Gb/s backplanes.
Furthermore, precise digital internal margining techniques
4. have allowed the extrapolation of the BER beyond 1E-18,
which is not practical to measure in a lab setting. These
internal margining techniques can be used to depict the
effective post-sampler eye (a more accurate and stringent
figure of merit than RX eye since it includes sampler non-
idealities such as recovered clock jitter and receiver offsets).
This post-sampler eye is depicted in Fig. 7. The spot near the
middle indicates the sampling location. It can be seen that,
while the voltage offset compensation appears to be working
properly, a slight internal phase mismatch is present,
although its effect on the overall performance is very small.
This illustrates how this powerful feature can be used to
diagnose subtle circuit and system problems.
JTAG
Test Access
Port
Management
Interface
Clock
Synthesizer
REFCLK
XTAL
Common Control
Logic
. . .
3G
RX
RX3G0_P/N 3G Data
& Clock
Recovery
Tx FIFO/
Mux
Parallel to
Serial
Conversion
(PISO)
6G
TX
TX6G0_P/N
:
TX6G3_P/N
3G
TX
Parallel to
Serial
Conversion
(PISO)
Rx FIFO/
Demux
6G Data
& Clock
Recovery
6G
RX
RX6G0_P/N
:
RX6G3_P/N
:
RX3G7_P/N
TX3G0_P/N
:
TX3G7_P/N
Fig. 5. Block diagram (top) and micorgraph (bottom) of the 4.9-Gb/s to
6.4-Gb/s SERDES chip in 0.13µm CMOS
Fig. 6. Measured Transmit Eye and Jitter at 6.25 Gb/s
Fig. 7. Internal post-sampler eye at 6.4 Gb/s over 34” legacy backplane.
VII. CONCLUSION
A 4.9- to 6.4-Gb/s SERDES using conventional NRZ coding
is implemented in a standard 0.13µm CMOS process. Using
an adaptive Decision Feedback Equalizer technique, the chip
is able to demonstrate a BER better than 1E-15 over a 34”
(86-cm) legacy FR4 backplane. Each 6.4-Gb/s link
consumes 285 mW. The product is compliant to the OIF
CEI-6G LR (long reach) standard.
ACKNOWLEGMENT
The authors would like to acknowledge the hard work of
many additional PMC-Sierra staff, in particular Ognjen Katic
and Dr. Yuriy Greshishchev who made significant
contributions to make this device possible.
REFERENCES
[1] V. Balan, et. al, “A 4.8–6.4 Gbps Serial Link for Backplane
Applications using Decision Feedback Equalization,” Proc. IEEE
CICC, Oct. 2004, pp. 31-34.
[2] M. Sorna, T. Beukema, et. al, “A 6.4Gb/s CMOS SerDes core with
Feedforward and Decision-Feedback Equalization,” ISSCC Dig. Tech.
Papers, Feb. 2005, pp. 62-63.
[3] P. Landman, et. al, “A Transmit Architecture with 4-Tap Feedforward
Equalization for 6.25/12.5Gb/s Serial Backplane Communications,”
ISSCC Dig. Tech. Papers, Feb. 2005, pp. 66-67.
[4] N. Krishnapura, et. al, “A 5Gb/s NRZ Transceiver with Adaptive
Equalization for Backplane Transmission,” ISSCC Dig. Tech. Papers,
Feb. 2005, pp. 60-61.
[5] K. Krishna, et. al, “A 0.6 to 9.6Gb/s Binary Backplane Transceiver
Core in 0.13µm CMOS,” ISSCC Dig. Tech. Papers, Feb. 2005, pp.
64-65.
[6] R. Payne, et. al, “A 6.25Gb/s Binary Adaptive DFE with First Post-
Cursor Tap Cancellation for Serial Backplane Communications,”
ISSCC Dig. Tech. Papers, Feb. 2005, pp. 68-69.
[7] S. Wu, et. al, “Design of a 6.25Gb/s Backplane SerDes with Top-
down Design Methodology,” DesignCon2004, Feb., 2004.
[8] OIF CEI 02.0, “Common Electrical I/O (CEI) – Electrical and Jitter
Interoperability Agreements for 6G+ bps and 11G+ bps
I/O,”,February 28, 2005.
[9] Sonntag et. al, “An Adaptive PAM-4 5Gb/s backplane transceiver in
0.25µm CMOS” Proc. IEEE CICC, 2002, pp. 363 – 366.
[10] Johnson, Howard. High-Speed Digital Design: A Handbook of Black
Magic, Prentice Hall, NJ, 1993.
[11] US patent application 10/725,183 – “A Flexible Adaptation Engine
for Adaptive Transversal Filters”.