Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine

Introduction to the Lumbar Fusion Guidelines
As scientific understanding of the pathophysiology of
degenerative disease of the lumbar spine has increased, the
possibilities for correcting the underlying problem and the
resulting improvement in clinical function have expanded
exponentially. Fueled by advances in material technology
and surgical technique, treatment of greater numbers of
individuals suffering from lumbar spinal disease has prolif-
erated. Using data from the National Hospital Discharge
Survey, Deyo and colleagues4
described a 200% increase in
the frequency of lumbar fusion procedures in the 1980s.
Davis3
observed that the age-adjusted rate of hospitaliza-
tion for lumbar surgery and lumbar fusion increased greater
than 33% and greater than 60%, respectively, from 1979
to 1990. Lumbar fusion has been described as a treatment
of symptomatic degenerative disc disease, spinal stenosis,
spondylolisthesis, and degenerative scoliosis. Lumbar fu-
sion has been performed to treat acute and chronic low-
back pain, radiculopathy, and spinal instability.
As practitioners have become caught up in the excite-
ment of what can be accomplished, there are increasing
questions regarding what should be done and how. These
questions are being addressed in this current document,
Guidelines for the Performance of Fusion Procedures for
Degenerative Disease of the Lumbar Spine.
In January 2003, a group was formed at the request of
the leadership of the CNS by the executive committee of
the American Association of Neurological Surgeons/CNS
Joint Section on Disorders of the Spine and Peripheral
Nerves to perform an evidence-based review of the litera-
ture on lumbar fusion procedures for degenerative disease
of the lumbar spine and to formulate treatment recom-
mendations based on this review. In March 2003, this
group was convened. Invitations were extended to approx-
imately 12 orthopedic and neurosurgical spine surgeons
active in the Joint Section or in the North American Spine
Society to ensure participation of nonneurosurgical spine
surgeons. The 50 recommendations that follow this intro-
duction represent the product of the work of the group,
with input from the Guidelines Committee of the Ameri-
can Association of Neurological Surgeons/CNS and the
Clinical Guidelines Committee of North American Spine
Society.
The first few papers in this series deal with the meth-
odology of guideline formation and the assessment of
outcomes following lumbar fusion. The next series of rec-
ommendations involve the diagnostic modalities helpful
for the pre- and postoperative evaluation of patients con-
sidered candidates for or treated with lumbar fusion, fol-
lowed by recommendations dealing with specific patient
populations. Finally, several surgical adjuncts, including
pedicle screws, intraoperative monitoring, and bone graft
substitutes are discussed, and recommendations are made
for their use.
Methodology
The development of practice parameters, guidelines, or
recommendations is an onerous and time-consuming pro-
cess. It consists of literature gathering (primarily through
J. Neurosurg: Spine / Volume 2 / June, 2005
J Neurosurg: Spine 2:637–638, 2005
Guidelines for the performance of fusion procedures
for degenerative disease of the lumbar spine.
Part 1: introduction and methodology
DANIEL K. RESNICK, M.D., TANVIR F. CHOUDHRI, M.D., ANDREW T. DAILEY, M.D.,
MICHAEL W. GROFF, M.D., LARRY KHOO, M.D., PAUL G. MATZ, M.D.,
PRAVEEN MUMMANENI, M.D., WILLIAM C. WATTERS III, M.D., JEFFREY WANG, M.D.,
BEVERLY C. WALTERS, M.D., M.P.H., AND MARK N. HADLEY, M.D.
Department of Neurosurgery, University of Wisconsin, Madison, Wisconsin; Department of
Neurosurgery, Mount Sinai Medical School, New York, New York; Department of Neurosurgery,
University of Washington, Seattle, Washington; Department of Neurosurgery, Indiana University,
Indianapolis, Indiana; Departments of Orthopedic Surgery and Neurosurgery, University of
California at Los Angeles, California; Department of Neurosurgery, University of Alabama at
Birmingham, Alabama; Department of Neurosurgery, Emory University, Atlanta, Georgia;
Bone and Joint Clinic of Houston, Texas; and Department of Neurosurgery, Brown University,
Providence, Rhode Island
KEY WORDS • fusion • lumbar spine • practice guidelines • treatment outcome
637
Abbreviation used in this paper: CNS = Congress of Neurologi-
cal Surgeons.
SpineJune2005 5/24/05 10:42 AM Page 637

computerized literature searches), evaluation and classifi-
cation of the quality of evidence provided by the literature,
interpretation of this evidence to draw meaningful conclu-
sions, and formulation of recommendations based on this
process. The process is meant to be clear, and the reader is
encouraged to read the entire document as opposed to the
recommendations alone.
Guideline development within the specialty of neuro-
surgery has followed a rigorous process delineated early on
in the advent of specialty-specific guidelines.5
Following
recommendations proposed by other specialty societies,
the process used in neurosurgical guideline development
divides the types of literature into classes depending on the
scientific strength of the study design.6
Because the publi-
cation of the ground-breaking and exemplary Guidelines
for the Management of Severe Head Injury,1,2
an effort has
been made to adhere to these strict criteria for practice rec-
ommendations. The definitions of classes of evidence for
therapeutic effectiveness are as follows: Class I, evidence
from one or more well-designed, randomized controlled
clinical trials, including overviews of such trials; Class II,
evidence from one or more well-designed comparative cli-
nical studies, such as nonrandomized cohort studies, case-
control studies, and other comparable studies, including
less well-designed randomized controlled trials; and Class
III, evidence from case series, comparative studies with
historical controls, case reports, and expert opinion as well
as significantly flawed randomized controlled trials. For
diagnostic tests, and clinical assessment, other study de-
signs are used, and therefore the classification systems are
slightly different, but still result in Classes I, II, and III evi-
dence. This is reviewed in detail elsewhere.6
Class I evidence is used to support treatment recom-
mendations of the strongest type, called practice standards,
reflecting a high degree of clinical certainty. Class II evi-
dence is used to support recommendations called guidelines,
reflecting a moderate degree of clinical certainty. Other
sources of information, including observational studies such
as case series and expert opinion, as well as fatally flawed
randomized controlled trials (Class III evidence), support
practice options reflecting unclear clinical certainty.
On the surface, this appears to be a fairly straightfor-
ward task, but within the process the most difficult aspect
is evaluating the quality of the evidence in each type.
Disappointingly, studies in which evidence should be con-
sidered Class I or II because of study type have to be
downgraded to a lower class of evidence due to method-
ological flaws that could cause false conclusions to be
drawn from the evidence. This is discussed extensively
within each topic, and all cited evidence is listed in outline
form in the evidentiary tables, so as to ensure transparen-
cy of the development process.
The group culled through literally thousands of refer-
ences to identify the most scientifically robust citations
available concerning each individual topic. Not every ref-
erence identified is cited. In general, if high-quality (Class
I or II) medical evidence was available on a particular
topic, poorer-quality evidence was only briefly summa-
rized and rarely included in the evidentiary tables. If no
high-quality evidence existed, or if there was significant
disagreement between similarly classified evidence sourc-
es, then the Class III and supporting medical evidence
were discussed in greater detail. If multiple reports were
available that provided similar information, a few were
chosen as illustrative examples.
A consistent finding during the exploration of many of
these topics was that many investigators reported studies in
which the designs were unsophisticated. The use of invalid
outcome measures, the lack of an appropriate power analy-
sis, and the failure to identify distinct patient populations for
study inhibited our ability to draw meaningful conclusions
from many reports. Specific examples are provided in the
text of each topic. Suggestions for future research are made
at the conclusion of each paper. We, as spine surgeons, must
improve the quality of our research practices to provide con-
vincing evidence that the therapies we strongly believe in
are safe, effective, and make economic sense.
During the development of these guidelines, the authors
often found that their preconceived ideas regarding the
proper treatment of patients with chronic low-back pain
were founded on poor-quality or controversial medical evi-
dence. Some recommendations have resulted in changes in
the authors’ practice patterns after every effort was made to
classify the evidence and to interpret the results of the vari-
ous studies in a scientifically rigorous fashion. Many rec-
ommendations are made at the lowest level, meaning that
definitive evidence is lacking to support the recommenda-
tion but that evidence exists at some level. Some readers
will undoubtedly disagree with one or more of our recom-
mendations or with the level of a given recommendation.
The justification for all of the recommendations is included
in the scientific foundation portion and the summary section
of each guideline. If the job has been done correctly, the rea-
soning behind the recommendation should be clear.
It is our hope, as well as that of the participating orga-
nizations, that these guidelines will help to elucidate the
current knowledge on the topic of lumbar fusion and will
stimulate the development of more rigorous scientific evi-
dence justifying or refining—or, if appropriate, eliminat-
ing—aspects of this form of treatment.
References
1. Bullock R, Chesnut RM, Clifton G, et al: Guidelines for the
management of severe head injury. Brain Trauma Foundation.
J Neurotrauma 13:639–734, 1996
2. Bullock R, Chesnut, RM, Clifton G, et al: Guidelines for the
management of severe traumatic brain injury. J Neurotrauma
17:451–627, 2000
3. Davis H: Increasing rates of cervical and lumbar spine surgery
in the United States, 1979–1990. Spine 19:1117–1124, 1994
4. Deyo RA, Cherkin D, Conrad D, et al: Cost, controversy, crisis:
low back pain and the health of the public. Annu Rev Public
Health 12:141–156, 1991
5. Rosenberg J, Greenberg MK: Practice parameters: strategies for
survival into the nineties. Neurology 42:1110–1115, 1992
6. Walters BC: Clinical practice parameter development, in Bean
JR (ed): Neurosurgery in Transition. Baltimore: Williams &
Wilkins, 1998, pp 99–111
Manuscript received December 7, 2004.
Accepted in final form February 18, 2005.
Address reprint requests to: Daniel K. Resnick, M.D., Depart-
ment of Neurological Surgery, University of Wisconsin Medical
School, K4/834 Clinical Science Center, 600 Highland Avenue,
Madison, Wisconsin 53792. email: Resnick@neurosurg.wisc.edu.
D. K. Resnick, et al.
638 J. Neurosurg: Spine / Volume 2 / June, 2005

Recommendations
Standards. It is recommended that functional outcome
be measured in patients treated for low-back pain due to
degenerative disease of the lumbar spine by using reliable,
valid, and responsive scales. Examples of these scales in
the low-back pain population include the following: The
Spinal Stenosis Survey of Stucki, Waddell–Main Ques-
tionnaire, RMDQ, DPQ, QPDS, SIP, Million Scale, LBPR
Scale, ODI, the Short Form–12, the JOA system, the
CBSQ, and the North American Spine Society Lumbar
Spine Outcome Assessment Instrument.
Guidelines. There is insufficient evidence to recom-
mend a guideline for assessment of functional outcome
following fusion for lumbar degenerative disease.
Options. Patient satisfaction scales are recommended
for use as outcome measures in retrospective case series,
where better alternatives are not available. Patient satis-
faction scales are not reliable for the assessment of out-
come following intervention for low-back pain.
Rationale
Lumbar spinal fusion is an increasingly common pro-
cedure performed as an adjunct in the surgical manage-
ment of patients with degenerative lumbar disease and
instability. As the frequency and complexity of lumbar
fusion surgery increases, there is a tendency for costs and
complication rates to increase as well.20
With fewer hospi-
tal resources available, the ability to assess objectively the
functional outcome following lumbar fusion and to corre-
late patient outcome with the economic consequences of
treatment is important.
Various assessment tools are available for measuring
functional outcomes in patients who have undergone lum-
bar fusion. These outcomes may vary widely in the same
population depending on whether subjective or objective
measures have been used.17
Examples of objective outcome
measures include physiological, anatomical, economic,
health-related QOL, and mortality measurements.10
Objec-
tive outcome measures may be classified into functional
questionnaires, global ratings (satisfaction), economic fac-
tors (employment, disability, and cost), and physical factors
(activities).21
The purpose of this review was to identify
valid, reliable, and responsive measures of functional out-
comes after lumbar fusion for degenerative disease.
Search Criteria
A computerized search of the National Library of Me-
dicine database of the literature published between 1966
Part 2: assessment of functional outcome
KEY WORDS • fusion • lumbar spine • practice guidelines • treatment outcome
639
Abbreviations used in this paper: CBSQ = Curtain Back
Screening Questionnaire; DPQ = Dallas Pain Questionnaire; DRI =
Disability Rating Index; FSQ = Functional Status Questionnaire;
JOA = Japanese Orthopaedic Association; LBPR = Low Back Pain
Rating; ODI = Oswestry Disability Index; QOL = quality of life;
QPDS = Quebec Pain Disability Scale; RMDQ = Roland–Morris
Disability Questionnaire; SF-36 = Short-Form–36; SIP = Sickness
Impact Profile; VAS = visual analog scale.

and 2003 was performed. A search using the subject head-
ing “lumbar fusion” yielded 3708 citations. The following
subject headings were combined: “lumbar fusion and out-
comes.” Approximately 204 citations were acquired. Only
citations in English were selected. A search of this set of
publications with the key words “functional outcome” and
“satisfaction” resulted in 107 matches. Alternative search-
es included each disability index by name. Titles and ab-
stracts of the articles were reviewed and clinical series
dealing with adult patients treated with lumbar fusion for
degenerative lumbar disease were selected for detailed an-
alysis. Additional references were culled from the ref-
erence lists of remaining articles. Among the articles
reviewed, 30 studies were included that dealt with lum-
bar fusion, functional outcomes, and satisfaction surveys.
Nineteen of these articles were studies in which the au-
thors examined the reliability of functional outcome mea-
sures. In another seven articles investigators examined the
utility of these functional outcome measures in the setting
of lumbar fusion. Two articles were overviews on func-
tional outcome and lumbar degenerative disease. All pa-
pers providing Class I medical evidence are summarized
in the evidentiary table (Table 1).
Scientific Foundation
Assessment of Functional Outcome
To assess outcome following treatment properly, a
functional instrument must fulfill three criteria.11,21
First, it
must be reliable.10,11
Repetition of the functional assess-
ment should be consistent within (internal reliability) and
between (external reliability) observers. If a functional in-
strument contains multiple domains, each should correlate
with the final outcome (internal consistency). Second, a
functional instrument must be valid.21
It should measure the
property intended. For example, an instrument assessing
dysfunction due to leg pain would be expected to correlate
with a reduction in the ability to walk a given distance.
Finally, the instrument should be responsive.21
The instru-
ment should be able to detect differences in severity among
populations. If an instrument measures low-back pain and
this pain improves with physical therapy, the instrument
should reflect that improvement quantitatively. When eval-
uating the utility of a functional tool, the initial assessment
should emphasize reliability. If a functional instrument
does not produce reliable results, its validity and respon-
siveness are irrelevant.
In terms of grading the quality of outcomes instruments,
␬ and ␣ values are used. The ␬ value refers to the degree
of correlation of interrater observations (reliability). In
patient-based assessments, it indicates consistency in res-
ponse at a given time point. The ␣ value, often calculated
using the Cronbach ␣ test, reflects the degree to which
each domain of a multidomain outcome measure corre-
lates with the final result.7
For example, an assessment
tool for pain may contain physical, psychological, and so-
cial domains. Each domain score should correlate with the
final score. For a study to provide Class I medical evi-
dence regarding functional outcomes, the outcomes tool
used must have a ␬ value greater than 0.8. Class II med-
ical evidence requires an outcomes tool to have a ␬ greater
than 0.6. Any outcome scale with a ␬ value less than 0.6
is considered to provide Class III medical evidence for the
assessment of outcomes following an intervention.18
Roland and Morris30,31
followed 230 patients of whom
193 were studied up to 4 weeks after their initial presenta-
tion. Functional disability was assessed using a 24-item
disability questionnaire (the RMDQ) with statements de-
rived from the SIP and relating to the lower back. Re-
liability was ascertained in 20 patients with an external
reliability greater than 0.91. Internal consistency appeared
to be greater than 0.8. Validity was confirmed after com-
parisons to a six-point pain rating scale and physical signs
ascertained by an examining physician.31
In this group,
60% of patients appeared to improve over the 4-week
period, whereas 20% worsened. Absence from work ap-
peared to correlate less well with disability, as only 8%
of the employed were unable to work.30
Using the ODI,
Fairbank and colleagues12
followed 25 patients with acute
low-back pain in whom a reasonable prognosis was ex-
pected. The questionnaire has 10 categories with six gra-
dations each, for a total score of 50. It was completed at
weekly intervals over a period of 3 weeks. Reliability
(␬ Ͼ 0.95) was confirmed in 22 patients who repeated the
questionnaire over 2 days. Validity was demonstrated as
patients improved over 3 weeks. Paired t-tests revealed a
significant improvement in ODI scores during this time
period (p Ͻ 0.005).
Leclaire and colleagues24
observed patients who pre-
sented with acute low-back pain alone (100 cases) or ac-
companied by radiculopathy (100 cases). The cohort was
followed using the RMDQ and ODI questionnaires. In the
radiculopathy group, ODI and RMDQ scores were signif-
icantly more severe (higher) than in the low-back pain–
alone group (p Ͻ 0.0001). The two scales had a moderate
correlation to each other in each subgroup (r = 0.72 [ra-
diculopathy]; r = 0.66 [lumbago]; p Ͻ 0.0001). In a cohort
of patients with low-back pain, the JOA score was used as
a psychometric measure. External reliability was strong
(␬ Ͼ 0.90) when 15 patients reassessed their status with
no change in their symptomatology. Interobserver external
reliability among physicians was also sound (␬ Ͼ 0.90) in
30 patients reassessed using the JOA. Validity was estab-
lished by a strong correlation to the RMDQ, ODI, and the
SF-36.15
In several different groups with lumbar degener-
ative disease, the North American Spine Society Lumbar
Spine Outcome Assessment tool was used to assess
patients who had undergone conservative or decompres-
sive therapy.8
In this study, 136 of 206 questionnaires were
successfully completed. External reliability was assessed
in 64 patients. Both internal and external reliability was
strong (␬ Ͼ 0.90). The test was determined to be a valid
measure compared with existing instruments.
The SIP is a traditional general functional outcome
measure, with 136 items in 12 categories, that has been
evaluated in the general populace for a variety of condi-
tions. It has been applied to patients with low-back pain
and degenerative lumbar disease. Bergner, et al.,1
exam-
ined the use of this general health instrument in 1108 pa-
tients with multiple medical problems including rheuma-
toid arthritis and hip osteoarthritis.1
Simultaneous with
this questionnaire were a clinician’s assessment of physi-
cal function and patients’ self-assessment of the severity
of sickness and dysfunction. In this setting, the test–retest
(external) reliability of SIP was greater than 0.90, and its

Functional Outcome
641
TABLE1
EvidentiarytablesummarizingpublishedstudiesinvolvingClassImedicaldata*
Authors&YearClassDescriptionResultsConclusions
Fairbank,etal.,1980I25patientsw/acuteLBP&reasonableprognosiswereTest–retestreliabilitywas␬Ͼ0.95(pϽ0.001)inTheODIisareliable&validmeasureindetecting
studiedatwklyintervalsfor3wksw/afunctional22patients.Overthe3-wkinterval,significantim-changesintheLBP&itsfunctionalseverity.
disabilitysurvey.TheODIhas10categorieseachprovementwasnotedclinically&wasdetected
w/6responsesgraded0–5.Atotalof50pointsareusingtheODI.Apairedt-testrevealedasignificant
possible.improvementontheODIover3wks(pϽ0.05).
Bergner,etal.,1981I1108patientsinageneralpopulacew/multipleprob-Externalreliabilityw/in&btwnobserverswasSIPmeasuresindependentfunction,physicalwellness,
lemsincludingRA&hiposteoarthritis.Patients␬Ͼ0.90.Internalconsistencywas␣Ͼ0.90.&psychosocialwellness.Itisreliable&valid.Rea-
wereevaluatedusingtheSIP.AssessmentwasdoneSelf-assessmentofsickness&dysfunctionhadsonablemeasurestouseforoutcomeareSIP&self-
byaclinicianforphysicalmeasures.Self-assess-areliabilityof␬Ͼ0.60.TheSIPappearedtocor-assessmentofsickness&dysfunction.
mentwascompletedforseverityofsickness&relatew/theself-assessmentofsickness&dys-
dysfunction.function(correlationϾ0.50).
Million,etal.,1982I19patientsw/chronicLBP.TheirfunctionaldisabilityExternalreliabilitywasstrongbtwn&w/inobserversTheMillionScaleisareliableindicatoroftheseverity
wasstudiedusingtheMillionScalewhichwasa␬Ͼ0.90.Asavaliditymeasure,theMillionoflumbago&isresponsiveintheearlyphaseof
VASexamining15subjectivevariablesreflectingScaleappearedtoreflectchangesinphysicalmea-treatment.Itsresponsivenessappearsbetterthanthat
theseverityoflumbago.Asoftcorsetw/&w/osurements.At4&8wksafterrigidbracing,pa-ofobjectivemeasurementsincludinglumbarmotion
supportwasusedtotesttheresponsivenessofthetientsimprovedclinically,&thisresponsiveness&straightlegraising.
MillionScale.wasdetectedbytheMillionScale(pϽ0.05at
4wks&pϽ0.01at8wks).
Roland&Morris,1983I230patientsw/acutelumbago;193werestudiedat0,Externalreliabilitywas␬Ͼ0.90&internalconsist-TheRMDQisreliableforassessmentofacuteLBP.
1,&4wksaftertheepisode.Test–testreliabilityency␣Ͼ0.80.Constructvaliditydemonstrated
wasdoneon20/230patients.TheconstructvaliditythattheRoland–Morrisquestionnairewasableto
wasqualitativelyassessedbycomparingthisfunc-detectqualitativelypatientsw/pooreroutcomes
tionalquestionnairetothepainratingscale.fromacutelumbago;however,nospecificanaly-
siswasdone.
Roland&Morris,1983I230patientsw/acutelumbagowhowerestudiedat0,Ͼ60%ofpatientshadimprovementoverthe4-wkNospecificstatisticstestedthecorrelationinthisstudy.
1,&4wks.Thedisabilityquestionnairewasad-period,whereas20%hadanincreaseindisability.TheRMDQisreliablebutthismanuscriptdidnot
ministered&completedatalltimeintervalsin193Thesechangesappearedtobereflectedinthedis-assessitsresponsivenesstoastandardmeasureinsta-
patients.Correlationwasqualitativelydonew/abilityquestionnaire.Absencefromworkappear-tisticalfashion.
back-to-workstatus.edtocorrelatelesswellasonly8%ofemployed
wereunabletowork4wksafteracutelumbago.
Waddell&Main,1984I160patientsw/12wksoflumbago(chronic)w/se-DisabilityasdeterminedbyfunctionaloutcomeonWaddellScaledescribesfunctionaldisabilityw/chron-
veritystudiedbya9-categorydisabilityindex&questionnairehadareliabilityϾ0.80&correlatedicLBP.All9scalescorrelatew/finalscore(content
physicalcharacteristics.Reliabilitydeterminedus-w/theODI(r=0.70).Forphysicalcharacteristicsvalidity)&thescaleisreliable.Italsohasconstruct
ingasubgroupof30patients.(lumbarflexion,straightlegraising,rootcompres-validityasitcorrelatesw/ODI.
sionsigns)reliabilitywasϾ0.90.
Deyo,1986I136patientswhowereexaminedinaclinicforachiefReliabilityforbothscaleswas␬Ͼ0.80inpatientsTheSIP&themodifiedRMDQ(shorter)arereliable
complaintoflumbago.Evaluationwasdoneusing(10)whohadnochangeinpain.Forpatientswhoscalesfortheassessmentoflumbago,whichseemto
SIP&themodifiedRMDQScale(shortenedver-didnotresumefullactivity(47),thereliabilityfollowthephysicaldimensionoffunctionaldisability.
sionofSIP)initially&3wkslater.was␣Ͼ0.60.AstrongcorrelationexistedbtwnThemodifiedRMDQislesswellsuitedtofollowthe
thescales(r=0.85)&betweenthephysicaldi-psychosocialdimensionoffunctionaldisability.
mensionoftheSIP&themodifiedRMDQ(r=
0.89).ThemodifiedRMDQcorrelatedlesswell
w/thepsychosocialdimensionoftheSIP(r=0.56).
Lawlis,etal.,1989I143patientsoverall(24normal,15chroniclumbagoExternalreliabilitywas␬Ͼ0.90.ConstructvalidityTheDPQisareliabletestinassessingchronicLBP&
butworking,104chroniclumbagoundergoingin-wasshownbycorrelationofthe1st2categoriesappearsresponsiveindefiningdifferencesbtwn
patienttherapy).FunctionalassessmentperformedofDPQw/functionalcapacityscoresrelatingtopatientsw/chroniclumbago&thosew/o.
usingtheDPQwhichassessesdailyactivities,workthephysicaldemandsofwork.Responsivenesswas
&leisureactivities,anxiety/depression,&socialassessedbycomparingDPQscoresinthe104
interest.Reliabilitytestedon15chronicpainpatientschroniclumbagopatientstothe24normalpatients.
&13normalpatients.DPQscoresweresignificantlyhigherintheformer.
continued

TABLE1Continued
Manniche,etal.,1994I58patientswhounderwentlumbardiscopweresur-TheLBPRscalecomprised60pointsforpain,30forTheLBPRScalecombineselementsofphysicalfunc-
veyed14–60mospostop.Theassessmentwasanleveloffunction,&40forphysicalimpairment.tion,painintensity,&overalldisability.Itisareliable
LBPRscalethatexaminedphysicalimpairment,dis-Interraterreliabilitywas␬Ͼ0.95.Usingcontin-indicatorofdysfunction&appearsvalidcomparedw/
ability,&painintensity.Comparisonwasdonegencytables,thescalecorrelatedwiththedoctor’sobjectivemeasures(doctor’sassessment)&subjec-
againstadoctor’sglobalassessment&apatient’sassessmentandpatient’sassessment(pϽ0.00005).tive/satisfactionmeasures(patient’sassessment).
globalassessment.
Ruta,etal.,1994I354patientsw/lumbagoinitiallyexaminedinclinic&183patientshadnoclinicalchanges&underwentThisLBPscaleisareliable&validindicatorofthe
surveyedshortlythereaftertoassessfunctionaldis-externalreliabilitytesting(␬Ͼ0.90).Thequestion-functionaldisabilityrelatingtolumbago.Nousage
ability.273patientswereretestedforreliabilityofnairecorrelatedwellw/all8domainsoftheSF-36describedinthesettingoflumbarfusion.Noacuity
whom183reportednochangeinclinicalseverity.usinglinearregression(pϽ0.001)&w/percep-givenforthelumbago.
CorrelationtotheSF-36generalhealthprofilewastionsofdiseaseseverity.
doneforconstructvalidity.
Salen,etal.,1994I1445patientsweredividedinto3groups:1092vol-ExternalreliabilityfortheDRIwas␬Ͼ0.80.ThereTheDRIisareliable,valid,&responsivemeasurein
unteercontrols,306w/axialskeletalpain,&47w/wasacorrelationtotheFSQ.TheDRIwasres-patientsw/axialskeletalpain.
jointpain.PatientswereevaluatedusingtheDRIponsiveindetectingimprovementafterjointre-
&anFSQ.placement.
Harper,etal.,1995I150patientsweredividedinto3groups(GroupI:Externalreliabilityinall3groupswas␬Ͼ0.90.In-TheCBSQisareliable&validmeasurefordetermining
chroniclumbagoϾ4wks/disabled;GroupII:acuteternalreliabilitywas␣Ͼ0.80.TherewasastrongthefunctionaldisabilityassociatedwithLBP.No
lumbago/working;GroupIII:normal).EvaluationofcorrelationbtwneachcategoryinCBSQ&itssim-testingofresponsivenesswasundertaken.
functionaldisabilitywasdoneusingtheCBSQ&ilarcategoryintheSIP(r=0.56–0.72).Finally,
theSIP.TheCBSQtests11categoriesoffunctionalCBSQscoresappearedresponsivew/higherscores
disability.Test–retestcorrelation&correlationbtwninthemoreseverelyaffectedgroups.
tCBSQandSIPwasdoneusingthePearsoncor-
relationtest.
Kopec,etal.,1995I242patientswithahistoryoflumbagoinQuebec.80%Externalreliabilitywas␬Ͼ0.90w/internalconsis-TheQPDSissuitableforthereliablefunctionalmea-
hadpriorlumbagow/29%receivingcompensation.tencyof␣Ͼ0.90.ConstructvaliditywasshownsurementofLBP.
Patientswereassessedforfunctionaldisabilityus-byastrongcorrelationinthisfunctionalindexw/
ingtheQPDS.Reliabilitywasexaminedina98-theODI(r=0.80),RMDQ(r=0.77),&SF-36
patientsamplew/in1–14daysafterinitialsurvey.(r=0.72).
Constructvaliditywasdonebycomparingresults
tofunctionalscalesofODI,RMDQ,&SF-36.
Daltroy,etal.,1996I206patientsin6orthopedicpracticeswereevaluated.External&internalreliabilitywerestrong(␬Ͼ0.90)TheNASSLSOAisavalid&reliableoutcomemeasure
Patientswereinseveralcategoriesincludingthosewhenassessedin64patients.Themeasurewasforfunctionalevaluationofthelumbarspine.
w/LBP&sciatica.Alsoincludedwerepatientswhovalidcomparedw/knowninstruments.
underwentlumbardecompressionbutnotfusion.
Stucki,etal.,1996I193patientsw/lumbardegenerativestenosisundergo-23/193studiedforreliabilityw/␬Ͼ0.80.InternalThisoutcomequestionnairewasreliableinlumbarsten-
ingdecompression.Prospectivemulticenterstudyofconsistency␣Ͼ0.80;130/193studiedforrespon-osispatientswhounderwentop&hadconstructvalid-
self-administeredoutcomemeasureassessedw/insiveness.Responsive&validover6mostodetectitycomparedw/establishedscale&wasresponsive
6mos.Likertresponsescalesusedindomainsofimprovementpostop.indetectionofdifferencesw/in6mosforfunctional
physicaldysfunction,symptomseverity,&satisfac-improvement.
tion.Resultscomparedw/SIP&VAS.
Fujiwara,etal.,2003I97patientsobservedclinicallyw/LBP&followedTest–retestreliabilitywas␬Ͼ0.90whenpatientsTheJOAisareliable&validindicatorofLBP.
usingJOA,ODI,andRMDQ.Correlationwascal-(15)orphysicians(30)didrepeatmeasurements.
culatedbtwnthesemeasures&externalreliabilityStrongcorrelationwasobservedbtwnJOA&
wasassessedbyrepeatedphysician&patientob-ODI&RMDQ.
servation.
Luo,etal.,2003I2520patientsw/LBP;506patientsassessedover3–6ExternalreliabilityoftheSF-12wasperformedbyTheSF-12iscapableofassessing&followingLBP
mos.SF-12surveywasused&comparedw/subjec-Ware,etal.,inadifferentpatientgroup;however,reliably.
tivequantificationofLBPintensity.internalreliability&responsivenesswasfoundin
thisstudy.
*LBP=low-backpain;NASSLSOA=NorthAmericanSpineSocietyLumbarSpineOutcomeAssessment;RA=rheumatoidarthritis.

Functional Outcome
643
internal consistency was greater than 0.90. Self-assess-
ment of sickness and dysfunction had a reliability greater
than 0.60. The SIP appeared to correlate (Ͼ 0.50) with the
self-assessment of sickness and dysfunction. Deyo9
used
the SIP and a modified RMDQ when evaluating 136 pa-
tients with a chief complaint of low-back pain at an initial
index visit and 3 weeks later. Reliability was examined in
10 patients who claimed no interval improvement in pain
and in 47 patients who did not resume full activity. For
patients with no change in pain, the correlation was great-
er than 0.80. In those patients who may have improved but
did not resume normal activity, reliability was greater than
0.60. A strong correlation was observed between the SIP
and the modified RMDQ (r = 0.85). The physical dimen-
sion of the SIP (r = 0.89) correlated more strongly with the
RMDQ than the psychosocial dimension (r = 0.56). The
SIP appears to be a reliable and valid measure of the se-
verity of low-back pain in the acute phase.
Million and colleagues27
assessed 19 patients with chron-
ic low-back pain by using a VAS examining 15 subjective
variables reflecting its severity. External reliability among
and within observers was greater than 0.90. To determine
validity, they compared their results with physical mea-
surements of spinal movements and straight leg raising.
These objective assessments had a reliability greater than
0.90 and correlated with the Million Scale. After bracing
with a rigid support, low-back pain improved clinically
and this responsiveness was detected by the Million Scale.
The Waddell–Main Disability Index was used to evaluate
chronic low-back pain (duration Ͼ 12 weeks) in a 160-pa-
tient cohort.37
Reliability in this study was evaluated in a
random subgroup of 30 patients. Measures were also ob-
tained of objective physical characteristics including lum-
bar flexion, straight leg raising, and root compression signs.
The external reliability on the Waddell–Main Disability In-
dex was greater than 0.80, and its validity was established
by a strong correlation with the ODI (r = 0.70). The physi-
cal characteristics, when evaluated for objective reliability,
had a correlation greater than 0.80.
Using the DPQ, Lawlis and colleagues23
studied 143 pa-
tients of whom 119 had chronic low-back pain. Fifteen
patients in this group were working, whereas the remain-
ing 104 were undergoing inpatient therapy. Twenty-four
healthy volunteers served as controls. The DPQ was used
to assess daily activities, work/leisure activities, anxiety/
depression, and social interest. Reliability was tested in 15
patients with chronic back pain and 13 controls. External
reliability was greater than 0.90. Construct validity was
shown through a positive correlation to other assessments
of functional capacity relating to the physical demands of
work. The DPQ was responsive to differences between
patients with chronic low-back pain and controls.
Ruta, et al.,32
devised an outcome measure based on
questions commonly used in the clinical assessment of
patients with low-back pain. A total of 354 patients with
low-back pain seen by primary and specialty practitioners
were studied. Within this group, 273 patients were tested
for reliability. One hundred eighty-three reported no clin-
ical changes over a 2-week interval. External reliability
was tested in these 183 patients with correlations greater
than 0.90. Validity was demonstrated by a strong correla-
tion (p Ͻ 0.001 on regression) with the SF-36 general
health assessment. Harper and colleagues19
examined 150
patients in three subgroups (chronic low-back pain [50
cases], acute lumbago [49 cases], and control [51 cases]).
They employed the CBSQ, which evaluated 11 categories
of functional disability and compared results with those of
the SIP. External reliability for the CBSQ was greater than
0.90, with internal reliability greater than 0.80. A strong
correlation was observed between each category in the
CBSQ and its similar category in the SIP (r = 0.56–72), and
the CBSQ appeared responsive in distinguishing the sever-
ity of dysfunction among the three groups of patients.
Several other groups undertook studies on the functional
assessment of chronic low-back pain. Using the QPDS,
Kopec, et al.,22
analyzed 242 patients with a history of
chronic low-back pain. Twenty-nine percent of this group
were disabled and receiving compensation. This scale con-
tains 48 items assessing the difficulty in simple daily activ-
ities pertaining to domains relevant to low-back pain. Re-
liability was gauged using a random sample (98 cases) who
were retested after 14 days. External reliability was greater
than 0.90, with an internal consistency coefficient greater
than 0.90. Construct validity was determined by a strong
correlation with the ODI (r = 0.80), RMDQ (r = 0.77), and
SF-36 (r = 0.72) Scales. Using the LBPR scale, Manniche
and colleagues26
surveyed 58 patients 14 to 60 months
after they underwent lumbar disc surgery. This scale com-
prises 60 points for back and leg pain, 30 points for level
of function, and 40 points for physical impairment. Exter-
nal reliability had a coefficient greater than 0.95. Validity
was determined by dichotomizing the scale into good and
bad outcomes. The mean score of the study population was
39, and therefore a value greater than 39 implied greater
dysfunction than the mean. The results on the LBPR Scale
correlated (p Ͻ 0.00005) with a Global Assessment Scale
(a graded evaluation tool) performed by both patient and
physician.
Stucki, et al.,35
evaluated 193 patients with degenerative
lumbar stenosis from multiple centers who were to under-
go lumbar decompression. A functional survey was under-
taken preoperatively and 6 months after surgery. Interob-
server reliability was studied in a random sample of 23
patients. Correlation (␬) was greater than 0.80 in this group.
Internal consistency was greater than 0.80. This lumbar
outcome scale was responsive to functional improvement
in this cohort of patients when reassessed 6 months follow-
ing surgery. Comparison to the SIP and the VAS for pain
confirmed the validity of this instrument in detecting over-
all dysfunction associated with lumbar stenosis.
Bernstein and colleagues2
followed 291 patients with
chronic low-back pain by using the 90-item Symptom
Checklist, which measures psychological dysfunction. It
has nine major scales with one common factor—general
psychological discomfort. The somatization scale covers
general physical discomfort. The reliability of this check-
list was not reported in this study, but validity was ascer-
tained by comparison with the Minnesota Multiphasic In-
ventory and the McGill Pain Inventory. In this group of
patients, the scale had a high correlation with the Minne-
sota Multiphasic Inventory and McGill Inventory scales
for detecting general discomfort; however, external relia-
bility was not reported. In a 5-year period, Greenough and
Fraser16
studied 300 patients with low-back pain by using
a Low-Back Outcome Score that examined 13 functional
factors related to pain. Comparison was made to the ODI

and Waddell–Main scales. Despite a statement that exter-
nal reliability was studied, no mention was made of the
statistical analysis in their study. This scale had a high cor-
relation with the ODI (Ϫ0.87; p Ͻ 0.001) and Waddell–
Main scale (Ϫ0.74; p Ͻ 0.001). Moffroid and colleagues28
assessed 115 patients undergoing physical therapy refer-
red for low-back pain, 112 asymptomatic volunteers were
used as a control group. The physical capabilities of both
groups were quantified using the National Institute for Oc-
cupational Safety and Health Low Back Atlas score. Al-
though external reliability was described, it was not spe-
cifically reported in this study. The authors did find clusters
of patients with imbalances in muscle strength and symme-
try. Those patients were more apt to suffer from low-back
pain.
General health may be measured in addition to low-
back pain. In addition to the use of the SIP as a general
health measure, Brazier and colleagues3
studied the SF-36
Scale in 1582 patients in a general medical practice. The
SF-36 Scale focuses on functional status, general well-
ness, and an overall assessment of health in eight domains
by asking 36 questions. Results were compared for valid-
ity with the Nottingham Scale. In the general population,
the external reliability coefficient was greater than 0.60.
Construct validity was determined through a correlation
with the Nottingham Scale (r Ͼ 0.50). Ware and col-
leagues38
used regression methods to shorten the SF-36 to
a 12-item format (SF-12) focusing on physical and mental
aspects. Reliability in an initial evaluation of two different
sets of patients was strong (␬ Ͼ 0.80). Luo and col-
leagues25
used the SF-12 in 2520 patients with low-back
pain. Although no external reliability was performed in
this setting, internal consistency was sound, and the SF-12
appeared valid and responsive to changes in patients with
low-back pain.
Salen, et al.,33
assessed 1092 healthy volunteers and
compared observation with 306 patients with axial skele-
tal pain or 47 with joint pain by using a DRI. External reli-
ability for this group was greater than 0.80. The DRI was
valid with correlation to the FSQ. The DRI was respon-
sive in detecting improvement after joint replacement.
Examples of the Application of Functional Assessments to
Lumbar Fusion
The appropriateness of an outcome instrument designed
to assess low-back pain does not necessarily generalize
to the assessment of patients treated with lumbar spinal
fusion procedures. Despite this fact, these same outcome
measures have been used to assess outcome following
lumbar fusion procedures. In an attempt to correct this
apparent deficiency, many investigators have used multi-
ple outcome instruments for correlation.
Several groups have used more formalized methods of
assessing patient outcome. Moller and Hedlund29
studied
111 patients with isthmic spondylolisthesis and a 1-year
history of back or leg pain. Patients were randomized to
surgery (80 cases) or exercise (34 cases). Evaluation was
completed at 1 and 2 years by using the DRI and a patient
assessment survey involving broad categories (much bet-
ter, better, unchanged, or worse). In this patient popula-
tion, the DRI appeared responsive with improvement in
the surgical group at 12 and 24 months (p Ͻ 0.0001,
Mann–Whitney U-test). Similarly, the broad patient assess-
ment survey revealed that a higher proportion of “good”
responses occurred in the surgery group (p Ͻ 0.01). In a
similar cohort study, Christensen and colleagues6
followed
129 patients with chronic low-back pain and either isthmic
spondylolisthesis, primary lumbar degeneration, or second-
ary lumbar degeneration. Comparison was made between
posterior fusion with and without instrumentation by using
the DPQ and LBPR Scale in a 5-year period. Patients in
both groups improved significantly from their preopera-
tive status on the DPQ during this period. With the excep-
tion of patients with isthmic spondylolisthesis, no differ-
ences were observed between groups when using the DPQ
or LBPR Scale. For patients with isthmic spondylolisthe-
sis, fusion without instrumentation resulted in significant-
ly better results as measured by the DPQ.
In a different cohort study, Fritzell and colleagues14
studied 294 patients with L4–S1 disc degeneration and
low-back pain who underwent surgical (222) or expectant
(72) management during a 6-year period. Evaluation was
completed at 6, 12, and 24 months by using the ODI,
Million, and General Function Score Scales. Disability
significantly decreased in the surgical group over a 2-year
period when assessed using all of these scales (p Ͻ 0.02).
Using a general, subjective assessment, 63% in the surgi-
cal group indicated they were better or much better com-
pared with 29% in the nonsurgical group (p Ͻ 0.0001).
Burkus, et al.,4
reported on 46 patients randomized to
anterior interbody fusion with or without bone morpho-
genetic protein–2. Outcome was recorded over a 24-month
period by using the ODI, SF-36, and satisfaction scales.
Neurological function, satisfaction, and general health
measures were no different between groups. The ODI
score indicated an improvement in the bone morphogen-
etic protein–2 group as early as 3 months after surgery.
These outcome measures were responsive to low-back
pain after lumbar fusion, and the use of multiple outcome
measures conferred apparent validity.
Other Outcome Measures
Turner and colleagues36
undertook a metaanalysis of all
lumbar fusion Medline literature published between 1966
and 1991. Studies were required to have a minimum 1-
year follow-up period and classification of clinical out-
come as satisfactory or unsatisfactory in at least 30 patients.
Forty-seven articles met their inclusion criteria. No ran-
domized trials were identified at that time. A mean of 68%
of the patients had a satisfactory outcome (range 16–95%).
Substratification revealed outcomes of excellent/good in
66% (range 16–93%), fair in 22% (range 5–68%), and poor
in 13% (range 2–54%). No defined criteria were reported
for external reliability. Their analysis demonstrates that
outcomes may be dichotomized into broad categories to
assess overall outcome following lumbar fusion.
Patient satisfaction has been used as an outcome mea-
sure for patients undergoing lumbar fusion. Patient satis-
faction surveys are frequently used in the setting of retro-
spective series because preintervention data may not be
available. Patient satisfaction is easily surveyed but is
dependent on multiple external factors independent of the
surgical procedure. Furthermore, satisfaction outcome
measures are hampered by the inherent inability to measure

responsiveness. The validity of satisfaction measures has
been examined but their external reliability has not.
Slosar and colleagues34
followed 141 patients who un-
derwent circumferential lumbar fusion. A satisfaction sur-
vey was used as a follow-up instrument, as was return to
employment. Patients were asked if: 1) surgery met their
expectations; 2) surgery improved their condition; 3) sur-
gery improved their condition but they would not repeat it;
and 4) surgery worsened their condition. One hundred thir-
ty-three patients were followed for more than 37 months.
The outcomes were classified as follows: 10.5% in Cate-
gory 1, 51.1% in Category 2, 19.5% in Category 3, and
18.8% in Category 4. Christensen and colleagues5
followed
148 patients who underwent posterior lumbar fusion with
or without supplemental anterior interbody fusion. Satis-
faction surveys and the DPQ and LBPR Scale were used.
In addition to improvements on the LBPR Scale and DPQ,
satisfaction was high in both groups, with 77% of patients
in the posterior fusion group and 79% of patients in the cir-
cumferential fusion group stating they would undergo sur-
gery again if indicated.
In a study of 388 Workers’ Compensation patients in
Washington state, Franklin and colleagues13
undertook an
assessment of broad satisfaction surveys. Simple surveys
examined back and leg pain, QOL, and the decision to un-
dergo surgery at 2 years following lumbar fusion. Patients
were dichotomized into two outcome groups: poor (re-
ceiving Workers’ Compensation) and good (not receiving
Workers’ Compensation) at 2 years. There was a higher in-
cidence of poor outcomes among those who stated that
back or leg pain was worse than expected (76% compared
with 54%; p Ͻ 0.0003) and in those whose QOL was no
better or worse than expected (69% compared with 34%;
p Ͻ 0.0001). There was a lower incidence of poor out-
comes in patients who would undergo surgery again for the
same indications (52% compared with 80%; p Ͻ 0.0001).
Although patient satisfaction surveys are easy and are
intuitively valuable, they have never been validated and
the responsiveness of such measures cannot be measured.
Furthermore, wide discrepancies exist when results of
patient satisfaction surveys are compared with validated
outcome measures. These inadequacies limit their ability
to provide high-quality medical evidence for or against
any treatment modality.
Summary
Functional disability secondary to acute low-back pain,
chronic low-back pain, lumbar stenosis, and lumbar disc
disease may be reliably and validly assessed using func-
tional outcome surveys that are valid, reliable, and respon-
sive. Outcome instruments supported by Class I and Class
II medical evidence for the evaluation of low-back pain
include the Spinal Stenosis Survey of Stucki, Waddell–
Main, RMDQ, DPQ, QPDS, SIP, Million Scale, LBPR
Scale, ODI, and CBSQ. Many of these outcome measures
have been applied to patients who have been treated with
lumbar fusion for degenerative lumbar disease and have
proven to be valid and responsive; however, the reliability
of these instruments has never been specifically assessed in
the lumbar fusion patient population. Patient satisfaction
surveys have been used to measure outcome following
lumbar fusion. Their usefulness resides in their insight in-
to patient attitudes toward the treatment experience but is
limited because of their inability to measure responsive-
ness and the lack of information on their reliability.
Key Issues for Future Investigation
Although the functional outcome instruments discussed
in this review appear valid and responsive in the low-back
pain patient population, their external reliability has not
been confirmed in the clinical setting of lumbar fusion.
This may be important for the comparison of different
lumbar fusion techniques. Another key issue appears to be
the timing of administration of the outcomes instruments.
The aforementioned functional outcome measures appear
to be responsive both initially and over a few years. Whet-
her the benefits associated with any sort of intervention
for low-back pain are durable beyond this period has not
been established.
References
1. Bergner M, Bobbitt RA, Carter WB, et al: The Sickness Impact
Profile: development and final revision of a health status mea-
sure. Med Care 19:787–805, 1981
2. Bernstein IH, Jaremko ME, Hinkley BS: On the utility of the
SCL-90-R with low-back pain patients. Spine 19:42–48, 1994
3. Brazier JE, Harper R, Jones NM, et al: Validating the SF-36
health survey questionnaire: new outcome measure for primary
care. BMJ 305:160–164, 1992
4. Burkus JK, Transfeldt EE, Kitchel SH, et al: Clinical and radio-
graphic outcomes of anterior lumbar interbody fusion using re-
combinant human bone morphogenetic protein-2. Spine 27:
2396–2408, 2002
5. Christensen FB, Hansen E, Eiskjaer SP, et al: Circumferential
lumbar spinal fusion with Brantigan cage versus posterolateral
fusion with titanium Cotrel-Dubousset instrumentation: a pros-
pective, randomized clinical study of 146 patients. Spine 27:
2674–2683, 2002
6. Christensen FB, Hansen ES, Laursen M, et al: Long-term func-
tional outcome of pedicle screw instrumentation as a support
for posterolateral spinal fusion. Spine 27:1269–1277, 2002
7. Cronbach LJ: Coefficient alpha and the internal structure of
tests. Psychometrika 16:297–334, 1951
8. Daltroy LH, Cats-Baril W, Katz JN, et al: The North American
Spine Society Lumbar Spine Outcome Assessment Instrument:
Reliability and Validity Tests. Spine 21:741–748, 1996
9. Deyo RA: Comparative validity of the sickness impact profile
and shorter scales for functional assessment in low-back pain.
Spine 11:951–954, 1986
10. Deyo RA, Andersson G, Bombardier C, et al: Outcome mea-
sures for studying patients with low back pain. Spine 19 (18
Suppl):S2032–S2036, 1994
11. Deyo RA, Diehr P, Patrick DL: Reproducibility and responsive-
ness of health status measures. Statistics and strategies for evalu-
ation. Control Clin Trials 12 (4 Suppl):S142–S158, 1991
12. Fairbank JC, Couper J, Davies JB, et al: The Oswestry low back
pain disability questionnaire. Physiotherapy 66:271–273, 1980
13. Franklin GM, Haug J, Heyer NJ, et al: Outcome of lumbar fu-
sion in Washington State workers’ compensation. Spine 19:
1897–1904, 1994
14. Fritzell P, Hagg O, Wessberg P, et al: 2001 Volvo Award Win-
ner in Clinical Studies: Lumbar fusion versus nonsurgical treat-
ment for chronic low back pain: a multicenter randomized
controlled trial from the Swedish Lumbar Spine Study Group.
Spine 26:2521–2534, 2001
15. Fujiwara A, Kobayashi N, Saiki K, et al: Association of the Jap-
anese Orthopaedic Association score with the Oswestry Dis-
Functional Outcome
645

ability Index, Roland-Morris Disability Questionnaire, and
Short-Form 36. Spine 28:1601–1607, 2003
16. Greenough CG, Fraser RD: Assessment of outcome in patients
with low-back pain. Spine 17:36–41, 1992
17. Greenough CG, Peterson MD, Hadlow S, et al: Instrumented
posterolateral lumbar fusion. Results and comparison with ante-
rior interbody fusion. Spine 23:479–486, 1998
18. Hadley MN, Walters BC, Grabb PA: Guidelines for the man-
agement of acute cervical spine and spinal cord injuries. Neu-
rosurgery 50 (Suppl):S2–S6, 2002
19. Harper AC, Harper DA, Lambert LJ, et al: Development and
validation of the Curtin Back Screening Questionnaire (CBSQ):
a discriminative disability measure. Pain 60:73–81, 1995
20. Katz JN: Lumbar spinal fusion. Surgical rates, costs, and com-
plications. Spine 20 (24 Suppl):S78S–S83, 1995
21. Kopec JA, Esdaile J: Functional disability scales for back pain.
Spine 20:1943–1949, 1995
22. Kopec JA, Esdaile J, Abrahamowicz M, et al: The Quebec Back
Pain Disability Scale. Measurement properties. Spine 20:341–352,
1995
23. Lawlis GF, Cuencas R, Selby D, et al: The development of the
Dallas Pain Questionnaire. An assessment of the impact of
spinal pain on behavior. Spine 14:511–516, 1989
24. Leclaire R, Blier F, Fortin L, et al: A cross-sectional study com-
paring the Oswestry and Roland-Morris Functional Disability
scales in two populations of patients with low back pain of dif-
ferent levels of severity. Spine 22:68–71, 1997
25. Luo X, Lynn George M, Kakouras I, et al: Reliability, validity,
and responsiveness of the short form 12-item survey (SF-12) in
patients with back pain. Spine 28:1739–1745, 2003
26. Manniche C, Asmussen K, Lauritsen B, et al: Low Back Pain
Rating scale: validation of a tool for assessment of low back
pain. Pain 57:317–326, 1994
27. Million R, Hall W, Nilsen KH, et al: Assessment of the progress
of the back-pain patient. 1981 Volvo Award in Clinical Scien-
ces. Spine 7:204–208, 1982
28. Moffroid MT, Haugh LD, Henry SM, et al: Distinguishable
groups of musculoskeletal low back pain patients and asymptom-
atic control subjects based on physical measures of the NIOSH
Low Back Atlas. Spine 19:1350–1358, 1994 (Erratum in Spine
19:2137, 1994)
29. Moller H, Hedlund R: Surgery versus conservative management
in adult isthmic spondylolisthesis—a prospective, randomized
study: part 1. Spine 25:1711–1715, 2000
30. Roland M, Morris R: A study of the natural history of back pain.
Part I: development of a reliable and sensitive measure of dis-
ability in low-back pain. Spine 8:141–144, 1983
31. Roland M, Morris R: A study of the natural history of low-back
pain. Part II: development of guidelines for trials of treatment
in primary care. Spine 8:145–150, 1983
32. Ruta DA, Garratt AM, Wardlaw D, Russell IT: Developing a
valid and reliable measure of health outcome for patients with
low back pain. Spine 19:1887–1896, 1994
33. Salen BA, Spangfort EV, Nygren AL, et al: The Disability
Rating Index: an instrument for the assessment of disability in
clinical settings. J Clin Epidemiol 47:1423–1435, 1994
34. Slosar PJ, Reynolds JB, Schofferman J, et al: Patient satisfaction
after circumferential lumbar fusion. Spine 25:722–726, 2000
35. Stucki G, Daltroy L, Liang MH, et al: Measurement properties
of a self-administered outcome measure in lumbar spinal steno-
sis. Spine 21:796–803, 1996
36. Turner JA, Ersek M, Herron L, et al: Patient outcomes after
lumbar spinal fusions. JAMA 268:907–911, 1992
37. Waddell G, Main CJ: Assessment of severity in low-back dis-
orders. Spine 9:204–208, 1984
38. Ware JE, Kosinski M, Keller SD: A 12-Item Short-Form Health
Survey: construction of scales and preliminary tests of reliabil-
ity and validity. Med Care 34:220–233, 1996
Accepted in final form March 22, 2005.

Recommendations
Standards. There is insufficient evidence to recommend
a standard for assessment of economic outcome following
lumbar fusion for degenerative disease.
Guidelines. There is insufficient evidence to recom-
mend a guideline for assessment of economic outcome
following lumbar fusion for degenerative disease.
Options. It is recommended that valid and responsive
economic outcome measures be included in the assess-
ment of outcomes following lumbar fusion surgery for
degenerative disease. Return-to-work rates and termina-
tion of disability compensation are two such measures. It
is recommended that cost analyses related to lumbar
spinal fusion include perioperative expenses as well as ex-
penses associated with long-term care, including those
incurred in both the operative and nonoperative settings.
Rationale
Lumbar fusion is commonly performed as an adjunct to
the surgical treatment of patients with low-back pain due
to degenerative lumbar disease. Using data from the Na-
tional Hospital Discharge Survey, both Deyo, et al.,4
and
Davis3
observed a dramatic increase in the frequency of
lumbar fusion procedures in the 1980s. Lumbar fusion has
been undertaken in the setting of degenerative disc disease,
spinal stenosis, spondylolisthesis, and degenerative scolio-
sis and is commonly supplemented with internal fixation
involving a variety of devices. As the frequency and com-
plexity of lumbar fusion surgery increases, there is a ten-
dency for costs and complication rates to follow.9
In a time
of contracting hospital resources, it is important to under-
stand the economic impact of lumbar fusion. The purpose
of this review is to examine the economic impact of lum-
bar fusion for degenerative lumbar spine disease as
assessed by cost, complication rates, and rates of reopera-
tion. These expenses of lumbar fusion must be contrasted
with the return-to-work rate and the potential for im-
proved productivity following treatment. These end points
will be examined as economic outcome measures follow-
ing lumbar fusion.
Search Criteria
A computerized search of the National Library of Me-
dicine database of the literature published between 1966
and 2001 was performed. A search using the subject head-
ing “lumbar fusion” yielded 3708 citations. The following
subject headings were combined: “lumbar fusion and out-
comes.” Approximately 204 citations were acquired. Only
citations in English were selected. A search of this set
of publications with the key words “employment status,”
“mortality,” “medical care costs,” “cost containment/com-
Part 3: assessment of economic outcome
KEY WORDS • fusion • lumbar spine • practice guidelines • treatment outcome •
economic outcome
647
Abbreviations used in this paper: CI = confidence interval;
LOS = length of stay; QOL = quality of life; RR = relative risk.

parison,” or “cost effectiveness” resulted in 58 matches.
Titles and abstracts of the articles were reviewed. Clinical
series dealing with adult patients who had lumbar fusion
for degenerative disease were selected. Additional refer-
ences were culled from the reference lists of remaining
articles.
Among the articles reviewed, 13 studies were included
that dealt with lumbar fusion, complication rates, reopera-
tion rates, and costs. Six of these articles were cohort stud-
ies that examined the economic impact of lumbar fusion
compared with surgery for degenerative lumbar disease
that did not involve fusion. One article was a cohort study
investigating fusion with and without fixation. Two stud-
ies examined the cost benefit or cost effectiveness of lum-
bar fusion compared with decompression alone. The re-
maining study examined the responsiveness of returning
to work as an economic indicator in a large series. These
articles are summarized in Table 1.
One of the more difficult results to ascertain following
a medical or surgical treatment is economic outcome. Typ-
ical medical economic analyses seek to ascertain whether
a given treatment-related benefit accrues in light of the
expenditures required to provide that treatment. With re-
gard to lumbar fusion procedures, benefits from treatment
may include an overall improvement in low-back pain and
function, an increased return-to-work rate, and/or improv-
ed patient satisfaction. The expenses of the procedures are
the measured costs of the surgery, the devices implanted,
and operative time. Other measurable outlays include the
cost of complications and time and expenses associated
with reoperation. Deyo and colleagues4
examined data
concerning lumbar spinal disease and lumbar spinal fu-
sion from the National Hospital Discharge Survey be-
tween 1979 and 1987. In addition to a 200% increase in
spinal fusion procedures performed during this period,
the authors reported significant regional variations in the
performance of lumbar fusion procedures as reflected by
a ninefold regional variation in frequency between the
northeastern US (four/100,000) and the western US (35/
100,000). Because of the increasing incidence of lumbar
fusion procedures in the treatment of degenerative spine
disease, it is important to examine the economic impact of
lumbar fusion as a specific outcome measure.
Costs, Complications, Hospitalizations, and Reoperations
Malter and colleagues12
performed a population-based
study of patients who underwent lumbar surgery for de-
generative disease in Washington state in 1988. The study
was not prospective, nor was it clear that all patients were
eligible for all therapies. Using diagnosis and procedure
codes from the Washington State Department of Health’s
computerized system, the authors obtained data on 6376
patients of whom 1041 underwent lumbar fusion. Rates of
reoperation, complications, and associated costs (in 1988
US dollars) were examined through the next 5 years. The
complication rates associated with lumbar arthrodesis pro-
cedures were 18% compared with a 7% complication rate
following lumbar surgery without arthrodesis (chi-square
test, p Ͻ 0.001). The LOS was significantly longer for
fusion-treated patients (7 days compared with 5.1 days;
p Ͻ 0.001). In 1988 dollars, hospital costs averaged $7101
per patient treated with fusion and $4161 per patient treat-
ed without fusion (p Ͻ 0.001). These authors examined
reoperation rates to determine if fusion reduced the need
for repeated lumbar surgery within 5 years. Reoperation
rates were similar between those treated with fusion (RR
1.1; 95% CI 0.9–1.3) and those not. Because the indica-
tions for surgery were not examined, the only conclusions
that could be drawn from this study were that lumbar
fusion procedures are associated with increased costs and
complications.
Using similar methods and a hospital discharge registry
in Washington state, Deyo and colleagues5
examined
18,122 hospitalizations for lumbar surgery between 1986
and 1988. The majority (84%) of cases requiring surgery
involved spinal stenosis or disc displacement. Excluded
were cases involving malignant lesions, infection, or frac-
tures. Approximately 15% of patients in this cohort under-
went arthrodesis in addition to decompression. The report-
ed mortality rate was less than 1%. The complication rate
was 17.4% among patients treated with fusion compared
with a 7.6% rate for those with lumbar disease treated sur-
gically without fusion (chi square test, p Ͻ 0.0005). The
LOS among patients who were treated with fusion was
approximately 7.6 days compared with 5.4 days for those
who did not undergo fusion (p Ͻ 0.0005). In 1986 to 1988
dollars, the cost of hospitalization was $6491 for fusion-
treated patients compared with $3793 for patients treated
surgically without fusion (p = 0.0005). Logistic-regres-
sion models were used to examine the risk of complica-
tions or prolonged hospitalization and indicated that the
RR for a complication or prolonged hospitalization with
any type of lumbar fusion procedure was 2.7 (95% CI
1.5–4.9). The lack of information regarding the indica-
tions for surgery and the clinical outcome following
surgery limit the usefulness of this information.
Deyo, et al.,6
examined lumbar surgery data for 1985
obtained from the Health Care Financing Administration
for all Medicare recipients, excluding those on Medicare
for chronic renal failure or Social Security Disability.
Using ICD-9-CM diagnosis and procedure codes, data
were accrued on the frequency of lumbar surgery per-
formed with or without fusion and the incidence of asso-
ciated complications. The study was not undertaken pros-
pectively nor was it certain that all patients were eligible
for all therapies. Specific data were obtained for 6-week
mortality rates, requirements for assisted living, and the
need for blood transfusion. An economic analysis was
completed for LOS and cost. These data were compared
with similar data from 1 year prior to 4 years after the
study date. A study population of 27,111 patients was
obtained of whom 1524 (5.6%) underwent lumbar fusion.
For patients treated surgically with fusion, the mean hos-
pital costs (1985 US dollars) were $10,091 compared with
$6754 for patients treated without fusion (chi-square test,
p Ͻ 0.0005). A logistic regression was completed to deter-
mine RR and (95%) CIs for several variables. In the fu-
sion group, the RR was 1.9 (95% CI 1.6–2.2) for the pres-
ence of complications, 5.8 for blood transfusion (5.2–6.6),
2.0 for 6-week mortality (1.2–3.4), and 2.2 for discharge
to a nursing home (1.7–3.0). This cohort study revealed
that lumbar surgery with fusion was more expensive and

Economic Outcome
649
TABLE1
Summaryofstudiesinvolvingassessmentofeconomicoutcomeafterlumbarspinalsurgery*
Tunturi,etal.,1979III133patientsunderwentlumbosacralfusionw/118FUsThecost/benefitratioforlumbosacralfusionwas1:2.9Lumbosacralfusioninaselectedpopulationhasa
including2deaths.Costswerecalculatedin1976w/thecostin1976USdollarsas$5569&benefitaspositivecost/benefitratio.
USdollarsbasedonperiophospitalization&FUs.$16,075.
BenefitwasdefinedasthetimeoverthemeanFU
(4.8yrs)forwhichthepatientwasemployed&was
calculatedbasedonmeansalaryduringthisperiod.
Deyo,etal.,1991IIIAllMedicarerecipientsundergoinglumbaropin1985.Forthefusiongroup,RRw/95%CIforcomplicationsAgreatereconomiccostoffusionintheMedicare
DataprovidedfromHCFAonthesepatientsfromwas1.9(1.6–2.2),bloodtransfusion5.8(5.2–6.6),population.Lumbarfusionisassociatedw/great-
1yrpreopto4yrspostop.27,111patientswerestud-6-wkmortality2.0(1.2–3.4),assistedliving2.2(1.7–ermorbidity,mortality,&useofhospitalresources
iedofwhom5.6%(1524)receivedlumbarfusion.3.0)(pϽ0.05).Theseresultswereconsistentbtwninolderadults.Noclearcohortoflumbardegen-
Hospitalizationsexaminedforcomplications,mortal-spinalstenosis&spondylolisthesisw/hospitalcostserativepopulationdefinedforcostcomparison.
ityat6wks,needforbloodtransfusion,&require-of$10,091(fusion)vs$6754(w/o)in1985dollars
mentsforassistedliving.Economicanalysiswas&asignificantlyshorterLOS(pϽ0.05ineachcate-
completeforLOS&costs.gory).
Deyo,etal.,1992III18,122hospitalizationsforlumbarspineop(84%in-~15%ofpatientsunderwentarthrodesis.Thecomplica-Patientswhoundergofusioninabroadpopulation
volvedspinalstenosisordiscdisplacement)fromtionratewas17.4%w/fusion&7.6%w/o(pϽaremoreapttohavelongerLOSsw/greatercom-
1986–1988.15,280surgeriesw/oarthrodesis&0.0005).TheLOSwas7.6daysw/fusion&5.4w/oplicationrates&utilizationofhealthcareresourc-
2785includedarthrodesis.Hospitalizationsexamin-(pϽ0.0005).Thecostin1986–1988dollarswas$6491es.Noclearcohortoflumbarfusionpopulation
edforcomplications.Economicanalysiswascom-w/fusion&$3793w/o(pϽ0.0005).Nodetailsweredefinedforcostcomparison.
pleteforLOS&costs.givenformortality.
Franklin,etal.,1994III388patientsinWorkers’CompensationsysteminEmploymentwas16,32,&49%over1,2,&3yrs.ItEmploymentasaneconomicindicatormaybeused
Washingtonstate(1986–1987)whounderwentfu-waslesslikelytooccurinthiscohortthanhistoricalasanoutcomemeasurebutothercontrolgroups
sion.Patientsatisfactionstudiedalongw/economiccontrols(RR=0.66,0.88,&0.93)at1,2,&3yrs;shouldbeconsidered.
recoverybypatient.Simplesatisfactionsurveyexam-23%requiredreop&instrumentationdoubledthisrisk.
inedback/legpain,QOL,decisiontoundergoop,
&employmentat2yrs.
Katz,etal.,1997III272patientsw/degenerativelumbarstenosis.Surgery:Individualsurgeonwaspredictorforarthrodesis.Hospi-Hospitalcostsofarthrodesis/fixationarehighestw/
decompression(194),decompressionw/arthrode-talcostswere$12,615(noarthrodesis),$18,495(arth-nocleardefinedbenefit.Arthrodesisalone
sis(37),&decompressionw/arthrodesis/fixationrodesis),$25,914(arthrodesis/fixation)(pϽ0.0001).showedimprovedreliefoflumbagoat6&24
(41).Outcomesassessedw/respecttowalkingca-Noreliabilitygivenforwalking,satisfaction,orhealthmosw/oreliability.Significantvariabilityintro-
pacity,back/legpain,satisfaction,healthstatusstatus.ducedbysurgeonchoiceforarthrodesis.
(SF-36),&hospitalcost.
Malter,etal.,1998III6376patientshadopforlumbardegenerativediseaseComplicationrate:18%(arthrodesis)to7%(none)(pϽTheeconomiccostsoflumbararthrodesisintheset-
(1041foropincludingarthrodesis,5335foropw/o0.001).Hospitalcostsgreaterw/fusion($7101&tingofstenosis,discdisplacement,spondylolis-
arthrodesis).Economicanalysisofhospitalization.$4161in1988dollars)(pϽ0.001).Reopratesimilarthesis,&degenerationaregreater.Noclearcohort
btwngroups,RR1.1(95%CI0.9–1.3).oflumbardegenerativepatientsusedforcomparison.
Kuntz,etal.,2000IIIAcost-effectivenessstudyoflaminectomy,laminec-TheQALYs&costswerecalculated&foundtobeLumbarlaminectomyw/noninstrumentedfusioncom-
tomyw/noninstrumentedfusion,&laminectomy$56,500forlaminectomyw/noninstrumentedfusionparedfavorablyw/decompression.Notenough
w/instrumentedfusion.Outcomewasassessedatcomparedw/laminectomyalone.Instrumentedfusiondataexistedonoutcomew/fixationtopresentit
6mos&longterm&basedonpriorreports.Peri-wassubstantiallyhigher($3,112,800).Improvedout-positivelyonaneconomicscale.
opcomplications&costs&reoprateswereallcomew/instrumentation(90vs80%)reducedtherel-
basedonpriorreports.ativecostoffixation.
Moller&Hedlund,2000II111patientsw/spondylolisthesiswhounderwentfu-Thefusion&exercisegroupshadsimilarnumbersofpa-RTEappearstobeanindicatorofimprovement.A
sion(77)orexercise(34).Patientswererandom-tientsondisabilityat2yrs(46vs45%);however,thesatisfactionsurveywasnotreliablystudiedbut
izedtothesegroupsiftheyhadՆ1yrofpain/sci-overallreductionwasgreaterforfusion(pϽ0.0001)didappeartobearesponsiveindicatorforout-
atica.Evaluationwascompletedat1&2yrsusingcomparedw/exercise(p=0.23).Thesatisfactionsur-come&satsfactionhadimprovedmoreafter
theDRI,asatisfactionsurvey(muchbetter,better,veyshowedgoodresponsestobesignificantlyhigherfusion.
unchanged,worse;wouldyourepeatop?),&RTE.intheopgroup(pϽ0.01).
Continued

associated with higher risk for complications than lumbar
surgery without fusion.
Katz and colleagues10
completed a prospective observa-
tional study of 272 patients with radiographically and clin-
ically documented lumbar stenosis. Patients were treated at
four centers by eight surgeons over a 4-year period. Sur-
gical treatment included decompression (194 cases), de-
compression with fusion (37 cases), or decompression with
fusion and internal fixation (41 cases). Patients were fol-
lowed for 24 months and assessed for walking capacity,
back and leg pain, satisfaction, and health status based on
the Sickness Impact Profile. Internal reliability was calcu-
lated for the walking, pain, and satisfaction scales. Hospital
costs were analyzed for each group. The individual sur-
geon, in this study, was the greatest predictor for the per-
formance of a fusion, with an RR of more than 10 based on
logistic regression. At 6 and 24 months, decompression
and fusion without internal fixation resulted in better relief
of back pain (p Ͻ 0.004 at 6 months; p Ͻ 0.01 at 24
months) compared with the other treatment groups. With
multivariate analysis, a trend was evident but did not reach
statistical significance. The reoperation rates were similar
in all three groups (p = 0.15). Mean hospital costs were
$12,615 per patient for decompression without fusion,
$18,495 per patient for decompression with fusion, and
$25,914 per patient for decompression with fusion and in-
ternal fixation (p Ͻ 0.0001). This study indicated that in-
creased costs for lumbar fusion may be offset by significant
functional gains in the patients who undergo fusion without
instrumentation. The medical evidence cited in this report is
considered Class III because of the retrospective nature of
the study and selection bias by the operating surgeons as to
which patients were treated with internal fixation.
Return to Employment
Two studies described resumption of employment
among patients with low-back pain and compared those
treated with lumbar fusion with those treated nonopera-
tively. Moller and Hedlund13
examined 111 patients over
a 5-year period who had chronic low-back pain or sciati-
ca for a minimum of 1 year as a result of isthmic spondy-
lolisthesis. Treatments included arthrodesis with internal
fixation (37 cases), arthrodesis without internal fixation
(40 cases), or exercise (34 cases). Evaluation was per-
formed at 1 and 2 years by using a Disability Rating In-
dex, a satisfaction survey, and return-to-work rates. In the
fusion and exercise groups there were similar numbers of
patients receiving disability payments at 2 years (46 and
45%, respectively); however, the surgical group had a
greater degree of improvement (75 and 46%, respective-
ly; p Ͻ 0.0001) compared with the exercise group (61 and
45%, respectively; p = 0.23). Fritzell, et al.,8
reported a
study of 294 patients with lumbar degenerative disease
involving chronic lumbago of at least 2 years’ duration
due to L4–5 and/or L5–S18 disc degeneration. Patients
were randomized to surgical or nonsurgical groups. Pa-
tients in the surgical group underwent posterolateral fu-
sion (73 cases), posterolateral fusion with internal fixation
(74 cases), or interbody fusion with internal fixation (75
cases). Seventy-two patients received medical manage-
ment including physical therapy. Evaluation was accom-
plished at 6, 12, and 24 months by using functional out-
TABLE1Continued
Slosar,etal.,2000III141patientsunderwentcircumferentiallumbarinstru-133FUpatients(10.5%=1,51.1%=2,19.5%=3,&Satisfactionappearstobearesponsiveoutcomemea-
mentedfusion,eitherprimary(31%)orsecondary18.8%=4).RTEoccurredin38%ofpatients;itwassureat37mos;however,itsreproducibilitywas
(69%).FUaveraged37mos&wasdonebyabasicmorelikelyinthosenotinvolvedw/Workers’Com-nottested.RTEisaresponsivemeasure&im-
satisfactionsurvey:1)opmetexpectations;2)oppensation(57vs22%;pϽ0.001).Therewasa20%provesinthenoncompensationpatients.Acom-
improvedmycondition;3)opimprovedbutwouldcomplicationrateincludedtransientweakness,infec-plicationrateof20%wasseen,suggestinganega-
notredo;4)opworsenedcondition.tion,&graftextrusion.tiveeconomicimpact.
Fritzell,etal.,2001II294patientsw/L4–S1discdegeneration&LBPwhow/overallassessment,63%intheopgroupindicatedSatisfactionsurveysseemtoberesponsiveoverthe
underwentop(222)orexpectant(72)managementtheywerebetterormuchbettercomparedw/29%sametimeinterval,&RTEindicatesresponsive-
overa6-yrperiod.Evaluationwascompletedat6,inthenonopgroup(pϽ0.0001).ThenetRTEratenessat24mos.
12,&24mosusingtheODI,Million,&generalwashigherintheopgroup(36%)thanthenonop
functionscorealongw/patientassessment.RTEgroup(13%,pϽ0.002).
wasalsomonitored.
Christensen,etal.,2002III148patientsunderwentlumbarfusionw/73inPLFRTEwassimilarinbothgroupsbutimprovedfrom24%RTEdidnotappeartoworkasaresponsiveindica-
groupand75inALIF/PLFgroup.Threepatientsto36%(nostatisticsusedtoanalyze)w/nodifferencetorofimprovement,&RTEseemedtocorrelate
werelosttoFU.RTEwasalsofollowedat2years.btwnsubgroups.w/animprovementinDPQscore.Nocontrols
FUperformedat0,1,&2yrs.wereused.
Christensen,etal.,2002II129patientsw/chronicLBP&isthmicspondylolis-Theinstrumentedgrouphada28%reopratecomparedMedicaloutcomebyreop&optimemaybeares-
thesis,primarydegeneration,orsecondarydegener-w/14%forthenoninstrumentedgroup(pϽ0.03).Opponsiveindicatorw/in5yrsoflumbarfusionsw/
ationwhounderwentinstrumentedornoninstru-time212vs127min(pϽ0.0001)w/greaterperiopinstrumentation.
mentedfusion.Outcomeat5yrswasdoneusingbloodloss(pϽ0.01).
functionalquestionnairesalongwithratesofRTE
&reop.
*ALIF=anteriorlumbarinterbodyfusion;DPQ=DallasPainQuestionnaire;FU=followup;HCFA=HealthCareFinancingAdministration;LBP=low-backpain;ODI=OswestryDisabilityIndex;
PLF=posterolateralfusion;QALY=quality-adjustedlifeyear;QOL=qualityoflife;RTE=returntoemployment;SF-36=ShortForm–36.

come questionnaires and return-to-work status. In an over-
all assessment, 63% in the surgical group indicated they
were better or much better following treatment compared
with 29% in the nonsurgical group (p Ͻ 0.0001). The net
return-to-work rate was 39% in the surgical group and
only 23% in the nonsurgical group (p Ͻ 0.05). These two
studies suggest that the resumption of employment is a
responsive economic outcome measure for patients with
low-back pain who may be considered surgical candi-
dates.
In the study by Franklin and colleagues7
of Workers’
Compensation patients, the end of total disability was
monitored in patients who underwent lumbar fusion be-
tween 1986 and 1987. The termination of total disability as
an end point occurred in 16% of treated patients at 1 year,
32% at 2 years, and 49% at 3 years; however, compared
with historical controls for Workers’ Compensation pa-
tients, the RR of ending total disability was less likely
among patients treated with lumbar fusion (0.66 at 1 year,
0.88 at 2 years, and 0.93 at 3 years) compared with Work-
ers’ Compensation patients as a whole. In contrast, Chris-
tensen, et al.,1
examined 148 patients who underwent lum-
bar fusion over a 3-year period: posterolateral fusion (73
cases) or a combination of posterolateral and anterior inter-
body fusion procedures (75 cases). Outcome was assessed
over 2 years. Overall improvement was greatest in the cir-
cumferential treatment group, with a lower reoperation rate
(22% compared with 7%, p Ͻ 0.009). The return-to-work
rate improved in both groups from 24 to 36% with no dif-
ference between subgroups. No statistical analyses were
used to assess the overall return-to-work rate. Slosar, et
al.,14
reported on 133 patients who underwent circumferen-
tial fusion during a 2-year follow-up period. In this group,
50 patients (38%) returned to work; 16 (22%) of the 73
injured workers resumed work compared with 34 (57%) of
the 60 patients who were not receiving Workers’ Com-
pensation (chi-square test, p Ͻ 0.001). These studies indi-
cated that return to work and/or termination of disability
payment are responsive measures for economic outcome
after lumbar fusion procedures. They further indicate that
the presence of a compensable injury is associated with a
lower rate of return to work.
Cost–Benefit Analysis
Kuntz and colleagues11
undertook a cost-effectiveness
analysis of lumbar fusion by constructing a hypothetical
model based on historical reports in prior clinical studies.
They examined lumbar laminectomy, laminectomy with
noninstrumented fusion, and laminectomy with instru-
mented fusion. Rates of clinical improvement and return
to employment were culled from series reported in the lit-
erature as were costs, complication rates, fusion rates, re-
operation rates, and the incidence of clinical worsening.
Each negative and positive outcome was assigned a rela-
tive value pertaining to quality of life, which the authors
adjusted according to hypothetical outcomes.
The authors determined that laminectomy with non-
instrumented fusion cost $56,500 per quality-adjusted
year of life compared with laminectomy alone.11
The addi-
tion of instrumentation to lumbar fusion procedure cost
$3,112,800 per quality-adjusted year of life. The authors
concluded that laminectomy with noninstrumented fusion
compared favorably with decompression alone; however,
improvement in outcome associated with instrumentation
was not well defined enough to accrue a benefit. The
authors noted that a hypothetical rate of 90% symptom
relief for patients treated with instrumented fusion com-
pared with 80% for noninstrumented patients would re-
duce the quality-adjusted year of life cost to $82,400.
Tunturi and colleagues15
analyzed 133 consecutive pa-
tients who underwent lumbosacral fusion between 1968
and 1975. Results were reported for 116 patients in whom
the mean follow-up period was 4.8 years. These authors
calculated the mean expense of the hospital stay and post-
operative visits in 1976 dollars. Economic benefits were
calculated based on return-to-employment rates compared
with the costs of continued disability. The rate of return to
employment was approximately 31%. The mean cost in
1976 US dollars for a lumbosacral fusion was $5569. The
mean economic benefit in 1976 US dollars for the same
period was $16,075. The calculated cost/benefit ratio was
therefore 1:2.9 for lumbosacral arthrodesis. The authors
concluded that lumbosacral fusion had a positive cost–be-
nefit ratio when return-to-employment status and the ter-
mination of disability payment were considered as indices
of economic outcome.
Summary
Lumbar fusion may be associated with a high short-
term cost, especially if instrumentation is placed; how-
ever, there appear to be long-term economic benefits as-
sociated with lumbar fusion including resumption of
employment. To describe the economic impact of lumbar
fusion for degenerative disease adequately, it is important
to define the patient population treated with fusion and to
compare efficacy as well as the costs of other treatment
alternatives. Any such analysis should include both short-
and long-term costs and benefits.
Key Issues for Future Investigation
The application of valid and reliable outcome measures
in conjunction with a complete short- and long-term eco-
nomic analysis will be necessary to assess fully the
economic impact of lumbar fusion. To reach meaningful
conclusions, it is imperative to compare the economic out-
comes of patients treated with lumbar fusion with those in
patients with similar disease treated without fusion and to
include all relevant costs. This analysis should include sub-
sequent operative and nonoperative medical care, ongoing
disability costs, and the costs of loss of productivity. Mea-
sures such as return-to-work status and quality-adjusted life
years must be included in to allow the development of
meaningful data.
References
1. Christensen FB, Hansen ES, Eiskjaer SP, et al: Circumferential
lumbar spinal fusion with Brantigan cage versus posterolateral
fusion with titanium Cotrel-Dubousset instrumentation: a pros-
pective, randomized clinical study of 146 patients. Spine 27:
2674–2683, 2002
2. Christensen FB, Hansen ES, Laursen M, et al: Long-term func-
tional outcome of pedicle screw instrumentation as a support
Economic Outcome
651

for posterolateral spinal fusion: randomized clinical study with
a 5-year follow-up. Spine 27:1269–1277, 2002
3. Davis H: Increasing rates of cervical and lumbar spine surgery
in the United States, 1979–1990. Spine 19:1117–1124, 1994
4. Deyo RA, Cherkin D, Conrad D, et al: Cost, controversy, crisis:
low back pain and the health of the public. Annu Rev Public
Health 12:141–156, 1991
5. Deyo RA, Cherkin D, Loeser JD, et al: Morbidity and mortali-
ty in association with operations on the lumbar spine. The influ-
ence of age, diagnosis, and procedure. J Bone Joint Surg Am
74:536–543, 1992
6. Deyo RA, Ciol MA, Cherkin DC, et al: Lumbar spinal fusion.
A cohort study of complications, reoperations, and resource use
in the Medicare population. Spine 18:1463–1470, 1993
7. Franklin GM, Haug J, Heyer NJ, et al: Outcome of lumbar
fusion in Washington State workers’ compensation. Spine 19:
1897–1904, 1994
8. Fritzell P, Hagg O, Wessberg P, et al: 2001 Volvo Award
Winner in Clinical Studies: Lumbar fusion versus nonsurgical
treatment for chronic low back pain: a multicenter randomized
controlled trial from the Swedish Lumbar Spine Study Group.
Spine 26:2521–2534, 2001
9. Katz JN: Lumbar spinal fusion. Surgical rates, costs, and com-
plications. Spine 20 (24 Suppl):S78–S83, 1995
10. Katz JN, Lipson SJ, Lew RA, et al: Lumbar laminectomy alone
or with instrumented or noninstrumented arthrodesis in degen-
erative lumbar spinal stenosis. Patient selection, costs, and sur-
gical outcomes. Spine 22:1123–1131, 1997
11. Kuntz KM, Snider RK, Weinstein JN, et al: Cost-effectiveness
of fusion with and without instrumentation for patients with de-
generative spondylolisthesis and spinal stenosis. Spine 25:
1132–1139, 2000
12. Malter AD, McNeney B, Loeser JD, et al: 5-year reoperation
rates after different types of lumbar spine surgery. Spine 23:
814–820, 1998
13. Moller H, Hedlund R: Surgery versus conservative manage-
ment in adult isthmic spondylolisthesis—a prospective, ran-
domized study: part 1. Spine 25:1711–1715, 2000
14. Slosar PJ, Reynolds JB, Schofferman J, et al: Patient satisfac-
tion after circumferential lumbar fusion. Spine 25:722–726,
2000
15. Tunturi T, Niemela P, Laurinkari J, et al: Cost-benefit analysis
of posterior fusion of the lumbosacral spine. Acta Orthop
Scand 50:427–432, 1979
Accepted in final form March 22, 2005.

Recommendations
Standards. Static lumbar radiographs are not recom-
mended as a stand-alone means to assess fusion status fol-
lowing lumbar arthrodesis surgery.
Guidelines. 1) Lateral flexion and extension radiogra-
phy is recommended as an adjunct to determine the pres-
ence of lumbar fusion postoperatively. The lack of motion
between vertebrae, in the absence of rigid instrumenta-
tion, is highly suggestive of successful fusion. 2) Tech-
netium-99 bone scanning is not recommended as a means
to assess lumbar fusion.
Options. Several radiographic techniques, including sta-
tic radiography, lateral flexion–extension radiography, and/
or CT scanning, often in combination, are recommended
as assessment modality options for the noninvasive evalu-
ation of symptomatic patients in whom failed lumbar fu-
sion is suspected.
Rationale
Lumbar fusion is performed in patients with pain due to
lumbar degenerative disease. An outcome measure fre-
quently cited in studies evaluating lumbar fusion tech-
niques is the “radiographic fusion rate;” however, radio-
graphic fusion is not consistently defined throughout the
literature. The purpose of this review is to examine the lit-
erature regarding the ability of various diagnostic tech-
niques to assess fusion status after lumbar fusion is per-
formed to treat degenerative disease.
Search Criteria
A computerized search of the database of the National
Library of Medicine between 1966 and July 2003 was con-
ducted using the search terms “lumbar spine fusion assess-
ment,” “lumbar spine pseudoarthrosis,” or “lumbar spine
fusion outcome.” The search was restricted to references in
the English language involving humans. This yielded a to-
tal of 1076 references. The titles and abstracts of each of
these references were reviewed. Only papers concerned
with the assessment of fusion status following arthrodesis
procedures for degenerative lumbar disease were included.
Additional articles were obtained from the bibliographies
of the selected articles. Forty-five references were identi-
fied that provided either direct or supporting evidence rele-
vant to the radiographic assessment of lumbar fusion status.
Reports involving Class III or better medical evidence are
listed in Table 1. Supportive data are provided by addition-
al references listed in the bibliography.
Open surgical exploration is the only method that al-
lows direct inspection of fusion integrity. This procedure
Part 4: radiographic assessment of fusion
KEY WORDS • lumbar spine • fusion • radiography • treatment outcome •
practice guidelines
653
Abbreviations used in this paper: CT = computerized tomogra-
phy; NPV = negative predictive value; PPV = positive predictive
value; RSA = roentgen stereophotogrammetric analysis.

is considered the gold standard of lumbar fusion assess-
ment.6,7
It is, therefore, an appropriate benchmark to use in
establishing the accuracy and predictive value of noninva-
sive radiolographic studies for the assessment of fusion
status following attempted lumbar fusion surgery.
Plain Radiographs (static)
Anteroposterior and lateral radiographs can demon-
strate a continuous bone mass between adjacent vertebral
segments following lumbar fusion. Because of their rela-
tively low cost, widespread availability, and long history
as a means of assessing fusion, plain spinal radiography
remains a common method of assessment of lumbar fu-
sion;6
however, the limitations of static plain radiography
as a reliable test for determining the presence or absence
of a solid fusion have been well documented. Brodsky,
et al.,3
reported a 64% correlation between preoperative
plain radiographs and surgical exploration in a retrospec-
tive study of 214 lumbar fusion exploration procedures in
patients who had undergone prior posterolateral fusion.
Plain radiography had an 89% sensitivity and 60% speci-
ficity for predicting solid fusion. Radiographs interpreted
as demonstrating fusion had a PPV of 76%. Those predict-
ing pseudarthrosis had an NPV of 78%. These data indicate
a 0.18 likelihood ratio for a false-positive result (chance of
a pseudarthrosis discovered at exploration when radiogra-
phy indicates fusion), and a 2.25 likelihood ratio for a neg-
ative test result (chance of a fusion discovered at explo-
ration when the radiography suggests pseudarthrosis).3
The
medical evidence provided by this review is considered
Class II for the use of plain lumbar radiography compared
with open surgical exploration to assess fusion because of
the authors’ selection bias for open exploration.
Similarly, in a retrospective study of 75 patients, Kant and
coworkers11
found a positive correlation between static radi-
ography and surgical exploration of lumbar fusion in 68% of
their patients (sensitivity 85%, specificity 62%, PPV 76%,
and NPV 54%). The likelihood ratio for a positive result was
0.81, and the likelihood ratio for a negative result was 2.24.
Finally, in a study of 49 patients treated with posterolateral
and posterior interbody fusion with internal fixation, Blu-
menthal and Gill1
compared findings on anteroposterior and
lateral radiographs (interpreted by two surgeons and two ra-
diologists) with surgical exploration of the fusion mass at the
time of reoperation for hardware removal. They reported a
69% agreement between the radiographic diagnosis and sur-
gical findings. The accuracy among the four physicians in-
terpreting the radiographs ranged from 57 to 77% (false-pos-
itive rate 42%, false-negative rate 29%). These authors
concluded that plain radiography has limited accuracy and
validity for the assessment of lumbar fusion. Furthermore,
they noted significant intra- and interobserver variation, indi-
cating a lack of reliability (␬ 0.4–0.7). Their study provides
Class I medical evidence indicating that static radiography is
only accurate in determining fusion status in roughly two
thirds of cases. Therefore, static anteroposterior and lateral
radiographs are not recommended as a stand-alone assess-
ment of the presence of an arthrodesis after lumbar fusion
surgery for degenerative disease.
Flexion–Extension Radiography
In 1948 Cleveland, et al.,6
advocated the use of dynam-
ic lumbar spinal radiography rather than static radiography,
for the diagnosis of pseudarthrosis following attempted
lumbar fusion surgery. Other authors have also suggested
that lateral lumbar flexion–extension radiography allows
for appropriate assessment of fusion status.4
There has been
disagreement, however, on the number of allowable de-
grees of motion at the treated (fused) levels for determining
the presence or absence of successful bone fusion.16
Brodsky, et al.,3
compared the findings of lumbar flex-
ion–extension radiography to surgical exploration in a
series of 175 patients who underwent reoperation for var-
ious indications following instrumented and noninstru-
mented lumbar fusion. They found a 62% correlation be-
tween preoperative flexion–extension radiography and
intraoperative findings at exploration (specificity 37%,
sensitivity 96%, PPV 70%, and NPV 86%). Their study
provides Class II medical evidence that the absence of
motion on flexion–extension x-ray films is highly sugges-
tive of a solid fusion. The occurrence of some degree of
motion at the treated levels, however, does not necessari-
ly indicate a pseudarthrosis.
Computerized Tomography Scanning
Since the introduction of CT scanning in the 1970s, this
modality has been used to assess lumbar fusion. Early stud-
ies involved axial sequences alone. Brodsky, et al.,3
used
6-mm axial slice CT scans and demonstrated a 57% corre-
lation between fusion assessment based on these scans
compared with direct surgical exploration in a series of 214
operations on 175 patients. Computerized tomography
scanning had a sensitivity of 63%, specificity of 86%, PPV
of 72%, and an NPV of 81%. Laasonen and Soini12
con-
ducted a retrospective review of 20 patients who underwent
CT scanning prior to surgical exploration and found an
approximate 80% correlation between the CT study–based
diagnosis of fusion and intraoperative diagnosis of fusion.
Since the publication of these earlier studies, CT imaging
technology has advanced. The use of thin-section axial se-
quences, improved resolution, and multiplanar imaging ca-
pability has enhanced the ability of CT scanning to assess
lumbar fusion status. There have been no studies compar-
ing these more advanced CT scanning capabilities with
direct surgical exploration. Lang and colleagues14
found
that the addition of thin-slice and multiplanar CT scanning
resulted in a higher rate of detection of pseudarthrosis com-
pared with plain radiography. Similarly, Chafetz, et al.,5
demonstrated that direct coronal CT scanning may be more
sensitive than two-dimensional reconstructed coronal CT
images for the detection of pseudarthrosis. Zinreich and col-
leagues21
reported that three-dimensional CT reconstruction
may be more sensitive than two-dimensional CT recon-
struction for the detection of pseudarthrosis. Siambanes and
Mather20
demonstrated that multiplanar CT imaging detect-
ed pseudarthrosis in patients who had undergone posterior
lumbar interbody fusion compared with plain radiography
that had suggested a solid fusion. Santos and colleagues18
examined 32 patients who underwent anterior lumbar inter-
body fusion with carbon fiber cages. Plain static radiographs
were interpreted to demonstrate fusion at 86% of the as-
sessed levels. Flexion–extension lumbar radiography sug-
gested fusion rates ranging from 74 to 96% in this same
group of patients, depending on the method used to analyze

Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine

Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine

Ähnlich wie Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Guidelines for the preformance of fusion procedures for degenerative disease of the lumbar spine