PDOE

Research Paper j
Information Retrieval Performance of Probabilistically
Generated, Problem-Specific Computerized Provider
Order Entry Pick-lists: A Pilot Study
ADAM S. ROTHSCHILD, MD, HAROLD P. LEHMANN, MD, PHD
A b s t r a c t Objective: The aim of this study was to preliminarily determine the feasibility of probabilistically
generating problem-specific computerized provider order entry (CPOE) pick-lists from a database of explicitly linked
orders and problems from actual clinical cases.
Design: In a pilot retrospective validation, physicians reviewed internal medicine cases consisting of the admission
history and physical examination and orders placed using CPOE during the first 24 hours after admission. They created
coded problem lists and linked orders from individual cases to the problem for which they were most indicated.
Problem-specific order pick-lists were generated by including a given order in a pick-list if the probability of linkage of
order and problem (PLOP) equaled or exceeded a specified threshold. PLOP for a given linked order-problem pair was
computed as its prevalence among the other cases in the experiment with the given problem. The orders that the
reviewer linked to a given problem instance served as the reference standard to evaluate its system-generated pick-list.
Measurements: Recall, precision, and length of the pick-lists.
Results: Average recall reached a maximum of .67 with a precision of .17 and pick-list length of 31.22 at a PLOP
threshold of 0. Average precision reached a maximum of .73 with a recall of .09 and pick-list length of .42 at a PLOP
threshold of .9. Recall varied inversely with precision in classic information retrieval behavior.
Conclusion: We preliminarily conclude that it is feasible to generate problem-specific CPOE pick-lists probabilistically
from a database of explicitly linked orders and problems. Further research is necessary to determine the usefulness of
this approach in real-world settings.
j J Am Med Inform Assoc. 2005;12:322–330. DOI 10.1197/jamia.M1670.
Two of the main technologies driving current interest in health
care information technology are computerized provider order
entry (CPOE) and electronic health records (EHR).1,2
Both
technologies have been touted by influential groups as critical
to improving patient safety and reducing medical errors,3,4
al-
though good reasons exist to implement such systems even in-
dependently of these issues.5
Several investigators have begun to show that integrating
CPOE with clinical documentation functionality in EHR sys-
tems may create synergies. Rosenbloom et al.6
have reported
improving levels of computerized physician documentation
by integrating clinical documentation functionality into
CPOE work flow. Jao et al.7
reported preliminary results
suggesting that integrating problem list maintenance func-
tionality into CPOE work flow may improve problem list
maintenance. We suggest that coupling CPOE with problem
list maintenance even more tightly may lead to further syner-
gies. In this paper, we explore one potential synergy. We inves-
tigated how explicitly linking orders to the problems that
prompted them could enable generation of problem-specific
CPOE pick-lists probabilistically, without the need for prede-
fined order sets.
Background
Several studies in the outpatient setting have shown that
CPOE may reduce clinician order entry time8
or at least be
time neutral.9,10
Nonetheless, an impediment to physician ac-
ceptance of CPOE systems in the inpatient setting is the (real
or perceived) increased time burden of order entry compared
with paper-based ordering.11–14
A major solution that has
emerged is the use of predefined, problem/situation-specific
order sets. An order set is a static collection of orderable items
Affiliation of the authors: Division of Health Sciences Informatics,
Johns Hopkins University School of Medicine, Baltimore, MD.
Dr. Rothschild is currently with the Department of Biomedical
Informatics, Columbia University, New York, NY.
Presented in part at the 2004 National Library of Medicine Training
Program Directors’ Meeting in Indianapolis, IN in a plenary session
and MEDINFO 2004 in San Francisco in poster form.
Supported by the National Library of Medicine training grant
5T15LM007452 (ASR).
The authors thank the following physicians for serving as reviewers:
David Camitta, Richard Dressler, Cupid Gascon, Mark Laflamme,
Arun Mathews, Zeba Mathews, Greg Prokopowicz, Danny Rosen-
thal, Shannon Sims, and Chuck Tuchinda. The authors thank Bill
Hersh, George Hripcsak, and Gil Kuperman for their expert advice
on analysis and manuscript preparation.
Correspondence and reprints: Adam S. Rothschild, MD, Department
of Biomedical Informatics, Columbia University, Vanderbilt Clinic
5th Floor, 622 West 168th Street, New York, NY 10032; e-mail:
<adam.rothschild@dbmi.columbia.edu>.
Received for publication: 08/11/04; accepted for publication:
01/24/05.
322 ROTHSCHILD, LEHMANN, Probabilistic CPOE Pick-lists

that are appropriate for a given clinical situation. Order sets
are usually presented to the end user as pick-lists. The liberal
use of order sets has been named as a key factor in achieving
physician acceptance of CPOE implementations.15,16
Other
solutions for decreasing the time burden of CPOE have also
emerged. Lovis et al.17
have shown promise in the use of a nat-
ural language–based command-line CPOE interface, but inci-
dental findings from the same study nonetheless suggest the
speed superiority of order sets. The use of preconfigured or-
ders (e.g., medication orders whose dose, frequency, and
route of administration have already been specified) and hier-
archical order menus have also emerged as time-saving tech-
niques for placing common orders.18
Centrally defined order sets are believed to provide an advan-
tage in increasing clinical practice consistency and adherence
to evidence-based practice guidelines3
; however, an important
advantage of order sets from the end user’s perspective is that
they save time when entering orders via CPOE.15,19
Indeed,
personal order sets, which are order sets that individual users
can create, seem to exhibit much variability from user to user,
but they are nonetheless popular among end users,19,20
osten-
sibly because of their time-saving advantages. Few if any re-
ports have described the extent to which currently deployed
order sets, whether centrally defined or individually defined,
are based on scientific evidence.
In many currently available CPOE systems, problem-specific
order pick-list functionality is present in the form of static or-
der sets or collections of orders on menus. Centrally defined
order sets frequently have the benefit of institutional back-
ing, yet static order sets have several inherent problems.
First, Payne et al.18
have shown that centralized order set
configuration is time-consuming and that 87% of their cen-
trally defined order sets were never even used, although
their study did show higher proportional use of preconfig-
ured orders. This suggests that much time and many re-
sources are being spent on constructing unused order sets
and other order configuration entities. Second, newly emerg-
ing best practices may be slow to be incorporated into cen-
tralized order sets.21
Third, centralized order sets are often
defined only for frequently occurring problems, leaving rarer
problems unsupported. Finally, as the number of order sets
in a CPOE system increases, locating the appropriate order
set among all the available order sets can be difficult.
We believe that explicitly linking patients’ problems with the
orders used to manage those problems may be a source of syn-
ergies between order entry and other processes in the health
care information flow, including order entry itself, results re-
trieval, clinical documentation, reimbursement, public health
reporting, clinical research, quality assurance, and many
others. We believe that we can harness these synergies by em-
ploying the problem-driven order entry (PDOE) process.
PDOE requires the clinician to perform two activities with
the EHR/CPOE system, which together constitute a necessary
and sufficient embodiment of the process: (1) enter the pa-
tient’s problems using a controlled terminology and (2) enter
every order in the context of the one problem for which it is
most indicated. Orders may be entered via a pick-list or an-
other mechanism such as a command-line interface, but
each order must be entered in the context of a problem.
Through this process, PDOE produces a database of linked or-
der-problem pairs (LOPPs). We hypothesize that it is feasible
to use a database of LOPPs to generate useful problem-spe-
cific order pick-lists for CPOE by employing a simple proba-
bilistic pick-list generation algorithm. This is an example of
case-based reasoning.
With respect to the orders that a pick-list contains, we sug-
gest that the ideal problem-specific pick-list embodies the
following criteria: (1) all orders that the user wants to place
for a given problem are in the pick-list (i.e., perfect recall),
(2) all orders in the pick-list are orders that the user wants
to place for the given problem (i.e., perfect precision), and
(3) the orders in the pick-list reflect clinical best practice
and are backed by scientific evidence. Based on these crite-
ria, the information retrieval metrics of recall and precision
are necessary, albeit not sufficient, measures of the useful-
ness of problem-specific order pick-lists. Acknowledging
this limitation, we believe that they are reasonable for this
pilot study.
Methods
Case Selection
After obtaining an institutional review board exemption, the
legacy CPOE system22
at Johns Hopkins Hospital (JHH) was
queried for orders and identifying data for all patients admit-
ted between June 19 and October 16, 2003. Records from this
query were joined with case mix data to identify those cases
in which the patient was admitted to the Internal Medicine
service (one of several general internal medicine services at
JHH) and the length of stay was less than seven days.
The International Classification of Diseases, Ninth Revision
(ICD-9)–coded admission diagnosis was also selected. We ar-
bitrarily chose the seven-day length of stay cutoff to try to limit
the complexity of the cases, reasoning that complex cases tend
to have longer lengths of stay. We wanted to limit case com-
plexity to simplify the core study tasks for our reviewers.
We selected cases for inclusion in the study in which the fre-
quency of the ICD-9–coded admission diagnosis within the
above set was greater than six. We arbitrarily set the cutoff
at six to yield what we believed would be a manageably small
number of cases given our resource constraints but perhaps
sufficient to demonstrate the ability of our approach to gener-
ate useful pick-lists. Although the utility of ICD-9–coded di-
agnoses for clinical purposes has been questioned,23,24
we
used them to help us identify subsets of cases that would
be at least moderately similar to one another. We aimed to
achieve a moderate level of similarity among cases in our
sample because LOPP similarity across cases is the basis for
our process of probabilistic pick-list generation, and we
hoped to demonstrate its feasibility using our sample.
We eliminated cases in which insufficient data were present to
carry out the requirements of this study based on two pro-
spective criteria: (1) unavailability of the paper medical
record or (2) an order count less than ten. We arbitrarily de-
cided that cases with so few orders would not contribute suf-
ficient data to be cost-effective in this experiment.
Case Preparation
The paper medical records for our study set were requisi-
tioned and retrieved, and the handwritten history and phys-
ical document (H & P) that was written by the physician who
entered the admission orders was photocopied, digitally
scanned, and manually de-identified. This physician was
323Journal of the American Medical Informatics Association Volume 12 Number 3 May / Jun 2005

usually an intern, although many patients were admitted by
a moonlighting resident trained at Hopkins. Also, in a few
cases, the admitting physician’s H & P was not available in
the chart, so the attending physician’s or medical student’s
H & P was used. In these few cases, the person who entered
the orders (i.e., an intern) was different from the person who
wrote the H & P. We chose to use the H & P because it is an
information-rich narrative document25
that tends to explicitly
or implicitly offer reasons for orders placed during the early
part of a patient’s hospital stay. This experiment required
third-party physicians to enumerate patients’ problems and
to link orders to the problem for which they were indicated,
so we needed such documentation that correlated temporally
with orders. As we were limited by time and manpower, we
chose to focus on the period of the first 24 hours after admis-
sion since a disproportionately large number of orders are
placed during this time period.
For cases that met our inclusion criteria, we selected the or-
der records for all orders placed within the 24-hour period
following the time of the first order placed for a given
case. As the legacy CPOE system allowed end-user modifica-
tion of orders’ text descriptions as well as free-text order
entry, these order records required standardization before
being used in this experiment. These standardized orders
would essentially serve as our controlled terminology for
describing orders. To standardize the records, we first
manually deleted order records that we considered to be
meaningless or advantageous to enter in ways other than
as orders, as might be implemented in future clinical infor-
mation technology infrastructures. We considered an order
record meaningless if it did not describe an executable order
(e.g., ‘‘NURSING MISCELLANEOUS DISCHARGE ORDER’’).
We considered a record possibly advantageous to enter
in a way other than as an order if that order was adminis-
trative in nature and did not deal specifically with patient
care (e.g., ‘‘DISCHARGE NEW PATIENT’’ or ‘‘CHANGE
INTERN TO’’) or if the concept that the order described
might be more usefully classified as a problem (e.g.,
‘‘ALLERGY’’).
Next, we manually standardized the syntax of the remaining
order records by eliminating order attribute information. This
involved removing text referring to medication dose, timing,
and route of administration. Terms for medications that re-
ferred to qualitatively distinct preparations were left as sepa-
rate terms (e.g., the sustained release form of a drug was
considered to be different from the regular form). This stan-
dardization also involved standardizing highly similar free-
text orders to a single term (e.g., ‘‘EKG’’ standardized to
‘‘12-LEAD EKG’’).
Next, we eliminated duplicate order records within individ-
ual cases. This occurred frequently, for example, where se-
rial measurements of cardiac enzymes were ordered for
the purpose of assessing a patient for the presence of a myo-
cardial infarction. JHH’s CPOE system recorded serial or-
ders by placing multiple instances of the same order, all
of which were retrieved in our query. We chose to eliminate
such duplicates because, regardless of how serial orders
may be stored in a CPOE system, many CPOE systems al-
low ordering of serial tests through a single order with the
order’s serial nature expressed as an attribute of the order
rather than having to explicitly enter a separate order for
each instance of the test, and in this study, we simulated
the order entry process. Finally, the display sequence of or-
ders within each case was randomized at the storage level
to control for possible confounding on order-problem link-
ing from order sequence.
Reviewers
Ten physicians from seven different institutions agreed to be
reviewers for this study, serving as proxies for the actual
physicians who authored the H & Ps and entered the orders
for our study cases.
Study Tasks and Data Collection
Each of the ten reviewers was assigned 17 cases. Five of these
17 cases, randomly selected as a stratified sample from each
of the five most frequent ICD-9 admission diagnoses, were re-
dundantly assigned to all reviewers for a substudy looking at
problem list agreement; results are not discussed in this re-
port. Of the remaining 120 cases, 12 were randomly and non-
redundantly assigned to each reviewer. For each assigned
case, the reviewers reviewed the H & P and orders, entered
a coded problem list, and linked each order to the problem
for which it was most indicated (i.e., created a LOPP in-
stance). The reviewers could not add or delete orders. They
entered all data directly using PDOExplorer, a Web-based in-
terface to a relational database, which we designed specifi-
cally for this study.
The reviewers created coded problem lists by entering
problems using free-text and mapping them to a Unified
Medical Language System (UMLS) concept.26
UMLS look-up
functionality was implemented using the UMLSKS Java
API findCUI method of the KSSRetriever class, which was
called using the English-language 2003AC UMLS release27
and the normalized string index as arguments. Because this
was an internal research application, we did not limit source
vocabularies against which problem list entries could be map-
ped. After selecting a UMLS concept, reviewers could add
one of the following qualifiers: ‘‘history of,’’ ‘‘rule out,’’ ‘‘sta-
tus-post,’’ and ‘‘prevention of/prophylaxis.’’ We actively dis-
couraged use of qualifiers in the study instructions and by
visual prompting in the user interface. If the UMLSKS did
not return any potential matches for a given input term or
if the users thought that none of the returned UMLS concepts
were appropriate matches, they were given the option of
searching UMLS with a new term. They were also given the
option of adding the free-text term to the problem list, which
we also actively discouraged in the study instructions and by
visual prompting in the user interface. Every problem list au-
tomatically contained the universal problem ‘‘Hospital ad-
mission, NOS.’’ Although the reviewers were free to choose
not to link any orders to this problem, we suggested in the in-
structions that it might be appropriate for orders relating to
the patient’s simply being admitted to the hospital (e.g.,
‘‘notify house officer if . ’’). Reviewers were not permitted
to delete this problem.
Data collection took place between mid-February and mid-
April 2004. Reviewers were trained either in person or via
telephone. During this training, they familiarized them-
selves with PDOExplorer using a practice case that was
not included for data analysis. The complete written in-
structions were displayed before beginning their first real
case and were available in PDOExplorer for review at any

point during the experiment. Although we briefed the re-
viewers on the experiment’s hypothesis, we did not give
them guidance on how to make case-specific decisions.
The reviewers were not shown their actual cases before
data collection.
‘‘Problem instance’’ refers to a single assertion of a problem
by a reviewer for a specific patient. Similarly, ‘‘order instance’’
refers to a single clinical order of a specific patient, and
‘‘LOPP instance’’ refers to the linking of an order instance
to a problem instance.
Analysis
The main unit of analysis in this experiment was the simu-
lated order pick-list that was generated for each problem in-
stance that participated in at least one LOPP instance (i.e.,
each ‘‘linked problem’’ instance). The pick-list generation al-
gorithm that we employed caused a given order to be in-
cluded in the pick-list of a given problem instance if the
probability of linkage of order and problem (PLOP) equaled
or exceeded a specified threshold. We refer to this threshold
in the remainder of this paper as the ‘‘PLOP threshold.’’ We
computed the PLOP as the proportion of cases containing
the linked problem that also contained the LOPP. The
LOPPs of the problem instance whose pick-list was being
evaluated were not included in the pool for calculating its re-
spective PLOP.
We calculated recall, precision, and pick-list length for each
pick-list for the range of PLOP thresholds from 0 to 1 in incre-
ments of 0.1. Problems that had qualifiers were considered as
distinct entities from their base problem and from each other
(e.g., ‘‘rule out myocardial infarction’’ was considered to be
different than both ‘‘myocardial infarction’’ and ‘‘history of
myocardial infarction’’). For a given pick-list and its reference
standard, recall is defined as the proportion of orders in the
reference standard that are also in the pick-list. Precision is
defined as the proportion of orders in the pick-list that are
also in the reference standard. Every problem instance had
its own independent reference standard, namely, the set of or-
ders that the reviewer linked to that problem instance. Just as
a clinician in real time can enter inappropriate orders for
a problem, so could a reviewer have linked inappropriate or-
ders to a problem. We used the reference standards in lieu of
the orders that the clinician actually caring for a given patient
might have thought were indicated for a given problem
instance.
To summarize pick-list performance, we averaged recall, pre-
cision, and length of all pick-lists in the experiment at each
PLOP threshold. Precision for pick-lists with a recall of
0 was incalculable, so these pick-lists were not included in
the average for precision. We also computed cumulative fre-
quency distributions for recall, precision, and pick-list length
at each PLOP threshold.
Results
Demographics of Physician Reviewers
All except two of our reviewers had formal training
in internal medicine: an emergency medicine physician
with eight years of post-residency clinical experience
and a family practitioner with two years of post-residency
clinical experience. Of those trained in internal medicine,
one had one year of post-residency clinical experience, one
had two years of post-residency clinical experience, one
had seven years of post-residency clinical experience, two
were in the third year of their residency, one was in the sec-
ond year of her residency, one had completed 1½ years of
residency, and one had postponed his clinical training after
his internship. Five of these physicians were engaged in
formal medical informatics training at the time of data
collection.
Description of the Sample
A total of 137 cases met our inclusion criteria. The medical rec-
ords department was unable to provide us with the paper
medical records for nine cases, which were either missing or
in use for other purposes such as reimbursement coding.
We eliminated three cases for containing fewer than ten or-
ders. These cases had only two orders each. Table 1 shows
the ICD-9 admission diagnoses of the cases that were used
in this study.
Table 2 shows descriptive statistics about orders and re-
viewer-entered problems in the study. Where distributions
are normal, the means and 95% confidence intervals are
given; where distributions are skewed, the medians and
ranges are given.
Pick-list Information Retrieval Performance
Figure 1 shows average recall and precision through the
range of PLOP thresholds from 0 (right-most point) to 1
(left-most point). A PLOP threshold of 0 causes a given order
to be included in the pick-list if it is linked to any other in-
stance of the given problem in the experiment; a PLOP thresh-
old of 1 causes a given order to be included in the pick-list
only if it is linked to all other instances of the given problem
in the experiment. A point on the curve summarizes the re-
sults at a given PLOP threshold. It represents the estimated
precision of a randomly chosen pick-list that contained at
least one order and the estimated recall of a randomly chosen
pick-list regardless of whether it contained any orders.
Figure 2 shows average pick-list length as a function of
PLOP threshold. The cumulative frequency distributions of
recall, precision, and pick-list length as a function of PLOP
threshold are shown in Tables 3, 4, and 5, respectively.
Our experiment achieved a maximum average recall of .67 at
a PLOP threshold of 0 with a corresponding average precision
of .17 and average pick-list length of 31.22. This level of recall
means that on average, the pick-list contained 67% of the
Table 1 j Frequency Distribution of ICD-9
Admission Diagnoses in Sample
ICD-9 Diagnosis Frequency
Chest pain NOS 22
Syncope and collapse 19
Chest pain NEC 14
Congestive heart failure 10
Shortness of breath 9
Gastrointestinal hemorrhage NOS 9
Pneumonia, organism NOS 8
Asthma with acute exacerbation 8
Intermediate coronary syndrome 7
Cellulitis of leg 7
Abdominal pain, unspecified site 7
ICD-9 = International Classification of Diseases, Ninth Revision;
NEC = not elsewhere classified; NOS = not otherwise specified.

orders that the reviewing physician linked to a given problem
instance. This level of precision means that, on average, 17%
of the orders in the pick-list were among those that the re-
viewing physician linked to that problem in that case. As
the PLOP threshold was increased, the average precision in-
creased in an approximately linear fashion to a maximum
of .73 at a PLOP threshold of .9 with a corresponding average
recall of .09 and average pick-list length of .42.
Discussion
Order sets (or, more generally, order pick-lists) for CPOE
serve two purposes: (1) to increase clinician adherence to es-
tablished best practices and (2) to increase order entry speed.
The basic task of entering an order into a CPOE system is
composed of two subtasks: (1) retrieving a given order from
the set of all available orders in the system and (2) populating
that order’s attributes with the appropriate values. How effi-
ciently these subtasks can be performed may contribute to the
speed of order entry. Our probabilistic pick-list generation
process serves the second of the above-mentioned purposes
by hastening the first of the above-mentioned subtasks, and
it does this without the need for predefined order sets. We
preliminarily evaluated the usefulness of these probabilisti-
cally generated pick-lists in a laboratory setting by computing
the information retrieval measures of recall and precision.
With our pilot study, we have shown that our approach can
produce pick-lists with fair to good recall and precision.
The pick-list generation algorithm that we used in this exper-
iment produces pick-lists of orderable items but does not ad-
dress how to populate orders’ attributes. We realize that order
sets and order menus in conventional CPOE systems fre-
quently contain statically preconfigured orders and that
they can save clinicians time when entering orders. We re-
moved order attribute information in our experiment because
we did not want to conflate retrieval of an order class with
specification of its attributes. We can imagine methods for
CPOE systems to configure orders dynamically. One such
method might use a rule-based decision support system to
suggest appropriate dosing parameters for a given medica-
tion based on patient-specific data available in an EHR and
a medication knowledge base. Another such method might
suggest attributes for a given medication order based on
how those attributes have been populated when that medica-
tion has been ordered in the past, either for a population or for
a specific patient. This latter method is analogous to the pro-
cess of pick-list generation that we describe in this paper; the
method is simply applied to the attribute population subtask
of order entry rather than the order retrieval subtask.
Our results generally exhibited classic information retrieval
behavior,28
with precision increasing and recall decreasing
as the item inclusion criterion stringency (i.e., the PLOP
Table 2 j Descriptive Statistics about Sample
Item Value
No. of distinct orders in sample 433
Median no. of instances per order 2 (range: 1–120)
Average no. of orders per case 30.46 (95% CI: 28.79–32.12)
Average no. of linked order-
problem pairs per case
30.34 (95% CI: 28.68–32.00)
Median no. of instances per linked
order-problem pair
1 (range: 1–116)
Median no. of linked-to orders per
problem instance
2 (range: 1–25)
No. of distinct problems in
experiment
291
No. of distinct UMLS-coded
problems in experiment
288
% distinct problems in experiment
coded with UMLS
99.0%
No. of distinct linked-to problems
in experiment
213
Median no. of instances per linked-
to problem
1 (range: 1–120)
Average no. of linked problem
instances per case
7.35 (95% CI: 6.89–7.81)
CI = confidence interval; UMLS = Unified Medical Language
System.
F i g u r e 1. Average recall and precision of probabilistically
generated, problem-specific order pick-lists as a function of
probability of linkage of order and problem (PLOP) thresh-
old. Each data point represents the average recall and
precision at a different PLOP threshold, ranging from
0 (right-most point) to 1 (left-most point).
F i g u r e 2. Average length of probabilistically generated,
problem-specific order pick-lists as a function of probability
of linkage of order and problem threshold.

threshold) was increased. As expected, average recall reached
its maximum at the PLOP threshold of 0 and its minimum at
the PLOP threshold of 1. Although average precision reached
its minimum at the PLOP threshold of 0 as expected, it did not
reach its maximum at a PLOP threshold of 1. There was a flat-
tening of the curve at PLOP thresholds from 0.7 to 0.9 and
a precipitous drop in average precision at the highest PLOP
threshold value (indicated by the left-most segment in
Figure 1), where classic information retrieval behavior would
have predicted it to be at its maximum. We believe that these
artifacts were caused primarily by the presence of numerous
very low-frequency problems in our sample, but a thorough
discussion of this issue is beyond the scope of this paper.
Numerous controlled terminologies have been created spe-
cifically for coding problem lists,29–32
several of which are
UMLS source terminologies.31,33
SNOMED CTwas not yet in-
cluded in UMLS when we performed this experiment, but it
has recently been selected to be the standard problem list ter-
minology by the National Committee on Vital and Health
Statistics.34,35
We chose to use UMLS as a controlled terminol-
ogy in its own right to maximize concept coverage36
; how-
ever, any terminology with sufficient coverage of our
domain would have been suitable. Although others have re-
ported using UMLS as a problem list terminology,37–39
we
were aware that UMLS was not designed to be used as a con-
trolled terminology in its own right.
Among the terminology-related issues that we encountered,
problem coding was at times inconsistent across reviewers
and cases. For example, the reviewers used three different
UMLS concepts to represent type 2 diabetes: diabetes melli-
tus; diabetes, non–insulin-dependent; and diabetes. By choos-
ing to use UMLS as a terminology, we knowingly sacrificed
concept redundancy and ambiguity in exchange for concept
coverage. In the interest of user interface simplicity, we chose
not to display additional concept-specific information avail-
able through UMLS, which might have improved problem
coding consistency.
Even though problem coding inconsistency among reviewers,
possibly caused in part by using UMLS as a terminology, may
have led to decreased pick-list information retrieval perfor-
mance, the empiric nature of our probabilistic pick-list gener-
ation process makes it permissive of inconsistent problem
coding. So long as the system has previously ‘‘seen’’ a prob-
lem coded in a particular way, it would be able to generate
a pick-list, possibly allowing clinicians to locate an appropri-
ate set of problem-specific orders more easily than via con-
ventional CPOE systems’ approaches. While this feature
could be achieved by storing problems as free text, mapping
Table 3 j Cumulative Frequency Distribution of Recall by PLOP Threshold
Recall
PLOP Threshold
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cumulative Frequency
0 0.24 0.27 0.35 0.39 0.41 0.48 0.55 0.60 0.67 0.76 0.92
0.1 0.24 0.27 0.35 0.39 0.42 0.49 0.56 0.61 0.68 0.88 0.92
0.2 0.25 0.29 0.38 0.42 0.46 0.53 0.61 0.64 0.72 0.90 0.93
0.3 0.26 0.30 0.40 0.46 0.50 0.56 0.63 0.67 0.74 0.91 0.94
0.4 0.28 0.33 0.46 0.52 0.56 0.62 0.68 0.72 0.78 0.92 0.95
0.5 0.34 0.43 0.56 0.61 0.65 0.69 0.76 0.78 0.87 0.94 0.96
0.6 0.35 0.45 0.58 0.64 0.67 0.71 0.80 0.82 0.91 0.94 0.96
0.7 0.38 0.51 0.63 0.69 0.71 0.76 0.85 0.87 0.91 0.94 0.96
0.8 0.42 0.58 0.68 0.73 0.76 0.80 0.86 0.88 0.92 0.94 0.96
0.9 0.45 0.62 0.72 0.76 0.78 0.81 0.87 0.89 0.92 0.94 0.96
1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
N 882 882 882 882 882 882 882 882 882 882 882
PLOP = probability of linkage of order and problem.
Table 4 j Cumulative Frequency Distribution of Precision by PLOP Threshold
Precision
PLOP Threshold
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.12 0.15 0.24 0.26 0.26 0.23 0.24 0.21 0.23 0.21 0.41
0.1 0.45 0.25 0.25 0.26 0.26 0.24 0.24 0.21 0.23 0.21 0.41
0.2 0.73 0.43 0.33 0.33 0.29 0.26 0.26 0.22 0.25 0.23 0.44
0.3 0.84 0.55 0.42 0.35 0.31 0.28 0.28 0.25 0.28 0.24 0.47
0.4 0.90 0.69 0.50 0.42 0.36 0.31 0.31 0.26 0.30 0.27 0.53
0.5 0.97 0.86 0.68 0.52 0.44 0.40 0.33 0.29 0.32 0.29 0.59
0.6 0.97 0.93 0.71 0.53 0.45 0.41 0.35 0.30 0.32 0.29 0.59
0.7 0.98 0.97 0.75 0.58 0.48 0.44 0.37 0.33 0.33 0.30 0.59
0.8 0.98 0.98 0.83 0.64 0.50 0.46 0.38 0.34 0.36 0.30 0.59
0.9 0.98 0.98 0.84 0.68 0.55 0.48 0.41 0.36 0.40 0.30 0.59
1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
N 754 754 752 728 702 596 523 447 376 266 116

problems to a controlled terminology may lead to more con-
sistent referencing of problems and would also enable addi-
tional CPOE system features (e.g., problem-specific decision
support) that free-text problems do not directly support.
Our approach to probabilistic pick-lists circumvents some
of the cognitive difficulties that currently available CPOE sys-
tems impose through their dependence on users’ internaliza-
tion of a rigid system-embedded conceptual model.40
Implications of Implementing Probabilistic Pick-
lists in Real-World Computerized Provider Order
Entry Systems
Our probabilistic pick-list generation process requires a data-
base of LOPP instances, which PDOE provides. This is why
we consider probabilistic pick-list generation in lockstep
with PDOE, even though PDOE could exist without probabi-
listic pick-lists. We have heard anecdotal reports of health
care institutions that have begun requiring clinicians to ex-
plicitly link orders to problems in some manner. We believe
that this paper is the first to discuss the use of dynamic
CPOE pick-lists in general and probabilistic CPOE pick-lists
in particular.
We suggest that probabilistic CPOE pick-lists may comple-
ment rather than replace deterministic pick-lists such as order
sets. Our problem-driven approach to generating probabilis-
tic pick-lists offers certain advantages over deterministic pick-
lists, but it also introduces some potential disadvantages. The
main advantage of probabilistic pick-lists is that they can pro-
vide problem-specific pick-list functionality without the need
to predefine order sets. This might be especially useful for
rare problems that do not have order sets defined or in the
early stages of a CPOE system roll-out before many (or any)
order sets have been created. Probabilistic pick-lists may be
able to serve as ‘‘rough drafts’’ to hasten the construction
of deterministic order sets. We can also imagine hybrid
approaches whereby both deterministically and probabilisti-
cally selected orders are included in the same problem-
specific pick-list. A hybrid approach could simultaneously
leverage deterministic pick-lists’ encouragement of adher-
ence to established best practices and probabilistic pick-lists’
dynamic adaptation to changes in clinical practice.21
Independent of pick-list generation, PDOE itself may offer
benefits. One immediate potential benefit of PDOE is strong
encouragement of clinicians to maintain accurate and up-
to-date coded problem lists, which themselves have many
benefits.41,42
Explicitly linking orders to problems provides
a logical scaffold for organizing an EHR that would allow
automatic display of order-related information (e.g., active
orders, test results, order status, and medication administra-
tion record information) in the context of its relevant problem
and that tightly integrates problem-specific order entry with
problem-specific clinical documentation.43
The database of
LOPP instances created by PDOE would enable dynamic gen-
eration of order pick-lists that take into account factors such
as problem-specific ordering history by clinician, problem-
specific ordering history by patient, and interactions between
pick-lists for a patient’s problems.
A potential disadvantage of probabilistic pick-lists is that
they may encourage popular but inappropriate ordering
behavior. Such behavior includes errors of both commis-
sion and omission of primary orders and corollary orders.
Probabilistic pick-lists are only as good as the aggregate
thoughts and actions of the clinicians who previously en-
tered orders for a given problem. The orders on probabilistic
pick-lists might be popular but not adhere to established
best practices. This issue may be especially salient when
the most current best practice for a given problem departs
from historic best practice. For example, best-practice anti-
biotic selection for nosocomial infections changes as new
microbial antibiotic resistance patterns emerge. If pick-lists
are based solely on past LOPP frequency, as they are when
using the simple pick-list generation algorithm that we de-
scribe in this paper, then new best practice orders for a given
problem might take a long time to get incorporated into its
pick-list (i.e., the system might have to encounter many
LOPP instances with the new order), depending on the given
PLOP threshold and the frequency of the given problem.
Further, probabilistically included pick-list items in potential
‘‘hybrid’’ pick-lists will need to be reconciled with determin-
istically included pick-list items. We will need to develop
more sophisticated pick-list generation algorithms to address
these and other issues.
As aggregate clinician ordering behavior changes, probabilis-
tic pick-lists would also change. The cognitive burden associ-
ated with dynamically generated and evolving pick-lists is
another disadvantage.44
Building an intuitive user interface
for probabilistic CPOE pick-lists will thus require especially
careful design.
Problem-driven order entry will cost clinicians time to enter
coded problem lists. Even though our anecdotal experience
Table 5 j Cumulative Frequency Distribution of Pick-list Length by PLOP Threshold
Pick-list Length
PLOP Threshold
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.15 0.15 0.15 0.17 0.20 0.32 0.41 0.49 0.57 0.70 0.87
2 0.24 0.28 0.52 0.63 0.72 0.78 0.82 0.84 0.84 0.97 0.97
4 0.35 0.42 0.66 0.73 0.83 0.84 0.85 0.86 0.86 0.99 0.99
8 0.46 0.55 0.82 0.85 0.86 0.86 0.86 0.86 1.00 1.00 1.00
16 0.59 0.79 0.85 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00
32 0.70 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
64 0.82 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
128 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
N 882 882 882 882 882 882 882 882 882 882 882

with the reviewers in this study did not suggest that this step
is highly burdensome, mapping free-text problems to a con-
trolled terminology may be able to be performed automati-
cally45
to save time. The added step of placing an order in
the context of a problem may also add time to the order
entry process. This might be especially burdensome for
‘‘one-offs’’ or very short ordering sessions,13
yet it is also pos-
sible that PDOE may decrease time spent on non-order entry
tasks such as searching for patient-specific clinical infor-
mation. Such effects have been described for conventional
CPOE and EHR systems.8,9,12,13
Study Limitations
Our study is limited by both sample and methodology. Our
sample included only 120 cases. All were inpatient cases
from the domain of general internal medicine and only cov-
ered the time period of the first 24 hours after admission.
We selected the cases in our sample by frequency of ICD-9 ad-
mission diagnosis. While our sample included only 11 com-
mon ICD-9 admission diagnoses, the diversity of problems
represented in our sample was substantially greater, as indi-
cated by the 291 distinct problems that our reviewers entered
in this experiment. Because our sample was small and non-
random, however, our results may not be generalizable to
problems and orders associated with less common internal
medicine ICD-9 admission diagnoses or to problems and or-
ders more commonly encountered in other medical and sur-
gical domains (e.g., pre- and post-procedure orders).
Our sample cannot be used to study how well our probabilis-
tic pick-list generation process might perform over patients’
entire hospital stays or in the outpatient setting because it en-
compassed only the 24-hour post-admission time period. Our
sample also cannot be used for comprehensively studying
how probabilistic pick-lists’ information retrieval perfor-
mance changes over time as the database of LOPP instances
grows because our set of cases was not chronologically
continuous.
We designed our study as a retrospective validation. For lo-
gistical reasons, the critical steps of problem list creation
and order-problem linking were performed by reviewers
rather than the clinicians who actually cared for the patients
whose cases we studied. In a few cases, the H & P was written
by a clinician other than the clinician who entered the orders;
the H & P by the ordering clinician might have contained
greater or less detail and different interpretations of clinical
data. Both of these issues may have introduced noise in iden-
tifying problems and linking orders to problems. We believe
that a retrospective validation was a reasonable study design
given the youth of the process that we sought to study, but we
acknowledge that pick-list performance in the real world
might vary from what we measured.
We averaged recall, precision, and pick-list length at each
PLOP threshold to generate the data points plotted in
Figures 1 and 2. Averaging allowed us to summarize the over-
all experimental results in a single graph, but in doing so, it
necessarily obscured information about the distribution of
data at each PLOP threshold. Since our data were not nor-
mally distributed, we could not conveniently estimate param-
eters by calculating confidence intervals or the like. This
problem is well-known in the information retrieval commu-
nity,46
who routinely average recall and precision47
despite
it. We present cumulative frequency distributions in lieu of
confidence intervals.46
Our experimental design limited the metrics that we could
employ to study the usefulness of probabilistic pick-lists.
We described above our reasoning for choosing the informa-
tion retrieval metrics of recall and precision. Before defini-
tively concluding the usefulness of PDOE and probabilistic
pick-list generation, other factors must be evaluated. One im-
portant such factor is the actual time required to enter orders
using probabilistic pick-lists with PDOE. Another such factor
is how well orders placed via probabilistic pick-lists reflect es-
tablished best practices. Other important-to-evaluate factors
are the external benefits of PDOE such as its effect on problem
list maintenance and its effect on the efficiency of performing
other clinical tasks.
Our analysis aggregated results only at the level of the indi-
vidual pick-lists. Pick-list performance may vary by patient,
ordering session, clinician type, individual clinician, problem,
problem type, etc. Future studies, which will require larger,
more diverse, and ideally prospective samples, would benefit
from analysis at other levels of aggregation.
Conclusions
We conclude that it is preliminarily feasible to use a simple
probabilistic algorithm to generate problem-specific CPOE
pick-lists from a database of explicitly linked orders and
problems. Further research is necessary to determine the use-
fulness of this approach in real-world settings.
References j
1. 14th Annual HIMSS Leadership Survey sponsored by Superior
Consultant Company: Health care CIO Survey Final Report.
[cited 2004 October 27]. Available from: http://www.himss.
org/2003survey/docs/Healthcare_CIO_final_report.pdf.
2. Goldsmith J, Blumenthal D, Rishel W. Federal health informa-
tion policy: a case of arrested development. Health Aff. 2003;
22:44–55.
3. Kohn LT, Corrigan JM, Donaldson MS (eds). To err is human:
building a safer health system. Washington, DC: National
Academy Press, 2000.
4. Metzger J, Turisco F. Computerized physician order entry: a look
at the vendor marketplace and getting started. [cited 2004
October 27]. Available from: http://www.leapfroggroup.org/
media/file/Leapfrog-CPO_Guide.pdf.
5. McDonald CJ, Overhage JM, Mamlin BW, Dexter PD, Tierney
WM. Physicians, information technology, and health care sys-
tems: a journey, not a destination. J Am Med Inform Assoc.
2004;11:121–4.
6. Rosenbloom ST, Grande J, Geissbuhler A, Miller RA. Experience
in implementing inpatient clinical note capture via a provider or-
der entry system. J Am Med Inform Assoc. 2004;11:310–5.
7. Jao C, Hier DB, Wei W. Simulating a problem list decision sup-
port system: can CPOE help maintain the problem list? Medinfo.
2004;2004(CD):1666.
8. Overhage JM, Perkins S, Tierney WM, McDonald CJ. Controlled
trial of direct physician order entry: effects on physicians’ time
utilization in ambulatory primary care internal medicine prac-
tices. J Am Med Inform Assoc. 2001;8:361–71.
9. Rodriguez NJ, Murillo V, Borges JA, Ortiz J, Sands DZ. A usabil-
ity study of physicians interaction with a paper-based patient rec-
ord system and a graphical-based electronic patient record
system. Proc AMIA Symp. 2002;667–71.
10. Pizziferri L, Kittler A, Volk L, Honour M, Gupta S, Wang S, et al.
Primary care physician time utilization before and after

implementation of an electronic health record: a time-motion
study. J Biomed Inform. In press 2005. Epub 2005 Jan 31.
11. Massaro T. Introducing physician order entry at a major aca-
demic medical center: I. Impact on organizational culture and
behavior. Acad Med. 1993;68:20–5.
12. Shu K, Boyle D, Spurr C, Horsky J, Heiman H, O’Connor P, et al.
Comparison of time spent writing orders on paper with comput-
erized physician order entry. Medinfo. 2001;10:1207–11.
13. Bates DW, Boyle DL, Teich JM. Impact of computerized physi-
cian order entry on physician time. Proc Annu Symp Comput
Appl Med Care. 1994;996.
14. Tierney WM, Miller ME, Overhage JM, McDonald CJ. Physician
inpatient order writing on microcomputer workstations. Effects
on resource utilization. JAMA. 1993;269:379–83.
15. Ahmad A, Teater P, Bentley TD, Kuehn L, Kumar RR, Thomas A,
et al. Key attributes of a successful physician order entry system
implementation in a multi-hospital environment. J Am Med
Inform Assoc. 2002;9:16–24.
16. Ash JS, Stavri PZ, Kuperman GJ. A consensus statement on con-
siderations for a successful CPOE implementation. J Am Med
Inform Assoc. 2003;10:229–34.
17. Lovis C, Chapko MK, Martin DP, Payne TH, Baud RH, Hoey PJ,
et al. Evaluation of a command-line parser-based order entry
pathway for the Department of Veterans Affairs electronic pa-
tient record. J Am Med Inform Assoc. 2001;8:486–98.
18. Payne TH, Hoey PJ, Nichol P, Lovis C. Preparation and use of
preconstructed orders, order sets, and order menus in a comput-
erized provider order entry system. J Am Med Inform Assoc.
2003;10:322–9.
19. Ash JS, Gorman PN, Lavelle M, Payne TH, Massaro TA, Frantz
GL, et al. A cross-site qualitative study of physician order entry.
J Am Med Inform Assoc. 2003;10:188–200.
20. Thomas SM, Davis DC. The characteristics of personal order sets
in a computerized physician order entry system at a community
hospital. Proc AMIA Symp. 2003;1031.
21. Bates DW, Kuperman GJ, Wang S, Gandhi T, Kittler A, Volk L,
et al. Ten commandments for effective clinical decision support:
making the practice of evidence-based medicine a reality. J Am
Med Inform Assoc. 2003;10:523–30.
22. Weiner M, Gress T, Thiemann DR, Jenckes M, Reel SL, Mandell
SF, et al. Contrasting views of physicians and nurses about an
inpatient computer-based provider order-entry system. J Am
Med Inform Assoc. 1999;6:234–44.
23. Campbell JR, Carpenter P, Sneiderman C, Cohn S, Chute CG, War-
ren J. Phase II evaluation of clinical coding schemes: completeness,
taxonomy, mapping, definitions, and clarity. CPRI Work Group on
Codes and Structures. J Am Med Inform Assoc. 1997;4:238–51.
24. Chute CG, Cohn SP, Campbell KE, Oliver DE, Campbell JR.
The content coverage of clinical classifications. For The Com-
puter-Based Patient Record Institute’s Work Group on Codes &
Structures. J Am Med Inform Assoc. 1996;3:224–33.
25. Wagner M, Bankowitz R, McNeil M, Challinor S, Jankosky J,
Miller R. The diagnostic importance of the history and physical
examination as determined by the use of a medical decision sup-
port system. Symp Comput Appl Med Care. 1989;139–44.
26. Humphreys BL, Lindberg DA, Schoolman HM, Barnett GO. The
Unified Medical Language System: an informatics research col-
laboration. J Am Med Inform Assoc. 1998;5:1–11.
27. UMLS Knowledge Sources. January Release 2003AA Documenta-
tion. 14th ed. Bethesda (MD): National Library of Medicine; 2003.
28. Hersh WR. Information retrieval: a health and biomedical per-
spective. 2nd ed. New York: Springer-Verlag; 2003.
29. Elkin PL, Mohr DN, Tuttle MS, Cole WG, Atkin GE, Keck K,
et al. Standardized problem list generation, utilizing the
Mayo canonical vocabulary embedded within the Unified
Medical Language System. Proc AMIA Annu Fall Symp.
1997;500–4.
30. Chute CG, Elkin PL, Fenton SH, Atkin GE. A clinical terminol-
ogy in the post modern era: pragmatic problem list develop-
ment. Proc AMIA Symp. 1998;795–9.
31. Brown SH, Miller RA, Camp HN, Guise DA, Walker HK. Empir-
ical derivation of an electronic clinically useful problem state-
ment system. Ann Intern Med. 1999;131:117–26.
32. Hales JW, Schoeffler KM, Kessler DP. Extracting medical knowl-
edge for a coded problem list vocabulary from the UMLS
Knowledge Sources. Proc AMIA Symp. 1998;275–9.
33. UMLS Metathesaurus Fact Sheet. [cited 2005 January 5]. Avail-
able from: http://www.nlm.nih.gov/research/umls/sources_
by_categories.html.
34. Wasserman H, Wang J. An applied evaluation of SNOMED CT
as a clinical vocabulary for the computerized diagnosis and
problem list. AMIA Annu Symp Proc. 2003;699–703.
35. Consolidated Health Informatics Standards Adoption Recom-
mendation: Diagnosis and Problem List. [cited 2005 January
5]. Available from: http://www.whitehouse.gov/omb/egov/
downloads/dxandprob_full_public.doc.
36. Campbell JR, Payne TH. A comparison of four schemes for cod-
ification of problem lists. Proc Annu Symp Comput Appl Med
Care. 1994;201–5.
37. Johnson KB, George EB. The rubber meets the road: integrating
the Unified Medical Language System Knowledge Source Server
into the computer-based patient record. Proc AMIA Annu Fall
Symp. 1997;17–21.
38. Elkin PL, Tuttle M, Keck K, Campbell K, Atkin G, Chute CG. The
role of compositionality in standardized problem list generation.
Medinfo. 1998;9:660–4.
39. Goldberg H, Goldsmith D, Law V, Keck K, Tuttle M, Safran C.
An evaluation of UMLS as a controlled terminology for the
Problem List Toolkit. Medinfo. 1998;9:609–12.
40. Horsky J, Kaufman DR, Oppenheim MI, Patel VL. A framework
for analyzing the cognitive complexity of computer-assisted clin-
ical ordering. J Biomed Inform. 2003;36:4–22.
41. Weed LL. Medical records that guide and teach. N Engl J Med.
1968;12:593–600, 652–7.
42. Safran C. Searching for answers on a clinical information-
system. Methods Inf Med. 1995;34:79–84.
43. Rothschild AS, Lehmann HP. Problem-driven order entry: an in-
troduction. Medinfo. 2004;2004(CD):1838.
44. Poon AD, Fagan LM, Shortliffe EH. The PEN-Ivory project: ex-
ploring user-interface design for the selection of items from large
controlled vocabularies of medicine. J Am Med Inform Assoc.
1996;3:168–83.
45. Aronson AR. Effective mapping of biomedical text to the UMLS
Metathesaurus: the MetaMap program. Proc AMIA Symp.
2001;17–21.
46. van Rijsbergen CJ. Information retrieval. London: Butterworths;
1979.
47. Voorhees E. Overview of TREC 2003. [cited 2005 January 7].
Available from: http://trec.nist.gov/pubs/trec12/papers/
OVERVIEW.12.pdf.

PDOE

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to PDOE

Similar to PDOE (20)

PDOE