1. Journal of Experimental Psychology: General Copyright 1998 by the American Psychological Association, Inc.
1998. IfcL 127. No. 4, 398-415 0096-3445/9№ 00
Does Consistent Scene Context Facilitate Object Perception?
Andrew Hollingworth and John M. Henderson
Michigan State University
The conclusion that scene knowledge interacts with object perception depends on evidence
that object detection is facilitated by consistent scene context. Experiment 1 replicated the
I. Biederman, R. J. Mezzanotte, and J. C. Rabinowitz (1982) object-detection paradigm.
Detection performance was higher for semantically consistent versus inconsistent objects.
However, when the paradigm was modified to control for response bias (Experiments 2 and 3)
or when response bias was eliminated by means of a forced-choice procedure (Experiment 4),
no such advantage obtained. When an additional source of biasing information was eliminated
by presenting the object label after the scene (Experiments 3 and 4), there was either no effect
of consistency (Experiment 4) or an inconsistent object advantage (Experiment 3). These
results suggest that object perception is not facilitated by consistent scene context.
To what degree is perception affected by our knowledge directly addresses the influence of our knowledge and
of the world? This question has historically been central in beliefs about meaningful relationships in the world on our
theories of perception and cognition. For example, the perception of the visual environment.
apparent role of semantic constraint in visual recognition led For the purposes of this study, we define object identifica-
to the emergence of so-called New Look psychology tion from the perspective of current computational theories
(Bruner, 1957,1973). In cognitive psychology, the effects of (Biederman, 1987; Bulthoff, Edelman, & Tarr, 1995; Marr,
contextual expectations on perception have been couched in 1982; Marr & Nishihara, 1978; Ullman, 1996). At a general
terms of debates between bottom-up versus top-down pat- level of description, these theories assume two processing
tern recognition (Neisser, 1967) and modular versus interac- stages in object identification. First, the retinal image is
tive perception (e.g., Fodor, 1983; Pylyshyn, 1980; Rumel- transformed into a perceptual description that is compatible
hart, McClelland, & the PDF Research Group, 1986). More with a set of memory descriptions. This first stage can be
recently, the effect of higher-level knowledge on perception further broken down into two sub-stages, an early stage of
has become critical in discussions of the role of re-entrant visual analysis that translates the current pattern of retinal
neural pathways in the early cortical processing of visual stimulation into perceptual primitives, and an additional
stimulation (Barlow, 1994; Churchland, Ramachandran, & stage that uses these primitives to produce descriptions of
Sejnowski, 1994; Kosslyn, 1994; Mumford, 1994). In the
the object tokens in the scene. Second, object descriptions
present study, we explored the following version of the
are matched to stored long-term memory descriptions of
context question: How is the identification of a visual object
object types, leading to entry-level recognition. When a
affected by the meaning of the real-world scene in which that
match is found, identification has occurred, and information
object appears? This question is important because it
stored hi memory about that object type, such as its identity,
whether it is good to eat, and so on becomes available. In the
Andrew Hollingworth submitted this work as part of the following experiments, we will consider the activation of an
requirements for his master of arts degree in psychology at entry-level label for an object stimulus as evidence of the
Michigan State University. This research was supported by a successful completion of the matching stage of object
National Science Foundation Graduate Fellowship to Andrew identification.
Hollingworth, by U.S. Army Research Office Grant DAAH04-94- The hypothesis we set out to test was that the identifica-
G-0404, and by National Science Foundation Grant SBR 96-
tion of a real-world object is facilitated when that object is
17274. We are solely responsible for the contents of this article,
semantically consistent rather than inconsistent with the
which should not be construed as an official U.S. Department of the
Army position, policy, or decision. scene in which it appears (Biederman, 1981; Biederman,
We would like to thank the master's committee of Tom Carr, Mezzanotte, & Rabinowitz, 1982; Boyce & Pollatsek, 1992;
Fernanda Ferreira, and Rose Zacks for their helpful discussions of Boyce, Pollatsek, & Rayner, 1989; Friedman, 1979; Koss-
the research and for their comments on a draft of this article. We lyn, 1994; Metzger & Antes, 1983; Palmer, 1975a; Ullman,
would also like to thank Peter De Graef for his discussions of the 1996; see Henderson & Hollingworth, in press, for a
research, Sandy Pollatsek and Johan Lauwereyns for their com- review). The strongest evidence supporting the view that
ments on a draft of the article, and Gary Schrock for his technical consistent scene context facilitates object identification
assistance.
comes from the object-detection paradigm introduced by
Correspondence concerning this article should be addressed to
Biederman and colleagues (Biederman, 1981; Biederman et
Andrew Hollingworth or John M. Henderson, Department of
Psychology, 129 Psychology Research Building, Michigan State al., 1982). In this paradigm, participants were asked to
University, East Lansing, Michigan 48824—1117. Electronic mail determine whether & pre-specified object appeared within a
may be sent to andrew@eyelab.msu.edu orjohn@eyelab.msu.edu. briefly presented scene at a particular location. During each
398
2. OBJECT PERCEPTION IN SCENES 399
trial, a label naming an object was presented until the on the finding that detection sensitivity (d1) was higher hi
participant was ready to continue, followed by a line- base conditions than in violation conditions. This sensitivity
drawing of a natural scene presented for 150 ms, followed result has been replicated by Boyce et al. (1989) using a
by a pattern mask with an embedded location cue. The similar object-detection paradigm.
pattern mask remained on the screen until the participant The results of the Biederman et al. (1982) study have led
pressed one of two buttons to indicate whether the object to two general conclusions about the perception of objects in
named by the label had or had not appeared in the scene at natural scenes. First, scene meaning, including information
the cued location.1 In target-present trials, the target label about the semantic relationship between scene and object
named the cued object. In catch trials, the target label named types, can be accessed very quickly. Such early activation is
an object that did not appear in the scene. We will refer to a necessary condition for context effects, as contextual
this paradigm as the original object-detection paradigm. constraints must be active early enough to influence the
The primary contextual manipulation in the original perception of object stimuli. This conclusion is supported by
object-detection paradigm was the relationship between the a number of studies demonstrating that the information
object presented at the cued location (the cued object) and necessary to identify natural scenes can be obtained in less
the scene in which that object appeared. Base scenes than 150 ms (Antes, Penland, & Metzger, 1981; Biederman,
1972; Biederman, Glass, & Stacy, 1973; Loftus, Nelson, &
contained a cued object that was consistent with the scene,
ICallman, 1983; Potter, 1976; Schyns & Oliva, 1994).
and the scene contained no other objects that violated scene
Second, stored knowledge about scenes and the objects
context. Violation scenes contained a cued object that was
likely to appear in them can be used to facilitate the
inconsistent with the scene along one or more dimensions,
construction of perceptual descriptions of consistent objects.
including episodic probability, position, size, support, and
Taken together, these two hypotheses combine to form the
interposition (whether the object occluded objects behind it
perceptual schema model of scene context effects (Bieder-
or was transparent). For the purposes of this article, we will
man, 1981; Biederman et al., 1982; Palmer, 1975b; see
focus on cases of probability violation (i.e., the semantic
Henderson, 1992, for further discussion). The perceptual
consistency between object and scene). The most conserva-
schema model proposes that the stored representation of a
tive hypothesis regarding the information contained in a
scene type contains information about the objects that form
scene concept is that it specifies the object types typically
that type. The early activation of this information can be
found in the scene (Mandler & Johnson, 1976; Mandler & used to facilitate the perceptual analysis (e.g., the encoding
Parker, 1976). Therefore, manipulations of object probabil- of features or generation of a perceptual description) of
ity provide the most direct means to investigate the influence objects that are consistent with the semantic constraints
of scene meaning on object perception. imposed by the scene.
Biederman et al. (1982) found that detection performance In addition to the perceptual schema model, two other
was best when the cued object did not violate any of the models of scene and object processing can account for the
constraints imposed by scene meaning. They reported poorer facilitated detection of consistent objects in scenes. First, a
performance across all violation dimensions, with com- priming model of scene context effects (Friedman, 1979;
pound violations (e.g., probability and support) producing Kosslyn, 1994; Palmer, 1975a) places the locus of contex-
even greater decrements. Importantly, violations of semantic tual influence at the matching stage of object identification,
relationships were found to be as disruptive as violations of when the perceptual description of an object token is
structural relationships (e.g., support or interposition), sug- compared to long-term memory representations of object
gesting that semantic relationships can be accessed very types. According to the priming model, the recognition of a
rapidly and can then interact with the initial perceptual scene serves to prime the stored representations of object
analysis of an object. The poorer performance in violation types consistent with that scene (i.e., the activation levels of
conditions held across percentage correct performance, stored, consistent object representations are raised closer to
sensitivity (d1), and reaction time. These measures, however, a threshold value). As a result, relatively less perceptual
are not equally valid for investigating the influence of scene information needs to be encoded to bring a stored, consistent
context on object perception. Reaction time in that study
cannot be taken as strong evidence for the facilitated
detection of consistent objects, as there were a significant 1
The following terminology has been used. The object named by
number of errors, causing more than 30% of the trials to be the label appearing before the scene has been referred to as the
excluded from the analysis. In general, it is not clear that target object, the label itself as the target label, and the object
reaction time is a good measure of object perception presented at the cued location as the cued object.
2
processes, as it may be influenced by post-identification Boyce and PoIIatsek (1992) used a paradigm in which, after a
factors, such as response generation (Henderson, 1992).2 In scene had appeared on the screen, a single object wiggled, and
participants were required to make an eye movement to the object
addition, percentage correct performance in target-present
and name it as quickly as possible. They found shorter naming
trials does not provide a reliable measure of object-detection
latencies for consistent versus inconsistent objects, which they
performance, as participants demonstrated significant re- interpreted as support for the view that consistent scene context
sponse biases that varied with the consistency of the target facilitates object perception. As with the Biederman et al. (1982)
label with the scene. Thus, the conclusion that consistent experiment, however, it is not clear that naming latency is an
scene context facilitates object perception rests most firmly appropriate measure of ease of object identification.
3. 400 HOLLINGWORTH AND HENDERSON
object representation to the threshold value indicating that a Table 1
match has been found.3 Summary of the Target-Present and Catch Trial Design in
Second, facilitated detection of consistent objects in Biederman et al. (1982, Experiment 1) and in Experiments
scenes can be accounted for by an interactive activation 1—3 for a Sample Trial Presenting a Farmyard Scene
model similar to that proposed by McClelland and Rumel-
Semantic consistency manipulation
hart (1981) for word and letter recognition (Boyce &
Pollatsek, 1992; Boyce etal., 1989; Metzger& Antes, 1983). Inconsistent
Consistent (probability
In this model, scenes correspond to the word level in the violation)
Trial (base)
network and objects to the letter level. These two levels
mutually constrain each other, facilitating the perception of Biederman et al. (1982)
objects consistent with scene meaning and inhibiting the Target-present
perception of inconsistent objects. In addition, partial activa- Cued object chicken mixer
tion at the object level could act to constrain the encoding of Target label "chicken" "mixer"
Catch
perceptual features consistent with that object type, though
Cued object Pig mixer
no interactive activation model of scene context effects has,
Target label 70% consistent 70% consistent
as yet, specifically included that level of interaction. ("horse"), 30% ("horse"), 30%
Each of these models of the influence of scene knowledge inconsistent inconsistent
on object perception predicts that perception of an object ("television") ("television")
should be facilitated when that object is consistent with the Experiment 1
scene in which it appears. We will therefore refer to them as
Target-present
contextual facilitation models of object perception in scenes.
Cued object chicken mixer
Target label "chicken" "mixer"
Concerns With the Object-Detection Paradigm Catch
Cued object chicken mixer
Support for contextual facilitation models rests almost Target label 50% consistent 50% consistent
("horse"), 50% ("horse"), 50%
entirely on results from object-detection experiments that
inconsistent inconsistent
show detection benefits for consistent objects versus incon- ("television") ("television")
sistent objects under brief presentation conditions (Bieder-
Experiments 2 and 3
man et al., 1982; Boyce et al., 1989). However, a number of
general concerns have been raised regarding the original Target-present
object-detection paradigm (De Graef, Christiaens, & Cued object chicken mixer
d'Ydewalle, 1990; De Graef & d'Ydewalle, 1995; Hender- Target label "chicken" "mixer"
Catch
son, 1992). These concerns revolve around two central
Cued object mixer chicken
issues. First, the original object-detection paradigm may not "mixer"
Target label "chicken"
have adequately controlled participant response bias. Sec-
ond, the presentation of the target label prior to the scene and
the location cue following the scene may have provided
additional sources of information that influenced detection
should lead to higher false-alarm rates in consistent target
performance.
label catch trials and lower false-alarm rates in inconsistent
target label catch trials. Biederman et al. found precisely this
Catch Trial Design and the Calculation of Sensitivity effect: False-alarm rates were higher when the target label
In the original object-detection paradigm (Biederman et
al., 1982), detection sensitivity (d') was calculated using the 3
Friedman (1979) recorded eye movements while participants
percentage correct rate in target-present trials (the hit rate) viewed scenes in preparation for a difficult memory test. The
and the error rate in catch trials (the false-alarm rate). Table duration of the first fixation (i.e., the total duration the eyes were
1 summarizes the design of the target-present and catch trials fixated on the object the first time it was entered, now referred to as
for Biederman et al. (1982, Experiment 1). One concern with first pass gaze duration) was shorter for consistent versus inconsis-
this design is that the method of calculating detection tent objects, a result that Friedman (1979) interpreted as support for
sensitivity did not adequately control for participants' bias to a priming model of scene context effects. The difference in fixation
respond "yes" more often when the catch trial label was duration was more than 300 ms, however, which is unlikely to be
semantically consistent versus semantically inconsistent explained by perceptual factors alone (Biederman et al., 1982;
Henderson, 1992). For example, participants may have looked
with the subsequent scene. If participants attempt to detect
longer at a semantically inconsistent object to integrate the already
an absent horse in a farmyard, for example, they should be
identified object into a conceptual representation in which it was
biased to respond "yes," as a horse is generally likely to incongruous. In addition, the instructions to prepare for a difficult
appear there. If participants attempt to detect an absent memory test may have caused participants to consciously create an
television in the farmyard, however, they should show less association between the inconsistent object and the scene, leading
bias to respond "yes," as there is contextual information to longer fixation durations (Biederman et al., 1982; Henderson,
arguing against a television's presence. This pattern of bias 1992).
4. OBJECT PERCEPTION IN SCENES 401
was semantically consistent versus inconsistent with the 1989) have not met this criterion (see De Graef & d'Ydewalle,
subsequent scene. 1995), and thus the sensitivity measures in those experi-
To eliminate this sort of bias from measures of object- ments must be interpreted with caution.
detection performance, detection sensitivity in the base
condition should be calculated using the hit rate in trials Target Label Preview
when the target label was consistent with the scene and the
false-alarm rate in trials when the target label was also The second concern with the original object-detection
consistent with the scene. Similarly, detection sensitivity for paradigm involves the presentation of the target label before
target objects in the probability violation condition should scene viewing. There are two potential problems with such a
be calculated using the hit rate in trials when the target label design. First, participants may have used the identity of the
was inconsistent with the scene and the false-alarm rate in target object to guide their search in the subsequent scene
trials when the target label was also inconsistent with the (Henderson, 1992). This strategy could have lead to a
scene. The Biederman et al. (1982) study, however, did not consistent object-detection advantage if the spatial positions
calculate sensitivity in this manner: The false-alarm rates in of consistent objects were more predictable than the posi-
both base and violation conditions averaged across catch tions of inconsistent objects. For example, participants may
trials on which the target label was consistent and inconsis- have known where to find a horse in a farmyard but would
tent with the subsequent scene (see Table 1). Thus, differ- not necessarily have known where to find a television in a
ences in response bias as a function of target label consis- farmyard. Supporting this intuition, we have recently demon-
tency were not necessarily controlled in the d' measure. strated that viewers can more quickly locate semantically
It is important to note that this method of computing d' consistent versus inconsistent objects in a free-viewing,
may have overestimated the sensitivity rate in base condi- visual search task (Henderson, Weeks, & Hollingworth, in
tions and underestimated the sensitivity rate in violation press). The information provided by the target label preview
conditions. As discussed previously, the false-alarm rate was may have been particularly helpful in the Boyce et al. (1989)
experiments. The scenes used in that study contained only
higher when the catch trial label was consistent versus
five discrete objects presented against a very simple back-
inconsistent with the scene. By averaging across these two
ground. Thus, participants had to search through only a
catch trial conditions for the purpose of calculating d',
small number of objects during scene presentation. If the
sensitivity in the base condition may have been artificially
positions of consistent objects were more predictable than
raised because the averaged false-alarm rate was lower than
the positions of inconsistent objects, an advantage for the
the false-alarm rate for consistent target label catch trials
detection of consistent objects would be expected.
alone. Similarly, sensitivity in the probability violation
Second, the preview of the target label (e.g., "mixer")
condition may have been artificially lowered because the
may have led participants to expect the subsequent presenta-
averaged false-alarm rate was higher than the false-alarm
tion of a certain scene type (in this case, a kitchen). The
rate for inconsistent target label catch trials alone. Thus, the
Biederman et al. (1982) method of calculating d' very likely generation of such expectations would have been particu-
larly likely in the Biederman et al. (1982) study, because
produced an exaggerated sensitivity advantage for the
detection of consistent objects.4 70% of the target labels were consistent with the subsequent
scene. When the target label specified an object inconsistent
A second concern with the design of the original object-
detection paradigm is that participants did not attempt to with the subsequent scene, however, the discontinuity
between the expected and presented scene may have inter-
detect the same object in the catch trials as in the correspond-
ing target-present trials. Thus, detection sensitivity for a fered with perceptual processing, to the detriment of incon-
sistent object detection.
particular object was based on the correct detection of that
object in the scene and the false detection of an entirely
different object that was not in the scene. Signal detection Location Cue
theory, however, requires that sensitivity measures be calcu-
The final concern regarding the original object-detection
lated using the correct detection of a particular signal when it
paradigm is that the location cue may have provided
is present and the false detection of the same signal when it
information useful to post-perceptual guessing. The position
is not present (Green & Swets, 1966). For example, suppose
of the cue relative to the scene could have provided evidence
that the consistent cued object appearing in a kitchen scene
concerning the types of objects likely to be found at that
were a stove, and the consistent target label in the catch trial
position (Henderson, 1992). A cue marking a position high
were "bread box." The hit rate would reflect the correct
in a living room scene, for example, would constrain the
detection of a stove. The false-alarm rate, however, would
objects that could have appeared there to clocks, pictures,
reflect the false detection of a bread box. Because a bread
curtains, etc. Such information would be useful when
box is less likely to appear in a kitchen scene than a stove,
attempting to detect a consistent object but not as useful
the false-alarm rate would be artificially low, and the
resulting sensitivity estimate would be artificially high. This
example demonstrates the importance of requiring partici- 4
It is important to note that the Boyce et al. (1989) replication of
pants to detect the same object on corresponding target- this paradigm is not subject to this criticism, as the consistency of
present and catch trials. The object-detection experiments the catch trial labels was controlled. However, the Boyce et al.
conducted to date (Biederman et al., 1982; Boyce et al., study is subject to the remaining criticisms.
5. 402 HOLLINGWORTH AND HENDERSON
when attempting to detect an inconsistent object, be-
cause the spatial position of an inconsistent object is less
predictable.
response
The Present Study
Results from object-detection experiments are the pri-
mary support for the widely held view that consistent scene
context facilitates object identification. It is therefore impor-
tant to assess the previous concerns experimentally. To do
this, we conducted four experiments. Experiment 1 at-
tempted to replicate the original Biederman et al. (1982)
paradigm. Experiment 2 tested whether the consistent object
advantage in the original object-detection paradigm was due
to the inadequate control of response bias. The original
paradigm was modified so that participants attempted to
detect the same object on corresponding target-present and
catch trials. Experiment 3 investigated the influence of
presenting the target label before the scene by manipulating
whether the label appeared before or after the scene. In
addition, the location cue was eliminated from the third
experiment to investigate whether its presence may have
affected performance. Experiment 4 employed a forced-
choice procedure to investigate the influence of scene
context on object perception independently of response bias.
Figure I. Schematic illustration of a trial in Experiment 1.
In all four experiments, contextual facilitation models
predict better detection of objects that are consistent with a
scene versus objects that are inconsistent, because they cued object. In the catch trials, the target label named an
propose that stored scene knowledge of the types of objects object that did not appear in the scene. As in the original
likely to be found in a scene facilitates the identification of object-detection paradigm, there was no relationship be-
those objects. tween the semantic consistency of the target label on
5
We made a number of modifications to the original Biederman
Experiment 1 et al. (1982, Experiment 1) paradigm. First, we limited the cued
object consistency manipulations to base (semantically consistent)
The purpose of Experiment 1 was to replicate the original and probability violation (semantically inconsistent). Second, we
Biederman et al. (1982) paradigm. We felt that replication did not include a bystander condition. Third, we used a paired-
was necessary as a baseline against which to compare the scene design to control for scene-specific factors such as cued
results from subsequent experiments in which we modified object distance from fixation and lateral masking. Fourth, we
the original paradigm. Figure 1 illustrates the main aspects presented the target label for 1500 ms rather than for a participant-
of the paradigm. The basic design was the same as that of determined duration. Fifth, we used a scene presentation duration
Biederman et al. (1982, Experiment I).5 A target label was of 200 ms rather than 150 ms. The 200 ms presentation duration
presented for 1500 ms, followed by a line drawing of a real- prevents eye movements during scene presentation, limiting view-
ing to a single glance of the scene, but reduces the possibility of
world scene for 200 ms, followed by a pattern mask with an floor effects. Sixth, in the Biederman et al. experiment, 30% of the
embedded location cue. The participant's task was to catch trial target labels were inconsistent with the subsequent
determine whether the object named by the target label had scene, because 30% of the labels in the target-present trials were
or had not appeared in the scene at the cued location. inconsistent. In Experiment 1, 50% of the catch trial target labels
The key manipulation in Experiment 1 was the semantic were inconsistent with the subsequent scene, because 50% of the
consistency between the cued object and me scene in which labels in the target-present trials were inconsistent. Finally, the
it appeared. Semantically consistent cued objects were likely probability violation catch trials in the Biederman et al. paradigm
to appear in the scene (e.g., a chicken in a farmyard); did not have the same design as the catch trials in the base
semantically inconsistent cued objects were unlikely to condition. For each scene in the probability violation condition, the
appear in the scene (e.g., a mixer in a farmyard). In half of same object was cued in the catch trials as in the target-present
trials. For each scene in the base condition, however, a different
the trials, the target label named a semantically consistent object was cued in the catch trials than in the target-present trials
object, and in the other half the target label named a (see Table 1). Thus, we chose to make the two consistency
semantically inconsistent object. Half of the trials presented conditions in Experiment 1 equivalent by always cueing the same
a scene that contained the target object (target-present trials), object in target-present and catch trials for each scene in each
and half of the trials presented a scene that did not (catch consistency condition. This cueing manipulation is the same as that
trials). In the target-present trials, the target label named the employed by Boyce et al. (1989).
6. OBJECT PERCEPTION IN SCENES 403
corresponding target-present and catch trials: For each cued
object consistency condition, half of the catch trials pre-
sented a consistent target label and half an inconsistent target
label. Note that, as in the original object-detection paradigm,
the catch trial semantic consistency manipulation is denned
by the cued object appearing in the scene and not by the
consistency of the object the participant is attempting to
detect. Table 1 summarizes the design of the target-present
and catch trials for Experiment 1.
Method
Participants. Twenty-four Michigan State University under-
graduate students participated in the experiment for course credit.
All participants had normal or corrected-to-normal vision. The
participants were naive with respect to the hypotheses under
investigation.
Stimuli. Twenty scenes and 20 cued objects were used as
stimuli. The stimuli were generated from photographs of natural
scenes. Fourteen scenes were generated from those used by van
Diepen and De Graef (1994), and the other 6 scenes were generated
from photographs taken in the East Lansing, Michigan, area. In
both cases, the main contours of the scenes were traced using
commercial software to create gray-scale line drawings. The
images generated from the two sources were not distinguishable.
Semantically consistent cued objects for each scene were also
created by digitally tracing scanned images. These objects were
created separately from the scenes. The 20 scenes were paired, and
the semantically inconsistent conditions were created by swapping
objects across scenes. For example, a mixer was the semantically
consistent cued object in a kitchen scene, and a live chicken was the
consistent cued object in a farmyard scene. These objects were
swapped across scenes so that the mixer was the semantically
Figure 2. An example of the type of scene used and the cued
inconsistent object in the farmyard scene, and the chicken was the
object semantic consistency manipulation. The top scene contains a
inconsistent object in the kitchen scene. Figure 2 shows an example semantically consistent cued object (chicken), and the bottom
of a stimulus scene and the semantic consistency manipulation.
scene contains a semantically inconsistent cued object (mixer).
Because each scene was to be presented a total of eight times
This farmyard scene was paired with a kitchen scene in which the
during the experiment, two cued object positions were chosen mixer was consistent and the chicken inconsistent.
within each scene to minimize participants' ability to predict the
object's location. Each position was chosen as a place within the
scene where the consistent cued object might reasonably appear.
Both the consistent and inconsistent cued objects appeared in the The pattern mask presented after the scene consisted of overlap-
same two positions. In a number of scenes, the two object positions ping line segments, curves, and angles, and was slightly larger than
required using two different sizes of each object (e.g., a fire hydrant the scene stimuli. The scenes were completely obliterated when
placed in two positions on a receding sidewalk). In such a case, the presented simultaneously with the pattern mask. The location cue
paired scene also employed two different sizes of each object. The appearing within the pattern mask was a thick circle containing a
percentage change in the size of each object was equated and was dot, subtending 1.4 degrees.
the same in both of the paired scenes. As a consequence of this Apparatus. The stimuli were displayed on a flat-screen SVGA
paired-scene design, each scene served as a control for its partner, monitor with a 100-Hz refresh rate. Responses were collected with
reducing the influence of such factors as object size, eccentricity, a button box connected to a dedicated input-output (I-O) board.
and lateral masking. Depression of a button stopped a millisecond clock on the I-O
All scene and object manipulations were conducted using board. The display and I-O systems were interfaced with a
commercially available software. The scenes subtended a visual 486-based microcomputer that controlled the experiment.
angle of 23 degrees (width) by 15 degrees (height) at a viewing Procedure. Participants were tested individually. The experi-
distance of 64 cm. Cued objects subtended about 2.75 degrees on menter first explained that the task would be to determine whether
average (range = 1.25 to 4.92 degrees). All images were displayed the object named by a label was present at a marked location in a
as gray-scale contours on a white background at a resolution of briefly displayed scene. The participant was then seated in front of
800 X 600 pixels X 16 levels of gray. Gray-scale was used for a computer monitor, with one hand resting on a button labeled
anti-aliasing so that the contours appeared smooth and sharp. "yes" and the other on a button labeled "no." Viewing distance
Target labels were created using lower-case, 24-point, anti- was maintained by a forehead rest.
aliased Arial font. Labels for target-present trials named the cued During each experimental trial, participants saw a fixation cross
object. For catch trials, one consistent and one inconsistent label and a prompt instructing them to press a pacing button to begin the
were chosen for each scene. Each of these labels named an object trial. Once the participant pressed the button, the fixation cross
that never appeared in the scene. remained on the screen for an additional 500 ms, followed by a
7. 404 HOLLINGWORTH AND HENDERSON
blank screen containing a target label for 1500 ms, followed by the target object presence, F(l, 23) = 19.42, MSE = .0149,p <
presentation of the scene for 200 ms, followed by a pattern mask .001. Simple effects tests indicated that the hit rate for scenes
containing an embedded location cue. There was no delay (i.e., the containing a consistent cued object (76.6%) was higher than
inter-stimulus interval was zero) between each display. The pattern
that for scenes containing an inconsistent cued object
mask remained in view until the participant pressed the left (yes)
(59.2%), F(l, 23) = 31.71, MSE = .0229, p < .001. This
button to indicate that the object named by the target label had
pattern was not present in the catch trials; the correct
appeared in the scene at the cued location or the right (no) button to
indicate that the target object had not appeared at the cued location. rejection rate was the same when the cued object was
After the response, there was a 4-s delay while the stimuli for the consistent (80.4%) versus inconsistent (78.5%), F(l, 23) =
next trial were loaded into video memory, and then the prompt for 1.38, MSE = .0061,p > .25. To summarize, although the hit
the next trial appeared. rate was higher for consistent versus inconsistent cued
Participants took part in a practice block of 16 trials (2 cued objects, the false-alarm rates were not different, 19.6% and
object consistency conditions X 2 target object presence conditions 21.5% respectively.
x 2 cued object positions x 2 scenes). The two scenes used in the A' analysis. Mean A' for each cued object consistency
practice block were not used in the experimental trials. After the condition is shown in Table 2. Detection sensitivity was
practice trials, the experimenter answered any questions the
reliably higher in scenes containing a consistent cued object
participant had about the procedure, and the participant proceeded
(.861) than in scenes containing an inconsistent cued object
to the experimental trials.
(.775), F(l, 23) = 20.50, MSE = .0043,p < .001.
Each participant saw 160 experimental trials that were produced
by a within-participant factorial combination of 2 cued object
consistency conditions x 2 target object presence conditions X 2 Discussion
cued object positions X 20 scenes. The position of the cued object
and the consistency of the catch trial label were counterbalanced Experiment 1 replicated the principal result of the original
between a pair of scenes and completely counterbalanced as a object-detection paradigm. We found that object-detection
between-participants factor. Because the cued object position performance was higher for scenes containing a consistent
factor was not of theoretical interest, the two levels of that factor cued object than for scenes containing an inconsistent cued
were combined in the statistical analyses. Each participant saw all object. This experiment demonstrates that our scenes and
160 trials in a different random order. The entire session lasted semantic consistency manipulation are sufficient to replicate
approximately 45 min. the basic consistent object contextual facilitation effect. The
results of this experiment provide a baseline that can be used
to assess the results of Experiments 2-4, in which the
Results original paradigm will be modified to address the concerns
specified in the Introduction.
In the following analyses, two measures were used to
In Biederman et al. (1982), false-alarm rates were higher
assess object-detection performance. First, we report percent-
when the catch trial label was consistent versus inconsistent
age correct hits for the target-present trials and percentage
with the scene. To investigate whether such biases were
correct rejections for the catch trials. Second, because
present in the current experiment, we conducted a full
reliable response biases were present, we report A', a
analysis of the Experiment 1 catch trial data. The two
nonparametric measure of sensitivity (Grier, 1971). A' can
within-participants variables of cued object semantic consis-
be interpreted as equivalent to percentage correct in a
tency and target label semantic consistency were entered
forced-choice procedure. A' was computed using the rate of
into an ANOVA. There was a reliable effect of target label
correct responses in target-present trials (the hit rate) and the
semantic consistency, F(l, 23) = 4.97, MSE = .0303, p <
rate of errors in catch trials (the false-alarm rate). For the
.05, with a higher correct rejection rate for inconsistent
purpose of replicating the Biederraan et al. (1982) paradigm,
target labels (83.4%) than consistent target labels (75.5%).
we did not separate the catch trial data as a function of the
As found in the Biederman et al. study, participants were
semantic consistency of the target label. Thus, the A' results
more likely to falsely respond that an absent consistent target
reported in this section are based on the mean false-alarm
object was present than to respond that an absent inconsis-
rate in each cued object consistency condition, averaging
tent target object was present. This response bias was likely
across catch trials in which the target label was consistent
caused by participants adopting a higher standard of evi-
and inconsistent with the subsequent scene.
dence to accept that an inconsistent object was present in the
Percentage correct analysis. Mean percentage correct
scene than that a consistent object was present.
as a function of cued object consistency and target object
presence is presented in Table 2. First, there was a reliable
main effect of target object presence, F(l, 23) = 12.79,
MSB = .0506, p < .005. Participants responded correctly Table 2
67.9% of the time when the target object was present and Mean Percentage Hits, Percentage Correct Rejections
79.5% of the time when the target object was absent. There (Percentage False Alarms), and A' for Experiment 1
was also a main effect of cued object consistency, F(l, 23) =
Cued object % correct rejections
31.57, MSE = .0141, p < .001, with better performance consistency %hits (% false alarms) A'
when the cued object was consistent (78.5%) than when it
Consistent 76.6 80.4 (19.6) .861
was inconsistent (68.9%) with the scene. Finally, there was a
Inconsistent 59.2 78.5(21.5) .775
reliable interaction between cued object consistency and
8. OBJECT PERCEPTION IN SCENES 405
As reported previously, the effect of cued object semantic design controls for the potential bias for participants to
consistency on catch trial performance did not approach respond "yes" more often when the catch trial target label is
reliability, F(l, 23) = 1.38, MSE = .0061, p > .25, nor was consistent versus inconsistent with the scene.
the interaction between cued object consistency and target To control for the general complexity of the scene in
label consistency reliable, F < 1. The absence of a cued target-present and catch trials, the catch trial scenes con-
object consistency effect on catch trial performance is tained the cued object from the paired scene at the cued
intriguing. According to contextual facilitation models, location. For example, if a participant viewed the label
consistent cued objects should be easier to identify than "chicken" in a catch trial, the subsequent scene would
inconsistent cued objects. Thus, contextual facilitation mod- contain the paired cued object (a mixer). Thus, in the catch
els predict lower false-alarm rates when the cued object is trials, the semantic consistency manipulation was based
consistent with the scene, because participants will be better on the relationship between the target label and the scene
rather than the relationship between the cued object and the
able to determine that the cued object does not match the
target label. No such effect of cued object consistency was
found. It is important to note that a similar lack of an effect
of cued object consistency on catch trial performance was Method
reported by Biederman et al. (1982, Experiment 1). In that
experiment, false-alarm rates were 16% in the base condi- Participants. Twenty-four Michigan State University under-
tion and 15% in the probability violation condition. graduate students participated in the experiment for course credit.
In summary, analysis of the catch trial data revealed that All participants had normal or corrected-to-normal vision. The
participants were naive with respect to the hypotheses under
the semantic consistency of the target label reliably influ-
investigation. None had participated in Experiment 1.
enced performance, but the semantic consistency of the cued
Stimuli. The stimuli were the same as in Experiment 1 with the
object had little influence on performance. Because the following modifications. First, for each scene, target labels in catch
original object-detection paradigm (and our replication of trials named the same object as in corresponding target-present
that paradigm in Experiment 1) averaged across consistent trials. Second, the catch trial scenes contained the paired cued
and inconsistent target label catch trials when calculating object (e.g., a mixer when the target object was a chicken).
false-alarm rates, it is likely that these experiments underes- Apparatus and procedure. The apparatus and procedure were
timated the false-alarm rate for base conditions and overesti- the same as in Experiment 1. Each participant saw 160 experimen-
tal trials that were produced by a within-paiticipam factorial
mated the false-alarm rate for violation conditions. This
combination of 2 target label semantic consistency conditions X 2
would result in artificially high sensitivity estimates in base target object presence conditions x 2 cued object positions x 20
conditions and artificially low sensitivity estimates in scenes. Because the cued object position factor was not of
violation conditions. Overall, these data suggest that theoretical interest, the two levels of that factor were combined in
the consistent object facilitation effect observed in prior the statistical analyses. Each participant saw all 160 trials in a
object-detection experiments may have been due, at least in different random order. The entire session lasted approximately
part, to the fact that sensitivity measures did not control for 45min.
response biases induced by the consistency of the target
label.
Results
Percentage correct analysis. Mean percentage correct
Experiment 2 performance as a function of target label semantic consis-
tency and target object presence is shown in Table 3. First,
In Experiment 2 we modified the original object-detection
there was a reliable main effect of target object presence,
paradigm to address our concerns about catch trial design
F(l, 23) = 8.61, MSE = .0739, p < .01. Participants
and the method of calculating sensitivity. In this experiment,
responded correctly 65.6% of the time when the target object
the target label in a catch trial named the same object as in a
was present and 77.1% of the time when the target object
corresponding target-present trial (see Table 1). This design
was absent. There was no main effect of target label
has two advantages over the original paradigm. First,
semantic consistency, F < , with 71.9% correct perfor-
measures of sensitivity in this experiment were based on the mance when the target label was consistent with the scene
correct detection of a particular signal when it was present and 70.9% when the target label was inconsistent. There
and the false detection of the same signal when it was not was, however, a reliable interaction between target label
present. Second, the semantic consistency of the target label semantic consistency and target object presence, F(l, 23) =
with the scene on corresponding target-present and catch 65.85, MSE = .0269, p < .001. The hit rate in the consistent
trials was equivalent, because both labels named the same target label condition (75.7%) was higher than that in the
object. As a result, false-alarm rates in the semantically inconsistent target label condition (55.5%), but the reverse
consistent condition were based entirely on the false detec- pattern obtained in the catch trials, with a higher correct
tion of consistent target objects, and false-alarm rates in the rejection rate in the inconsistent target label condition
semantically inconsistent condition were based entirely on (86.3%) than in the consistent target label condition (68.0%).
the false detection of inconsistent target objects. In contrast In other words, both the hit and false-alarm rates were higher
to the original object-detection paradigm, this catch trial for consistent versus inconsistent target object-detection,
9. 406 HOLLINGWORTH AND HENDERSON
suggesting that participants were more biased to respond suggest that the semantic consistency of the cued object has
"yes" when attempting to detect a consistent target object. little if any effect on performance in catch trials, because
Given this pattern, sensitivity measures provide a better false-alarm rates did not differ as a function of cued object
indication of detection accuracy than percentage correct consistency.
performance. A second explanation for the difference in results between
A' analysis. Mean A' for each target object consistency Experiments 1 and 2 is that the facilitated detection of
condition is presented in Table 3. No effect of target label consistent objects in Experiment 1 and in the original
semantic consistency was obtained, F < 1. Participants were object-detection paradigm was an artifact produced by the
equally accurate at detecting semantically consistent target method of calculating sensitivity. In these experiments,
objects (.803) as semantically inconsistent target objects measures of sensitivity did not control response biases
(.810). caused by the semantic consistency of the target label, as
those paradigms averaged across catch trials on which the
target object was consistent and inconsistent with the scene.
Discussion
In Experiment 2, when the catch trials were modified so that
In Experiment 2, the catch trial design of the original response biases were controlled in A', no such consistent
object-detection paradigm was modified so that participants object advantage was obtained. This suggests that results
attempted to detect the same target object on corresponding from the original object-detection paradigm reflected re-
target-present and catch trials. The main result was that no sponse bias rather than the influence of scene context on
advantage was found for the detection of semantically object perception, and thus cannot be taken as strong
consistent versus inconsistent target objects. In addition, evidence for contextual facilitation of consistent object
there were reliable response biases caused by target label perception.
semantic consistency. As in Experiment 1, participants were A final explanation for the absence of a consistent object
biased to respond "yes" more often when the catch trial facilitation effect in Experiment 2 hinges on the use of a
label was consistent versus inconsistent with the scene. This target label preview. The priming model of scene context
bias suggests that information sufficient to access scene effects proposes that the source of contextual facilitation
meaning was available within the 200 ms scene presentation effects is the spread of activation from the activated represen-
duration. tation of a scene to stored descriptions of object types likely
Why might the consistent object contextual facilitation to be found in the scene (Friedman, 1979; Kosslyn, 1994;
effect, present in Experiment 1, be absent in Experiment 2? Palmer, 1975a). In Experiment 2, the presentation of the
One potential explanation is that, in the catch trials, the target label prior to scene viewing may have served to prime
presence of the paired cued object at the cued location may the stored description of the target object, regardless of its
have biased performance. For example, in an inconsistent semantic consistency. Such priming could mask potential
condition catch trial, the target label was "chicken"; this influences of scene context. In Experiment 3, the target label
label was followed by a kitchen scene containing the paired was presented either before or after the scene. If the priming
cued object (a mixer). If consistent objects are easier to model is correct, contextual facilitation of consistent object
detect than inconsistent objects, as suggested by contextual detection should be observed when the target label is
facilitation models, false-alarm rates may have been artifi- presented after the scene, but not necessarily when the target
cially low when the target label was inconsistent with the label is presented before the scene.
scene, because on those trials, the cued object was always
consistent with the scene. This lower false-alarm rate in the
inconsistent condition might have masked consistent object Experiment 3
facilitation in our sensitivity measure. We have no direct
Experiment 3 sought to provide further evidence concern-
way to determine whether cued object semantic consistency
ing the influence of scene context on object perception. In
influenced performance in this experiment, as that factor
this experiment, the target label was presented either before
was confounded with the semantic consistency of the target
or after scene presentation. In addition, the location cue was
label. The results from Experiment 1, as well as those
eliminated. Otherwise, Experiment 3 was identical to Experi-
reported by Biederman et al. (1982), however, provide no
ment 2. The presentation of the target label after the scene
support for this hypothesis. Both experiments strongly
further refines the object-detection paradigm by eliminating
two potential problems. First, participants may have used the
target label preview to constrain the spatial extent of then-
Table 3 search in the subsequent scene. This strategy would benefit
Mean Percentage Hits, Percentage Correct Rejections the detection of consistent objects, because the position of a
(Percentage False Alarms), and A' for Experiment 2 consistent object in a scene is easier to predict than the
position of an inconsistent object. Second, the presentation
Target label % correct rejections
consistency %hits (% false alarms) A' of a semantically inconsistent target label before the scene
may have interfered with perceptual processing when the
Consistent 75.7 68.0 (32.0) .803 scene implied by the label was not presented. In addition,
Inconsistent 55.5 86.3 (13.7) .810
elimination of the location cue improves the object-
10. OBJECT PERCEPTION IN SCENES 407
Preview
Postview
Figure 3. Schematic illustration of a trial in Experiment 3.
detection paradigm by undermining the potential strategy of location cue changed the participants' task. In Experiments 1
using cue position to assist post-presentation guessing. and 2, the task was to determine whether the target object
Figure 3 illustrates the main aspects of the paradigm. In appeared at the cued location. In Experiment 3, the task was
the target label preview condition, participants saw a target to determine whether the target object appeared anywhere in
label for 1500 ms, followed by presentation of the scene the scene. As in Experiment 2, the semantic consistency
for 200 ms, followed by a pattern mask containing a series manipulation in the catch trials was based on the relationship
of lowercase Xs, which remained on the screen until between the target label and the scene rather than the
response. In the target label postview condition, participants relationship between the presented object and the scene.6
saw a series of Xs in a blank field for 1500 ms, followed by Table 1 presents a summary of the target-present and catch
presentation of the scene for 200 ms, followed by a pattern trial design in Experiment 3.
mask containing the target label, which remained on the
screen until response. The series of X& was employed to
roughly equate stimulus presentation in the two conditions. 6
In Experiments 3 and 4, the object presented in the scene was
In addition to the manipulation of target label (preview or not cued by a location dot. Therefore, we will refer to this object as
postview), we omitted the location cue. The absence of a the presented object rather than the cued object.
11. 408 HOLLINGWORTH AND HENDERSON
Method obtained: Correct rejections for consistent target objects
dropped significantly between preview and postview trials
Participants. Twenty-four members of the Michigan State (11.0%), but correct rejections for inconsistent target objects
University community were paid $5 each for their participation. All
showed almost no decline (1.2%). As in Experiments 1 and
participants had normal or corrected-to-normal vision. The partici-
2, both the hit rate and false-alarm rate were higher for
pants were naive with respect to the hypotheses under investiga-
tion. None had participated in previous experiments. consistent versus inconsistent target objects, suggesting that
Stimuli. The stimuli were the same as in Experiment 2, except participants were biased to respond "yes" more often when
that a blank space, subtending 6.3 X 1.5 degrees, was created in the attempting to detect a consistent versus an inconsistent
center of the pattern mask to accommodate the object label or series object. Given this pattern, sensitivity measures again pro-
ofATs. vide a better indication of detection accuracy than percent-
Apparatus. The apparatus was the same as that used for age correct performance.
Experiment 1, except that the stimuli were displayed on a A' analysis. Mean A' as a function of target label
flat-screen monitor with a 72-Hz refresh rate. presentation and target label semantic consistency is re-
Procedure. The procedure was the same as that in Experiment
ported in Table 4. There was a reliable main effect of target
1, except that the participant was informed that the target label
label presentation, F(l, 23) = 23.52, MSE = .0067, p <
could appear either before or after the scene. In either case, the
participant was to press the button marked "yes" if the object .001, with better detection in the preview conditions (.836)
named by the label had appeared anywhere in the scene or the than in the postview conditions (.755). In addition, there was
button marked "no" if the object had not appeared in the scene. a reliable main effect of semantic consistency, F(l, 23) =
Each participant saw 160 experimental trials that were produced 5.87, MSE = .0034, p < .05, with better detection of
by a within-participant factorial combination of 2 target label semantically inconsistent target objects (.810) than consis-
presentation conditions (preview, postview) X 2 target label tent target objects (.782). These effects were mediated by a
consistency conditions X 2 target object presence conditions X 20 marginally reliable interaction between label presentation
scenes. Object position was manipulated between participants and and target object semantic consistency, F(l, 23) = 3.55,
was tied to the preview-postview manipulation. In one group, if
MSE = .0036, p = .07. Planned simple effects tests showed
position A was employed for the preview trials of a particular
that there was no difference in performance for consistent
scene, position B was used for all postview trials employing that
versus inconsistent target objects in the preview condition,
scene. These assignments were reversed in the second group.
Because the object position factor was not of theoretical interest, F < 1, but there was a reliable advantage for inconsistent
the two levels of that factor were combined in the statistical target object detection in the postview condition, F(l, 23) =
analyses. Each participant saw all 160 trials in a different random 5.35, MSE= .0060, p < . 05.
order. The entire session lasted approximately 45 min.
Discussion
Results The results from the preview condition of Experiment 3
replicated those of Experiment 2 and provided no support
Percentage correct analysis. Mean percentage correct
for contextual facilitation models of object perception in
performance as a function of target label presentation, target
scenes. The pattern of performance as a function of target
label semantic consistency, and target object presence is
label consistency was the same across the two experiments:
shown in Table 4. First, there was a reliable main effect of
There was no advantage for the detection of semantically
target label presentation (preview, postview), F(l, 23) =
consistent versus inconsistent objects, and participants dem-
36.00, MSB = .0082, p < .001. Participants responded
onstrated a bias to respond that consistent target objects were
correctly 73.8% of the time in the preview condition and
present in the scene compared to inconsistent objects. In
65.9% of the time in the postview condition. There was also
addition, overall performance in the preview condition of
a main effect of target object presence, F(l, 23) = 10.89,
Experiment 3 (A1 = .836) was higher than that in Experi-
MSE = .0765, p < .005, with better performance in the
ment 2 (A' = .807). These experiments differed only in the
catch trials (76.4%) than in the target-present trials (63.2%).
presence of the location cue, suggesting that the location cue
There was no main effect of target label semantic consis-
provided little if any information to aid detection perfor-
tency, F < 1, but there was a reliable interaction between
target label semantic consistency and target object presence,
F(l, 23) = 107.67, MSE = .0313, p < .001. The hit rate for
Table 4
consistent target objects (76.4%) was higher than that for
Mean Percentage Hits, Percentage Correct Rejections
inconsistent target objects (50.1%), but the reverse pattern
(Percentage False Alarms), and A' for Experiment 3
obtained in the catch trials, with a higher correct rejection
rate for inconsistent target objects (89.8%) than for consis- Target label % correct rejections
tent target objects (63.0%). Finally, there was a reliable consistency %hits (% false alarms) A'
3-way interaction between all factors considered, F(l, 23) = Preview
10.63, MSE = .0080, p < .005. In the target-present trials, Consistent 79.4 68.5(31.5) .834
performance for consistent target objects dropped slightly Inconsistent 56.7 90.4 (9.6) .839
between preview and postview conditions (6.1%), but Postview
Consistent 73.3 57.5 (42.5) .729
performance for inconsistent target objects dropped more
Inconsistent 43.5 89.2(10.8) .781
significantly (13.2%). In the catch trials, the reverse pattern
12. OBJECT PERCEPTION IN SCENES 409
mance. The results of the postview condition of Experiment consistency between the scene and the presented object. For
3 also failed to support contextual facilitation models of this experiment we chose one additional consistent and one
object perception in scenes. Contrary to the prediction of additional inconsistent object to be presented in each scene.
those models, an advantage for the detection of inconsistent Thus, each scene could contain one of four presented
target objects was obtained. This result is in particular objects, two of which were consistent and two of which were
contrast to the prediction derived from the priming model inconsistent. In the forced-choice response screen, one
that consistent object facilitation may only be observed object label named the object presented in the scene, and the
when the target label is presented after the scene. second label named the other object of the equivalent
In addition, the results from Experiment 3 provide further semantic consistency. For example, when a consistent object
evidence that the semantic consistency between the cued (a chicken or a pig) was presented Ln the farmyard scene, the
object and the scene has little or no influence on catch trial forced choice screen presented the labels "chicken" and
performance. In Experiment 3, the location cue was elimi- "pig." As in previous experiments, contextual facilitation
nated, and the task was to determine whether the target models predict that percentage correct discrimination perfor-
object appeared anywhere in the scene. Participants were mance should be better when the presented object is
therefore unaware, at least initially, that the identity of the consistent versus inconsistent with the scene in which it
presented object had any bearing on whether the target appears.
object was present or absent. If the lower false-alarm rate for
inconsistent versus consistent target objects in Experiment 2
Method
was caused by the semantic consistency of the cued object,
that difference should have disappeared, or at least have Participants. Twenty-four Michigan State University under-
been attenuated, in Experiment 3. Contrary to this predic- graduate students participated in the experiment for course credit.
tion, the false-alarm rates for inconsistent target label catch One of these original participants had to be replaced because he had
trials in both the preview and postview conditions of difficulty understanding the instructions. All participants had
Experiment 3 were lower than that in Experiment 2. In normal or corrected-to-normal vision. The participants were naive
addition, the difference in the false-alarm rate as a function with respect to the hypotheses under investigation. None had
participated in previous experiments.
of target label semantic consistency was actually larger in
Stimuli. The stimuli were the same as in Experiments 1—3 with
both the preview and postview conditions of Experiment 3
the following modifications. For each scene, a second consistent
than in Experiment 2, suggesting that participants were more
and a second inconsistent object were chosen. One scene from the
biased to respond that consistent objects were present in original set was replaced because of the difficulty of finding a
Experiment 3 than in Experiment 2. Thus, the absence of a second object that would be consistent in the scene at the same
consistent object advantage in Experiment 2 and the preview location as the original consistent object. Finally, only one object
condition of Experiment 3, and the presence of the inconsis- location was used for each scene. The object labels appearing after
tent object advantage in the postview condition of Experi- scene presentation were centered vertically and positioned to the
ment 3, do not appear to have been caused by the semantic left and right of fixation. The labels were created using lower-case,
consistency of the object presented in the scene during catch 24-point, anti-aliased Arial font.
Apparatus and procedure. The apparatus was the same as in
trials.
Experiments 1 and 2. Participants were presented a scene for 250
ms, followed by a pattern mask for 30 ms, followed by a
Experiment 4 forced-choice screen containing two object labels. There was no
delay (i.e., the inter-stimulus interval was zero) between each
The purpose of Experiment 4 was to provide converging display. The forced-choice screen remained in view until the
evidence bearing on the general hypothesis that consistent participant pressed the left button to indicate that the object named
scene context facilitates object perception. In Experiments by the left-hand label had appeared or the right button to indicate
1-3, A' was employed to control for participant response that the object named by the right-hand label had appeared in the
scene. Each participant saw 160 experimental trials that were
biases. In Experiment 4, we introduced a forced-choice
produced by a within-participant factorial combination of 2 pre-
procedure, similar to that developed by Reicher in the word
sented object consistency conditions X 2 presented objects X 2
recognition literature (Reicher, 1969), to eliminate response
label positions in the forced-choice display X 20 scenes. Because
bias entirely. This paradigm eliminates response biases the presented object factor and the label position factor were not of
caused by the semantic consistency of the object label theoretical interest, the two levels of each factor were combined in
because participants must discriminate between two object the statistical analyses. Each participant saw all 160 trials in a
labels, both of which are either semantically consistent or different random order. The entire session lasted approximately 45
inconsistent with the scene. Figure 4 illustrates the main
aspects of the paradigm.
A scene was presented for 250 ms, followed by a pattern
Results
mask for 30 ms, followed by a screen displaying two object
labels. One label named an object presented in the scene, and The influence of object consistency on percentage correct
the other label named an object that had not appeared in the discrimination performance was analyzed via a simple
scene. The participants' task was to indicate which of the effects test. There was no effect of the consistency of the
two labels named an object that had been presented in the presented object, F < 1. Participants responded correctly
scene. The main contextual manipulation was the semantic 70.7% of the time when the presented object was consistent
13. 410 HOLLINGWORTH AND HENDERSON
response
Figure 4. Schematic illustration of a trial in Experiment 4.
with the scene and 71.6% of the time when the presented In Experiment 4, only one object position was used for
object was inconsistent with the scene. The 95% confidence each scene. It is possible that the absence of a consistent
interval around these means was ± 1.69%. Thus, the experi- object advantage was due to participants learning the
ment had enough power to detect a 2.39% effect (see Loftus position at which the presented object appeared in each
& Masson, 1994). scene. Such knowledge could allow participants to direct
their attention quickly to the object position regardless of
Discussion whether the presented object was semantically consistent or
inconsistent with the scene. To investigate this possibility,
In Experiment 4, we introduced a forced-choice proce-
dure to investigate the influence of scene context on object we calculated percentage correct discrimination perfor-
perception independently of response bias. Contrary to the mance as a function of object consistency and first half
prediction of contextual facilitation models, no advantage versus second half of the trials. If the learning of object
for the detection of consistent objects was found. In fact, the positions masked a consistent object advantage, that advan-
non-reliable trend was in the direction of better inconsistent tage would more likely be found in the first half of the
object detection. Thus, using the same general set of scene experiment than in the second. However, there was no
stimuli, the original object-detection paradigm (Experiment evidence of a consistent object advantage in the first half of
1) produced a consistent object advantage, but Experiment the experiment. Percentage correct discrimination was 67.3%
4, in which response bias was eliminated, showed no such when the presented object was consistent and 68.8% when
advantage. This suggests, again, that the consistent object the presented object was inconsistent. A second potential
advantage in the original object-detection paradigm resulted, concern with Experiment 4 is that the scene was presented
at least in part, from the inadequate control of response bias for 250 ms, 100 ms longer than studies finding contextual
rather than from the influence of scene context on object facilitation of object detection (Biederman et al., 1982;
perception. Boyce et al., 1989). However, we have replicated Experi-
14. OBJECT PERCEPTION IN SCENES 411
ment 4 with a scene presentation duration of 150 ms and versus inconsistent with the scene. Because of the design of
obtained no advantage for the discrimination of semantically the catch trials in this paradigm, sensitivity measures were
consistent objects (Hollingworth & Henderson, in press-a). unable to adequately control for this response bias.
In fact, discrimination performance was reliably higher In Experiment 2, we modified the original object-
when the target object was inconsistent versus consistent detection paradigm so that participants attempted to detect
with the scene. the same object on trials in which it was and was not present
in the scene. This modification assured that the detection
performance measure (A1) was based on the correct detec-
General Discussion
tion of a particular signal when it was present in the scene
The purpose of this study was to investigate whether and the false detection of the same signal when it was not
object identification is influenced by the semantic relation- present. In addition, response biases caused by the semantic
ship between an object and the scene in which it appears. consistency of the target label were controlled in A',
One view of the influence of scene context on object providing a valid measure of object-detection performance.
identification proposes that consistent scene context facili- Contrary to the prediction of contextual facilitation models,
tates the perception of objects (Biederman, 1972; Biederman no advantage was found for the detection of consistent
et al., 1973; Kosslyn, 1994; Ullman, 1996). This view has versus inconsistent objects.
been instantiated in contextual facilitation models of object In Experiment 3, we manipulated whether the target label
perception in scenes, which propose that knowledge about appeared before or after the scene, and we eliminated the
the objects found in a given scene type facilitates the location cue. The purpose of the label manipulation was to
perception of object stimuli consistent with that scene investigate whether presenting the label before the scene
(Biederman, 1981; Biederman et al., 1982; Boyce & Pollat- may have interfered with scene processing when it was
sek, 1992; Boyce et al., 1989; Friedman, 1979; Palmer, inconsistent with the subsequent scene, and may have
1975a). The primary evidence supporting contextual facilita- allowed participants to constrain their subsequent search,
tion models comes from experiments demonstrating that the biasing detection performance. The location cue was elimi-
detection of an object in a scene is facilitated when the nated to see if it provided information useful to post-
object is semantically consistent compared with when it is presentation guessing strategies. The results from the target
inconsistent with the scene (Biederman et al., 1982; Boyce label preview condition replicated those of Experiment 2
et al., 1989). However, as discussed in the Introduction, and indicated that the location cue provided little or no
there are several potential problems with the original information to aid performance. The main result from the
object-detection paradigm that make interpretation of these target label postview condition was superior detection of
data difficult (De Graef et al., 1990; De Graef & d'Ydewalle, inconsistent versus consistent objects.
1995; Henderson, 1992). Specifically, the original paradigm In Experiment 4, we employed a forced-choice procedure
may not have adequately controlled for participant response to assess object-detection performance. This procedure
biases, and may have provided additional sources of informa- improved the object-detection paradigm by eliminating the
tion that influenced detection performance. In this study, we possibility of response bias caused by the semantic consis-
modified the original object-detection paradigm to address tency of the target label. Participants were asked to discrimi-
these concerns. nate between two consistent object labels (when the pre-
In Experiment 1, we replicated the original object- sented object was consistent with the scene) or between two
detection paradigm contextual facilitation effect (Biederman inconsistent object labels (when the presented object was
et al., 1982; Boyce et al., 1989). Participants saw a target inconsistent with the scene). Contrary to the prediction of
label naming an object, followed by a briefly presented contextual facilitation models, no advantage was found for
scene, followed by a pattern mask with an embedded the detection of consistent objects.
location cue. The object marked by the location cue could be In order to conclude from these data that consistent scene
either semantically consistent or inconsistent with the scene context does not facilitate object identification, it is neces-
in which it appeared. Detection performance was based on sary that these experiments meet two criteria. First, scene
percentage correct for trials in which the target label named meaning must have been available early enough to influence
the cued object and percentage of errors for trials in which object identification, if such influences exist. Second, the
the target label named a different object that did not appear contextual manipulation must have been strong enough to
in the scene. In these latter trials, the semantic consistency of interact with object identification, if such interaction is
the target label with the subsequent scene was not related to possible. In this study, scene meaning was available early
the consistency of the cued object with the scene. Using this enough and was strong enough to produce reliable response
paradigm, we replicated the basic consistent object contex- biases based on the consistency of the target label. These
tual facilitation effect: Detection performance (A1) was response biases suggest that scene meaning and its relation-
better when the cued object was semantically consistent ship to the identity of the target object was available from
versus inconsistent with the scene. However, a more detailed information obtained within the brief presentation duration
analysis of the catch trial data indicated that reliable of the scene stimulus. In addition, the contextual manipula-
response biases were caused by the semantic consistency of tion was strong enough to replicate the Biederman et al.
the target label: Participants responded "yes" more often (1982) results. Given that we found no evidence for
when the catch trial target label was semantically consistent facilitated detection of semantically consistent objects in the