Sample size in qualitative research Margarete Sandelowski
1. Research in Nursing & Health, 1995, 18, 179-1 83
Focus on Qualitative Methods
Sample Size in Qualitative
Research
Margarete Sandelowski
A common misconceptionabout sampling in qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. Yet, simple sizes may be too small to
support claims of having achieved either informational redundancy or theoretical saturation, or
too large to permit the deep, case-orientedanalysis that is the raison-d’etreof qualitative inquiry.
Determining adequate sample size in qualitative research is ultimately a matter of judgment and
experience in evaluating the quality of the information collected against the uses to which it will
be put, the particular research method and purposeful sampling strategy employed, and the
research product intended. 0 1995 John Wiley & Sons. Inc.
A common misconception about sampling in
qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. The “logic and power” (Patton, 1990,
p. 169) of the various kinds of purposeful sampling used in qualitative research lie primarily in
the quality of information obtained per sampling
unit, as opposed to their number per se. Moreover, an aesthetic thrust of sampling in qualitative research is that small is beautiful. Yet, inadequate sample sizes can undermine the credibility
of research findings. There are no computations
or power analyses that can be done in qualitative
research to determine a priori the minimum number and kinds of sampling units required, but
there are factors, including the aim of sampling
and the type of purposeful sampling and research
method employed, which researchers can consider to help them decide whether they have col1 1 am indebted to one of the anonymous reviewers of this
article for the phrasing “small is beautiful.”
lected enough data. These factors are the subject
of this article.
NEITHER SMALL NOR LARGE, BUT TOO
SMALL OR TOO LARGE
Adequacy of sample size in qualitative research is
relative, a matter of judging a sample neither
small nor large per se, but rather too small or too
large for the intended purposes of sampling and
for the intended qualitative product. A sample
size of 10 may be judged adequate for certain
kinds of homogeneous or critical case sampling,
too small to achieve maximum variation of a
complex phenomenon or to develop theory, or too
large for certain kinds of narrative analyses.
Reported sample sizes are often too small to
support claims of having achieved either informational redundancy (Lincoln & Guba, 1985) or
theoretical saturation (Strauss & Corbin, 1990).
Margarete Sandelowski, PhD, RN, is a professor, Department of Women’s and Children’s
Health, School of Nursing, University of North Carolina at Chapel Hill.
This article is part of the ongoing series, Focus on Qualitative Methods, edited or contributed
by Dr. Sandelowski.
This article was received on September 7, 1994, revised, and acceptedfor publication November 28, 1994.
Requests for reprints should be addressed to Dr. Sandelowski, University of North Carolina at
Chapel Hill, #7460 Carrington Hall, Chapel Hill, NC 27599-7460.
0 1995 John Wiley & Sons, Inc. CCC 0160-6891/95/020179-05
179
2. 180
RESEARCH IN NURSING 8 HEALTH
Impatience, an a priori commitment to what will
be seen, or a disinclination to see any more may
incline researchers to stop sampling prematurely.
Seeing nothing new in newly sampled units or
feeling comfortable that a theoretical category
has been saturated are functions involving the
recognition of what is there and what can be
made out of the data already collected, and then
deciding whether it is sufficient to create an intended product. These functions are acquired
through experience. For example, I have noticed
in my own development and that of students with
whom I have worked that beginning qualitative
researchers often require more sampling units
than more experienced researchers to “see” and
to “make.” One expert qualitative researcher (P.
Stern, personal communication, 1989) intimated
that we often have all the data we will need in the
very first pieces of data we collect, but that we do
not (or cannot) know that until we collect more.
Ultimately, information can be deemed redundant
or theoretical lines deemed saturated-only for
now (Morse, 1989).
Conversely, sample sizes may be too large to
support claims to having completed detailed analyses of data, especially the microanalysis demanded by certain kinds of narrative and observational studies. Even in qualitative projects
aimed at explicating regularities across pieces of
data, a high premium is still placed on discerning
the particularities or idiosyncrasies presented by
each piece of data. While qualitative studies may
involve what are considered large sample sizes
(over 50), qualitative analysis is generically
about maximizing understanding of the one in all
of its diversity; it is case-oriented, not variableoriented (Ragin & Becker, 1989). Any sample
size interfering with the case-oriented thrust of
qualitative work can, accordingly, be judged too
large.
ISSUES IN PURPOSEFUL SAMPLING
One of the major differences between qualitative
and quantitative research approaches is that qualitative approaches typically involve purposeful
sampling, while quantitative approaches usually
involve probability sampling (Kuzel, 1992; Morse, 1986, 1989; Patton, 1990). Patton (1990) described 14 different types of purposeful sampling, involving the selection for in-depth study
of typical, atypical, or, in some way, exemplary
“information-rich cases” (p. 169). Researchers in
both domains of inquiry often have to resort to
sampling they know is less than ideal for their
purposes, but qualitative researchers value the
deep understanding permitted by informationrich cases and quantitative researchers value the
generalizations to larger populations permitted by
random and statistically representative samples.
Although a sample of one will never be sufficient
to permit generalization of findings to populations, it may be sufficient to permit the valuable
kind of generalizations that can be made from
and about cases, variously referred to as idiographic, holographic, naturalistic, or analytic
generalizations (Firestone, 1993; Lincoln &
Guba, 1985; Ragin & Becker, 1992; Simons,
1980; Stake & Trumbull, 1982).
In qualitative research, events, incidents, and
experiences, not people per se, are typically the
objects of purposeful sampling (Miles & Huberman, 1994; Strauss & Corbin, 1990). People, in
addition to sites, artifacts, documents, and even
data that have already been collected are sampled
for the information they are likely to yield about a
particular phenomenon. Sample size in qualitative research may refer to numbers of persons,
but also to numbers of interviews and observations conducted or numbers of events sampled.
People are certainly central in all kinds of inquiry
approaches in the health sciences, but they enter
qualitative studies primarily by virtue of having
direct and personal knowledge of some event
(e.g., illness, pregnancy, life transition) that they
are able and willing to communicate to others and
only secondarily by virtue of demographic characteristics (e.g., age, race, sex).
People Versus Purpose
When qualitative researchers decide to seek
people out because of their age or sex or race, it
is because they consider them good sources of
information that will advance them toward an analytic goal and not because they wish to generalize to other persons of similar age, sex, or race.
That is, a demographic variable, such as sex,
becomes an analytic variable; persons of one or
the other sex are selected for a study because, by
virtue of their sex, they can provide certain kinds
of information. Accordingly, only as many persons of a particular sex are included in a study as
is necessary to obtain that information. There is
no mandate to have equivalent numbers of women or men or numbers of persons of each sex in
the proportions in which they appear in a certain
population.
Sampling on the basis of demographic characteristics presents something of a problem in
achieving both informational and size adequacy
3. SAMPLE SIZE / SANDELOWSKI
in qualitative studies. There is currently a strong
impulse (and federal mandate) to eliminate gender, race/ethnicity, and class bias in research by
including members of minority or traditionally
disempowered groups typically underrepresented
in research, and by including women and men
typically underrepresented in certain domains of
research, such as men in family studies and women in studies of heart disease. Trost (1986) described a “statistically nonrepresentative stratified” sampling strategy whereby researchers can
select persons varying in demographic characteristics to achieve representative coverage and
inclusion. That is, while the sample is statistically nonrepresentative, it is informationally
representative in that data will be obtained from
persons who can stand for other persons with
similar characteristics. In her illustration involving a study of families with teenagers, five sets of
naturally and artificially dichotomized variables
(one or two-parent family, one or two or more
children, housed in an apartment or home, with a
high or low income, and with a male or female
teenager) were combined to yield 32 kinds of
families to be sampled. A similar kind of sampling plan can be used to ensure inclusion of
females and males, and persons varying in social
class, race, cultural affiliation, religion, or other
dimension.
Although this kind of sampling accommodates
a new, laudable, and necessary moral consciousness concerning underrepresented and, therefore,
often misrepresented groups by partially accommodating the logic of probability sampling, it
may wholly contravene the logic of purposeful
sampling. Strictly speaking, sampling for variation in race, class, gender, or other such background or person-related characteristics ought to
be done in qualitative studies when they are
deemed analytically important and where the failure to sample for such variation would impede
understanding or invalidate findings (Cannon,
Higginbotham, Leung, 1988). Deciding a priori
that a sample will include a certain number or
percentage of individuals in various demographic
groups may meet federal and other mandates for
inclusion of traditionally excluded persons, but it
may also result in a sample with a kind of variation that has little analytic significance or detracts
from analysis goals (Morse, 1989). More importantly, such a sample may be too small adequately to address the analytic importance of
such factors as gender or race, or, alternatively,
too large to favor the deep analysis that qualitative projects mandate.
One way to resolve this dilemma is to design
181
studies in which a phenomenon is investigated in
one group at a time (either simultaneously or sequentially). The design for such studies will include more than one purposeful sampling strategy: for example, homogeneous and maximum
variation sampling, where person-related homogeneity is maintained while variation in the target
phenomenon is sought. After a series of such
studies has been completed, a larger synthesis of
findings can be undertaken in which the researcher can more adequately address the question of whether and how a variable such as gender is important in understanding a phenomenon.
SAMPLE SIZE IN DIFFERENT KINDS OF
PURPOSEFUL SAMPLING
Different kinds of purposeful sampling require
different minimum sample sizes. For example, in
deviant case sampling, where the intention is to
understand a very unusual or atypical manifestation of some phenomenon, one case may be sufficient. Yet, even a sample of one requires withincase sampling (Miles & Huberman, 1994). The
researcher must decide which of the varieties of
data concerning the case to sample to explicate its
atypicality. This is especially evident in cases involving aggregates of one, such as a family, community, or organization. Even when an individual
is the focal one, the researcher must sample from
the wealth of data obtainable from and about that
individual. In short, any one case offers a variety
of data that must be sampled in sufficient quantity
to make the case.
Maximum variation is one of the most frequently employed kinds of purposeful sampling
in qualitative nursing research and typically requires the largest minimum sample size of any of
the purposeful sampling strategies. As in any
kind of sampling, the more variability there is
within the confines of a qualitative project, the
more numbers of sampling units the researcher
will require to reach informational redundancy or
theoretical saturation. Researchers wanting maximum variation in their sample must decide what
kind(s) of variation they want to maximize and
when to maximize each kind. One kind of variation already described is demographic variation,
where variation is sought on generally peoplerelated characteristics.
A second kind of variation is phenomenal variation, or variation on the target phenomenon under study. For example, the target phenomenon in
a study of couples who have obtained positive
fetal diagnoses is diagnosis, which varies on such
4. 182
RESEARCH IN NURSING B HEALTH
dimensions as type and time of diagnosis, and the
instrumentation used to make it. Like the decision to seek demographic variation, the decision
to seek phenomenal variation is often made a
priori in order to have representative coverage of
variables likely to be important in understanding
how diverse factors configure a whole. This kind
of sampling is also referred to as selective or
criterion sampling, where sampling decisions are
made going into a study on “reasonable”
grounds, rather than on analytic grounds after
some data have already been collected (Glaser,
1978, p. 37; Schatzman & Strauss, 1973).
A third kind of variation is theoretical variation,
or variation on a theoretical construct that is associated with theoretical sampling, or the sampling
on analytic grounds characteristic of grounded
theory studies. A theoretical sampling strategy is
employed to fully elaborate and validate theoretically derived variations discerned in the data. Initial sampling for phenomenal variation permits
these theoretical variations to be identified. A
program of research employing grounded theory
typically begins with a selective or criterion sampling strategy aimed at phenomenal variation and
then proceeds to theoretical sampling (Sandelowski, Holditch-Davis, & Hams, 1992).
Researchers control the number of sampling
units required to achieve informational redundancy or theoretical saturation by deciding which
category of variation to maximize and minimize.
This decision is a matter of fitting the sampling
strategy to the purpose of and method chosen for
a particular study and appraising the resources
(including number of investigators and financial
support) available to conduct the study. For example, purposeful sampling for demographic homogeneity and selected phenomenal variation is a
way a researcher working alone with limited resources can reduce the minimum number of sampling units required within the confines of a single
research project, but still produce credible and
analytically and/or clinically significant findings.
SAMPLE SIZES FOR DIFFERENT
QUALITATIVE METHODS
Just as different purposeful sampling strategies
require different minimum sample sizes, different
qualitative methods require different minimum
sample sizes. Morse ( 1994) has recommended
that phenomenologies directed toward discerning
the essence of experiences include about six participants, ethnographies and grounded theory
studies, about 30 to 50 interviews and/or obser-
vations, and qualitative ethological studies, about
100 to 200 units of observation.
Additional considerations in matching sample
size to method are within-method diversity and
the multiple uses of a method. Phenomenology
offers a good illustration of how within-method
diversity and the particular use to which a method
is put can alter the requirements for sample size.
In a phenomenological case study, one case can
be sufficient to show something about an experience that a researcher deems significant for special display (e.g., Wertz, 1983). One case will
not be sufficient, however, if the researcher’s intention is to describe invariant or essential features of an experience. For example, a phenomenological study, as interpreted by Van Kaam
(1959), will likely require 10 to 50 descriptions
of a target experience in order to discern its necessary and sufficient constituents. When phenomenological techniques are used in the service
of a goal other than to produce a phenomenology,
such as generating items for an instrument, at
least 25 descriptions of an experience will likely
be required.
SAMPLE SIZES IN COMBINED
QUALITATIVE AND QUANTITATIVE
STUDIES
Studies combining qualitative and quantitative
approaches involve additional considerations in
determining sufficient sample size. Indeed, socalled methodologically triangulated studies present researchers with many dilemmas (beyond
the scope of this article), the resolution of which
depend on the researcher’s stance concerning the
compatibility of the philosophies and practices of
qualitative and quantitative inquiry.
With respect to sampling, the logics of probability and purposeful sampling are arguably sufficiently irreconcilable in most cases to preclude
using the same subjects for both quantitative and
qualitative purposes (Morse, 1991). Subjects selected for the purposes of statistical representativeness may not fulfill the informational needs
of the study, while participants selected for information purposes do not meet the requirement of
statistical representativeness.
Accordingly,
whether primarily quantitative or qualitative, or
whether designed for purposes of completeness
or confirmation (Breitmayer, Ayres, & Knafl,
1993), such combination studies would require
two samples drawn simultaneously or sequentially according to the two logics of sampling.
5. SAMPLE SIZE I SANDELOWSKI
Yet, it can also be argued that among persons
chosen according to the logic of probability sampling, there will likely be articulate informants
whose selection for the qualitative portion of a
combined study can be justified as purposeful.
The purposeful sample would have to be expanded
only if the data obtainable from the participants
already sampled was deemed informationally insufficient. Similarly, no additional sampling may
be necessary in studies where further information
obtainable from standardized instruments is desired about a purposefully drawn sample. The
caveat here is that the researcher use the data
from these instruments for purposes of fuller description, rather than to draw statistical inferences.
CONCLUSION
Determining an adequate sample size in qualitative research is ultimately a matter of judgment
and experience in evaluating the quality of the
information collected against the uses to which it
will be put, the particular research method and
sampling strategy employed, and the research
product intended. Numbers have a place in ensuring that a sample is fully adequate to support
particular qualitative enterprises. A good principle to follow is: An adequate sample size in qualitative research is one that permits-by
virtue of
not being too large-the
deep, case-oriented
analysis that is a hallmark of all qualitative inquiry, and that results in-by virtue of not being too
small-a
new and richly textured understanding
of experience.
REFERENCES
Breitmayer, B. J., Ayres, L., & Knafl, K. A. (1993).
Triangulation in qualitative research: Evaluation of
completeness and confirmation purposes. Image:
Journal of Nursing Scholarship, 25, 237-243.
Cannon, L. W., Higginbotham, E., & Leung, M. L.
(1988). Race and class bias in qualitative research
on women. Gender & Society, 2 , 449-462.
Firestonc, W. A. (1993). Alternative arguments for
generalizing from data as applied to qualitative research. Educational Researcher, 22, 16-23,
Glaser, B. G. ( 1978). Theoretical sensitivity: Advances
in the methodology o grounded theory. Mill Valley,
f
CA: Sociology Press.
Kuzel, A. J. (1992). Sampling in qualitative inquiry. In
B. F. Crabtree & W. L. Miller (Eds.), Doing qualitative research (pp. 31-44). Newbury Park, CA: Sage.
183
Lincoln, Y. S . , & Cuba, E. G. (1985). Naturalistic
inquiry. Beverly Hills, CA: Sage.
Miles, M. B., & Huberman, A. M. (1994). Qualitative
data analysis: An expanded sourcebook (2nd ed).
Thousand Oaks, CA: Sage.
Morse, J. M. (1986). Quantitative and qualitative research: Issues in sampling. In P. L. Chinn (Ed.),
Nursing research methodology: Issues and implementation (pp. 181-193). Rockville, MD: Aspen.
Morse, J. M. (1989). Strategies for sampling. In J. M.
Morse (Ed.), Qualitative nursing research: A contemporary dialogue (pp. 1 17- I3 I). Rockville, MD:
Aspen.
Morse, J. (1991). Approaches to qualitativequantitative methodological triangulation. Nursing
Research. 40. 120-123.
Morse, J. M. (1994). Designing funded qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.),
Handbook of qualitative research (pp. 220-235).
Thousand Oaks, CA: Sage.
Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed). Newbury Park, CA: Sage.
Ragin, C. C., & Becker, H. S. (1989). How the microcomputer is changing our analytic habits. In G.
Blank, J. L. McCartney, & E. Brent (Eds.), New
technology in society: Practical applications in research and work (pp. 47-55). New Brunswick, NJ:
Transaction.
Ragin, C. C., & Becker, H. S. (1992). Whar is a case?
Exploring the foundations of social inquiry. Cambridge: Cambridge University Press.
Sandelowski, M., Holditch-Davis, D., & Harris, B.
G. (1992). Using qualitative and quantitative methods: The transition to parenthood of infertile
couples. In J. F. Gilgun, K. Daly, & G. Handel
(Eds.), Qualitative methods in family research
(pp. 301-322). Newbury Park, CA: Sage.
Schatzman, L., & Strauss, A. (1973). Field research:
Strategies for a natural sociology. Englewood
Cliffs, NJ: Prentice-Hall.
Simons, H. (Ed.). (1980). Towards a science of the
singular: Essays about case study in educational
research and evaluation. Norwich: University of East
Anglia, Center for Applied Research in Education.
Stake, R. E., & Trumbull, D. J. (1982). Naturalistic
f
generalizations. Review Journal o Philosophy and
Social Science, 7 , 1-12.
Strauss, A,, & Corbin, J. (199). Basics of qualitative
research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.
Trost, J. E. (1986). Statistically nonrepresentative
stratified sampling: A sampling technique for qualitative studies. Qualitative Sociology, 9, 54-57.
Van Kaam, A. L. (1959). Phenomenal analysis: Exemplified by a study of the experience of “really feeling
understood.” Journal of Individual Psychology, 15,
66-72.
Wertz, F. J. (1983). From everyday to psychological
description: Analyzing the moments of a qualitative
data analysis. Journal of Phenomenological Psychology, 14, 197-241.