Unidad didactica Estadistica

International University

Isabel I de Castilla

Basic Statistical Concepts
Population
The specific group of individuals or individual objects or events to
be studied; the total group to which we make projections and
inferences (hence, the term “inferential statistics”)
Sample
A subset of individual elements from a population which is
examined and from which we draw conclusions about the
population as a whole
Random Sample
A sample selected in such a way that every individual member of
the population has an equal chance of being selected

Bias
In statistics, bias is bad, nasty, evil. In research, it is simply an
obstacle. Bias is the systematic favoritism that is present in the
data collection process which may result in skewed or misleading
results.
Sources of Bias
Sample selection: Non-random samples may be biased.
Data collection: The way questions are asked, as well as the
processing and handling of data may create bias.

Bias is often referred to as Error

Data
The actual measurements obtained through a study or procedure
(“data” is plural, the singular is “datum”)
Types of Data
Numerical: Measurements for which the numbers have value such
as height and weight. Something which has quantity (hence,
“quantitative data”)
Categorical: Observations of categories such as gender or race.
Numbers may be used to label categories but there is no
relationship between the number and its value (e.g., 1 = male and 2
= female)

Statistic
A number that summarizes the data collected from a sample.
Some examples include frequencies, percentages, percentiles, and
averages.
Parameter
Statistics are based on sample data. If the summary number is
from the entire population then it is a parameter. A study that
obtains data from an entire population and is summarized by using
parameters is a census.

Mean
The average or middle of a data set obtained by summing all the
values in the data set and dividing by the total number of values.
Also called the arithmetic mean; different calculations are used to
create a geometric mean or a harmonic mean; these are not
generally used in social and market research.
Median
When the data values are lined up in order from smallest to largest,
the median is the middle value, the point where half of the values
are above the median and half are below.
Mode
When data are grouped into categories, the mode is the largest
category, based on the number of individuals in the category.

Mean vs Median (vs Mode)
x = ∑.
x
The mean is calculated with the formula n
The median is the middle value in an ordered distribution.
Consider these data:
40 38
Car Prices N of
35
What’s “average”?
Models

$15K 38 30

$20K 16 25

$25K 11 20
16
15
14
$30K 6 11
10
9
$35K 3 6
5 3 2
$40K 2 1 0 0 0
0
$45K 1 $15K $20K $25K $30K $35K $40K $45K $50K $55K $60K $65K $70K

$65K 9
$70K 14
The mean is $31,400 (n = 100) and the
median is $20,000. The mode is the
most frequent category, $15,000.

Variation
Not every score is the same. There are different prices for different
cars. Some people pay different prices for the same car. Prices
change over time. As the previous exhibit shows, measures of
central tendency are not sufficient to describe the distribution (and
variability) of scores.
Standard Deviation
∑ ( x −. x )
2

The formula for standard deviation is s=
n −1
The standard deviation tells you whether the scores are tightly
grouped or widely distributed. Two data sets can have the same
mean but have very different distributions. In the previous bi-modal
example, the standard deviation is $21,119, indicating a high degree
of variation. Note that is the variance.2
s

Normal Distribution
The normal distribution is described graphically by the bell-shaped
curve. As the number of values in a distribution grows large, there
is a tendency in many situations for the largest group of individuals
to cluster in the middle of the distribution with successively fewer
individuals as the values move out to the tails or ends of the
distribution. Due to symmetry, the mean and median are equal and
in the middle of the distribution.

Normal Distribution (continued)
The normal distribution is the starting point for understanding
variability. With a normal distribution, standard deviation has
special significance. It is the distance from the mean to the saddle
point or point where the curvature changes from concave up to
concave down. At
this point, about 68% of
the values lie within one
standard deviation (this
is know as the empirical
rule). 95% of the values
will fall within two
standard deviations and
99.7% will fall within
three standard
deviations.

Normal Distribution The difference in variability is clear
300 when two normal distributions with
250
252
x = 50
the same mean are shown on the
200
210 210
s = 1.6 same scale
150 300
120 120

100 250

45 45 200
50
10 10
1 1
0 150
45 46 47 48 49 50 51 52 53 54 55
30
100

25
50
x = 50
20 s = 16 0
0 10 20 30 40 50 60 70 80 90 100

15

10

5

0
0 10 20 30 40 50 60 70 80 90 100

Standard Scores
( x − x)
The formula for a standard score is s
, where x is the original
score and s is the standard deviation.

Among other things, a standard score allows for comparisons when
means and distributions may be different for the scores being
compared. The standard score gives the relative standing of the
original score taking into account the mean and the variation in the
distribution. Standard scores are used in statements like, “Sales at
the Troy store are +2 standard deviations (above the mean).”
Knowing that a score is above or below the mean and that it is 2, 3,
or more standard deviations identifies the scores position relative
to all other scores both in terms of direction (from the mean) and
how extreme the score is given how other scores are distributed.

Standard Error
Standard error is the same basic concept as standard deviation,
both represent a typical distance from the mean. The difference is
that the original population values will deviate from each other due
to natural phenomena (different height, different ideas, different
characteristics). But standard error is the deviation of the sample
means (from multiple samples of the population).
Sample means vary due to the error that occurs from not doing a
census (hence, “standard error”). According to the central limit
theorem if the samples are large enough the distribution of all
possible sample means will have a bell-shaped or normal
distribution. Error above and below the mean cancels out and the
distribution is symmetrical.
σ
n is
the standard error, where σ
is the population standard
deviation.

Unidad didactica Estadistica

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Unidad didactica Estadistica

Ähnlich wie Unidad didactica Estadistica (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Unidad didactica Estadistica