SlideShare ist ein Scribd-Unternehmen logo
1 von 199
Difference between Normal and 
Skewed Distributions
This presentation will help you determine if the 
data set from the problem you are asked to 
solve has a normal or skewed distribution
This presentation will help you determine if the 
data set from the problem you are asked to 
solve has a normal or skewed distribution 
Normal 
Skewed
Knowing if your data’s distribution is skewed or 
normal is the second way of knowing if you will 
use what is called a parametric or a 
nonparametric test
The first way (as you may recall from the last 
decision point) is to determine if the data is 
scaled, ordinal, or nominal
But first,
What is a distribution?
We will illustrate what a distribution is with a 
data set that describes the hours students’ study
Here is the data set:
Student Hours of 
Study
Student Hours of 
Study 
Bart 1
Student Hours of 
Study 
Bart 1 
Basheba 2
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
Data
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
Data Set
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
From this data set we 
will create a 
distribution:
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The X Axis, will be the 
number of hours of 
study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
Hours of Study 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
The Y Axis, indicates 
the number of times 
the same number 
occurs 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
Number of Occurrences 
1 2 3 4 5 
Hours of Study
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
Student Hours of 
Study 
Bart 1 
Basheba 2 
Bella 2 
Bob 3 
Boston 3 
Bunter 3 
Buxby 4 
Bybee 4 
Bwinda 5 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1 
This is a 
distribution
One way to represent a distribution like this:
One way to represent a distribution like this:
One way to represent a distribution like this: 
Is like this:
One way to represent a distribution like this: 
Is like this:
One way to represent a distribution like this: 
Is like this: 
Normal distributions have 
the majority of the data in 
the middle
One way to represent a distribution like this: 
Is like this: 
Normal distributions have 
the majority of the data in 
the middle
One way to represent a distribution like this: 
Is like this: 
With decreasing 
but equal amounts 
toward the tails
One way to represent a distribution like this: 
Is like this: 
With decreasing 
but equal amounts 
toward the tails 
With decreasing 
but equal amounts 
toward the tails
The mean or average works really well with 
normal distributions
Another way to say it, is that the mean describes 
well the center point of a normal distribution
A Normal Distribution
The Mean
Here is how you calculate the mean:
Let’s put the data into the 
distribution
2 
1 2 
3 
3 
3 
4 
4 
5
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean =
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean =
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean =
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
ퟏ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+ퟐ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+ퟐ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+ퟑ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+ퟑ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+ퟑ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+ퟒ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+ퟒ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+ퟓ
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+5 
Divided by the number of 
total values
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+5 
Divided by the number of 
total values
Mean = 
1+2+2+3+3+3+4+4+5 
ퟗ 
2 
1 2 
3 
3 
3 
4 
4 
5
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+5 
9 
= 
27 
9
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+5 
9 
= 
27 
9 
= 3
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 3
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 3
The mean is a good estimate of the center of a 
distribution when the distribution is normal
But, the mean is not a good estimate of the 
center when the distribution is not normal
This is because of what we call OUTLIERS
What is an outlier?
An outlier is a data point that falls outside the 
overall pattern of the distribution
As an example, here is the overall pattern
As an example, here is the overall pattern 
2 
1 2 
3 
3 
3 
4 
4 
5
But what if we changed the 5
But what if we changed the 5 
2 
1 2 
3 
3 
3 
4 
4 
5
to a 50
to a 50 
2 
1 2 
3 
3 
3 
4 
4 
50
to a 50 
2 
1 2 
3 
3 
3 
4 
4 
50
To illustrate what happens to the mean when an 
outlier is present, let’s go back to this 
distribution:
To illustrate what happens to the mean when an 
outlier is present, let’s go back to this 
distribution: 
2 
1 2 
3 
3 
3 
4 
4 
5
Let’s say one student, instead of studying five 
hours studies 23 hours a day!!!!!
Watch what happens to the mean:
Before
Mean = 
1+2+2+3+3+3+4+4+5 
9 
= 
27 
9 
= 3 
2 
1 2 
3 
3 
3 
4 
4 
5
After
2 
1 2 
3 
3 
3 
4 
4 
5
2 
1 2 
3 
3 
3 
4 
4 
23
2 
1 2 
3 
3 
3 
4 
4 
23
2 
1 2 
3 
3 
3 
4 
4 
23 
Mean = 
1+2+2+3+3+3+4+4+ퟐퟑ 
9 
=
2 
1 2 
3 
3 
3 
4 
4 
23 
Mean = 
1+2+2+3+3+3+4+4+23 
9 
= 
ퟒퟓ 
ퟗ
2 
1 2 
3 
3 
3 
4 
4 
23 
Mean = 
1+2+2+3+3+3+4+4+23 
9 
= 
45 
9 
= ퟓ
Once again, BEFORE
Once again, BEFORE 
2 
1 2 
3 
3 
3 
4 
4 
5 
Mean = 
1+2+2+3+3+3+4+4+5 
9 
= 
27 
9 
= ퟑ
AFTER 
2 
1 2 
3 
3 
3 
4 
4 
23 
Mean = 
1+2+2+3+3+3+4+4+23 
9 
= 
45 
9 
= ퟓ
Just by changing one value from “5” to “23” the 
mean changed by two values (from “3” to “5”)
Thus, the mean is very sensitive to outliers
Therefore, the mean is not a good estimate of 
the center of a distribution when the 
distribution is NOT NORMAL
Therefore, the mean is not a good estimate of 
the center of a distribution when the 
distribution is NOT NORMAL
Therefore, the mean is not a good estimate of 
the center of a distribution when the 
distribution is NOT NORMAL
Therefore, the mean is not a good estimate of 
the center of a distribution when the 
distribution is NOT NORMAL
Here is a guiding principle
1 If your data set is normally distributed like 
this:
1 If your data set is normally distributed like this: 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
1 If your data set is normally distributed like this, 
then you will use a parametric test
2
2 If your data set is skewed either to the right
2 If your data set is skewed either to the right 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
2 If your data set is skewed either to the right 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
2 If your data set is skewed either to the right 
or to the left
2 If your data set is skewed either to the right 
or to the left 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
2 If your data set is skewed either to the right 
or to the left 
1 2 3 4 5 
Hours of Study 
Number of Occurrences 
3 
2 
1
2 If your data set is skewed either to the right 
or to the left, then you will use a nonparametric 
test
In summary,
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed 
A non-parametric test is used when the 
problem’s data set is very skewed to the right or 
the left:
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed 
A non-parametric test is used when the 
problem’s data set is very skewed to the right or 
the left:
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed 
A non-parametric test is used when the 
problem’s data set is very skewed to the right or 
the left:
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed: 
A non-parametric test is used when the 
problem’s data set is very skewed to the right or 
the left: 
Or very non-normal:
In summary, 
A parametric test is used when the problem’s 
data set is normally distributed: 
A non-parametric test is used when the 
problem’s data set is very skewed to the right or 
the left: 
Or very non-normal:
So, how do you know if your data is normally 
distributed?
So, how do you know if your data is normally 
distributed? 
Go to the Learning Module entitled: Assessing 
Skew. You will find it next to the link for this 
presentation.
So, how do you know if your data is normally 
distributed? 
Go to the Learning Module entitled: Assessing 
Skew. You will find it next to the link for this 
presentation. 
After you have viewed that learning module use 
SPSS to assess the skew of your data.
Is your data normally distributed or skewed?
If your data was skewed with a critical ratio 
greater than 2.0 or less than -2.0 then select 
Skewed
Otherwise select 
Normal
It is important to note that if you choose 
Skewed, your data will be analyzed using what 
are called non-parametric tests 
Skewed
Non-parametric tests differ from parametric 
tests in one simple way:
Parametric tests use the mean in their 
calculations
Parametric tests use the mean in their 
calculations 
Non-parametric tests use the median
What is the median?
The median is simply the middle score of a data 
set where
The median is simply the middle score of a data 
set where 
• 50% of the scores fall below it and
The median is simply the middle score of a data 
set where 
• 50% of the scores fall below it and 
• 50% of the scores are above it
To illustrate let’s go back to this distribution:
To illustrate let’s go back to this distribution: 
2 
1 2 
3 
3 
3 
4 
4 
5
With the Median we simply determine the mid 
point: 
2 
1 2 
3 
3 
3 
4 
4 
5
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5 
4 units
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
5
Notice that the Median is unaffected by outliers
To illustrate this, we’ll change the value “5” to a 
“10”:
2 
1 2 
3 
3 
3 
4 
4 
5
2 
1 2 
3 
3 
3 
4 
4 
10
Watch what happens to the median: 
2 
1 2 
3 
3 
3 
4 
4 
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 
2 
1 2 
3 
3 
3 
4 
4 
10
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 
2 
1 2 
3 
3 
3 
4 
4 
10 
4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 
2 
1 2 
3 
3 
3 
4 
4 
10 
4 units 
4 units
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 
2 
1 2 
3 
3 
3 
4 
4 
10 
4 units 
4 units
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 
2 
1 2 
3 
3 
3 
4 
4 
10 
Hmm, it’s 
still 3
But, what if we change the value 10 to 1,000!!!
Watch again what happens to the median: 
2 
1 2 
3 
3 
3 
4 
4 
1,000
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 
2 
1 2 
3 
3 
3 
4 
4 
1,000
4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 
2 
1 2 
3 
3 
3 
4 
4 
1,000
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 
2 
1 2 
3 
3 
3 
4 
4 
1,000
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 
2 
1 2 
3 
3 
3 
4 
4 
1,000
4 units 4 units 
Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 1000 
2 
1 2 
3 
3 
3 
4 
4 
What do you 
know – 
It’s still 3 
1,000
Here is the key take away:
The mean is affected by outliers
The mean is affected by outliers 
The median is not affected by outliers
Therefore the mean is used with more or less 
NORMAL DISTRIBUTIONS
Therefore the mean is used with more or less 
NORMAL DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL 
DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL 
DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL 
DISTRIBUTIONS
And the median is used with SKEWED OR NON-NORMAL 
DISTRIBUTIONS
So, why doesn’t everyone use non-parametric 
methods since they are unaffected by outliers?
Because parametric methods provide more 
meaningful information about the population 
than do non-parametric methods
So, if your data is skewed it’s better to get what 
information you can from a non-parametric 
test,
So, if your data is skewed it’s better to get what 
information you can from a non-parametric 
test, even though a parametric test would have 
provided more information (if your data had 
been normally distributed)
So, based on your analysis, which distribution 
best reflect your data set:
So, based on your analysis, which distribution 
best reflect your data set: 
Normal 
Skewed

Weitere ähnliche Inhalte

Mehr von Ken Plummer

Mehr von Ken Plummer (20)

Diff rel gof-fit - jejit - practice (5)
Diff rel gof-fit - jejit - practice (5)Diff rel gof-fit - jejit - practice (5)
Diff rel gof-fit - jejit - practice (5)
 
Learn About Range - Copyright updated
Learn About Range - Copyright updatedLearn About Range - Copyright updated
Learn About Range - Copyright updated
 
Inferential vs descriptive tutorial of when to use - Copyright Updated
Inferential vs descriptive tutorial of when to use - Copyright UpdatedInferential vs descriptive tutorial of when to use - Copyright Updated
Inferential vs descriptive tutorial of when to use - Copyright Updated
 
Diff rel ind-fit practice - Copyright Updated
Diff rel ind-fit practice - Copyright UpdatedDiff rel ind-fit practice - Copyright Updated
Diff rel ind-fit practice - Copyright Updated
 
Normal or skewed distributions (inferential) - Copyright updated
Normal or skewed distributions (inferential) - Copyright updatedNormal or skewed distributions (inferential) - Copyright updated
Normal or skewed distributions (inferential) - Copyright updated
 
Normal or skewed distributions (descriptive both2) - Copyright updated
Normal or skewed distributions (descriptive both2) - Copyright updatedNormal or skewed distributions (descriptive both2) - Copyright updated
Normal or skewed distributions (descriptive both2) - Copyright updated
 
Nature of the data practice - Copyright updated
Nature of the data practice - Copyright updatedNature of the data practice - Copyright updated
Nature of the data practice - Copyright updated
 
Nature of the data (spread) - Copyright updated
Nature of the data (spread) - Copyright updatedNature of the data (spread) - Copyright updated
Nature of the data (spread) - Copyright updated
 
Mode practice 1 - Copyright updated
Mode practice 1 - Copyright updatedMode practice 1 - Copyright updated
Mode practice 1 - Copyright updated
 
Nature of the data (descriptive) - Copyright updated
Nature of the data (descriptive) - Copyright updatedNature of the data (descriptive) - Copyright updated
Nature of the data (descriptive) - Copyright updated
 
Dichotomous or scaled
Dichotomous or scaledDichotomous or scaled
Dichotomous or scaled
 
Skewed less than 30 (ties)
Skewed less than 30 (ties)Skewed less than 30 (ties)
Skewed less than 30 (ties)
 
Skewed sample size less than 30
Skewed sample size less than 30Skewed sample size less than 30
Skewed sample size less than 30
 
Ordinal (ties)
Ordinal (ties)Ordinal (ties)
Ordinal (ties)
 
Ordinal and nominal
Ordinal and nominalOrdinal and nominal
Ordinal and nominal
 
Relationship covariates
Relationship   covariatesRelationship   covariates
Relationship covariates
 
Relationship nature of data
Relationship nature of dataRelationship nature of data
Relationship nature of data
 
Number of variables (predictive)
Number of variables (predictive)Number of variables (predictive)
Number of variables (predictive)
 
Levels of the iv
Levels of the ivLevels of the iv
Levels of the iv
 
Independent variables (2)
Independent variables (2)Independent variables (2)
Independent variables (2)
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 

Kürzlich hochgeladen (20)

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

Is the distribution normal or skewed?

  • 1. Difference between Normal and Skewed Distributions
  • 2. This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution
  • 3. This presentation will help you determine if the data set from the problem you are asked to solve has a normal or skewed distribution Normal Skewed
  • 4. Knowing if your data’s distribution is skewed or normal is the second way of knowing if you will use what is called a parametric or a nonparametric test
  • 5. The first way (as you may recall from the last decision point) is to determine if the data is scaled, ordinal, or nominal
  • 7. What is a distribution?
  • 8. We will illustrate what a distribution is with a data set that describes the hours students’ study
  • 9. Here is the data set:
  • 11. Student Hours of Study Bart 1
  • 12. Student Hours of Study Bart 1 Basheba 2
  • 13. Student Hours of Study Bart 1 Basheba 2 Bella 2
  • 14. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3
  • 15. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3
  • 16. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3
  • 17. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4
  • 18. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4
  • 19. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5
  • 20. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 Data
  • 21. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 Data Set
  • 22. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 From this data set we will create a distribution:
  • 23. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5
  • 24. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The X Axis, will be the number of hours of study
  • 25. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 Hours of Study
  • 26. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 Hours of Study 1
  • 27. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 Hours of Study
  • 28. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 Hours of Study
  • 29. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 Hours of Study
  • 30. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study
  • 31. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study
  • 32. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 33. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 34. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 35. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 36. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 37. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 The Y Axis, indicates the number of times the same number occurs 1 2 3 4 5 Hours of Study
  • 38. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 Number of Occurrences 1 2 3 4 5 Hours of Study
  • 39. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences
  • 40. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences
  • 41. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 42. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 43. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 44. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 45. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 46. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 47. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 48. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 49. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 50. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 51. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 52. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 53. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 54. Student Hours of Study Bart 1 Basheba 2 Bella 2 Bob 3 Boston 3 Bunter 3 Buxby 4 Bybee 4 Bwinda 5 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1 This is a distribution
  • 55. One way to represent a distribution like this:
  • 56. One way to represent a distribution like this:
  • 57. One way to represent a distribution like this: Is like this:
  • 58. One way to represent a distribution like this: Is like this:
  • 59. One way to represent a distribution like this: Is like this: Normal distributions have the majority of the data in the middle
  • 60. One way to represent a distribution like this: Is like this: Normal distributions have the majority of the data in the middle
  • 61. One way to represent a distribution like this: Is like this: With decreasing but equal amounts toward the tails
  • 62. One way to represent a distribution like this: Is like this: With decreasing but equal amounts toward the tails With decreasing but equal amounts toward the tails
  • 63. The mean or average works really well with normal distributions
  • 64. Another way to say it, is that the mean describes well the center point of a normal distribution
  • 65.
  • 68. Here is how you calculate the mean:
  • 69.
  • 70. Let’s put the data into the distribution
  • 71. 2 1 2 3 3 3 4 4 5
  • 72. 2 1 2 3 3 3 4 4 5 Mean =
  • 73. 2 1 2 3 3 3 4 4 5 Mean =
  • 74. 2 1 2 3 3 3 4 4 5 Mean =
  • 75. 2 1 2 3 3 3 4 4 5 Mean = ퟏ
  • 76. 2 1 2 3 3 3 4 4 5 Mean = 1+ퟐ
  • 77. 2 1 2 3 3 3 4 4 5 Mean = 1+2+ퟐ
  • 78. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+ퟑ
  • 79. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+ퟑ
  • 80. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+ퟑ
  • 81. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+ퟒ
  • 82. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+ퟒ
  • 83. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+ퟓ
  • 84. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+5 Divided by the number of total values
  • 85. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+5 Divided by the number of total values
  • 86. Mean = 1+2+2+3+3+3+4+4+5 ퟗ 2 1 2 3 3 3 4 4 5
  • 87. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+5 9 = 27 9
  • 88. 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+5 9 = 27 9 = 3
  • 89. 2 1 2 3 3 3 4 4 5 Mean = 3
  • 90. 2 1 2 3 3 3 4 4 5 Mean = 3
  • 91. The mean is a good estimate of the center of a distribution when the distribution is normal
  • 92. But, the mean is not a good estimate of the center when the distribution is not normal
  • 93. This is because of what we call OUTLIERS
  • 94. What is an outlier?
  • 95. An outlier is a data point that falls outside the overall pattern of the distribution
  • 96. As an example, here is the overall pattern
  • 97. As an example, here is the overall pattern 2 1 2 3 3 3 4 4 5
  • 98. But what if we changed the 5
  • 99. But what if we changed the 5 2 1 2 3 3 3 4 4 5
  • 101. to a 50 2 1 2 3 3 3 4 4 50
  • 102. to a 50 2 1 2 3 3 3 4 4 50
  • 103. To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution:
  • 104. To illustrate what happens to the mean when an outlier is present, let’s go back to this distribution: 2 1 2 3 3 3 4 4 5
  • 105. Let’s say one student, instead of studying five hours studies 23 hours a day!!!!!
  • 106. Watch what happens to the mean:
  • 107. Before
  • 108. Mean = 1+2+2+3+3+3+4+4+5 9 = 27 9 = 3 2 1 2 3 3 3 4 4 5
  • 109. After
  • 110. 2 1 2 3 3 3 4 4 5
  • 111. 2 1 2 3 3 3 4 4 23
  • 112. 2 1 2 3 3 3 4 4 23
  • 113. 2 1 2 3 3 3 4 4 23 Mean = 1+2+2+3+3+3+4+4+ퟐퟑ 9 =
  • 114. 2 1 2 3 3 3 4 4 23 Mean = 1+2+2+3+3+3+4+4+23 9 = ퟒퟓ ퟗ
  • 115. 2 1 2 3 3 3 4 4 23 Mean = 1+2+2+3+3+3+4+4+23 9 = 45 9 = ퟓ
  • 117. Once again, BEFORE 2 1 2 3 3 3 4 4 5 Mean = 1+2+2+3+3+3+4+4+5 9 = 27 9 = ퟑ
  • 118. AFTER 2 1 2 3 3 3 4 4 23 Mean = 1+2+2+3+3+3+4+4+23 9 = 45 9 = ퟓ
  • 119. Just by changing one value from “5” to “23” the mean changed by two values (from “3” to “5”)
  • 120. Thus, the mean is very sensitive to outliers
  • 121. Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
  • 122. Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
  • 123. Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
  • 124. Therefore, the mean is not a good estimate of the center of a distribution when the distribution is NOT NORMAL
  • 125. Here is a guiding principle
  • 126. 1 If your data set is normally distributed like this:
  • 127. 1 If your data set is normally distributed like this: 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 128. 1 If your data set is normally distributed like this, then you will use a parametric test
  • 129. 2
  • 130. 2 If your data set is skewed either to the right
  • 131. 2 If your data set is skewed either to the right 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 132. 2 If your data set is skewed either to the right 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 133. 2 If your data set is skewed either to the right or to the left
  • 134. 2 If your data set is skewed either to the right or to the left 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 135. 2 If your data set is skewed either to the right or to the left 1 2 3 4 5 Hours of Study Number of Occurrences 3 2 1
  • 136. 2 If your data set is skewed either to the right or to the left, then you will use a nonparametric test
  • 138. In summary, A parametric test is used when the problem’s data set is normally distributed
  • 139. In summary, A parametric test is used when the problem’s data set is normally distributed
  • 140. In summary, A parametric test is used when the problem’s data set is normally distributed A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
  • 141. In summary, A parametric test is used when the problem’s data set is normally distributed A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
  • 142. In summary, A parametric test is used when the problem’s data set is normally distributed A non-parametric test is used when the problem’s data set is very skewed to the right or the left:
  • 143. In summary, A parametric test is used when the problem’s data set is normally distributed: A non-parametric test is used when the problem’s data set is very skewed to the right or the left: Or very non-normal:
  • 144. In summary, A parametric test is used when the problem’s data set is normally distributed: A non-parametric test is used when the problem’s data set is very skewed to the right or the left: Or very non-normal:
  • 145. So, how do you know if your data is normally distributed?
  • 146. So, how do you know if your data is normally distributed? Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation.
  • 147. So, how do you know if your data is normally distributed? Go to the Learning Module entitled: Assessing Skew. You will find it next to the link for this presentation. After you have viewed that learning module use SPSS to assess the skew of your data.
  • 148. Is your data normally distributed or skewed?
  • 149. If your data was skewed with a critical ratio greater than 2.0 or less than -2.0 then select Skewed
  • 151. It is important to note that if you choose Skewed, your data will be analyzed using what are called non-parametric tests Skewed
  • 152. Non-parametric tests differ from parametric tests in one simple way:
  • 153. Parametric tests use the mean in their calculations
  • 154. Parametric tests use the mean in their calculations Non-parametric tests use the median
  • 155. What is the median?
  • 156. The median is simply the middle score of a data set where
  • 157. The median is simply the middle score of a data set where • 50% of the scores fall below it and
  • 158. The median is simply the middle score of a data set where • 50% of the scores fall below it and • 50% of the scores are above it
  • 159. To illustrate let’s go back to this distribution:
  • 160. To illustrate let’s go back to this distribution: 2 1 2 3 3 3 4 4 5
  • 161. With the Median we simply determine the mid point: 2 1 2 3 3 3 4 4 5
  • 162. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5
  • 163. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5 4 units
  • 164. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5
  • 165. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5
  • 166. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5
  • 167. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 5
  • 168. Notice that the Median is unaffected by outliers
  • 169. To illustrate this, we’ll change the value “5” to a “10”:
  • 170. 2 1 2 3 3 3 4 4 5
  • 171. 2 1 2 3 3 3 4 4 10
  • 172. Watch what happens to the median: 2 1 2 3 3 3 4 4 10
  • 173. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 2 1 2 3 3 3 4 4 10
  • 174. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 2 1 2 3 3 3 4 4 10 4 units
  • 175. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 2 1 2 3 3 3 4 4 10 4 units 4 units
  • 176. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +10 2 1 2 3 3 3 4 4 10 4 units 4 units
  • 177. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 2 1 2 3 3 3 4 4 10 Hmm, it’s still 3
  • 178. But, what if we change the value 10 to 1,000!!!
  • 179. Watch again what happens to the median: 2 1 2 3 3 3 4 4 1,000
  • 180. Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 2 1 2 3 3 3 4 4 1,000
  • 181. 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 2 1 2 3 3 3 4 4 1,000
  • 182. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 2 1 2 3 3 3 4 4 1,000
  • 183. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 +1000 2 1 2 3 3 3 4 4 1,000
  • 184. 4 units 4 units Median = 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 1000 2 1 2 3 3 3 4 4 What do you know – It’s still 3 1,000
  • 185. Here is the key take away:
  • 186. The mean is affected by outliers
  • 187. The mean is affected by outliers The median is not affected by outliers
  • 188. Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
  • 189. Therefore the mean is used with more or less NORMAL DISTRIBUTIONS
  • 190. And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
  • 191. And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
  • 192. And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
  • 193. And the median is used with SKEWED OR NON-NORMAL DISTRIBUTIONS
  • 194. So, why doesn’t everyone use non-parametric methods since they are unaffected by outliers?
  • 195. Because parametric methods provide more meaningful information about the population than do non-parametric methods
  • 196. So, if your data is skewed it’s better to get what information you can from a non-parametric test,
  • 197. So, if your data is skewed it’s better to get what information you can from a non-parametric test, even though a parametric test would have provided more information (if your data had been normally distributed)
  • 198. So, based on your analysis, which distribution best reflect your data set:
  • 199. So, based on your analysis, which distribution best reflect your data set: Normal Skewed