SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Mathematics
in the Modern
World
Chapter 4:
Data Management
4.1 Descriptive Statistics
Statistics is a branch of mathematics that deals
with data collection, organization, analysis,
interpretation and presentation.
Data collection is defined as the procedure of
collecting, measuring and analyzing accurate
insights for research using standard validated
techniques.
Data organization refers to the method of
classifying and organizing data sets to make
them more useful, it can be applied to physical
records or digital records.
Data analysis is a process of inspecting, cleansing,
transforming, and modeling data with the goal of
discovering useful information, informing
conclusions, and supporting decision-making.
Interpretation of data is the process of assigning
meaning to the collected information and
determining the conclusions, significance, and
implications of the findings.
Presentation of data refers to the organization of
data into tables, graphs or charts, so that logical
and statistical conclusions can be derived from the
collected measurements.
Descriptive Statistics gives us information or help
describe the characteristics of a specific data set by
giving short summaries about the sample and
measures of the data.
Basic Statistical Concepts
A population consists of the totality of the
observation and sample is a part of the
population. A variable is any characteristics,
number, or quantity that can be measured or
counted.
Two kinds of variables:
1. Qualitative variables also called as categorical
variables are variables that are not numerical.
It describes data that fits into categories.
2. Quantitative variables are numerical. It can be
ranked and has order.
Quantitative variables can be classified further into
discrete variables and continuous variables.
A discrete variable is a variable whose value
is obtained by counting.
Continuous variables can assume an infinite
number of values between any two specific
values. They are obtained by measuring. They
often include fractions and decimals.
Examples
Discrete
number of students present
number of red marbles in a jar
number of heads when flipping three coins
students’ grade level
Continuous
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
Types of Statistical Data
1.Numerical data. These data have meaning as a
measurement such as a person’s height, weight, IQ,
or blood pressure or shares of stocks a person owns.
2.Categorical data: Categorical data represent
characteristics such as a person’s gender, marital
status, hometown, or the types of movies they like.
Categorical data can take on numerical values (such
as “1” indicating male and “2” indicating female) but
those numbers don’t have mathematical meaning.
Four Levels of Measurement
1. Nominal – the lowest of the four ways to characterize data. It deals with
names, categories, or labels. (eg. colors of eyes, yes or no responses to a
survey, favorite breakfast cereal, and number on the back of a football
jersey).
2. Ordinal – the data at this level can be ordered but no differences between the
data. (eg. ten cities are ranked from one to ten, but differences between the
cities don't make much sense, letter grades where we can order things so that A
is higher than B but without any other information).
3. Interval – deals with data that can be ordered, and in which differences
between the data does make sense. But data at this level has no starting point.
(eg. Fahrenheit and Celsius scales of temperatures).
4. Ratio – the highest level of measurement. Data possess all of the features of
the interval level, in addition to an absolute zero. Due to the presence of a zero, it
now makes sense to compare the ratios of measurements.
4.2 Data Collection Method
Methods of Collecting Data
1. In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons: Time consuming, expensive and can be dismissed as anecdotal
2. Mail Surveys
Pros: Can reach anyone and everyone – no barrier
Cons: Expensive, data collection errors, lag time
3. Phone Surveys
Pros: High degree of confidence on the data collected, reach almost
anyone
Cons: Expensive, cannot self-administer, need to hire an agency
4. Web/Online Surveys
Pros: Cheap, can self-administer, very low probability of data errors
Cons: Not all your customers might have an email address/be on the
internet, customers may be wary of divulging information online
Three Ways of Presenting Data
1.Textual – this method comprises data
presentation with the help of a paragraph or a
number of paragraphs.
2.Tabular – the method of presenting data using
the statistical table. A systematic organization of
data in columns and rows.
3.Graphical – a chart representing the quantitative
variations or changes of variables in pictorial or
diagrammatic form.
4.3 Frequency Distribution
Frequency is the rate that measures how often
something occurs.
Example 1
Jack joins football practice every Wednesday morning,
Sunday morning and afternoon.
The frequency of Jack’s football practice every week is 3 (2 on
Sunday and 1 on Wednesday).
By counting frequencies we can make Frequency
Distribution Table.
Example 2
Jack’s team has scored the following numbers of goals in their games,
3, 1, 2, 1, 3, 2, 4, 2, 3, 2, 5, 4, 3, 2.
Jack put the numbers in order, then added up:
how often 1 occurs (2 times),
how often 2 occurs (5 times),
how often 3 occurs (4 times),
how often 4 occurs (2 times),
how often 5 occur (1 time)
Graphical Representation of Frequency Distribution
A. Bar Graph is a pictorial representation of statistical data in such a way
that length of the rectangles in the graph represents the proportional value
of the variable. Bar graphs are generally used to compare the values of
several variables at a time to analyze data. The length of the bars
(horizontal or vertical) represents the frequency of the variable and is
applicable to discrete categories only.
B. Line graph or Line chart is a graphical display of information that
changes continuously over time. Within a line graph, there are points
connecting the data to show a continuous change. The lines in a line graph
can descend and ascend based on the data. We can also compare different
events, situations, and information.
C. Pie Chart is a type of graph that displays data in a circular graph. The
pieces of the graph are proportional to the fraction of the whole in each
category. Each slice of the pie is relative to the size of that category in the
group as a whole. The entire “pie” represents 100 percent of a whole, while
the pie “slices” represent portions of the whole.
4.4 Measures of Central Tendency
A. Mean
It is the most common measure of central location. It can be
obtained by getting the sum of all values of the observations divided by
the number of observations. In computing for the mean, we use
𝑥 =
𝑥
𝑛
where x is the value of each observations in the sample
n is the total number of observations in the sample
It is worth noting that the mean has the following characteristics:
1. The mean is affected by the presence of extreme values.
2. The sum of the deviations of the observations from the mean is zero.
3. The sum of the squared deviations of the observations from the
mean is minimum.
4. It is a good measure for interval and ratio type of data.
B. Median
It is the middle value of a set of observations arranged in
increasing or decreasing order. This measure divides the
data into two equal number of observations.
The median has the following characteristics:
1. It is not affected by the presence of extreme observations.
2. The sum of absolute deviations of the observation from
the median is minimum.
3. It is an appropriate measure for an ordinal type of data.
C. Mode
It is the most repeated value or the value that occurs for
the most number of times. Note that it is possible for a
certain data to have two modes. In such case, the
distribution of the data set is bimodal (with two modes).
When a certain data set has more than two modes, the
distribution is called multimodal distribution.
The mode has the following characteristics:
1. Mode is determined by frequency.
2. It is an appropriate measure for nominal data.
Example 1 (for ungrouped data)
The following are the 3rd year math grades of an applied math student:
1.6 1.2 1.9 1.5 1.5 1.5 1.0 1.3 1.0
Mean:
X =
X1 + X2 + ⋯ + X9
9
=
1.6 + 1.2 + 1.9 + 1.5 + 1.5 + 1.5 + 1.0 + 1.3 + 1.0
9
= 1.39
Median:
1.0 1.0 1.2 1.3 1.5 1.5 1.5 1.6 1.9
Mode: 1.5
Example 2 (for grouped data)
The mean for grouped data is given by
Where fi is the frequency of the ith class interval
xi is the class mark of the ith interval
Solving for the mean:
Class limit 𝒇 𝒙 𝒇𝒙 < 𝒄𝒇 Class boundaries
60 – 67 2 63.5 127 2 59.5 – 67.5
52 – 59 2 55.5 111 4 51.5 – 59.5
44 – 51 6 47.5 285 10 43.5 – 51.5
36 – 43 10 39.5 395 20 35.5 – 43.5
28 – 35 7 31.5 220.5 27 27.5 – 35.5
20 – 27 3 23.5 70.5 30 19.5 – 27.5
𝑥 =
𝑓𝑖𝑥𝑖
𝑛
𝑥 =
127 + 111 + 285 + 395 + 220.5 + 70.5
30
= 40.3
The median for grouped data is given by
𝑀𝑑 = 𝐿𝐶𝐵 +
𝑛
2
− 𝑐𝑓
𝑝
𝑓
𝑚
𝑖
i
p
cf
m
f
where LCB is lower boundary of the median class
is the size of the class interval
is the cumulative frequency of the interval preceding the median class
is the frequency of the median class
Median Class – is the class containing cumulative frequency equal to n2 or next
higher.
Solving for median:
n
2
=
30
2
= 15
Lower Limit of the Class Boundary
LCB = 35.5
Cumulative Frequency before the median class
𝑐𝑓
𝑝 = 10
Frequency of the median class
fm = 10
Class Size (i) = 8
Median = LCB +
n
2
− 𝑐𝑓𝑝
fm
i
= 35.5 +
15 − 10
10
8 = 39.5
The mode for grouped data is given by
𝑀𝑜 = 𝐿𝐶𝐵 +
𝑓
𝑚 − 𝑓1
2𝑓
𝑚 − 𝑓1 − 𝑓2
𝑖
i
1
f
2
f
where LCB is the lower boundary of the modal class
is the size of the class interval
fm is the frequency of the modal class
is the frequency of the class preceding the modal class
is the frequency of the class following the modal class
Modal Class – is the class with the highest frequency.
Solving for mode:
Mode = LCB +
𝑓𝑚 − 𝑓1
2𝑓𝑚 − 𝑓1 − 𝑓2
i
= 35.5 +
10 − 7
20 − 7 − 6
8 = 38.9
4.5 Measures of Variability
Variability for Ungrouped Data
• Range - The range (R) is defined as the difference between the
highest value (HV) and the lowest value (LV) in the data. That is,
LV
HV
R 

• Variance
It is defined as the average of the squared deviations from the mean.
It is the measure that considers the position of each observation
relative to the mean.
𝑠2
=
𝑖
𝑥𝑖 − 𝑥 2
𝑛 − 1
or
 
)
1
(
2
2
2



 
n
n
x
x
n
s
• Standard Deviation (the most widely encountered) - It is
the measure of the spread or dispersion of scores from the
mean of distribution. It is the square root of the variance.
𝑠 =
𝑖
𝑥𝑖 − 𝑥 2
𝑛 − 1
or
 
)
1
(
2
2



 
n
n
x
x
n
s
Variability for Grouped Data
Range: mark
Class
Lowest
mark
Class
Highest
R 

Variance:
 
)
1
(
2
2
2



 
n
n
fx
fx
n
s
Standard Deviation:
 
)
1
(
2
2



 
n
n
fx
fx
n
s
4.6 Testing a Statistical Hypothesis
Hypothesis testing is the most significant area of statistical
inference. It is a step-by-step process in making inferences
(conclusions) about a population.
The truth value of a statistical hypothesis can only be identified
when we take a portion of the population of interest and use the
information obtained from this portion to decide whether the
statistical hypothesis is likely to be true or false. We either “reject”
the statistical hypothesis when inconsistencies from the sample
occur, or “not reject” otherwise. Note that the rejection of a
statistical hypothesis means that it is false, but its acceptance does
not necessarily mean it is true. Acceptance of the stated hypothesis
implies that there is not enough evidence to reject it.
Types of Statistical Hypothesis
We use the term null hypothesis for the hypothesis we
want to test, that is, to either reject or accept, denoted by H0.
If the null hypothesis is rejected, the alternative hypothesis,
denoted by H1, will then be accepted. The null hypothesis
H0 is stated such that it specifies an exact value while the
alternative hypothesis H1 is stated such that it allows for the
possibility of some certain values. For example, if the null
hypothesis H0 is 𝑥 = 8, the alternative hypothesis H1 might
be 𝑥 < 8, 𝑥 > 8, or 𝑥 ≠ 8.
Types of Statistical Tests
If the alternative hypothesis of any statistical test is one –
sided, for example, H1: 𝑥 < 8 or H1: 𝑥 > 8, it is said to be a
one – tailed test. On the other hand, if the alternative
hypothesis is two – sided, for example, H1: 𝑥 ≠ 8, the test is
said to be two – tailed.
Types of Error
However deciding whether to accept or reject any statistical
hypothesis of a population parameter is critical that it might lead
to wrong conclusions. For instance, a researcher could reject H0
when in fact, it is true. Such is called a type I error. Also, one
might accept H0 even when it is false. In this case, a type II error
occurred.
Constructing the Null and Alternative Hypothesis
A.Testing for Means
In hypothesis testing, means, variances, or proportions may
be compared so as to justify the need to reject or accept the null
hypothesis. But there are many instances that the sample means
were compared using experimental and control groups.
Example 1
1. A researcher wants to know if the average test score of the students taking a
particular examination is 80.
H0: 𝜇 = 80 (the average test score of the students taking a
particular examination is 80)
H1: 𝜇 ≠ 80 (the average test score of the students taking a
particular examination is not 80)
2. A small group of researchers is conducting a study to show if the average
number of hours a student spends on social media sites per day is greater than
10.
H0: 𝜇 = 10 (average number of hours a student spends on social
media sites per day is 10)
H1: 𝜇 > 10 (average number of hours a student spends on social
media sites per day is greater than 10)
3. A teacher wants to know if there is a difference in the performance of his
two classes based on their average grades.
H0: 𝜇1 = 𝜇2 (there is no difference in the performance of his two
classes based on their average grades)
H1: 𝜇1 ≠ 𝜇2 (there is a difference in the performance of his two
classes based on their average grades)
4. A researcher wants to study if the customer satisfaction level of a cable
television company A is greater than a cable television company B.
H0: 𝜇1 = 𝜇2 (the customer satisfaction levels of two competing
cable television companies are the same)
H1: 𝜇1 > 𝜇2 (the customer satisfaction levels of a cable television
company A is greater than a cable television company B)
5. A clinical trial is conducted to compare three different weight
loss programs based on the average weight measured among three
groups at the end of the program.
H0: 𝜇1 = 𝜇2 = 𝜇3 (there is no difference on the three weight
loss programs)
H1: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙
(there is a difference on the three weight loss
programs)
B. Testing for Independence
The chi-square (𝜒2
) test is used to test the independence of two
variables. In other words, this test is used to determine whether the
two variables are related or not, based on the sample selected from
each variable.
Example 2
1. A survey is conducted to test if the grades of the students are associated to the number of
hours they spend on social media sites.
H0: The grades of the students are not associated to the number of hours they spend
on social media sites.
H1: The grades of the students are associated to the number of hours they spend on
social media sites.
2. A study shows that the daily consumption depends on the age level of a person.
H0: The daily consumption does not depend on the age level of a person.
H1: The daily consumption depends on the age level of a person.
C. Correlation
To determine whether two variables (usually x and y) are
linearly related, correlation is the statistical method to be used.
In this method, the data collected on two numerical variables
are tested to determine the strength of their relationship
estimated by the sample correlation coefficient r given by
𝑟 =
𝑛( )
𝑥𝑦 − ( 𝑥)( )
𝑦
𝑛( 𝑥2) − 𝑥 2 𝑛( 𝑦2) − 𝑦 2
where −1 ≤ 𝑟 ≤ 1 𝑎𝑛𝑑
𝑛 = number of data pairs
If the value of 𝑟 is close to positive 1, then there is a strong positive linear
relationship between the two variables. If 𝑟 is close to negative 1, there is a
strong negative linear relationship between them. However, if the two
variables has a weak or no linear relationship, 𝑟 is close to 0.
Example 3
1. A study is conducted to show how strong is the relationship between sleeping habit of
employees and their level of performance at work.
H0: Sleeping habit of employees is not related to their level of performance at work.
H1: Sleeping habit of employees is related to their level of performance at work.
2. A student wants to know if his grade in Mathematics is associated to his grade in English.
H0: His grade in Mathematics is not associated to his grade in English.
H1: His grade in Mathematics is associated to his grade in English.
Student Hours of Study Grade
A
B
C
D
E
F
7
3
2
6
3
4
83
63
60
88
68
75
3. A researcher wishes to see whether there is a relationship
between number of hours of study and test scores on an exam.
The following data were obtained.
Solution:
To solve for the correlation coefficient r, we must find first the
values of 𝑥𝑦, 𝑥2
, and𝑦2
.
Studen
t
Hours of
Study (x)
Grade
(y)
𝑥𝑦 𝑥2
𝑦2
A
B
C
D
E
F
7
3
2
6
3
4
83
63
60
88
68
75
581
189
120
528
204
300
49
9
4
36
9
16
6889
3969
3600
7744
4624
5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2
= 123 𝚺𝒚2
= 32451
Substituting the values to the formula,
𝑟 =
6)(1922) − (25)(437
6 123 − 25 2 6 32451 − 437 2
𝑟 = 0.934
Since the correlation coefficient is close to +1, it indicates
a strong linear relationship between the number of hours
of study and test scores on an exam of students.
D. Regression
Computing the correlation coefficient means determining the
strength of the relationship between two numerical variables. When
the resulting correlation coefficient is significant, then regression
analysis can be done. Regression is used to understand the movement
or trend of the given data so predictions can be made.
The regression equation is given by 𝑦′
= 𝑎 + 𝑏𝑥
𝑎 =
𝑦)( )
𝑥2
− ( 𝑥)( )
𝑥𝑦
𝑛( 𝑥2) − 𝑥 2
𝑏 =
𝑛( 𝑥𝑦) − ( 𝑥)( )
𝑦
𝑛( 𝑥2) − 𝑥 2
where
Example 4
Let us take the example in correlation section since a strong linear relationship exists
between the number of hours of study and test scores on an exam of students.
Solution:
Since 𝑥𝑦, 𝑥2
, and𝑦2
are necessary to solve for 𝒂 and 𝒃, we must solve them first.
Student
Hours of
Study (x)
Grade
(y)
𝑥𝑦 𝑥2
𝑦2
A
B
C
D
E
F
7
3
2
6
3
4
83
63
60
88
68
75
581
189
120
528
204
300
49
9
4
36
9
16
6889
3969
3600
7744
4624
5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2
= 123 𝚺𝒚2
= 32451
Then we have,
𝑎 =
(437)(123) − (25)(1922)
6 123 − (25)2
= 50.451
𝑏 =
(6)(1922) − (25)(437)
6 123 − (25)2
= 5.372
Hence, the equation of the regression line is
𝒚′
= 𝟓𝟎. 𝟒𝟓𝟏 + 𝟓. 𝟑𝟕𝟐𝒙
Suppose we want to know the grade (𝒚′
) of the student if he/she studies in x
hours. For example, let 𝑥 = 9. Then,
𝑦′
= 50.451 + 5.372(9)
𝑦′
= 98.80
Let 𝑥 = 5. Then,
𝑦′
= 50.451 + 5.372(5)
𝑦′
= 77.31

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Termssheisirenebkm
 
Mathematical Sentence
Mathematical SentenceMathematical Sentence
Mathematical SentenceSonarin Cruz
 
GLOBAL-MEDIA-CULTURES-GROUP-4.pptx
GLOBAL-MEDIA-CULTURES-GROUP-4.pptxGLOBAL-MEDIA-CULTURES-GROUP-4.pptx
GLOBAL-MEDIA-CULTURES-GROUP-4.pptxTugdangVeronicaAllys
 
Sources of History- Readings in the Philippine History
Sources of History- Readings in the Philippine HistorySources of History- Readings in the Philippine History
Sources of History- Readings in the Philippine HistoryRomalieGalleto
 
Mathematics in the modern world
Mathematics in the modern worldMathematics in the modern world
Mathematics in the modern worlddonna ruth talo
 
Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...PhDEng Ruel Bongcansiso
 
Communication for Work Purposes
Communication for Work PurposesCommunication for Work Purposes
Communication for Work PurposesRyanBuer
 
Neutral Geometry_part2.pptx
Neutral Geometry_part2.pptxNeutral Geometry_part2.pptx
Neutral Geometry_part2.pptxArna Jean
 
Education during the spanish regime and its colonial effects group 4
Education during the spanish regime and its colonial effects group 4Education during the spanish regime and its colonial effects group 4
Education during the spanish regime and its colonial effects group 4Lorena Cantong
 
Social System of Pre-Colonial Period in the Philippines
Social System of Pre-Colonial Period in the PhilippinesSocial System of Pre-Colonial Period in the Philippines
Social System of Pre-Colonial Period in the PhilippinesAnne Valino
 

Was ist angesagt? (20)

The Nature of Mathematics
The Nature of MathematicsThe Nature of Mathematics
The Nature of Mathematics
 
Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Terms
 
Levels of Measurement
Levels of MeasurementLevels of Measurement
Levels of Measurement
 
Statistics and probability lesson2&3
Statistics and probability lesson2&3Statistics and probability lesson2&3
Statistics and probability lesson2&3
 
Mathematical Sentence
Mathematical SentenceMathematical Sentence
Mathematical Sentence
 
GLOBAL-MEDIA-CULTURES-GROUP-4.pptx
GLOBAL-MEDIA-CULTURES-GROUP-4.pptxGLOBAL-MEDIA-CULTURES-GROUP-4.pptx
GLOBAL-MEDIA-CULTURES-GROUP-4.pptx
 
MMW CHAPTER 3.pdf
MMW CHAPTER 3.pdfMMW CHAPTER 3.pdf
MMW CHAPTER 3.pdf
 
Section 1.1 inductive &amp; deductive reasoning
Section 1.1 inductive &amp; deductive reasoningSection 1.1 inductive &amp; deductive reasoning
Section 1.1 inductive &amp; deductive reasoning
 
Statistics and probability lesson5
Statistics and probability lesson5Statistics and probability lesson5
Statistics and probability lesson5
 
Sources of History- Readings in the Philippine History
Sources of History- Readings in the Philippine HistorySources of History- Readings in the Philippine History
Sources of History- Readings in the Philippine History
 
Mathematics in the modern world
Mathematics in the modern worldMathematics in the modern world
Mathematics in the modern world
 
Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...Evaluating messages or images of different types of texts reflecting differen...
Evaluating messages or images of different types of texts reflecting differen...
 
Statistics and probability lesson 4
Statistics and probability lesson 4Statistics and probability lesson 4
Statistics and probability lesson 4
 
Controversies in philippine history
Controversies in philippine historyControversies in philippine history
Controversies in philippine history
 
The Cry of Pugadlawin
The Cry of PugadlawinThe Cry of Pugadlawin
The Cry of Pugadlawin
 
Communication for Work Purposes
Communication for Work PurposesCommunication for Work Purposes
Communication for Work Purposes
 
Neutral Geometry_part2.pptx
Neutral Geometry_part2.pptxNeutral Geometry_part2.pptx
Neutral Geometry_part2.pptx
 
Education during the spanish regime and its colonial effects group 4
Education during the spanish regime and its colonial effects group 4Education during the spanish regime and its colonial effects group 4
Education during the spanish regime and its colonial effects group 4
 
Social System of Pre-Colonial Period in the Philippines
Social System of Pre-Colonial Period in the PhilippinesSocial System of Pre-Colonial Period in the Philippines
Social System of Pre-Colonial Period in the Philippines
 
Probability module 1
Probability module 1Probability module 1
Probability module 1
 

Ähnlich wie Chapter 4 MMW.pdf

Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxbajajrishabh96tech
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretationAsima shahzadi
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boaraileeanne
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing dataLong Beach City College
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statisticsSachinKumar1799
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfBensonNduati1
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]Rebin Daho
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 

Ähnlich wie Chapter 4 MMW.pdf (20)

Edited economic statistics note
Edited economic statistics noteEdited economic statistics note
Edited economic statistics note
 
Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptx
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statistics
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]Principlles of statistics [amar mamusta amir]
Principlles of statistics [amar mamusta amir]
 
RM7.ppt
RM7.pptRM7.ppt
RM7.ppt
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 

Kürzlich hochgeladen

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Kürzlich hochgeladen (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Chapter 4 MMW.pdf

  • 4. Statistics is a branch of mathematics that deals with data collection, organization, analysis, interpretation and presentation. Data collection is defined as the procedure of collecting, measuring and analyzing accurate insights for research using standard validated techniques. Data organization refers to the method of classifying and organizing data sets to make them more useful, it can be applied to physical records or digital records.
  • 5. Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Interpretation of data is the process of assigning meaning to the collected information and determining the conclusions, significance, and implications of the findings. Presentation of data refers to the organization of data into tables, graphs or charts, so that logical and statistical conclusions can be derived from the collected measurements.
  • 6. Descriptive Statistics gives us information or help describe the characteristics of a specific data set by giving short summaries about the sample and measures of the data. Basic Statistical Concepts A population consists of the totality of the observation and sample is a part of the population. A variable is any characteristics, number, or quantity that can be measured or counted.
  • 7. Two kinds of variables: 1. Qualitative variables also called as categorical variables are variables that are not numerical. It describes data that fits into categories. 2. Quantitative variables are numerical. It can be ranked and has order.
  • 8. Quantitative variables can be classified further into discrete variables and continuous variables. A discrete variable is a variable whose value is obtained by counting. Continuous variables can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.
  • 9. Examples Discrete number of students present number of red marbles in a jar number of heads when flipping three coins students’ grade level Continuous height of students in class weight of students in class time it takes to get to school distance traveled between classes
  • 10. Types of Statistical Data 1.Numerical data. These data have meaning as a measurement such as a person’s height, weight, IQ, or blood pressure or shares of stocks a person owns. 2.Categorical data: Categorical data represent characteristics such as a person’s gender, marital status, hometown, or the types of movies they like. Categorical data can take on numerical values (such as “1” indicating male and “2” indicating female) but those numbers don’t have mathematical meaning.
  • 11. Four Levels of Measurement 1. Nominal – the lowest of the four ways to characterize data. It deals with names, categories, or labels. (eg. colors of eyes, yes or no responses to a survey, favorite breakfast cereal, and number on the back of a football jersey). 2. Ordinal – the data at this level can be ordered but no differences between the data. (eg. ten cities are ranked from one to ten, but differences between the cities don't make much sense, letter grades where we can order things so that A is higher than B but without any other information). 3. Interval – deals with data that can be ordered, and in which differences between the data does make sense. But data at this level has no starting point. (eg. Fahrenheit and Celsius scales of temperatures). 4. Ratio – the highest level of measurement. Data possess all of the features of the interval level, in addition to an absolute zero. Due to the presence of a zero, it now makes sense to compare the ratios of measurements.
  • 13. Methods of Collecting Data 1. In-Person Interviews Pros: In-depth and a high degree of confidence on the data Cons: Time consuming, expensive and can be dismissed as anecdotal 2. Mail Surveys Pros: Can reach anyone and everyone – no barrier Cons: Expensive, data collection errors, lag time 3. Phone Surveys Pros: High degree of confidence on the data collected, reach almost anyone Cons: Expensive, cannot self-administer, need to hire an agency 4. Web/Online Surveys Pros: Cheap, can self-administer, very low probability of data errors Cons: Not all your customers might have an email address/be on the internet, customers may be wary of divulging information online
  • 14. Three Ways of Presenting Data 1.Textual – this method comprises data presentation with the help of a paragraph or a number of paragraphs. 2.Tabular – the method of presenting data using the statistical table. A systematic organization of data in columns and rows. 3.Graphical – a chart representing the quantitative variations or changes of variables in pictorial or diagrammatic form.
  • 16. Frequency is the rate that measures how often something occurs. Example 1 Jack joins football practice every Wednesday morning, Sunday morning and afternoon. The frequency of Jack’s football practice every week is 3 (2 on Sunday and 1 on Wednesday). By counting frequencies we can make Frequency Distribution Table.
  • 17. Example 2 Jack’s team has scored the following numbers of goals in their games, 3, 1, 2, 1, 3, 2, 4, 2, 3, 2, 5, 4, 3, 2. Jack put the numbers in order, then added up: how often 1 occurs (2 times), how often 2 occurs (5 times), how often 3 occurs (4 times), how often 4 occurs (2 times), how often 5 occur (1 time)
  • 18. Graphical Representation of Frequency Distribution A. Bar Graph is a pictorial representation of statistical data in such a way that length of the rectangles in the graph represents the proportional value of the variable. Bar graphs are generally used to compare the values of several variables at a time to analyze data. The length of the bars (horizontal or vertical) represents the frequency of the variable and is applicable to discrete categories only.
  • 19. B. Line graph or Line chart is a graphical display of information that changes continuously over time. Within a line graph, there are points connecting the data to show a continuous change. The lines in a line graph can descend and ascend based on the data. We can also compare different events, situations, and information.
  • 20. C. Pie Chart is a type of graph that displays data in a circular graph. The pieces of the graph are proportional to the fraction of the whole in each category. Each slice of the pie is relative to the size of that category in the group as a whole. The entire “pie” represents 100 percent of a whole, while the pie “slices” represent portions of the whole.
  • 21. 4.4 Measures of Central Tendency
  • 22. A. Mean It is the most common measure of central location. It can be obtained by getting the sum of all values of the observations divided by the number of observations. In computing for the mean, we use 𝑥 = 𝑥 𝑛 where x is the value of each observations in the sample n is the total number of observations in the sample It is worth noting that the mean has the following characteristics: 1. The mean is affected by the presence of extreme values. 2. The sum of the deviations of the observations from the mean is zero. 3. The sum of the squared deviations of the observations from the mean is minimum. 4. It is a good measure for interval and ratio type of data.
  • 23. B. Median It is the middle value of a set of observations arranged in increasing or decreasing order. This measure divides the data into two equal number of observations. The median has the following characteristics: 1. It is not affected by the presence of extreme observations. 2. The sum of absolute deviations of the observation from the median is minimum. 3. It is an appropriate measure for an ordinal type of data.
  • 24. C. Mode It is the most repeated value or the value that occurs for the most number of times. Note that it is possible for a certain data to have two modes. In such case, the distribution of the data set is bimodal (with two modes). When a certain data set has more than two modes, the distribution is called multimodal distribution. The mode has the following characteristics: 1. Mode is determined by frequency. 2. It is an appropriate measure for nominal data.
  • 25. Example 1 (for ungrouped data) The following are the 3rd year math grades of an applied math student: 1.6 1.2 1.9 1.5 1.5 1.5 1.0 1.3 1.0 Mean: X = X1 + X2 + ⋯ + X9 9 = 1.6 + 1.2 + 1.9 + 1.5 + 1.5 + 1.5 + 1.0 + 1.3 + 1.0 9 = 1.39 Median: 1.0 1.0 1.2 1.3 1.5 1.5 1.5 1.6 1.9 Mode: 1.5
  • 26. Example 2 (for grouped data) The mean for grouped data is given by Where fi is the frequency of the ith class interval xi is the class mark of the ith interval Solving for the mean: Class limit 𝒇 𝒙 𝒇𝒙 < 𝒄𝒇 Class boundaries 60 – 67 2 63.5 127 2 59.5 – 67.5 52 – 59 2 55.5 111 4 51.5 – 59.5 44 – 51 6 47.5 285 10 43.5 – 51.5 36 – 43 10 39.5 395 20 35.5 – 43.5 28 – 35 7 31.5 220.5 27 27.5 – 35.5 20 – 27 3 23.5 70.5 30 19.5 – 27.5 𝑥 = 𝑓𝑖𝑥𝑖 𝑛 𝑥 = 127 + 111 + 285 + 395 + 220.5 + 70.5 30 = 40.3
  • 27. The median for grouped data is given by 𝑀𝑑 = 𝐿𝐶𝐵 + 𝑛 2 − 𝑐𝑓 𝑝 𝑓 𝑚 𝑖 i p cf m f where LCB is lower boundary of the median class is the size of the class interval is the cumulative frequency of the interval preceding the median class is the frequency of the median class Median Class – is the class containing cumulative frequency equal to n2 or next higher.
  • 28. Solving for median: n 2 = 30 2 = 15 Lower Limit of the Class Boundary LCB = 35.5 Cumulative Frequency before the median class 𝑐𝑓 𝑝 = 10 Frequency of the median class fm = 10 Class Size (i) = 8 Median = LCB + n 2 − 𝑐𝑓𝑝 fm i = 35.5 + 15 − 10 10 8 = 39.5
  • 29. The mode for grouped data is given by 𝑀𝑜 = 𝐿𝐶𝐵 + 𝑓 𝑚 − 𝑓1 2𝑓 𝑚 − 𝑓1 − 𝑓2 𝑖 i 1 f 2 f where LCB is the lower boundary of the modal class is the size of the class interval fm is the frequency of the modal class is the frequency of the class preceding the modal class is the frequency of the class following the modal class Modal Class – is the class with the highest frequency.
  • 30. Solving for mode: Mode = LCB + 𝑓𝑚 − 𝑓1 2𝑓𝑚 − 𝑓1 − 𝑓2 i = 35.5 + 10 − 7 20 − 7 − 6 8 = 38.9
  • 31. 4.5 Measures of Variability
  • 32. Variability for Ungrouped Data • Range - The range (R) is defined as the difference between the highest value (HV) and the lowest value (LV) in the data. That is, LV HV R   • Variance It is defined as the average of the squared deviations from the mean. It is the measure that considers the position of each observation relative to the mean. 𝑠2 = 𝑖 𝑥𝑖 − 𝑥 2 𝑛 − 1 or   ) 1 ( 2 2 2      n n x x n s
  • 33. • Standard Deviation (the most widely encountered) - It is the measure of the spread or dispersion of scores from the mean of distribution. It is the square root of the variance. 𝑠 = 𝑖 𝑥𝑖 − 𝑥 2 𝑛 − 1 or   ) 1 ( 2 2      n n x x n s Variability for Grouped Data Range: mark Class Lowest mark Class Highest R   Variance:   ) 1 ( 2 2 2      n n fx fx n s Standard Deviation:   ) 1 ( 2 2      n n fx fx n s
  • 34. 4.6 Testing a Statistical Hypothesis
  • 35. Hypothesis testing is the most significant area of statistical inference. It is a step-by-step process in making inferences (conclusions) about a population. The truth value of a statistical hypothesis can only be identified when we take a portion of the population of interest and use the information obtained from this portion to decide whether the statistical hypothesis is likely to be true or false. We either “reject” the statistical hypothesis when inconsistencies from the sample occur, or “not reject” otherwise. Note that the rejection of a statistical hypothesis means that it is false, but its acceptance does not necessarily mean it is true. Acceptance of the stated hypothesis implies that there is not enough evidence to reject it.
  • 36. Types of Statistical Hypothesis We use the term null hypothesis for the hypothesis we want to test, that is, to either reject or accept, denoted by H0. If the null hypothesis is rejected, the alternative hypothesis, denoted by H1, will then be accepted. The null hypothesis H0 is stated such that it specifies an exact value while the alternative hypothesis H1 is stated such that it allows for the possibility of some certain values. For example, if the null hypothesis H0 is 𝑥 = 8, the alternative hypothesis H1 might be 𝑥 < 8, 𝑥 > 8, or 𝑥 ≠ 8.
  • 37. Types of Statistical Tests If the alternative hypothesis of any statistical test is one – sided, for example, H1: 𝑥 < 8 or H1: 𝑥 > 8, it is said to be a one – tailed test. On the other hand, if the alternative hypothesis is two – sided, for example, H1: 𝑥 ≠ 8, the test is said to be two – tailed. Types of Error However deciding whether to accept or reject any statistical hypothesis of a population parameter is critical that it might lead to wrong conclusions. For instance, a researcher could reject H0 when in fact, it is true. Such is called a type I error. Also, one might accept H0 even when it is false. In this case, a type II error occurred.
  • 38. Constructing the Null and Alternative Hypothesis A.Testing for Means In hypothesis testing, means, variances, or proportions may be compared so as to justify the need to reject or accept the null hypothesis. But there are many instances that the sample means were compared using experimental and control groups.
  • 39. Example 1 1. A researcher wants to know if the average test score of the students taking a particular examination is 80. H0: 𝜇 = 80 (the average test score of the students taking a particular examination is 80) H1: 𝜇 ≠ 80 (the average test score of the students taking a particular examination is not 80) 2. A small group of researchers is conducting a study to show if the average number of hours a student spends on social media sites per day is greater than 10. H0: 𝜇 = 10 (average number of hours a student spends on social media sites per day is 10) H1: 𝜇 > 10 (average number of hours a student spends on social media sites per day is greater than 10)
  • 40. 3. A teacher wants to know if there is a difference in the performance of his two classes based on their average grades. H0: 𝜇1 = 𝜇2 (there is no difference in the performance of his two classes based on their average grades) H1: 𝜇1 ≠ 𝜇2 (there is a difference in the performance of his two classes based on their average grades) 4. A researcher wants to study if the customer satisfaction level of a cable television company A is greater than a cable television company B. H0: 𝜇1 = 𝜇2 (the customer satisfaction levels of two competing cable television companies are the same) H1: 𝜇1 > 𝜇2 (the customer satisfaction levels of a cable television company A is greater than a cable television company B)
  • 41. 5. A clinical trial is conducted to compare three different weight loss programs based on the average weight measured among three groups at the end of the program. H0: 𝜇1 = 𝜇2 = 𝜇3 (there is no difference on the three weight loss programs) H1: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 (there is a difference on the three weight loss programs)
  • 42. B. Testing for Independence The chi-square (𝜒2 ) test is used to test the independence of two variables. In other words, this test is used to determine whether the two variables are related or not, based on the sample selected from each variable. Example 2 1. A survey is conducted to test if the grades of the students are associated to the number of hours they spend on social media sites. H0: The grades of the students are not associated to the number of hours they spend on social media sites. H1: The grades of the students are associated to the number of hours they spend on social media sites. 2. A study shows that the daily consumption depends on the age level of a person. H0: The daily consumption does not depend on the age level of a person. H1: The daily consumption depends on the age level of a person.
  • 43. C. Correlation To determine whether two variables (usually x and y) are linearly related, correlation is the statistical method to be used. In this method, the data collected on two numerical variables are tested to determine the strength of their relationship estimated by the sample correlation coefficient r given by 𝑟 = 𝑛( ) 𝑥𝑦 − ( 𝑥)( ) 𝑦 𝑛( 𝑥2) − 𝑥 2 𝑛( 𝑦2) − 𝑦 2 where −1 ≤ 𝑟 ≤ 1 𝑎𝑛𝑑 𝑛 = number of data pairs
  • 44. If the value of 𝑟 is close to positive 1, then there is a strong positive linear relationship between the two variables. If 𝑟 is close to negative 1, there is a strong negative linear relationship between them. However, if the two variables has a weak or no linear relationship, 𝑟 is close to 0. Example 3 1. A study is conducted to show how strong is the relationship between sleeping habit of employees and their level of performance at work. H0: Sleeping habit of employees is not related to their level of performance at work. H1: Sleeping habit of employees is related to their level of performance at work. 2. A student wants to know if his grade in Mathematics is associated to his grade in English. H0: His grade in Mathematics is not associated to his grade in English. H1: His grade in Mathematics is associated to his grade in English.
  • 45. Student Hours of Study Grade A B C D E F 7 3 2 6 3 4 83 63 60 88 68 75 3. A researcher wishes to see whether there is a relationship between number of hours of study and test scores on an exam. The following data were obtained.
  • 46. Solution: To solve for the correlation coefficient r, we must find first the values of 𝑥𝑦, 𝑥2 , and𝑦2 . Studen t Hours of Study (x) Grade (y) 𝑥𝑦 𝑥2 𝑦2 A B C D E F 7 3 2 6 3 4 83 63 60 88 68 75 581 189 120 528 204 300 49 9 4 36 9 16 6889 3969 3600 7744 4624 5625 𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
  • 47. Substituting the values to the formula, 𝑟 = 6)(1922) − (25)(437 6 123 − 25 2 6 32451 − 437 2 𝑟 = 0.934 Since the correlation coefficient is close to +1, it indicates a strong linear relationship between the number of hours of study and test scores on an exam of students.
  • 48. D. Regression Computing the correlation coefficient means determining the strength of the relationship between two numerical variables. When the resulting correlation coefficient is significant, then regression analysis can be done. Regression is used to understand the movement or trend of the given data so predictions can be made. The regression equation is given by 𝑦′ = 𝑎 + 𝑏𝑥 𝑎 = 𝑦)( ) 𝑥2 − ( 𝑥)( ) 𝑥𝑦 𝑛( 𝑥2) − 𝑥 2 𝑏 = 𝑛( 𝑥𝑦) − ( 𝑥)( ) 𝑦 𝑛( 𝑥2) − 𝑥 2 where
  • 49. Example 4 Let us take the example in correlation section since a strong linear relationship exists between the number of hours of study and test scores on an exam of students. Solution: Since 𝑥𝑦, 𝑥2 , and𝑦2 are necessary to solve for 𝒂 and 𝒃, we must solve them first. Student Hours of Study (x) Grade (y) 𝑥𝑦 𝑥2 𝑦2 A B C D E F 7 3 2 6 3 4 83 63 60 88 68 75 581 189 120 528 204 300 49 9 4 36 9 16 6889 3969 3600 7744 4624 5625 𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
  • 50. Then we have, 𝑎 = (437)(123) − (25)(1922) 6 123 − (25)2 = 50.451 𝑏 = (6)(1922) − (25)(437) 6 123 − (25)2 = 5.372 Hence, the equation of the regression line is 𝒚′ = 𝟓𝟎. 𝟒𝟓𝟏 + 𝟓. 𝟑𝟕𝟐𝒙 Suppose we want to know the grade (𝒚′ ) of the student if he/she studies in x hours. For example, let 𝑥 = 9. Then, 𝑦′ = 50.451 + 5.372(9) 𝑦′ = 98.80 Let 𝑥 = 5. Then, 𝑦′ = 50.451 + 5.372(5) 𝑦′ = 77.31