SlideShare ist ein Scribd-Unternehmen logo
1 von 129
Excel Files Assingments/Copy of
Student_Assignment_File.11.01.2016.xlsx
DataIDSalaryCompa-ratioMidpointAgePerformance
RatingServiceGenderRaiseDegreeGender1GradeCopy Employee
Data set to this page.The ongoing question that the weekly
assignments will focus on is: Are males and females paid the
same for equal work (under the Equal Pay Act)? Note: to
simplfy the analysis, we will assume that jobs within each grade
comprise equal work.The column labels in the table mean:ID –
Employee sample number Salary – Salary in thousands Age –
Age in yearsPerformance Rating – Appraisal rating (Employee
evaluation score)SERvice – Years of serviceGender: 0 = male, 1
= female Midpoint – salary grade midpoint Raise – percent
of last raiseGrade – job/pay gradeDegree (0= BSBA 1 =
MS)Gender1 (Male or Female)Compa-ratio - salary divided by
midpoint
Week 2This assignment covers the material presented in weeks
1 and 2.Six QuestionsBefore starting this assignment, make sure
the the assignment data from the Employee Salary Data Set file
is copied over to this Assignment file.You can do this either by
a copy and paste of all the columns or by opening the data file,
right clicking on the Data tab, selecting Move or Copy, and
copying the entire sheet to this file(Weekly Assignment Sheet
or whatever you are calling your master assignment file).It is
highly recommended that you copy the data columns (with
labels) and paste them to the right so that whatever you do will
not disrupt the original data values and relationships.To Ensure
full credit for each question, you need to show how you got
your results. For example, Question 1 asks for several data
values. If you obtain them using descriptive statistics,then the
cells should have an "=XX" formula in them, where XX is the
column and row number showing the value in the descriptive
statistics table. If you choose to generate each value using
fxfunctions, then each function should be located in the cell and
the location of the data values should be shown.So, Cell D31 -
as an example - shoud contain something like "=T6" or
"=average(T2:T26)". Having only a numerical value will not
earn full credit.The reason for this is to allow instructors to
provide feedback on Excel tools if the answers are not correct -
we need to see how the results were obtained.In starting the
analysis on a research question, we focus on overall descriptive
statistics and seeing if differences exist. Probing into reasons
and mitigating factors is a follow-up activity.1The first step in
analyzing data sets is to find some summary descriptive
statistics for key variables. Since the assignment problems
willfocus mostly on the compa-ratios, we need to find the mean,
standard deviations, and range for our groups: Males, Females,
and Overall.Sorting the compa-ratios into male and females will
require you copy and paste the Compa-ratio and Gender1
columns, and then sort on Gender1.The values for age,
performance rating, and service are provided for you for future
use, and - if desired - to test your approach to the compa-ratio
answers (see if you can replicate the values).You can use either
the Data Analysis Descriptive Statistics tool or the Fx =average
and =stdev functions. The range can be found using the
difference between the =max and =min functions with Fx
functions or from Descriptive Statistics.Suggestion: Copy and
paste the compa-ratio data to the right (Column T) and gender
data in column U. If you use Descriptive statistics, Place the
output table in row 1 of a column to the right.If you did not use
Descriptive Statistics, make sure your cells show the location of
the data (Example: =average(T2:T51)Compa-ratioAgePerf.
Rat.ServiceOverallMean35.785.99.0Standard
Deviation8.251311.41475.7177Note - remember the data is a
sample from the larger company
populationRange304521FemaleMean32.584.27.9Standard
Deviation6.913.64.9Range26.045.018.0MaleMean38.987.610.0S
tandard Deviation8.48.76.4Range28.030.021.0A key issue in
comparing data sets is to see if they are distributed/shaped the
same. At this point we can do this by looking at the
probabilities that males and females are distributed in the same
way for a grade levels.2Empirical Probability: What is the
probability for a:Probabilitya. Randomly selected person
being in grade E or above?b. Randomly selected person
being a male in grade E or above? c. Randomly selected
male being in grade E or above? d. Why are the results
different?3Normal Curve based probability: For each group
(overall, females, males), what are the values for each question
below?:Make sure your answer cells show the Excel function
and cell location of the data used.AThe probability of being in
the top 1/3 of the compa-ratio distribution.Note, we can find the
cutoff value for the top 1/3 using the fx Large function:
=large(range, value).Value is the number that identifies the x-
largest value. For the top 1/3 value would be the value that
starts the top 1/3 of the range,For the overall group, this would
be the 50/3 or 17th (rounded), for the gender groups, it would
be the 25/3 = 8th (rounded) value.OverallFemaleMaleAll of the
functions below are in the fx statistical list.i.How nany salaries
are in the top 1/3 (rounded to nearest whole number) for each
group? Use the "=ROUND" function (found in Math or All
list)iiWhat Compa-ratio value starts the top 1/3 of the range for
each group?Use the "=LARGE" functioniiiWhat is the z-score
for this value?Use Excel's STANDARDIZE function iv.What is
the normal curve probability of exceeding this score?Use "=1-
NORM.S.DIST" functionBHow do you interpret the relationship
between the data sets? What does this suggest about our equal
pay for equal work question?4Based on our sample data set, can
the male and female compa-ratios in the population be equal to
each other?AFirst, we need to determine if these two groups
have equal variances, in order to decide which t-test to
use.What is the data input ranged used for this question:Step
1:Ho:Ha:Step 2:Decision Rule:Step 3:Statistical test:Why?Step
4:Conduct the test - place cell B77 in the output location
box.Step 5:Conclusion and InterpretationWhat is the p-value:Is
the P-value < 0.05 (for a one tail test) or 0.025 (for a two tail
test)?What is your decision: REJ or NOT reject the null?What
does this result say about our question of variance
equality?BAre male and female average compa-ratios
equal?(Regardless of the outcome of the above F-test, assume
equal variances for this test.)What is the data input ranged used
for this question:Step 1:Ho:Ha:Step 2:Decision Rule:Step
3:Statistical test:Why?Step 4:Conduct the test - place cell B109
in the output location box.Step 5:Conclusion and
InterpretationWhat is the p-value:Is the P-value < 0.05 (for a
one tail test) or 0.025 (for a two tail test)?What is your
decision: REJ or NOT reject the null?What does your decision
on rejecting the null hypothesis mean?If the null hypothesis
was rejected, calculate the effect size value:If the effect size
was calculated, what doe the result mean in terms of why the
null hypothesis was rejected?What does the result of this test
tell us about our question on salary equality?5Is the Female
average compa-ratio equal to or less than the midpoint value of
1.00?This question is the same as: Does the company, pay its
females - on average - at or below the grade midpoint (which is
considered the market rate)?Suggestion: Use the data column T
to the right for your null hypothesis value.What is the data input
ranged used for this question:Step 1:Ho:Ha:Step 2:Decision
Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place
cell B162 in the output location box.Step 5:Conclusion and
InterpretationWhat is the p-value:Is the P-value < 0.05 (for a
one tail test) or 0.025 (for a two tail test)?What, besides the p-
value, needs to be considered with a one tail test?Decision:
Reject or do not reject Ho?What does your decision on rejecting
the null hypothesis mean?If the null hypothesis was rejected,
calculate the effect size value:If the effect size was calculated,
what doe the result mean in terms of why the null hypothesis
was rejected?What does the result of this test tell us about our
question on salary equality?6Considering both the salary
information in the lectures and your compa-ratio information,
what conclusions can you reach about equal pay for equal
work?Why - what statistical results support this conclusion?
Week 3Week 3ANOVAThree QuestionsRemember to show how
you got your results in the appropriate cells. For questions
using functions, show the input range when asked.Group
name:G1G2G3G4G5G61One interesting question is are the
average compa-ratios equal across salary ranges of 10K
each.Salary Intervals: 22-2930-3940-4950-5960-6970-79While
compa-ratios remove the impact of grade on salaries, are they
different for different pay levels,Compa-ratio values: that is are
people at different levels paid differently relative to the
midpoint? (Put data values at right.)What is the data input
ranged used for this question:Step 1:Ho:Ha:Step 2:Decision
Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place
cell b16 in the output location box.Step 5:Conclusions and
InterpretationWhat is the p-value?Is P-value < 0.05?What is
your decision: REJ or NOT reject the null?If the null
hypothesis was rejected, what is the effect size value (eta
squared)?If calculated, what does the effect size value tell us
about why the null hypothesis was rejected?What does that
decision mean in terms of our equal pay question?2If the null
hypothesis in question 1 was rejected, which pairs of means
differ?Why?Groups ComparedDiffT+/- TermLowto
HighDifference Significant?Why?G1 G2G1 G3G1 G4G1 G5G1
G6G2 G3G2 G4G2 G5G2 G6G3 G4G3 G5G3 G6G4 G5G4 G6G5
G63Since compa is already a measure of pay for equal work, do
these results impact your conclusion on equal pay for equal
work? Why or why not?
Week 4Regression and CorellationFive QuestionsCompa-
ratioMidpointAgePerformance
RatingServiceRaiseDegreeGenderRemember to show how you
got your results in the appropriate cells. For questions using
functions, show the input range when asked.1Create a
correlation table using Compa-ratio and the other interval level
variables, except for Salary.Suggestion, place data in columns T
- Y.What range was placed in the Correlation input range
box:Place C9 in output box.bWhat are the statistically
significant correlations related to Compa-ratio?T =Significant r
=cAre there any surprises - correlations you though would be
significant and are not, or non significant correlations you
thought would be?dWhy does or does not this information help
answer our equal pay question?2Perform a regression analysis
using compa as the dependent variable and the variables used in
Q1 along withincluding the dummy variables. Show the result,
and interpret your findings by answering the following
questions.Suggestion: Place the dummy variables values to the
right of column Y.What range was placed in the Regression
input range box:Note: be sure to include the appropriate
hypothesis statements.Regression hypothesesHo:Ha:Coefficient
hyhpotheses (one to stand for all the separate
variables)Ho:Ha:Place B36 in output box.Interpretation:For the
Regression as a whole:What is the value of the F statistic: What
is the p-value associated with this value: Is the p-value <
0.05?What is your decision: REJ or NOT reject the null?What
does this decision mean? For each of the coefficients:
MidpointAgePerf. Rat.ServiceGenderDegreeWhat is the
coefficient's p-value for each of the variables: Is the p-value <
0.05?Do you reject or not reject each null hypothesis: What are
the coefficients for the significant variables?Using the intercept
coefficient and only the significant variables, what is the
equation?Compa-ratio = Is gender a significant factor in compa-
ratio?Regardless of statistical significance, who gets paid more
with all other things being equal?How do we know? 3What does
regression analysis show us about analyzing complex
measures?4Between the lecture results and your results, what
else would you like to knowbefore answering our question on
equal pay? Why?5Between the lecture results and your results,
what is your answer to the questionof equal pay for equal work
for males and females? Why?
Excel Files Assingments/Randomized BUS308 Data -
08.01.2017.xlsm
DataIDSalaryCompaMidpoint AgePerformance
RatingServiceGenderRaiseDegreeGender1Gr163.31.1105734858
05.70METhe ongoing question that the weekly assignments will
focus on is: Are males and females paid the same for equal
work (under the Equal Pay Act)?
228.30.914315280703.90MBNote: to simplfy the analysis, we
will assume that jobs within each grade comprise equal
work.335.51.144313075513.61FB464.81.13757421001605.51M
EThe column labels in the table
mean:548.41.0084836901605.71MDID – Employee sample
number Salary – Salary in thousands
678.41.1706736701204.51MFAge – Age in yearsPerformance
Rating - Appraisal rating (employee evaluation
score)741.51.0374032100815.71FCService – Years of service
(rounded)Gender – 0 = male, 1 = female
824.71.073233290915.81FAMidpoint – salary grade midpoint
Raise – percent of last raise977.21.152674910010041MFGrade
– job/pay gradeDegree (0= BSBA 1 =
MS)1023.41.019233080714.71FAGender1 (Male or
Female)Compa - salary divided by
midpoint1123.81.03523411001914.81FA1259.71.048575295220
4.50ME1342.51.0624030100214.70FC14241.04223329012161F
A1523.41.018233280814.91FA1644.21.106404490405.70MC17
68.41.2005727553131FE1834.91.1263131801115.60FB1924.71.
072233285104.61MA2033.91.0953144701614.80FB2175.11.121
6743951306.31MF2250.21.046484865613.81FD2323.91.038233
665613.30FA2459.51.239483075913.80FD2523.71.0292341704
040MA1.081640.070251263343.89619.28677180522625.71.116
232295216.20FA1.062480.078906653746.719.42893031882736.
20.906403580703.91MC.2878.11.166674495914.40FF2968.71.0
26675295505.40MF3048.41.0094845901804.30MD3123.21.008
232960413.91FA3229.20.942312595405.60MB
CompaABCDEF3365.91.156573590905.51MEF
mean1.01416666671.12051.03751.12433333331.1751.1343427.
80.898312680204.91MBm1.05733333330.8921.08333333330.99
951.08181.122753524.41.062232390415.30FA3624.11.0492327
75314.30FAF
Stdev0.03378766140.03116087290.01767766950.07516204720.
04949747470.02121320343723.61.024232295216.20FAm0.0248
2606160.01905255890.08779711460.0289913780.06070291960.
03326033673857.91.0155745951104.50ME3934.91.1253127906
15.50FB4024.11.048232490206.30MA4145.41.134402580504.3
0MC4223.91.0372332100815.71FA4376.41.1406742952015.50F
F4464.11.1245745901605.21ME4549.61.034483695815.21FD46
65.71.1535739752003.91ME4757.31.006573795505.51ME4867.
41.1835734901115.31FE4961.71.0825741952106.60ME5063.81.
1195738801204.60ME
Sheet1SalCompaGMidAgeEESSRGRaiseDegSUMMARY
OUTPUTSUMMARY
OUTPUT241.0451233290915.8124.21.0531233080714.71Regres
sion StatisticsRegression
Statistics23.41.018123411001914.81Multiple
R0.7050179484Multiple
R0.993128693523.41.017123329012161R
Square0.4970503076R
Square0.986304601822.60.9831233280814.91Adjusted R
Square0.4132253589Adjusted R
Square0.984022035522.90.9951233665613.30Standard
Error0.0561252686Standard
Error2.435282266523.11.0031232295216.20Observations50Obs
ervations5023.31.0111232960413.9122.70.9851232390415.30A
NOVAANOVA23.51.0231232775314.30dfSSMSFSignificance
FdfSSMSFSignificance
F231.0021232295216.20Regression70.13075007750.018678582
55.92962256620.0000782906Regression717938.4246118632562
.632087409432.10336381775.29906273684337E-
37241.04212332100815.71Residual420.13230192250.00315004
58Residual42249.0851881375.930599717535.51.145131307551
3.61Total490.263052Total4918187.509834.71.11913131801115.
6035.51.14613144701614.80CoefficientsStandard Errort StatP-
valueLower 95%Upper 95%Lower 95.0%Upper
95.0%CoefficientsStandard Errort StatP-valueLower 95%Upper
95%Lower 95.0%Upper
95.0%35.21.1361312790615.50Intercept0.94862387720.081716
771611.608680311900.78371275571.11353499870.7837127557
1.1135349987Intercept-4.87145445873.54570071-
1.37390458390.1767599037-12.02696818532.2840592678-
12.02696818532.284059267840.41.0114032100815.71Mid0.003
49950270.00064925685.39001333560.00000297670.002189249
50.00480975590.00218924950.0048097559Mid1.22841550480.0
28171330843.60516416291.32019333894083E-
361.17156345761.28526755211.17156345761.285267552142.71
.06814030100214.70Age0.00055277380.00144594460.38229252
560.7041721007-0.00236526050.0034708081-
0.00236526050.0034708081Age0.03682794250.06273971240.58
699571780.5603489282-0.08978592310.1634418081-
0.08978592310.163441808153.41.1121484865613.81EES-
0.00184625530.0010252155-1.80084613710.0789105539-
0.00391522390.0002227133-0.00391522390.0002227133EES-
0.08215797850.0444842245-1.84690144510.0718147225-
0.1719307780.007614821-
0.1719307780.00761482151.51.0721483075913.80SR-
0.00041822880.0018278101-0.22881413450.820123898-
0.0041068990.0032704414-0.0041068990.0032704414SR-
0.07784845290.079308905-0.98158527010.3319249969-
0.23790030290.0822033971-
0.23790030290.082203397149.81.0371483695815.21G0.064664
99610.01833966973.52596296240.0010348660.02765404430.10
16759480.02765404430.101675948G2.91450831120.795760511
33.66254453430.0006935491.30859858364.52041803891.30859
858364.520418038968.31.19815727553131Raise0.01465495640
.01390889761.05363896080.2980722322-
0.01341433540.0427242483-
0.01341433540.0427242483Raise0.67632948240.60350876891.
12066222950.2687988764-0.54160052151.8942594864-
0.54160052151.894259486465.41.14815734901115.31Deg0.001
46759880.01610982490.09109961250.9278465471-
0.03104334410.0339785418-
0.03104334410.0339785418Deg0.03450444820.69900727420.04
936207310.9608647532-1.37614934191.4451582383-
1.37614934191.445158238378.41.171674495914.4075.91.13316
742952015.50241.0440233285104.6123.31.0120234170404024.
11.0490232490206.3027.50.8870315280703.90t-Test: Two-
Sample Assuming Equal
Variances27.10.8750312595405.6027.70.8950312680204.91Vari
able 1Variable
240.81.0190404490405.70Mean1.066841.0483643.91.09704035
80703.91Variance0.004301640.00648099411.0250402580504.3
0Observations252548.71.01404836901605.71Pooled
Variance0.00539131549.41.02904845901804.30Hypothesized
Mean
Difference064.41.130573485805.70df4864.51.13205742100160
5.51t Stat0.889835278458.91.03305752952204.50P(T<=t) one-
tail0.18899628757.91.0160573590905.51t Critical one-
tail1.6772241961591.03505745951104.50P(T<=t) two-
tail0.377992574163.31.11105745901605.21t Critical two-
tail2.010634757656.80.99605739752003.91581.0170573795505.
5162.41.09405741952106.6063.81.1205738801204.60791.17906
736701204.51771.149067491001004174.81.11606743951306.31
761.1350675295505.40
Week1/Discusion 1.docx
Part One – Analysis Toolkpak
Add the “Analysis Toolpak” to Excel. Be sure you are you able
to copy, sort, and find averages and sums in Excel. Use
the Load the Analysis ToolPak (Links to an external site.)Links
to an external site. article for information on how to load this in
Excel. (This should be completed on Day 1.)
Part Two – Data Characteristics
Read Lecture One on descriptive data and review the Employee
Data . Be sure to familiarize yourself with the different
variables shown on the Data tab. In this course, we will be
using the Employee Data and statistical tools to answer a single
research question: In our BUS308 company, are the males and
females paid equally for equal work?
Lecture One discusses different ways data values can be
classified. In our data set for the equal pay for equal work
assignment, students in the past have correctly identify the
variable gender (coded M and F for male and female
respectively) as nominal level data, but they often see gender1
(coded 0 and 1 for male and female respectively) as interval or
ratio level data. Why? What could cause this wrong
classification? What data do you use in your personal or
professional lives that might suffer from not being correctly
labeled/understood? (This should be started on Day 1.)
Part Three –Descriptive Statistics
Read Lecture Two on describing data sets and view The Role of
Data & Analytics Today (Links to an external site.)Links to an
external site.video. Lecture Two discusses several different
ways of summarizing a data set--central location, variability,
etc. Often, business reports provide a mean or average value for
some measure (such as average number of defects per
production run). Why is the average alone not enough
information to make informed judgements about the result?
What other descriptive statistic should be included? Why? Can
you illustrate this with an example from your personal or
professional lives? (This should be started on Day 3.)
Part Four – Probability
Read Lecture Three on probability. Lecture Three introduces the
idea of probability—a measure of how likely it is to get a
particular outcome. Looking at outcomes as resulting from
probabilities (somewhat random outcomes/selections) rather
than fixed constants often changes the way we see things. How
does considering the salary outcomes in our sample the result of
a probabilistic sample rather than a completely accurate and
precise reflection of the population change how we interpret the
sample statistic outcomes? What results in your personal or
professional lives could be viewed this way? What differences
would this cause? Why? (This should be completed by Day 5.)
Week1/Discussion 2.docx
Post one question that you had related to the material this week.
Conduct research and provide the answer to the question you
posted. Be sure to provide the source.
Week1/excel tool pak needed.txt
Load the Analysis ToolPak
https://support.office.com/en-us/article/Load-the-Analysis-
ToolPak-6a63e598-cd6d-42e3-9317-
6b40ba1a66b4?CorrelationId=b44046dd-0bbf-472c-aaaf-
1c7fd6858b56&ui=en-US&rs=en-
US&ad=US&ocmsassetID=HP010021569
Microsoft. (2007). Copy Excel data or charts to Word (Links to
an external site.)Links to an external site.. Retrieved from
http://office.microsoft.com/en-us/word-help/copy-excel-data-or-
charts-to-word-HP010198874.aspx
Multimedia
AnalystSoft Inc. StatPlus:mac LE (Links to an external
site.)Links to an external site.. Retrieved from
http://www.analystsoft.com/en/products/statplusmacle
Week1/Week 1 Lecture 1.pdf
Week 1 Lecture 1
Class Approach to Statistics
Statistics is basically a set of tools that allow us to get
information out of data sets (we
will get to the more formal definition below). As such, it can
be taught as a math class (focusing
on formulas), a logic class (If this, then that), or as a case study
(here is the problem, what are we
going to do). We have chosen the later – we will be examining
statistical tools and approaches
as they help us answer a business question.
The question we will focus on involves the Equal Pay Act,
specifically the requirement
that males and females be paid the same if they are performing
equal or equivalent work. So, our
business research question is: are males and females paid the
same for equal work?
In starting out with our case, we will have a data set that
provides a number of variables
(measures that can assume different values with different
subjects) for each of 50 employees
selected randomly from our company. (The company and
employee data are fictitious, of
course).
For each employee (labeled 1 thru 50 in the ID column), we will
have:
• Salary, the annual salary, rounded to the nearest hundred
dollars; for example, a
salary of 32, 650 would be rounded to 32.7.
• Compa (short for compa-ratio or Comparative ratio) – a
measure of how a salary
relates to the midpoint of a pay range, found by dividing the
salary by the pay range
midpoint.
• Midpoint – the middle of the salary range assigned to each
grade.
• Age – the employee’s age (rounded to the nearest birthday)
• Performance rating – a value between 1 and 100 showing the
manager’s rating how
good the employee performs their job
• Service – the years the employee has been with the company
(rounded to the nearest
hiring anniversary
• Gender – a numerical code indicating the employee’s gender
(1 = female, 0 = male)
• Raise – the percent increase in pay of the last performance
based increase in salary
• Degree – the educational achievement of the employee (0 =
BA/BS, 1 = Master’s or
more)
• Gender1 – a letter code indicating the employee’s gender (F =
female, M = male)
• Grade – the employee’s pay level – grade A is the lowest
(entry level) and grade E is
the highest.
During each week, we will examine some of these variables to
see if they help us
answer the question of males and females receiving equal pay
for equal work. In the
weekly lectures, we will work with the variable salary. In the
homework assignments for
weeks 2, 3, and 4; you will have the same questions but work
with the variable compa,
which – by definition – is an alternate method of looking at pay.
If you have any questions about this description of our course
case, please ask
them in either Ask Your Instructor or in one of the class posts.
Introduction to Statistics
Formally, we can define statistics as “the science of collecting,
organizing, presenting,
analyzing, and interpreting data to assist in making more
effective decisions” (Lind, Marchel, &
Wathen, 2008, p. 4). This makes statistics and statistical
analysis a subset of both critical
thinking and quantitative thinking, both skills that Ashford
University has identified as critical
abilities for any student graduating with a degree. H. G. Wells,
the author, once said that “one
day quantitative reasoning will be as necessary for effective
citizenship as the ability to read.”
In this class, we will focus mostly on the analyzing and
interpreting of data that we will
assume has been correctly collected to allow us to use it to
make decisions with. In doing this,
there is a fairly well agreed upon approach to understanding
what the data is trying to tell us.
This approach will be followed in this class, and involves:
• Identifying what kinds of data we are working with, then
• Developing summary statistics for the data
• Developing appropriate statistical tests to make decisions
about the population the
data came from.
• Drawing conclusions from the test results to answer the initial
research question(s).
Data Characteristics
We all recognized that not all data is the same. Saying we
“like” something is quite a bit
different than saying, the part weighs 3.7 ounces. We treat
these two kinds of data in very
different ways.
The first distinction we make in data types involves identifying
our data as either
qualitative or quantitative. Qualitative data identifies
characteristics or attributes of something
being studied. They are non-numeric and can often be used for
grouping purposes. Some
examples include nationality, gender, type of car, etc.
Quantitative data, on the other hand, tend to measure how much
of what is being
examined exists. Examples of these kinds of variables include,
money, temperature, number of
drawers in a desk, etc.
Within quantitative data, we can identify continuous and
discrete data types. Continuous
data variables can assume any value with limits. For example,
depending upon how accurate our
measuring instrument is, the temperature, in degrees Fahrenheit,
could be 75, 75.3 75.32,
75.3287468…. There are no natural “breaks” in temperature
even though we typically only
report it in whole numbers and ignore the decimal portion.
Height would be another continuous
data variable. Discrete data, on the other hand, has only certain
values, and shows breaks
between these values. The number of drawers in a desk could
be 3 or 4, but not 3.56, for
example.
The second important approach in defining data is the “level” of
the data. There exist
four distinct levels:
• Nominal – these serve as names or labels, and could be
considered qualitative. The
basic use for this level is to identify distinctions between and
among subjects, such as
ID numbers, gender identification (Male or Female), car type
(Ford, Nissan, etc.).
We can basically only count how many exist within each group
of a nominal data
variable.
• Ordinal – these data have the same characteristics as nominal
with the addition of
being rankable – that is, we can place them in a descending or
ascending order. One
example is rating something using good, better, best (even if
coded 1 = good, 2 =
better, and 3 = best). We can rank this preference, but cannot
say the difference
between each data point is the same for everyone.
• Interval – this level of data adds the element of constant
differences between
sequential data points – while we did not know the difference
between good and
better or better and best; we do know the difference between 57
degrees and 58
degrees – and it is the same as the difference between 67 and 68
degrees.
• Ratio – this level adds a “meaningful” 0 – which means the
absence of any
characteristic. Temperature (at least for the Celsius and
Fahrenheit scales)) does not
have a 0 point meaning no heat at all. A scale with a
meaningful 0, such as length,
has equal ratios – the ratio of 4 feet to 2 feet has the same value
as that of 8 feet to 4
feet – both are 2. This cannot be said of temperatures, for
example (Tanner &
Youssef-Morgan, 2013).
These are often recalled by the acronym NOIR.
Knowing what kinds of data we have is important, as it
identifies what kinds of statistical
analysis we can do.
Equal Pay Question
At the end of each lecture, we will apply the topics discussed to
our research question of
do males and females receive equal pay for equal work. In this
section, we will look at
identifying the data characteristics for each of our data
variables.
In looking at our first classification of qualitative versus
quantitative, we have
Qualitative Quantitative
Continuous Discrete
ID Compa Salary
Gender Age Midpoint
Gender1 Raise
Performance
Rating
Degree Service
Grade
Most of these are fairly clear – the variables in the qualitative
column merely identify
different groups. The continuous variable lists can all –
theoretically – be carried out to many
decimal points, while those in the discrete list all have distinct
values within their range of
available values.
The identification for the NOIR classification are shown below.
Nominal Ordinal Interval Ratio
ID Degree
Performance
Rating Salary
Gender Grade Midpoint
Gender1 Service
Compa
Age
Raise
While an argument can be made that Performance Ratings,
being basically opinions, are really
ordinal data; for this class let us assume that they are interval
level as many organizations treat
them as such.
An important reason for always knowing the data level for each
variable is that we are
limited to what can be done with different levels. With nominal
scales, we can count the
differences. With ordinal scales, we can do some limited
analysis of differences using certain
tests that are not covered in this course. Both interval and ratio
scales allow us to do both
inferential and descriptive analysis (Tanner & Youssef-Morgan,
2013). Most of the statistical
tools we will cover in this class require data scales that are at
least interval in nature. During our
last two weeks, we will look at some techniques for nominal and
ordinal data measures.
In Lecture 2, we will start to see what kinds of things we can do
with each level of the
NOIR characteristics.
If you have any questions about this material, please ask
questions in either Ask Your
Instructor or in the discussion area.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diego, CA:
Bridgeport Education.
Week1/Week 1 Lecture 2.pdf
Week 1 Lecture 2
In Lecture 1, we focused on identifying the characteristics –
quantitative, qualitative,
discrete, continuous, NOIR – of the data. In this section, we
will take a look at how we can
summarize a data set with descriptive statistics, and how we can
ensure that these descriptive
statistics can be used as inferential statistics to make inferences
and judgments about a larger
population. We are moving into the second step of the analysis
approach mentioned in Lecture
1.
Descriptive Statistics
Once we understand the kinds of data we have, the natural
reaction is want to summarize
it – reduce what may be a lot of data into a few measures to
make sense of what we have. We
start with summary descriptions, the principle types focus on
location, variability, and
likelihood. (Note, we will deal with likelihood, AKA
probability, in Lecture 3 for this week.)
For nominal data, our analysis is limited to counting how many
exist in each group, such
as how many cars by car company (Ford, Nissan, etc.) are in the
company parking lot. However,
we can also use nominal data as a group name to form different
groups to examine, in this case
we do nothing with the actual data label, but do some analysis
with the data in each group. An
example related to our class case: we can group the salary data
values into two groups using the
nominal variable gender (or gender1).
With ordinal scales, we can do some limited analysis of
differences using certain tests;
most of which are not covered in this course. We can also use
ordinal data as grouping labels,
for example we could do some analysis of salary by educational
degree.
Both interval and ratio scales allow us to do both inferential and
descriptive analysis
(Tanner & Youssef-Morgan, 2013). Most of the statistical tools
we will cover in this class
require data scales that are at least interval in nature.
Location measures. When working with interval or ratio level
variables, the first
measure most researchers look are indications of location –
mean, median, mode. The mean is
the numerical average of the data – simply add the values and
divide by the total count. The
median is the middle of the data set; rank order the values form
low to high or high to low, and
pick the value that is in the middle. This is easy if we have an
odd number of values, we can find
the middle exactly. If we have an even number of variables, the
middle is the average of the
middle two values. For example, in this data set: 2, 3, 4, 5, 6,
we have five values and the
median is 4. However, in this data set: 2, 3, 4, 5, we have only
four values and the median is the
average of the middle 2 numbers = (3 + 4)/2 = 7/2 = 3.5.
Finally, the mode is the most
frequently occurring value; as such, it may or may not occur.
And, there may be more than one
mode in any data set.
Generally, the mean is the most useful measure for a data set, as
it contains information
regarding all the values. It is the location measure that is used
in many statistical tests. The
symbol for the mean of a population is μ – called mu – while we
use �� – sometimes typed as x-
bar – for the sample mean.
Variation measures. After finding our mean (or other center
measure), we generally
want to know how consistent the data is – that is, is the data
bunched around the center, or is it
spread out. The more spread out a data is, the less any single
measure accurately describes all of
the data. Looking at the consistency (or lack of consistency) in
a data set will often give us a
different understanding of what is going on. A simple example,
if we have two departments in a
company that each averaged 3.0 on a question in a company
morale survey, we might be tempted
to say they were the same. However, if we looked at the actual
scores and saw that one
department had individual scores of 3, 3, 3, 3, 3, and 3 while
the other department’s scores were
5, 5, 5, 1, 1, and1 we can now see that the groups are quite a bit
different. The mean alone did
not provide enough information to interpret what was going on
in each group.
We have 3 general measures of variation – range, standard
deviation, and
variance. Range is simply the difference between the largest
and smallest value (largest –
smallest = range).
Standard deviation and variance are related values. The
variance is a somewhat
awkward measure to initially understand. To calculate it, we
first take the difference between
each value and mean of the entire group. This outcome will
have both positive and negative
values, and if we add them together we would get a result of 0.
So, to eliminate the negative
values, we square each outcome. Then we get the sum these
squared values and divide it by the
count. (Note: this is the same as the mean of the squared
differences.) For example, the
variance of this data set (2, 3, 4) would be:
• Mean = (2 + 3 + 4)/3 = 9/3 = 3
• Variance = ((2 - 3)^2 + (3 – 3)^2 + (4 – 3)^2)/3 = ((-1)^2 +
(0)^2 + (1)^2))/3 = (1 + 0
+ 1)/3 = 2/3 = 0.667.
This gives us an awkward measure – the variance of something
measured in inches, for
example, would be measured in inches squared – not a measure
we all use on a daily basis.
The standard deviation changes this awkward measure to one
that makes more intuitive
sense. It does so by taking the positive square root of the
variance. This would give us, for our
inches measure a result that is expressed in inches. The
standard deviation is always expressed
in the same units as the initial measure. For our example above
with the variance of 0.667, the
standard deviation would be the square root of 0.667 or 0.817.
Both the variance and standard
deviation require data that is at least interval in nature. The
standard deviation is about 1/6 of the
range, and is considered the average difference from the mean
for all of the data values in the set
(Tanner & Youssef-Morgan, 2013).
Technical point – both the variance and the standard deviation
have two different
formulas, one for populations and one for samples. The
difference is that with the sample
formula, the average is found with the (count -1) rather than the
full count. This serves to
increase the estimate, since the data in a sample will not be as
spread out as in the population
(unlikely to have the extreme largest and smallest value). The
symbol for the population
standard deviation is σ, while the sample standard deviation
symbol is s. In statistics, since we
deal with samples, we use the sample formulas – to be discussed
below.
The nice thing about descriptive statistics is that Excel will do
all of the math calculations
for us, we just need to know how to interpret our results.
For a video discussion of descriptive statistics take a look at
Descriptive statistics from
the Kahn Academy -
https://www.khanacademy.org/math/probability/descriptive-
statistics.
Research Question Example
Now that we have identified the data types for each of our
variables, we need to develop
some descriptive statistics – particularly for those at the
interval and ratio level. In our
discussion and example of salary, we will be using a salary
sample of 50 that does not exactly
match the data that is available in your data set. It is not
significantly different, and should be
considered to come from a different sample of the same
population. The results will be accurate
enough to consider them in answering our equal pay for equal
work question for the sample
results provided to the class.
Equal Pay Question. The obvious first question to ask is what
is the overall average
salary, and what is the average for the males and females
separately? This descriptive statistic
should also be accompanied with the standard deviation of each
group to examine group
diversity. (Reminder: the salary results presented each week
will not exactly match the results
from this class’ data set if you choose to duplicate the results
presented in this lecture. The
results are statistically close enough to use to answer our
assignment question on equal pay.)
The related question concerns the standard deviation of each of
the three groups (entire
sample, males, and females) – what is the standard deviation for
each group.
In setting up the data for this, copy the salary data column
(B1:B51) and paste it on a new
sheet. This is a recommend practice – never do analysis on the
raw data set so that relationships
between various columns are not compromised. Then copy the
Gender column (M and F) and
paste it beside the salary data. Using Excel’s sort function, sort
the two columns (at the same
time) using Gender as the sort key. This will give you the
salary data grouped by males and
females.
The screen shot below displays the results using both the
Descriptive Statistics option
found in the Data Analysis list and the =Average and = Stdev.s
functions found in the fx or
Formulas – statistics section.
Note a couple of things about the Descriptive Statistics output.
First, since for both the
overall and female groups, the input range included the label
Sal, this was shown at the top. The
male range did not have a label, so Column 1 was automatically
used. We can use Descriptive
Statistics for any number of contiguous columns in the input
range box. For reporting purposes,
we should change the Sal and Column 1 labels to Overall,
Female, and Male.
https://www.khanacademy.org/math/probability/descriptive-
statistics
The second issue about the descriptive statistics output is that it
contains much more
information than we were looking for. This is a good tool for
an overall look at a data set.
Looking at the fx values and those from the Descriptive
Statistics output, we can see that
the means and standard deviations are identical for each group –
so, it does not matter which
approach you use.
Now, looking at the actual statistical values, we see that the
overall all salary mean (45)
lies in the middle between the lower female mean (38) and the
upper male mean (52) – overall
means will always be flanked by sub-group means, but the
differences will not also be
equidistant.
The standard deviations, on the other hand are much closer
together with the overall
(19.2) being somewhat larger than either the female (18.3) or
male (17.8). This is also somewhat
common – the variation in the entire group is generally a bit
larger than for the sub-groups.
While we did not specifically ask for it, we can also note that
the range in each group is
very close 22 – 77 for overall and females and 24 – 77 for
males.
So, what can we say at this point? It appears that males and
females have about the same
range and standard deviations for salaries, but that females
appear to average less than the males.
However, at this point, we cannot say anything about our equal
pay for equal work question as
the Salaries have not been divided into equal work groups. So,
at this point we have some
interesting information, but no conclusive results yet.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diego, CA:
Bridgeport Education.
Week1/Week 1 Lecture 3.pdf
Week 1 Lecture 3
A second way of looking at data differences or similarities is to
consider how likely a
given outcome is. In looking at our data set, we could ask
questions such as, what is the
probability (likelihood) of a male or female salary exceeding
60K, what is the probability that a
person’s salary is within the range of 38K to 52K, etc.
Probability questions about a data begin
to help us look at distributions, a topic we will delve into in
more detail in the upcoming weeks.
Probability is the likelihood that a specific outcome will occur;
it is always positive and
ranges from 0 (will never occur) to 1.00 (will always occur).
Generally speaking, we have 3
kinds of probability – empirical (counting actual outcomes),
theoretical (using theory/logic to
determine what should occur) and subjective (our individual
guesses and feelings). Obviously
the theoretical and empirical are the best approaches for
business research questions, but at times
the best we can get is an expert’s guess.
Theoretical probability is just as it sounds – the theory of what
the probability should
be. For example, if we flip a fair coin, our theory says we
should get heads 50% of the time –
one outcome out of the two possible. If we flip the coin a
number of times, we will get the
empirical probability – the number of actual heads divided by
the number of flips. While this is
generally close to .5, achieving this is usually the result of a lot
of flips rather than just a few
(even up to 100) (Lind, Marchel, & Wathen, 2008).
While many approaches to theoretical probability exist
(binominal, hypergeometric,
Poisson, etc.) (Lind, Marchel, & Wathen, 2008) exist, we will
look at just two particular types –
the binominal and the normal curve based probabilities. The
binominal requires that we have
only two outcomes, such as heads and tails when flipping a
coin. This is not as restrictive as it
might seem, as we can always create 2 groups out of what we
have. For example, if we have a
single die (one of a pair of dice), we could form several two
group situations – evens versus
odds, 1 – 3 versus 4 – 6, etc. We will use the binominal to
discuss several basic probability
rules.
Four general probability (P) concerns exist. Typically, we want
to know one or more of
the following probabilities:
• of something happening – called P(event),
• of two things happening together – called joint probability:
P(A and B),
• of either one or the other but not both events occurring – P(A
or B),
• of something occurring given that something else has occurred
– conditional
probability: P(A|B) (read as probability of A given B).
• Compliment rule: P(not A) = 1- p(A) (Lind, Marchel, &
Wathen, 2008).
Two other issues are needed, the idea of mutually exclusive
means that the elements of
one data set do not belong to another – for example, males and
pregnant are mutually exclusive
data sets. The other term we frequently hear with probability is
collectively exhaustive – this
simply means that all members of the data set are listed (Lind,
Marchel, & Wathen, 2008).
Some rules, which apply for both theoretical and empirical
based probabilities, for
dealing with these different probability situations include:
• P(event) = (number of success)/(number of attempts or
possible outcomes)
• P(A and B) = P(A)*P(B) for independent events or
P(A)*P(B|A) for dependent events
(This last is called conditional probability the probability of B
occurring given that A
has occurred).
• P(A or B) = P(A) + P(B) – P(A and B); if A and B cannot
occur together (such as the
example of male and pregnant) then P(A and B) = 0
• P(A|B) = P(A and B)/P(B) (Lind, Marchel, & Wathen, 2008).
Binominal Probability
Binominal probabilities deal with dichotomous outcomes –
those that have only 2
possible outcomes. A typical example is flipping a coin, the
result can only be a head or tail.
Another common example is gender, we are born as either male
or female. The interesting
element about binominal outcomes is that while every single
trail (such as the flip of a coin) has
the same probability, the outcome of a group of trails will not
necessarily match that probability.
For example, the probability of getting exactly 5 heads out of
10 flips of a fair coin is not .5, but
rather 24.6%! This is due to the number of ways the 10
outcomes can be distributed (Lind,
Marchel, & Wathen, 2008).
We can turn almost any outcome into a dichotomous outcome by
creating groups. For
example, we can say that when we toss a six-sided die (half of a
pair of dice), we have two
outcomes: getting a 1 or 2 versus getting anything else. Now
we have two outcomes of interest
instead of the original 6 possible outcomes.
Tables exist to determine the likelihood, but the easier way is to
use the Excel functions
found in the fx or Formulas lists. For example, Excel’s
BINOM.DIST functions can quickly
provide us with the correct probability of getting a certain
number of outcomes within a given
number of attempts.
Research Question Example
Understanding the distribution of the data is an important
element of understanding what
the data is trying to tell us. Probabilities can give us a sense of
the data set and allow us to
compare results across groups.
Equal Pay Example. In thinking about equal pay, we might be
interested in the
probability that both males and females appear to be grouped in
similar ways as the overall
group. This would be an example of an empirical probability,
as we would be counting how
many of each group fall into each of the ranges we would set
up.
We noted that the overall salary mean was 45, with the female
mean equaling 38 and the
male mean equaling 52. This suggests one group to look at –
what is the probability of someone
having a salary between 38 and 52 in each group – overall,
females, and males?
Translating this into “probability” terms, we want to know:
• What is p(38 <= salary <=52)? What is the probability that
salaries are between
38 and 52 inclusive?
• What is p(38 <= salary <=52|Female)? What is the probability
that salaries are
between 38 and 52 inclusive, given a female salary? Or, if a
female, what is the
probability that salaries are between 38 and 52 inclusive?
• What is p(38 <= salary <=52|Male)? What is the probability
that salaries are
between 38 and 52 inclusive, given a Male salary? Or, if a
Male, what is the
probability that salaries are between 38 and 52 inclusive?
We know a couple of things right off. First the entire sample
has 50 members, and we
have 25 males and females. These become the denominators in
the respective probabilities.
Since you do not have the exact data set we are working with,
the counts for salaries in these
ranges are: Overall: 8, Females: 3, and Males: 5.
So, we have:
P(38 <= salary <=52) = 8/50 = .16.
P(38 <= salary <=52|Female) = 3/25 = .12.
P(38 <= salary <=52|Male) = 5/25 = .20.
We can see if gender influences being within this range by
seeing if the formula for
independent events is true. Above we had stated P(A and B) =
P(A)*P(B) for independent
events. In this case, the P(within salary range AND Female) is
the same as P(within salary
range|Female); (this is not always the case). So, since P(within
salary range) = .16, and
P(Female) = .5, we would have:
P(within salary range and Female) = P(within salary range) *
P(Female)
Replacing these with the associated values, we would have:
.12 = .16 * .5 (= .08). An expression that is clearly not correct
or true.
Since the two sides of the equation are clearly not the same; we
can say that gender and
being within this salary ranger are not independent elements.
Doing this for other ranges
produces similar results, so we have a clue that gender and
salary interact in some ways that
suggest males and females are not paid equally.
What we still do not know yet, is how to consider equal work in
our examination.
The Normal Curve
The normal curve is a data distribution that is often called the
bell curve, as when you
plot the likelihood of outcomes occurring, the resulting graph
looks like a bell – the most
outcomes in the middle (where the mean = median = mode), and
then smoothly decreasing on
each side.
As a probability distribution, the normal has some interesting
characteristics. First, the
probability of any outcome equals the area under the curve for
that range of outcomes. (Tables
and Excel give us these values.) Second, the curve technically
extends from - infinity to +
infinity (although this range is rarely actually used). Third,
since the normal curve is continuous
data, the probability of any single outcome (for example,
getting a 76 on a test) is 0, so to
overcome this we develop a range of values – the 76 score
outcome would be the area from 75.5
to 76.5 – the adding of +/- half a unit to a value allows us to
translate discrete data into a
continuous range (Lind, Marchel, & Wathen, 2008).
The normal curve is important due to its wide spread appearance
in everyday
situations. Some examples of data that follow the normal curve
are height, weight, IQ,
standardized test scores such as the college boards, many
manufacturing measures (above and
below the average result), etc.
To make working with different normal curves (having different
means and standard
deviations), we can convert them all into the Standard Normal
Curve, which has a mean of 0 and
a standard deviation of 1.0.
We do this using a z-score – subtract the mean from the data
value, and divide the result
by the standard deviation. Doing this for every value in a data
set would change the mean of the
new distribution to 0 (due to the subtraction), while the division
changes the standard deviation
to 1. The resulting data values are now z-scores, and the area
between z-scores is the probability
of an outcome within that range of values. One characteristic of
the z-score is that it tells us, in
standard deviations, how close or far from the mean any
individual score is; so in some ways this
is another location measure but one that focuses on individual
values.
Here is an example using the normal distribution and related
Excel functions found in the
fx list. (See the Excel Week 1 lecture for guidance on using
this function if you are unclear about
it.) To find the probability of an outcome between a z-score of
1.63 and 2.0, we would need to
find the area between these two scores. To do this, we would
subtract the area under the curve
up to the z score of 1.63 from the area under curve up to the
area of the z-score for 2.0. In excel
we use the fx function NORM.S.DIST (z, cumulative) this way”
=NORM.S.DIST(2.0,1)-NORM.S.DIST(1.63,1) = 0.0288.
This tells us that the probability of finding a sample value
within this range is about 2.9%.
A second example with values that are above and below the
mean would be done the
same way. Looking to find the probability that we would find a
sample value between the z
scores of -1.63 and 2.0 would be:
=NORM.S.DIST(2.0,1)-NORM.S.DIST(-1.63,1) = 0.9257 or
92.6%.
The final example of finding a normal curve based probability
is determining the
probability of being greater than some value; for example, what
is the probability of exceeding a
z score of 2.0? =1 - NORM.S.DIST(2.0,1) = 0.02275 or 2.3%
Finding the area below a negative z score is found by simply
using the NORM.S.DIST
function. = NORM.S.DIST(-2.0,1) = 0.02275 or 2.3%
A hint on doing these kinds of problems is to draw a picture of
the normal curve, and
draw a vertical line at each of the z score values you are
working with. Then shade the area you
are interested in. There are three cases – the area below a
certain value, the area above a certain
value, and the area between two values. This visual guide helps
in determining what we subtract
from what.
Side note – the probability of exceeding a particular outcome by
pure chance alone is called the
p-value. We will start using this idea next week.
Research Question Example
We will be assuming that the variables we are using for our
equal pay question come
from a normally distributed population. This allows us to use
normal curve based probabilities
and statistical tests to examine the data we are using to answer
our question.
Equal Pay Example. Earlier we found that the likelihood of
males and females having a
salary between 38 and 52K were not the same, suggesting that
gender and salary interacted in
some way. Let us ask if the probability of having a salary
greater than the overall mean of 45 is
the same for both genders. Since we are assuming that salary is
normally distributed as a whole
and for each gender, we can use a normal curve probability to
examine this.
The first step is to find the z scores for each gender for the data
value of 45. We found
earlier that the female mean is 38, and the sample standard
deviation is 18.3, and the male mean
and standard deviation is 52 and 17.8 respectively. This gives
us the information needed to
determine our z scores:
Female z = (45-38)/18.3 = (7)/18.3 = 0.38.
Male z = (45-52)/17.8 = (-)7/17.8 = -0.39.
The second step is to find the probability of exceeding each of
these values using the
NORM.S.DIST function.
Female: =1-NORM.S.DIST(0.38,1) = 0.352
Males = 1-NORM.S.DIST(-0.39,1) = 0.652
So, it again appears that males and females have a different
salary distribution as males are
almost twice as likely to be above the overall average of 45 as
females are. Again, we have not
yet considered the equal work element.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Week2/Assingment.docx
Problem Set Week Two
The assignment for this week involves developing an
understanding of the problem and the data that we will be
analyzing during the class. We will be using a data set of 50
employees sampled from an imaginary company to answer the
question of whether males and females receive equal pay for
performing equal work.
The questions in the assignment follow the examples provided
in the weekly guidance lectures.
The first question this week focuses on the kind of data we
have. Different levels of data allow us to do different kinds of
analysis, so we need to understand what we have to work with.
Question two involves developing the probability of randomly
picking a student who has certain characteristics from the
sample.
Question three involves finding the probability of randomly
picked employees falling within the top one-third of different
groups using Excel functions. Question four and five involve
using statistical tests to determine if the compa-ratio (an
alternate measure of pay).
The final question asks for an interpretation of your opinion on
the question of equal pay for equal work based on the work
done this week.
Both the assignment file and the data file are located in
the Course Materials section at the bottom in the Multi-
Media section. The assignment file contains all of the weekly
assignments (for Weeks 2, 3, and 4). See the labeled tabs at the
bottom of the Excel assignment file. The data in the data file
needs to be copied over into the assignment file, and you will be
set for the entire class. *Ask questions if it is not clear how to
move the data from one file to the other.
Week2/discu1.docx
Hypothesis Testing / T-tests / F-test
Although the initial post is due on Day 5, you are encouraged to
start working on it early, as it is a three-part discussion that
should be completed in sequential order.
Part One – Hypothesis Testing
Read Lecture Four. Lecture Four starts out with the five-step
procedure for hypothesis testing. What is this? What does it do
for us? Why do we need to follow these steps in making a
judgement about the populations our samples came from? What
are the “tricky” parts of developing appropriate hypotheses to
test? What examples can you suggest where this process might
be appropriate in your personal or professional lives? (This
should be started on Day 1.)
Part Two – T-tests
Read Lecture Five. Lecture Five illustrates several t-tests on the
data set. What conclusions can you draw from these tests about
our research question on equal pay for equal work? What is
missing from these results to give us a complete answer to the
question? Why? (This should be started on Day 3.)
Part Three – F-test
Read Lecture Six. Lecture Six introduces you to the F-test for
variance equality. Last week, we discussed how adding a
variation measure to reports of means was a smart thing to do.
Why does variation make our analysis of the equal pay for equal
work question more complicated? What causes of variation
impact salary that we have not discussed yet? How can you
relate this issue to measures used in your personal or
professional lives? (This should be completed by Day 5.)
Your responses should be separated in the initial post,
addressing each part individually, similar to what you see here.
Week2/discu2.docx
Post a question that you had related to the material this week.
Conduct research to provide the answer to the question and
provide the source.
Week2/Week 2 Lecture 4-1.pdf
Lecture 4
(Sampling basics and Hypothesis test)
This week we turn from descriptive statistics to inferential
statistics and making decisions
about our populations based on the samples we have. For
example, our class case research
question is really asking if in the entire company population of
employees, do males and females
receive the same pay for doing equal work. However, we are
not analyzing the entire
population, instead we have a sample of 25 males and 25
females to work with.
This brings us to the idea of sampling – taking a small
group/sample from a larger
population. To paraphrase, not all samples are created equal.
For example, if you wanted to
study religious feelings in the United States, would you only
sample those leaving a
fundamentalist church on a Wednesday? While this is a
legitimate element of US religions, it
does not represent the entire range of religious views – it is
representative of only a portion of the
US population, and not the entire population.
The key to ensuring that sample descriptive statistics can be
used as inferential statistics –
sample results that can be used to infer the characteristics (AKA
parameters) of a population – is
have a random sample of the entire population. A random
sample is one where, at the start,
everyone in the population has the same chance of being
selected. There are numerous ways to
design a random sampling process, but these are more of a
research class concern than a
statistical class issue. For now, we just need to make certain
that the samples we use are
randomly selected rather than selected with an intent of
ensuring desired outcomes are achieved.
The issue about using samples that students often new to
statistics is that the sample
statistic values/outcomes will rarely be exactly the same as the
population parameters we are
trying to estimate. We will have, for each sample, some
sampling error, the difference between
the actual and the sample result. Researchers feel that this
sampling error is generally small
enough to use the data to make decisions about the population
(Lind, Marchel, & Wathen,
2008).
While we cannot tell for any given sample exactly what this
difference is, we can
estimate the maximum amount of the error. Later, we will look
at doing this; for now, we just
need to know that this error is incorporated into the statistical
test outcomes that we will be
studying.
Once we have our random sample (and we will assume that our
class equal pay case
sample was selected randomly), we can start with our analysis.
After developing the descriptive
statistics, we start to ask questions about them. In examining a
data set, we need to not only
identify if important differences exist or not but also to identify
reasons differences might exist.
For our equal pay question, it would be legal to pay males and
females different salaries if, for
example, one gender performed the duties better, or had more
required education, or have more
seniority, etc. Equal pay for equal work, as we are beginning to
see, is more complex than a
simple single question about salary equality. As we go thru the
class, we will be able to answer
increasingly more complex questions. For this week, we will
stay with questions about
involving ways to sort our salary results – looking for
differences might exist.
Some of these questions for this week with our equal pay case
could include:
• Could the means for both males and females be the same, and
the observed difference
be due to sampling error only?
• Could the variances for the males and female be the same
(AKA statistically equal)?
• Could salaries per grade be statistically equal?
• Could salaries per degree (undergraduate and graduate) be the
same?
• Etc.
Hypothesis Testing
As we might expect, research and statistics have a set
procedure/process on how to go
about answering these questions. The hypothesis testing
procedure is designed to ensure that
data is analyzed in a consistent and recognized fashion so
everyone can accept the outcome.
Statistical tests focus on differences – is this difference large
enough to be significant,
that is not simply a sampling error? If so, we say the difference
is statistically significant; if not,
the difference is not considered statistically significant. This
phrasing is important as it is easy to
measure a difference from some point, it is much harder to
measure “things are different.” It is
that pesky sampling error that interferes with assessing
differences directly.
Before starting the hypothesis test, we need to have a clear
research question. The
questions above are good examples, as each clearly asks if some
comparison is statistically equal
or not. Once we have a clear question – and a randomly drawn
sample – we can start the
hypothesis testing procedure. The procedure itself has five
steps:
• Step 1: State the null and alternate hypothesis
• Step 2: Form the decision rule
• Step 3: Select the appropriate statistical test
• Step 4: Perform the analysis
• Step 5: Make the decision, and translate the outcome into an
answer to the initial
research question.
Step 1. The null hypothesis is the “testable” claim about the
relationship between the
variables. It always makes the claim of no difference exists in
the populations. For the question
of male and female salary equality, it would be: Ho: Male mean
salary = Female mean salary.
If this claim is found not to be correct, then we would accept
the alternate hypothesis claim: Ha:
Male salary mean =/= (not equal) Female salary mean. (Note,
some alternate ways of phrasing
these exist, and we will cover them shortly. For now, let’s just
go with this format.)
Step 2. This step involves selecting the decision rule for
rejecting the null hypothesis
claim. This will be constant for our class – we will reject the
null hypothesis when the p-value is
equal to or less than 0.05 (this probability is called alpha).
Other common values are .1, and .01
– the more severe the consequences of being wrong if we reject
the null, the smaller the value of
alpha we select. Recall that we defined the p-value last week as
the probability of exceeding a
value, the value in this case would be the statistical outcome
from our test.
Step 3. Selecting the appropriate statistical test is the next step.
We start with a question
about mean equality, so we will be using the T-test – the most
appropriate test to determine if
two population means are equal based upon sample results.
Step 4. Performing the analysis comes next. Fortunately for us,
we can do all the
arithmetic involved with Excel. We will go over how to select
and run the appropriate T-test
below.
Step 5. Interpret the test results, making a decision on rejecting
or not rejecting the null
hypothesis, and using this outcome to answer the research
question is the final step. Excel output
tables provide all the information we need to make our decision
in this step.
Step 1: Setting up the hypothesis statements
In setting up a hypothesis test for looking at the male and
female means, there are
actually three questions we could ask and associated hypothesis
statements in step 1.
1. Are male and female mean salaries equal?
a. Ho: Male mean salary = Female mean salary
b. Ha: Male mean salary =/= Female mean salary
2. Is the male mean salary equal to or greater than the Female
mean salary?
a. Ho: Male mean salary => Female mean salary
b. Ha: Male mean salary < Female mean salary
3. Is the male salary equal to or less than the female mean
salary?
a. Ho: Male mean salary <= Female mean salary
b. Ha: Male mean salary > Female mean salary
While they appear similar each answers a different question.
We cannot, for example,
take the first question, determine the means are not equal and
then say that, for example, the
male mean is greater than the female mean because the sample
results show this. Our statistical
test did not test for this condition. If we are interested in a
directional difference, we need to use
a directional set of hypothesis statements as shown in
statements 2 and 3 above.
Rules. There are several rules or guidelines in developing the
hypothesis statements for
any statistical test.
1. The variables must be listed in the same order in both claims.
2. The null hypothesis must always contain the equal (=) sign.
3. The null can contain an equal (=), equal to or less than (<=)
or equal to or greater than
(=>) claim.
4. The null and alternate hypothesis statement must, between
them, account for all
possible actual comparisons outcomes. So, if the null has the
equal (=) claim, the
alternate must contain the not equal (=/= or ≠) statement. If the
null has the equal or
less than (<= or ≤) claim, the alternate must contain the greater
than (>) claim.
Finally, if the null has the equal to or greater (=> or ≥) claim,
the null must contain
the less than (<) claim.
Deciding which pair of statements to use depends on the
research question being asked –
which is why we always start with the question. Look at the
research question being asked; does
it contain words indicating a simple equality (means are equal,
the same, etc.) or inequality (not
equal, different, etc.), if so we have the first example Ho:
variable 1 mean = variable 2 mean, Ha:
variable 1 mean =/= variable 2 mean.
If the research question implies a directional difference (larger,
greater, exceeds,
increased, etc. or smaller, less than, reduced, etc.) then it is
often easier to use the question to
frame the alternate hypothesis and back into the null. For
example, the question is the male
mean salary greater than the female mean salary would lead to
an alternate of exactly what was
said (Ha: Male salary mean > Female salary mean) and the
opposite null (Male salary mean <=
Female salary mean).
Step 2: Decision Rule
Once we have our hypothesis statements, we move on to
deciding the level of evidence
that will cause us to reject the null hypothesis. Note, we always
test the null hypothesis, since
that is where our claim of equality lies. And, our decision is
either reject the null or fail to reject
the null. If the latter, we are saying that the alternate
hypothesis statement is the more accurate
description of the relationship between the two variable
population means. We never accept the
alternate.
When we perform a statistical test; we are in essence asking if,
based on the evidence we
have is, the difference we observe be large enough to have been
caused by something other than
chance or is it due to sampling error?
A statistical test gives us a statistic as a result. We know the
shape of the statistical
distribution for each type of test, therefore we can easily find
the probability of exceeding this
test value. Remember we called this the p-value.
Now all we need to decide is what is an acceptable level of
chance – that is, when would
the outcome be so rare that we would not expect to see it purely
by chance sampling error alone?
Most researchers agree that if the p-value is 5% (.05) or less
than, then chance is not the cause of
the observed difference, something else must be responsible.
This decision point is called alpha.
Other values of alpha frequently used are 10% (often used in
marketing tests) and 1% (frequently
used in medical studies). The smaller the chosen alpha is, the
more serious the error is in
rejecting the null when we should not have.
For our analysis, we will use an alpha of .05 for all our tests.
Final Point
You may have noticed that we have two basic types of
hypothesis statements – those
testing equality and those testing directional differences. This
leads to two different types of
statistical tests – the two-tail and the one-tail. In the one-tail
test, the entire value of alpha is
focused on the distribution tail – either the right or left tail
depending upon the phrasing of the
alternate hypothesis. A neat hint, the arrow head in the
alternate hypothesis shows which tail the
result needs to be in to reject the null.
In the case of the two-tail test (equality), we do not care if one
variable is bigger or
smaller than the other, only that they differ. This means that
the rejection statistic could be in
either tail, the right or left. Since the reject region is split into
two areas, we need to split alpha
into these areas – so with a two-tail test, we use alpha/2 as the
comparison with our p-value (e.g.,
0.05/2 = 0.025). The example in Lecture 5 will review this in
more detail.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Week2/Week 2 Lecture 5-1.pdf
Lecture 5
The T-Test
In the previous lecture, we introduced the hypothesis testing
procedure, and developed
the first two steps of a statistical test to determine if male and
female mean salaries could be
equal in the population – where our differences were caused
simply by sampling errors. This
lecture continues with this example by completing the final
three steps. It also introduces our
first statistical test, the t-test for mean equality.
Last week we looked at the normal curve and noted several of
its characteristics, such as
mean = median = mode, symmetrical around the mean, curve
height drops off the further the
score gets from the mean (meaning scores further from the mean
are less likely to occur). Our
first statistical test, the t-test, is based on a population that is
distributed normally. The t-test is
used when we do not have the population variance value – this
is the situation every time we use
a sample to make decisions about their related populations.
While the t-test has several different versions, we will focus on
the most commonly used
form – the two sample test for mean equality assuming equal
variance. When we are testing
measures for mean equality, it is fairly rare for the variances to
be much difference, and the
observed difference is often merely sample error. (In Lecture 6,
we will revisit this assumption.)
The logic of the test is that the difference between mean values
divided by a measure of
this difference’s variation will provide a t statistic that is
distributed normally, with the mean
equaling 0 and the standard deviation equaling 1. This outcome
can then be tested to see what
the likelihood is that we would get a value this large or larger
purely by chance – our old friend
the p-value. If this p-value exceeds our decision criteria, alpha,
then we reject the null
hypothesis claim of no difference (Lind, Marchel, & Wathen,
2008).
Setting up the t-test
Before selecting any test from Excel, the data needs to be set
up. For the t-test, there are
a couple of steps needed. First, copy the data you want to first
set up the data. In our question
about male and female salaries, copy the gender variable
column from the data page to a new
worksheet page (the recommendation is on the week 2 tab) and
paste it to the right of the
questions (such as in column T), then copy and paste the salary
values and paste them next to the
gender data. Next, sort both columns by the gender column –
this will give you the salary data
sorted by gender. Then, in column V place the label/word
Males, and in column W place the
label Females. Now copy the male salaries and paste them
under the Male label, and do the
same for the female salaries and the female label. The data is
now set up for easy entry into the
T-test data entry section.
The t-test is found in the Analysis Toolpak that was loaded into
your Excel program last
week. To find it, click on the Data button in the top ribbon,
then on the Data Analysis link in the
Analyze box at the right, then scroll down to the T-test: Two-
Sample Assuming Equal Variances.
For assistance in setting up the t-test, please see the discussion
in the Week 2 Excel Help lecture.
Interpreting the T-test Output
The t-test output contains a lot of information, and not all of it
is needed to interpret the
result. The important elements of the t-test outcome will be
shown with an example for our
research case question.
Equal Pay Example - continued
In Lecture 4 we set up the first couple of steps for our testing of
the research question: Do
males and females receive equal pay for equal work? Our first
examination of the data we have
for answering this question involves determining if the average
salaries are the same.
Here is the completed hypothesis test for the question: Is the
male average salary equal to
the female average salary?
Step 1. Ho: Male mean salary = female mean salary
Ha: Male mean salary ≠ female mean salary
Step 2. Reject the null if the p-value is < (less than) alpha =
.05.
Step 3. The selected test is the Two-Sample T-test assuming
equal variances.
Step 4. The test results are below. The screen shot shows
output table.
Step 5. Interpretation and conclusions.
The first step is to ensure we have all of the correct data. We
see that we have 25 males
and females in the Observations row, and that the respective
means are equal to what we earlier
calculated.
The calculated t statistic is 2.74 (rounded). We have two ways
to determine if our result
rejects or fails to reject the null hypothesis; both involve the
two-tail rows, as we have a two tail
test (equal or not equal hypothesis statements). The first is a
comparison of the t-values – if the
critical t of 2.74 (rounded) is greater than the T-Critical two-tail
value of 2.01, we reject the null
hypothesis. The second way is to compare the p-value with our
criteria of alpha = .05.
Remember, since this is a two-tail test, the alpha for each tail is
half of the overall alpha or .025.
If the p-value (shown as P(T<=t) two -tail value of 0.0085 is
less than our one tail alpha (.025)
then we reject the null hypothesis. Note: at times Excel will
report the p-value in an E format,
such as 3.45E-04. This is called an Exponent format, and is the
same as 3.45 * 10-04. This
means move the decimal point 4 places to the left, making
3.45E-04 = 0.000345. Virtually any
p-value reported with an E-xx form will be less than our alpha
of 0.05 (which would be 5E-02).
Since we rejected the null hypothesis in both approaches (and
both will always provide
the same outcome), we can answer our question with: No - the
male and female mean salaries are
not equal.
Note that for this set of data, we would have rejected the null
for a one-tail test if and
only if the null hypothesis had been: Male mean salary is <=
Female mean salary and the
alternate was Male mean salary is > Female mean salary. The
arrow in the alternate points to the
positive/right tail and that is where the calculated t-statistic is.
So, even if the p-value is smaller
than alpha in a one tail test, we need to ensure the t-statistic is
in the correct tail for rejection.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Week2/Week 2 Lecture 6-1.pdf
Lecture 6
(Additional information on t-tests and hypothesis testing)
Lecture 5 focused on perhaps the most common of the t-tests,
the two sample assuming
equal variance. There are other versions as well; Excel lists
two others, the two sample assuming
unequal variance and the paired t-test. We will end with some
comments about rejecting the null
hypothesis.
Choosing between the t-test options
As the names imply each of the three forms of the t-test deal
with different types of data
sets. The simplest distinction is between the equal and unequal
variance tests. Both require that
the data be at least interval in nature, come from a normally
distributed population, and be
independent of each other – that is, collected from different
subjects.
The F-test for variance.
To determine if the population variances of two groups are
statistically equal – in order to
correctly choose the equal variance version of the t-test – we
use the F statistic, which is
calculated by dividing one variance by the other variance. If
the outcome is less than 1.0, the
rejection region is in the left tail; if the value is greater than
1.0, the rejection region is in the
right tail. In either case, Excel provides the information we
need.
To perform a hypothesis test for variance equality we use
Excel’s F-Test Two-Sample for
Variances found in the Data Analysis section under the Data
tab. The test set-up is very similar
to that of the t-test, entering data ranges, checking Labels box if
they are included in the data
ranges, and identifying the start of the output range. The only
unique element in this test is the
identification of our alpha level.
Since we are testing for equality of variances, we have a two
sample test and the rejection
region is again in both tails. This means that our rejection
region in each tail is 0.25. The F-test
identifies the p-value for the tail the result is in, but does not
give us a one and two tail value,
only the one tail value. So, compare the calculated p-value
against .025 to make the rejection
decision. If the p-value is greater than this, we fail to reject the
null; if smaller, we reject the null
of equal variances.
Excel Example. To test for equality between the male and
female salaries in the
population, we set up the following hypothesis test.
Research question: Are the male and female population
variances for salary equal?
Step 1: Ho: Male salary variance = Female salary variance
Ha: Male salary variance ≠ Female salary variance
Step 2: Reject Ho if p-value is less than Alpha = 0.025 for one
tail.
Step 3: Selected test is the F-test for variance
Step 4: Conduct the test
Step 5: Conclusion and interpretation. The test resulted in an F-
value less than 1.0, so the
statistic is in the left tail. Had we put Females as the first
variable we would have gotten a right
tail F-value greater than 1.0. This has no bearing on the
decision. The F value is larger than the
critical F (which is the value for a 1-tail probability of 0.25 – as
that was entered for the alpha
value).
So, since our p-value (.44 rounded) is > .025 and/or our F (0.94
rounded) is greater than
our F Critical, we fail to reject the null hypothesis of no
differences in variance. The correct t-
test would be the two-sample T-test assuming equal variances.
Other T-tests.
We mentioned that Excel has three versions of the t-test. The
equal and unequal variance
versions are set up in the same way and produce very similar
output tables. The only difference
is that the equal variance version provides an estimate of the
common variation called pooled
variance while this row is missing in the unequal variance
version.
A third form of the t-test is the T-Test: Paired Two Sample for
Means. A key
requirement for the other versions of the t-test is that the data
are independent – that means the
data are collected on different groups. In the paired t-test, we
generally collect two measures on
each subject. An example of paired data would be a pre- and
post-test given to students in a
statistics class. Another example, using our class case study
would the comparing the salary and
midpoint for each employee – both are measured in dollars and
taken from each person. An
example of NON-pared data, would the grades of males and
females at the end of a statistics
class. The paired t-test is set up in the same way as the other
two versions. It provides the
correlation (a measure of how closely one variable changes
when another does – to be covered
later in the class) coefficient as part of its output.
An Excel Trick. You may have noticed that all of the Excel t-
tests are for two samples,
yet at times we might want to perform a one-sample test, for
example quality control might want
to test a sample against a quality standard to see if things have
changed or not. Excel does not
expressly allow this. BUT, we can do a one-sample test using
Excel.
The reason is a bit technical, but boils down to the fact that the
two-sample unequal
variance formula will reduce to the one-sample formula when
one of the variables has a variance
equal to 0. So using the unequal variance t-test, we enter the
variable we are interested – such as
salary – as variable one and the hypothesized value we are
testing against – such as 45 for our
case – as variable two, ensuring that we have the same number
of variables in each column.
Here is an example of this outcome.
Research question: Is the female population salary mean = 45?
Step 1: Ho: Female salary mean = 45
Ha: Female salary mean ≠ 45
Step 2: Reject the null hypothesis is less than Alpha = 0.05
Step 3: Selected test is the two sample unequal variance t-test
Step 4: Conduct the test
Step 5: Conclusions and Interpretation. Since the two tail p-
value is greater than (>) .025
and/or the absolute value of the t-statistic is less than the
critical two tail t value, we fail to reject
the null hypothesis. Our research question answer is that, based
upon this sample, the overall
female salary average could equal 45.
Miscellaneous Issues on Hypothesis Testing
Errors. Statistical tests are based on probabilities, there is a
possibility that we could
make the wrong decision in either rejecting or failing to reject
the null hypothesis. Rejecting the
null hypothesis when it is true is called a Type I error.
Accepting (failing to reject) the null when
it is false is called a Type II error.
Both errors are minimized somewhat by increasing the sample
size we work with. A type
I error is generally considered the more severe of the two
(imagine saying a new medicine works
when it does not), and is managed by the selection of our alpha
value – the smaller the alpha, the
harder it is to reject the null hypothesis (or, put another way,
the more evidence is needed to
convince us to reject the null). Managing the Type II error
probability is slightly more
complicated and is dealt with in more advanced statistics class.
Choosing an alpha of .05 for
most test situations has been found to provide a good balance
between these two errors.
Reason for Rejection. While we are not spending time on the
formulas behind our
statistical outcomes, there is one general issue with virtually all
statistical tests. A larger sample
size makes it easier to reject the null hypothesis. What is a
non-statistically significant outcome
based upon a sample size of 25, could very easily be found
significant with a sample size of, for
example, 25,000. This is one reason to be cautious of very
large sample studies – far from
meaning the results are better, it could mean the rejection of the
null was due to the sample size
and not the variables that were being tested.
The effect size measure helps us investigate the cause of
rejecting the null. The name is
somewhat misleading to those just learning about it; it does
NOT mean the size of the difference
being tested. The significance of that difference is tested with
our statistical test. What it does
measure is the effect the variables had on the rejection (that is,
is the outcome practically
significant and one we should make decisions using) versus the
impact of the sample size on the
rejection (meaning the result is not particularly meaningful in
the real world).
For the two-sample t-test, either equal or unequal variance, the
effect size is measured by
Cohen’s D. Unfortunately, Excel does not yet provide this
calculation automatically, however it
is fairly easy to generate.
Cohen’s D = (absolute value of the difference between the
means)/the standard deviation of both
samples combined.
Note: the total standard deviation is not given in the t-test
outputs, and is not the same as the
square root of the pooled variance estimate. To get this value,
use the fx function stdev.s on the
entire data set – both samples at the same time.
Interpreting the effect size outcome is fairly simple. Effect
sizes are generally between 0
and 1. A large effect (a value around .8 or larger) means the
variables and their interactions
caused the rejection of the null, and the result has a lot of
practical significance for decision
making. A small effect (a value around .2 or less) means the
sample size was more responsible
for the rejection decision than the variable outcomes. The
medium effect (values around .5) are
harder to interpret and would suggest additional study (Tanner
& Youssef-Morgan, 2013).
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Deigeo, CA:
Bridgepoint Education.
Week3/assignemtn.docx
Problem Set Week Two
The assignment for this week involves developing an
understanding of the problem and the data that we will be
analyzing during the class. We will be using a data set of 50
employees sampled from an imaginary company to answer the
question of whether males and females receive equal pay for
performing equal work.
The questions in the assignment follow the examples provided
in the weekly guidance lectures.
The first question this week focuses on the kind of data we
have. Different levels of data allow us to do different kinds of
analysis, so we need to understand what we have to work with.
Question two involves developing the probability of randomly
picking a student who has certain characteristics from the
sample.
Question three involves finding the probability of randomly
picked employees falling within the top one-third of different
groups using Excel functions. Question four and five involve
using statistical tests to determine if the compa-ratio (an
alternate measure of pay).
The final question asks for an interpretation of your opinion on
the question of equal pay for equal work based on the work
done this week.
Both the assignment file and the data file are located in
the Course Materials section at the bottom in the Multi-
Media section. The assignment file contains all of the weekly
assignments (for Weeks 2, 3, and 4). See the labeled tabs at the
bottom of the Excel assignment file. The data in the data file
needs to be copied over into the assignment file, and you will be
set for the entire class. *Ask questions if it is not clear how to
move the data from one file to the other.
Week3/Discu1.docx
Multiple Testing / ANOVA / Effect Size
Although the initial post is due on Day 5, you are encouraged to
start working on it early, as it is a three-part discussion that
should be completed in sequential order.
Part One – Multiple Testing
Read Lecture Seven. The lectures from last week and Lecture
Seven discuss issues around using a single test versus multiple
uses of the same tests to answer questions about mean equality
between groups. This suggests that we need to master—or at
least understand—a number of statistical tests. Why can’t we
just master a single statistical test—such as the t-test—and use
it in situations calling for mean equality decisions? (This should
be started on Day 1.)
Part Two – ANOVA
Read Lecture Eight. Lecture Eight provides an ANOVA test
showing that the mean salary for each job grade significantly
differed. It then shows a technique to allow us to determine
which pair or pairs of means actually differ. What other factors
would you be interested in knowing if means differed by grade
level? Why? Can you provide an ANOVA table showing these
results? (Do not bother with which means differ.) How does this
help answer our research question of equal pay for equal work?
What kinds of results in your personal or professional lives
could use the ANOVA test? Why? (This should be started on
Day 3.)
Part Three – Effect Size
Read Lecture Nine. Lecture Nine introduces you to Effect size
measure. There are two reasons we reject a null hypothesis.
One is that the interaction of the variables causes significant
differences to occur – our typical understanding of a rejected
null hypothesis. The other is having a large sample size –
virtually any difference can be made to appear significant if the
sample is large enough. What is the Effect size measure? How
does it help us decide what caused us reject the null hypothesis?
(This should be completed by Day 5.)
Week3/Week 3 lecture 7-1.pdf
Week 3 Lecture 7
We have so far seen how we can summarize data sets using
descriptive statistics,
showing several characteristics including mean and standard
deviation. We also found that if our
data comes from a random sample of a larger population, these
descriptive statistics become
inferential statistics, and can be used to make inferences about
the population. These inferences
can then be used in statistical tests to see if things have changed
or not (equal to known standards
or other data sets or not).
We have looked at one and two sample mean tests (with the t-
test) and two sample
comparisons of variance equality (with the F test). This week
we will look at the Analysis of
Variance (ANOVA) test for mean equality between three or
more groups.
ANOVA
The first question often asked is why not just do multiple t-tests
comparing three or more
different group means? One answer involves efficiency.
Conducting multiple t-tests can
become somewhat tedious. Comparing just three groups (A, B,
and C) requires us to compare A
and B, B and C, and A and C (3 tests). With 4 groups (A, B, C,
and D) we have A and B, A and
C, A and D, B and C, B and D, C and D (6 tests)! So a single
test can save us a lot of time and is
much more efficient.
A second reason and much more important reason is that we
lose confidence in our results
when multiple tests are performed on the same data. With an
alpha of 0.05, we are 95% certain
we are right with each test, but being certain we are right for all
the tests involves multiplying the
results together, so for three tests we would be .95*.95*.95 or
86% certain; with six tests, our
confidence drops to .95^6 = .74, a long way from our desired
95% confidence. So, a single test
maintains our desired level of confidence in the outcome (Lind,
Marchel, & Wathen, 2008).
Logic
A second question asked comes from the name itself, how can
analyzing variance tell us
anything about mean differences? The answer lies in how
ANOVA works. The key
assumptions for an ANOVA analysis are that each of the groups
are normally distributed AND
have equal variances. These mean that the distributions are
shaped the same and, this allows for
an easy comparison. Take a look at the following two sets of
normal curves.
Exhibit A
Exhibit B
The means of the three sample groups in Exhibit A could clearly
come from three
populations that have the same mean, and the differences seen
are merely sampling errors.
However, we cannot say the same thing about the sample groups
in Exhibit B.
ANOVA takes the variation of all of the data in the groups
being tested (three in this
case) and compares it with the average variation for each of the
groups using the F-test
(discussed last week). Since for the Exhibit A groups, the
overall variation will be only slightly
larger than the average of the three (which are assumed to be
equal). Since the resulting F value
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-5 -4 -3 -2 -1 0 1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-10 -5 0 5 10
will not be statistically significant, we can say that the groups
are closely distributed and the
means are statistically equal.
In Exhibit B, however, the variation of the entire group would
be around three times the
variation of the average. Just by looking at the average
variance for the individual groups and
comparing it to the variance for the entire group, we can make a
judgement on how close the
distributions are, and with that a judgement on mean equality.
As with the t-test, ANOVA will let us know exactly how much
difference in the
population locations is enough to say means differ or not, we
cannot just “eyeball” it.
Hypothesis
Stating the null and alternate hypothesis for an ANOVA test is
simple, as they are always
the same:
Ho: All means equal.
Ha: At least one mean differs (Tanner & Youssef-Morgan,
2013).
You might recall from last week that we said the alternate
always states the opposite from
the null statement. If so, why isn’t our alternate: all means
differ, which seems like the opposite?
The reason is that the ANOVA test will reject the null
hypothesis if even one mean from the
groups being examined is statistically significant difference. So,
the opposite of all means differ
is actual at least one mean differs.
Data Set-up
Setting-up the data for an ANOVA analysis is just a bit more
complicated than for a t-
test. While with the T-test we just highlighted the column or
portion of a column of data
(sometimes after sorting it by a variable such as gender), for an
ANOVA test, we need to create a
table. For example, if we wanted to look at average salaries per
grade (shown in the Week 3
Lecture 8 example), we would need a table looking like this.
Doing this is fairly simple. Copy the grade and salary columns
(separately) and paste
them onto a new Excel sheet (probably in Week 3 to the right of
the questions). Then, highlight
both columns – from labels to last value – and select Data Sort.
Select sorting on the grade
variable and click on OK. Both columns are now in grade
order, and you can highlight and cut
the salaries for each grade and paste them into a new table you
create with the grade letter as the
head. When finished, you will have the input table used in
setting up an Excel ANOVA test.
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diego, CA:
Bridgeport Education.
Week3/Week 3 Lecture 8-1.pdf
Week 3 Lecture 8
Excel ANOVA Example
In our on-going investigation of whether or not males and
females are paid equally for
equal work, we have come up with contradicting results so far,
average salaries are clearly
different but average compa-ratios are not. We need to examine
reasons that might impact these
differences to see if we can explain what is going on. For
possible factors influencing individual
salaries, we need to be able to, paraphrasing what they say in
TV cop shows, “rule it out as a
suspect” in causing differences or keep it in as a cause of
differences between the gender pay
practices.
One key issue in our question that has not clearly been
examined yet is the impact of
grades on salaries. Clearly, grade differences have the potential
to complicate the issue as the
work done differs by grade. One question to ask here is, “are
average salaries equal across grade
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx
Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx

Weitere ähnliche Inhalte

Ähnlich wie Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx

DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxtheodorelove43763
 
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxDataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxsimonithomas47935
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docxMARRY7
 
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docxDataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docxtheodorelove43763
 
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxDataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxwhittemorelucilla
 
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docxMARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docxinfantsuk
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxssuserf9c51d
 
Running head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docxRunning head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docxtoltonkendal
 
Final Exam Due Friday, Week EightInstructions  Each response is.docx
Final Exam Due Friday, Week EightInstructions  Each response is.docxFinal Exam Due Friday, Week EightInstructions  Each response is.docx
Final Exam Due Friday, Week EightInstructions  Each response is.docxmydrynan
 
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docxBUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docxhumphrieskalyn
 
Week 2 – Lecture 3 Making judgements about differences bet.docx
Week 2 – Lecture 3 Making judgements about differences bet.docxWeek 2 – Lecture 3 Making judgements about differences bet.docx
Week 2 – Lecture 3 Making judgements about differences bet.docxcockekeshia
 
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docxCase Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docxwendolynhalbert
 
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docx
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docxScoreWeek 1.Measurement and Description - chapters 1 and 2.docx
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docxpotmanandrea
 
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxBUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxcurwenmichaela
 
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxBUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxjasoninnes20
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryKelvinNMhina
 
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docx
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docxScoreWeek 4Confidence Intervals and Chi Square  (Chs .docx
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docxpotmanandrea
 
ScoreWeek 5 Correlation and Regressio.docx
ScoreWeek 5 Correlation and Regressio.docxScoreWeek 5 Correlation and Regressio.docx
ScoreWeek 5 Correlation and Regressio.docxpotmanandrea
 
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docxShiraPrater50
 
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docx
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docxAshford 4 - Week 3 - Discussion 1Your initial discussio.docx
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docxfredharris32
 

Ähnlich wie Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx (20)

DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docxDataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseD.docx
 
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxDataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
 
Chi-square tests are great to show if distributions differ or i.docx
 Chi-square tests are great to show if distributions differ or i.docx Chi-square tests are great to show if distributions differ or i.docx
Chi-square tests are great to show if distributions differ or i.docx
 
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docxDataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
DataSalCompaMidAgeEESSERGRaiseDegGen1Gr1581.017573485805.70METhe o.docx
 
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docxDataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
DataIDSalaryCompa-ratioMidpoint AgePerformance RatingServiceGender.docx
 
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docxMARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
MARKETING MANAGEMENT PHILOSOPHIESCHAPTER 1 - ASSIGNMENTQuest.docx
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docx
 
Running head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docxRunning head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docx
 
Final Exam Due Friday, Week EightInstructions  Each response is.docx
Final Exam Due Friday, Week EightInstructions  Each response is.docxFinal Exam Due Friday, Week EightInstructions  Each response is.docx
Final Exam Due Friday, Week EightInstructions  Each response is.docx
 
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docxBUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
BUSI 620Questions for Critical Thinking 3Salvatore’s Chapter.docx
 
Week 2 – Lecture 3 Making judgements about differences bet.docx
Week 2 – Lecture 3 Making judgements about differences bet.docxWeek 2 – Lecture 3 Making judgements about differences bet.docx
Week 2 – Lecture 3 Making judgements about differences bet.docx
 
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docxCase Study 2 SCADA WormProtecting the nation’s critical infra.docx
Case Study 2 SCADA WormProtecting the nation’s critical infra.docx
 
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docx
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docxScoreWeek 1.Measurement and Description - chapters 1 and 2.docx
ScoreWeek 1.Measurement and Description - chapters 1 and 2.docx
 
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxBUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
 
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docxBUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
BUS 308 Week 2 Lecture 3 Setting up the F and T tests in E.docx
 
Data Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies SummaryData Analysis for Graduate Studies Summary
Data Analysis for Graduate Studies Summary
 
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docx
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docxScoreWeek 4Confidence Intervals and Chi Square  (Chs .docx
ScoreWeek 4Confidence Intervals and Chi Square  (Chs .docx
 
ScoreWeek 5 Correlation and Regressio.docx
ScoreWeek 5 Correlation and Regressio.docxScoreWeek 5 Correlation and Regressio.docx
ScoreWeek 5 Correlation and Regressio.docx
 
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
 
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docx
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docxAshford 4 - Week 3 - Discussion 1Your initial discussio.docx
Ashford 4 - Week 3 - Discussion 1Your initial discussio.docx
 

Mehr von SANSKAR20

The Assignment (3–5 pages)Complete a leadership development plan .docx
The Assignment (3–5 pages)Complete a leadership development plan .docxThe Assignment (3–5 pages)Complete a leadership development plan .docx
The Assignment (3–5 pages)Complete a leadership development plan .docxSANSKAR20
 
The assignment consist of a Case Study.  I have attached the Case St.docx
The assignment consist of a Case Study.  I have attached the Case St.docxThe assignment consist of a Case Study.  I have attached the Case St.docx
The assignment consist of a Case Study.  I have attached the Case St.docxSANSKAR20
 
The annotated bibliography will present an introduction and five ref.docx
The annotated bibliography will present an introduction and five ref.docxThe annotated bibliography will present an introduction and five ref.docx
The annotated bibliography will present an introduction and five ref.docxSANSKAR20
 
The artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxThe artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxSANSKAR20
 
The Assignment (2–3 pages including a budget worksheet)Explain th.docx
The Assignment (2–3 pages including a budget worksheet)Explain th.docxThe Assignment (2–3 pages including a budget worksheet)Explain th.docx
The Assignment (2–3 pages including a budget worksheet)Explain th.docxSANSKAR20
 
The assigment is to Research and find me resources on  Portland Sta.docx
The assigment is to Research and find me resources on  Portland Sta.docxThe assigment is to Research and find me resources on  Portland Sta.docx
The assigment is to Research and find me resources on  Portland Sta.docxSANSKAR20
 
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docx
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docxthe article.httpwww.nytimes.com20120930opinionsundaythe-m.docx
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docxSANSKAR20
 
The Arts and Royalty; Philosophers Debate Politics Please respond .docx
The Arts and Royalty; Philosophers Debate Politics Please respond .docxThe Arts and Royalty; Philosophers Debate Politics Please respond .docx
The Arts and Royalty; Philosophers Debate Politics Please respond .docxSANSKAR20
 
The assassination of Archduke Franz Ferdinand was the immediate caus.docx
The assassination of Archduke Franz Ferdinand was the immediate caus.docxThe assassination of Archduke Franz Ferdinand was the immediate caus.docx
The assassination of Archduke Franz Ferdinand was the immediate caus.docxSANSKAR20
 
The article Fostering Second Language Development in Young Children.docx
The article Fostering Second Language Development in Young Children.docxThe article Fostering Second Language Development in Young Children.docx
The article Fostering Second Language Development in Young Children.docxSANSKAR20
 
The Article Critique is required to be a minimum of two pages to a m.docx
The Article Critique is required to be a minimum of two pages to a m.docxThe Article Critique is required to be a minimum of two pages to a m.docx
The Article Critique is required to be a minimum of two pages to a m.docxSANSKAR20
 
The Apple Computer Company is one of the most innovative technology .docx
The Apple Computer Company is one of the most innovative technology .docxThe Apple Computer Company is one of the most innovative technology .docx
The Apple Computer Company is one of the most innovative technology .docxSANSKAR20
 
The artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxThe artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxSANSKAR20
 
The Article Attached A Bretton Woods for InnovationBy St.docx
The Article Attached A Bretton Woods for InnovationBy St.docxThe Article Attached A Bretton Woods for InnovationBy St.docx
The Article Attached A Bretton Woods for InnovationBy St.docxSANSKAR20
 
The analysis must includeExecutive summaryHistory and evolution.docx
The analysis must includeExecutive summaryHistory and evolution.docxThe analysis must includeExecutive summaryHistory and evolution.docx
The analysis must includeExecutive summaryHistory and evolution.docxSANSKAR20
 
The annotated bibliography for your course is now due. The annotated.docx
The annotated bibliography for your course is now due. The annotated.docxThe annotated bibliography for your course is now due. The annotated.docx
The annotated bibliography for your course is now due. The annotated.docxSANSKAR20
 
The Americans With Disabilities Act (ADA) was designed to protect wo.docx
The Americans With Disabilities Act (ADA) was designed to protect wo.docxThe Americans With Disabilities Act (ADA) was designed to protect wo.docx
The Americans With Disabilities Act (ADA) was designed to protect wo.docxSANSKAR20
 
The air they have of person who never knew how it felt to stand in .docx
The air they have of person who never knew how it felt to stand in .docxThe air they have of person who never knew how it felt to stand in .docx
The air they have of person who never knew how it felt to stand in .docxSANSKAR20
 
The agreement is for the tutor to write a Microsoft word doc of a .docx
The agreement is for the tutor to write a Microsoft word doc of a .docxThe agreement is for the tutor to write a Microsoft word doc of a .docx
The agreement is for the tutor to write a Microsoft word doc of a .docxSANSKAR20
 
The abstract is a 150-250 word summary of your Research Paper, and i.docx
The abstract is a 150-250 word summary of your Research Paper, and i.docxThe abstract is a 150-250 word summary of your Research Paper, and i.docx
The abstract is a 150-250 word summary of your Research Paper, and i.docxSANSKAR20
 

Mehr von SANSKAR20 (20)

The Assignment (3–5 pages)Complete a leadership development plan .docx
The Assignment (3–5 pages)Complete a leadership development plan .docxThe Assignment (3–5 pages)Complete a leadership development plan .docx
The Assignment (3–5 pages)Complete a leadership development plan .docx
 
The assignment consist of a Case Study.  I have attached the Case St.docx
The assignment consist of a Case Study.  I have attached the Case St.docxThe assignment consist of a Case Study.  I have attached the Case St.docx
The assignment consist of a Case Study.  I have attached the Case St.docx
 
The annotated bibliography will present an introduction and five ref.docx
The annotated bibliography will present an introduction and five ref.docxThe annotated bibliography will present an introduction and five ref.docx
The annotated bibliography will present an introduction and five ref.docx
 
The artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxThe artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docx
 
The Assignment (2–3 pages including a budget worksheet)Explain th.docx
The Assignment (2–3 pages including a budget worksheet)Explain th.docxThe Assignment (2–3 pages including a budget worksheet)Explain th.docx
The Assignment (2–3 pages including a budget worksheet)Explain th.docx
 
The assigment is to Research and find me resources on  Portland Sta.docx
The assigment is to Research and find me resources on  Portland Sta.docxThe assigment is to Research and find me resources on  Portland Sta.docx
The assigment is to Research and find me resources on  Portland Sta.docx
 
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docx
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docxthe article.httpwww.nytimes.com20120930opinionsundaythe-m.docx
the article.httpwww.nytimes.com20120930opinionsundaythe-m.docx
 
The Arts and Royalty; Philosophers Debate Politics Please respond .docx
The Arts and Royalty; Philosophers Debate Politics Please respond .docxThe Arts and Royalty; Philosophers Debate Politics Please respond .docx
The Arts and Royalty; Philosophers Debate Politics Please respond .docx
 
The assassination of Archduke Franz Ferdinand was the immediate caus.docx
The assassination of Archduke Franz Ferdinand was the immediate caus.docxThe assassination of Archduke Franz Ferdinand was the immediate caus.docx
The assassination of Archduke Franz Ferdinand was the immediate caus.docx
 
The article Fostering Second Language Development in Young Children.docx
The article Fostering Second Language Development in Young Children.docxThe article Fostering Second Language Development in Young Children.docx
The article Fostering Second Language Development in Young Children.docx
 
The Article Critique is required to be a minimum of two pages to a m.docx
The Article Critique is required to be a minimum of two pages to a m.docxThe Article Critique is required to be a minimum of two pages to a m.docx
The Article Critique is required to be a minimum of two pages to a m.docx
 
The Apple Computer Company is one of the most innovative technology .docx
The Apple Computer Company is one of the most innovative technology .docxThe Apple Computer Company is one of the most innovative technology .docx
The Apple Computer Company is one of the most innovative technology .docx
 
The artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docxThe artist Georges Seurat is one of the worlds most fascinating art.docx
The artist Georges Seurat is one of the worlds most fascinating art.docx
 
The Article Attached A Bretton Woods for InnovationBy St.docx
The Article Attached A Bretton Woods for InnovationBy St.docxThe Article Attached A Bretton Woods for InnovationBy St.docx
The Article Attached A Bretton Woods for InnovationBy St.docx
 
The analysis must includeExecutive summaryHistory and evolution.docx
The analysis must includeExecutive summaryHistory and evolution.docxThe analysis must includeExecutive summaryHistory and evolution.docx
The analysis must includeExecutive summaryHistory and evolution.docx
 
The annotated bibliography for your course is now due. The annotated.docx
The annotated bibliography for your course is now due. The annotated.docxThe annotated bibliography for your course is now due. The annotated.docx
The annotated bibliography for your course is now due. The annotated.docx
 
The Americans With Disabilities Act (ADA) was designed to protect wo.docx
The Americans With Disabilities Act (ADA) was designed to protect wo.docxThe Americans With Disabilities Act (ADA) was designed to protect wo.docx
The Americans With Disabilities Act (ADA) was designed to protect wo.docx
 
The air they have of person who never knew how it felt to stand in .docx
The air they have of person who never knew how it felt to stand in .docxThe air they have of person who never knew how it felt to stand in .docx
The air they have of person who never knew how it felt to stand in .docx
 
The agreement is for the tutor to write a Microsoft word doc of a .docx
The agreement is for the tutor to write a Microsoft word doc of a .docxThe agreement is for the tutor to write a Microsoft word doc of a .docx
The agreement is for the tutor to write a Microsoft word doc of a .docx
 
The abstract is a 150-250 word summary of your Research Paper, and i.docx
The abstract is a 150-250 word summary of your Research Paper, and i.docxThe abstract is a 150-250 word summary of your Research Paper, and i.docx
The abstract is a 150-250 word summary of your Research Paper, and i.docx
 

Kürzlich hochgeladen

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Kürzlich hochgeladen (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Excel Files AssingmentsCopy of Student_Assignment_File.11.01..docx

  • 1. Excel Files Assingments/Copy of Student_Assignment_File.11.01.2016.xlsx DataIDSalaryCompa-ratioMidpointAgePerformance RatingServiceGenderRaiseDegreeGender1GradeCopy Employee Data set to this page.The ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? Note: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.The column labels in the table mean:ID – Employee sample number Salary – Salary in thousands Age – Age in yearsPerformance Rating – Appraisal rating (Employee evaluation score)SERvice – Years of serviceGender: 0 = male, 1 = female Midpoint – salary grade midpoint Raise – percent of last raiseGrade – job/pay gradeDegree (0= BSBA 1 = MS)Gender1 (Male or Female)Compa-ratio - salary divided by midpoint Week 2This assignment covers the material presented in weeks 1 and 2.Six QuestionsBefore starting this assignment, make sure the the assignment data from the Employee Salary Data Set file is copied over to this Assignment file.You can do this either by a copy and paste of all the columns or by opening the data file, right clicking on the Data tab, selecting Move or Copy, and copying the entire sheet to this file(Weekly Assignment Sheet or whatever you are calling your master assignment file).It is highly recommended that you copy the data columns (with labels) and paste them to the right so that whatever you do will not disrupt the original data values and relationships.To Ensure full credit for each question, you need to show how you got your results. For example, Question 1 asks for several data values. If you obtain them using descriptive statistics,then the cells should have an "=XX" formula in them, where XX is the column and row number showing the value in the descriptive
  • 2. statistics table. If you choose to generate each value using fxfunctions, then each function should be located in the cell and the location of the data values should be shown.So, Cell D31 - as an example - shoud contain something like "=T6" or "=average(T2:T26)". Having only a numerical value will not earn full credit.The reason for this is to allow instructors to provide feedback on Excel tools if the answers are not correct - we need to see how the results were obtained.In starting the analysis on a research question, we focus on overall descriptive statistics and seeing if differences exist. Probing into reasons and mitigating factors is a follow-up activity.1The first step in analyzing data sets is to find some summary descriptive statistics for key variables. Since the assignment problems willfocus mostly on the compa-ratios, we need to find the mean, standard deviations, and range for our groups: Males, Females, and Overall.Sorting the compa-ratios into male and females will require you copy and paste the Compa-ratio and Gender1 columns, and then sort on Gender1.The values for age, performance rating, and service are provided for you for future use, and - if desired - to test your approach to the compa-ratio answers (see if you can replicate the values).You can use either the Data Analysis Descriptive Statistics tool or the Fx =average and =stdev functions. The range can be found using the difference between the =max and =min functions with Fx functions or from Descriptive Statistics.Suggestion: Copy and paste the compa-ratio data to the right (Column T) and gender data in column U. If you use Descriptive statistics, Place the output table in row 1 of a column to the right.If you did not use Descriptive Statistics, make sure your cells show the location of the data (Example: =average(T2:T51)Compa-ratioAgePerf. Rat.ServiceOverallMean35.785.99.0Standard Deviation8.251311.41475.7177Note - remember the data is a sample from the larger company populationRange304521FemaleMean32.584.27.9Standard Deviation6.913.64.9Range26.045.018.0MaleMean38.987.610.0S tandard Deviation8.48.76.4Range28.030.021.0A key issue in
  • 3. comparing data sets is to see if they are distributed/shaped the same. At this point we can do this by looking at the probabilities that males and females are distributed in the same way for a grade levels.2Empirical Probability: What is the probability for a:Probabilitya. Randomly selected person being in grade E or above?b. Randomly selected person being a male in grade E or above? c. Randomly selected male being in grade E or above? d. Why are the results different?3Normal Curve based probability: For each group (overall, females, males), what are the values for each question below?:Make sure your answer cells show the Excel function and cell location of the data used.AThe probability of being in the top 1/3 of the compa-ratio distribution.Note, we can find the cutoff value for the top 1/3 using the fx Large function: =large(range, value).Value is the number that identifies the x- largest value. For the top 1/3 value would be the value that starts the top 1/3 of the range,For the overall group, this would be the 50/3 or 17th (rounded), for the gender groups, it would be the 25/3 = 8th (rounded) value.OverallFemaleMaleAll of the functions below are in the fx statistical list.i.How nany salaries are in the top 1/3 (rounded to nearest whole number) for each group? Use the "=ROUND" function (found in Math or All list)iiWhat Compa-ratio value starts the top 1/3 of the range for each group?Use the "=LARGE" functioniiiWhat is the z-score for this value?Use Excel's STANDARDIZE function iv.What is the normal curve probability of exceeding this score?Use "=1- NORM.S.DIST" functionBHow do you interpret the relationship between the data sets? What does this suggest about our equal pay for equal work question?4Based on our sample data set, can the male and female compa-ratios in the population be equal to each other?AFirst, we need to determine if these two groups have equal variances, in order to decide which t-test to use.What is the data input ranged used for this question:Step 1:Ho:Ha:Step 2:Decision Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place cell B77 in the output location box.Step 5:Conclusion and InterpretationWhat is the p-value:Is
  • 4. the P-value < 0.05 (for a one tail test) or 0.025 (for a two tail test)?What is your decision: REJ or NOT reject the null?What does this result say about our question of variance equality?BAre male and female average compa-ratios equal?(Regardless of the outcome of the above F-test, assume equal variances for this test.)What is the data input ranged used for this question:Step 1:Ho:Ha:Step 2:Decision Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place cell B109 in the output location box.Step 5:Conclusion and InterpretationWhat is the p-value:Is the P-value < 0.05 (for a one tail test) or 0.025 (for a two tail test)?What is your decision: REJ or NOT reject the null?What does your decision on rejecting the null hypothesis mean?If the null hypothesis was rejected, calculate the effect size value:If the effect size was calculated, what doe the result mean in terms of why the null hypothesis was rejected?What does the result of this test tell us about our question on salary equality?5Is the Female average compa-ratio equal to or less than the midpoint value of 1.00?This question is the same as: Does the company, pay its females - on average - at or below the grade midpoint (which is considered the market rate)?Suggestion: Use the data column T to the right for your null hypothesis value.What is the data input ranged used for this question:Step 1:Ho:Ha:Step 2:Decision Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place cell B162 in the output location box.Step 5:Conclusion and InterpretationWhat is the p-value:Is the P-value < 0.05 (for a one tail test) or 0.025 (for a two tail test)?What, besides the p- value, needs to be considered with a one tail test?Decision: Reject or do not reject Ho?What does your decision on rejecting the null hypothesis mean?If the null hypothesis was rejected, calculate the effect size value:If the effect size was calculated, what doe the result mean in terms of why the null hypothesis was rejected?What does the result of this test tell us about our question on salary equality?6Considering both the salary information in the lectures and your compa-ratio information, what conclusions can you reach about equal pay for equal
  • 5. work?Why - what statistical results support this conclusion? Week 3Week 3ANOVAThree QuestionsRemember to show how you got your results in the appropriate cells. For questions using functions, show the input range when asked.Group name:G1G2G3G4G5G61One interesting question is are the average compa-ratios equal across salary ranges of 10K each.Salary Intervals: 22-2930-3940-4950-5960-6970-79While compa-ratios remove the impact of grade on salaries, are they different for different pay levels,Compa-ratio values: that is are people at different levels paid differently relative to the midpoint? (Put data values at right.)What is the data input ranged used for this question:Step 1:Ho:Ha:Step 2:Decision Rule:Step 3:Statistical test:Why?Step 4:Conduct the test - place cell b16 in the output location box.Step 5:Conclusions and InterpretationWhat is the p-value?Is P-value < 0.05?What is your decision: REJ or NOT reject the null?If the null hypothesis was rejected, what is the effect size value (eta squared)?If calculated, what does the effect size value tell us about why the null hypothesis was rejected?What does that decision mean in terms of our equal pay question?2If the null hypothesis in question 1 was rejected, which pairs of means differ?Why?Groups ComparedDiffT+/- TermLowto HighDifference Significant?Why?G1 G2G1 G3G1 G4G1 G5G1 G6G2 G3G2 G4G2 G5G2 G6G3 G4G3 G5G3 G6G4 G5G4 G6G5 G63Since compa is already a measure of pay for equal work, do these results impact your conclusion on equal pay for equal work? Why or why not? Week 4Regression and CorellationFive QuestionsCompa- ratioMidpointAgePerformance RatingServiceRaiseDegreeGenderRemember to show how you got your results in the appropriate cells. For questions using functions, show the input range when asked.1Create a correlation table using Compa-ratio and the other interval level variables, except for Salary.Suggestion, place data in columns T - Y.What range was placed in the Correlation input range box:Place C9 in output box.bWhat are the statistically
  • 6. significant correlations related to Compa-ratio?T =Significant r =cAre there any surprises - correlations you though would be significant and are not, or non significant correlations you thought would be?dWhy does or does not this information help answer our equal pay question?2Perform a regression analysis using compa as the dependent variable and the variables used in Q1 along withincluding the dummy variables. Show the result, and interpret your findings by answering the following questions.Suggestion: Place the dummy variables values to the right of column Y.What range was placed in the Regression input range box:Note: be sure to include the appropriate hypothesis statements.Regression hypothesesHo:Ha:Coefficient hyhpotheses (one to stand for all the separate variables)Ho:Ha:Place B36 in output box.Interpretation:For the Regression as a whole:What is the value of the F statistic: What is the p-value associated with this value: Is the p-value < 0.05?What is your decision: REJ or NOT reject the null?What does this decision mean? For each of the coefficients: MidpointAgePerf. Rat.ServiceGenderDegreeWhat is the coefficient's p-value for each of the variables: Is the p-value < 0.05?Do you reject or not reject each null hypothesis: What are the coefficients for the significant variables?Using the intercept coefficient and only the significant variables, what is the equation?Compa-ratio = Is gender a significant factor in compa- ratio?Regardless of statistical significance, who gets paid more with all other things being equal?How do we know? 3What does regression analysis show us about analyzing complex measures?4Between the lecture results and your results, what else would you like to knowbefore answering our question on equal pay? Why?5Between the lecture results and your results, what is your answer to the questionof equal pay for equal work for males and females? Why? Excel Files Assingments/Randomized BUS308 Data - 08.01.2017.xlsm DataIDSalaryCompaMidpoint AgePerformance
  • 7. RatingServiceGenderRaiseDegreeGender1Gr163.31.1105734858 05.70METhe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 228.30.914315280703.90MBNote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.335.51.144313075513.61FB464.81.13757421001605.51M EThe column labels in the table mean:548.41.0084836901605.71MDID – Employee sample number Salary – Salary in thousands 678.41.1706736701204.51MFAge – Age in yearsPerformance Rating - Appraisal rating (employee evaluation score)741.51.0374032100815.71FCService – Years of service (rounded)Gender – 0 = male, 1 = female 824.71.073233290915.81FAMidpoint – salary grade midpoint Raise – percent of last raise977.21.152674910010041MFGrade – job/pay gradeDegree (0= BSBA 1 = MS)1023.41.019233080714.71FAGender1 (Male or Female)Compa - salary divided by midpoint1123.81.03523411001914.81FA1259.71.048575295220 4.50ME1342.51.0624030100214.70FC14241.04223329012161F A1523.41.018233280814.91FA1644.21.106404490405.70MC17 68.41.2005727553131FE1834.91.1263131801115.60FB1924.71. 072233285104.61MA2033.91.0953144701614.80FB2175.11.121 6743951306.31MF2250.21.046484865613.81FD2323.91.038233 665613.30FA2459.51.239483075913.80FD2523.71.0292341704 040MA1.081640.070251263343.89619.28677180522625.71.116 232295216.20FA1.062480.078906653746.719.42893031882736. 20.906403580703.91MC.2878.11.166674495914.40FF2968.71.0 26675295505.40MF3048.41.0094845901804.30MD3123.21.008 232960413.91FA3229.20.942312595405.60MB CompaABCDEF3365.91.156573590905.51MEF mean1.01416666671.12051.03751.12433333331.1751.1343427. 80.898312680204.91MBm1.05733333330.8921.08333333330.99 951.08181.122753524.41.062232390415.30FA3624.11.0492327 75314.30FAF
  • 8. Stdev0.03378766140.03116087290.01767766950.07516204720. 04949747470.02121320343723.61.024232295216.20FAm0.0248 2606160.01905255890.08779711460.0289913780.06070291960. 03326033673857.91.0155745951104.50ME3934.91.1253127906 15.50FB4024.11.048232490206.30MA4145.41.134402580504.3 0MC4223.91.0372332100815.71FA4376.41.1406742952015.50F F4464.11.1245745901605.21ME4549.61.034483695815.21FD46 65.71.1535739752003.91ME4757.31.006573795505.51ME4867. 41.1835734901115.31FE4961.71.0825741952106.60ME5063.81. 1195738801204.60ME Sheet1SalCompaGMidAgeEESSRGRaiseDegSUMMARY OUTPUTSUMMARY OUTPUT241.0451233290915.8124.21.0531233080714.71Regres sion StatisticsRegression Statistics23.41.018123411001914.81Multiple R0.7050179484Multiple R0.993128693523.41.017123329012161R Square0.4970503076R Square0.986304601822.60.9831233280814.91Adjusted R Square0.4132253589Adjusted R Square0.984022035522.90.9951233665613.30Standard Error0.0561252686Standard Error2.435282266523.11.0031232295216.20Observations50Obs ervations5023.31.0111232960413.9122.70.9851232390415.30A NOVAANOVA23.51.0231232775314.30dfSSMSFSignificance FdfSSMSFSignificance F231.0021232295216.20Regression70.13075007750.018678582 55.92962256620.0000782906Regression717938.4246118632562 .632087409432.10336381775.29906273684337E- 37241.04212332100815.71Residual420.13230192250.00315004 58Residual42249.0851881375.930599717535.51.145131307551 3.61Total490.263052Total4918187.509834.71.11913131801115. 6035.51.14613144701614.80CoefficientsStandard Errort StatP- valueLower 95%Upper 95%Lower 95.0%Upper 95.0%CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper
  • 9. 95.0%35.21.1361312790615.50Intercept0.94862387720.081716 771611.608680311900.78371275571.11353499870.7837127557 1.1135349987Intercept-4.87145445873.54570071- 1.37390458390.1767599037-12.02696818532.2840592678- 12.02696818532.284059267840.41.0114032100815.71Mid0.003 49950270.00064925685.39001333560.00000297670.002189249 50.00480975590.00218924950.0048097559Mid1.22841550480.0 28171330843.60516416291.32019333894083E- 361.17156345761.28526755211.17156345761.285267552142.71 .06814030100214.70Age0.00055277380.00144594460.38229252 560.7041721007-0.00236526050.0034708081- 0.00236526050.0034708081Age0.03682794250.06273971240.58 699571780.5603489282-0.08978592310.1634418081- 0.08978592310.163441808153.41.1121484865613.81EES- 0.00184625530.0010252155-1.80084613710.0789105539- 0.00391522390.0002227133-0.00391522390.0002227133EES- 0.08215797850.0444842245-1.84690144510.0718147225- 0.1719307780.007614821- 0.1719307780.00761482151.51.0721483075913.80SR- 0.00041822880.0018278101-0.22881413450.820123898- 0.0041068990.0032704414-0.0041068990.0032704414SR- 0.07784845290.079308905-0.98158527010.3319249969- 0.23790030290.0822033971- 0.23790030290.082203397149.81.0371483695815.21G0.064664 99610.01833966973.52596296240.0010348660.02765404430.10 16759480.02765404430.101675948G2.91450831120.795760511 33.66254453430.0006935491.30859858364.52041803891.30859 858364.520418038968.31.19815727553131Raise0.01465495640 .01390889761.05363896080.2980722322- 0.01341433540.0427242483- 0.01341433540.0427242483Raise0.67632948240.60350876891. 12066222950.2687988764-0.54160052151.8942594864- 0.54160052151.894259486465.41.14815734901115.31Deg0.001 46759880.01610982490.09109961250.9278465471- 0.03104334410.0339785418- 0.03104334410.0339785418Deg0.03450444820.69900727420.04
  • 10. 936207310.9608647532-1.37614934191.4451582383- 1.37614934191.445158238378.41.171674495914.4075.91.13316 742952015.50241.0440233285104.6123.31.0120234170404024. 11.0490232490206.3027.50.8870315280703.90t-Test: Two- Sample Assuming Equal Variances27.10.8750312595405.6027.70.8950312680204.91Vari able 1Variable 240.81.0190404490405.70Mean1.066841.0483643.91.09704035 80703.91Variance0.004301640.00648099411.0250402580504.3 0Observations252548.71.01404836901605.71Pooled Variance0.00539131549.41.02904845901804.30Hypothesized Mean Difference064.41.130573485805.70df4864.51.13205742100160 5.51t Stat0.889835278458.91.03305752952204.50P(T<=t) one- tail0.18899628757.91.0160573590905.51t Critical one- tail1.6772241961591.03505745951104.50P(T<=t) two- tail0.377992574163.31.11105745901605.21t Critical two- tail2.010634757656.80.99605739752003.91581.0170573795505. 5162.41.09405741952106.6063.81.1205738801204.60791.17906 736701204.51771.149067491001004174.81.11606743951306.31 761.1350675295505.40 Week1/Discusion 1.docx Part One – Analysis Toolkpak Add the “Analysis Toolpak” to Excel. Be sure you are you able to copy, sort, and find averages and sums in Excel. Use the Load the Analysis ToolPak (Links to an external site.)Links to an external site. article for information on how to load this in Excel. (This should be completed on Day 1.) Part Two – Data Characteristics Read Lecture One on descriptive data and review the Employee Data . Be sure to familiarize yourself with the different variables shown on the Data tab. In this course, we will be using the Employee Data and statistical tools to answer a single research question: In our BUS308 company, are the males and females paid equally for equal work?
  • 11. Lecture One discusses different ways data values can be classified. In our data set for the equal pay for equal work assignment, students in the past have correctly identify the variable gender (coded M and F for male and female respectively) as nominal level data, but they often see gender1 (coded 0 and 1 for male and female respectively) as interval or ratio level data. Why? What could cause this wrong classification? What data do you use in your personal or professional lives that might suffer from not being correctly labeled/understood? (This should be started on Day 1.) Part Three –Descriptive Statistics Read Lecture Two on describing data sets and view The Role of Data & Analytics Today (Links to an external site.)Links to an external site.video. Lecture Two discusses several different ways of summarizing a data set--central location, variability, etc. Often, business reports provide a mean or average value for some measure (such as average number of defects per production run). Why is the average alone not enough information to make informed judgements about the result? What other descriptive statistic should be included? Why? Can you illustrate this with an example from your personal or professional lives? (This should be started on Day 3.) Part Four – Probability Read Lecture Three on probability. Lecture Three introduces the idea of probability—a measure of how likely it is to get a particular outcome. Looking at outcomes as resulting from probabilities (somewhat random outcomes/selections) rather than fixed constants often changes the way we see things. How does considering the salary outcomes in our sample the result of a probabilistic sample rather than a completely accurate and precise reflection of the population change how we interpret the sample statistic outcomes? What results in your personal or professional lives could be viewed this way? What differences would this cause? Why? (This should be completed by Day 5.)
  • 12. Week1/Discussion 2.docx Post one question that you had related to the material this week. Conduct research and provide the answer to the question you posted. Be sure to provide the source. Week1/excel tool pak needed.txt Load the Analysis ToolPak https://support.office.com/en-us/article/Load-the-Analysis- ToolPak-6a63e598-cd6d-42e3-9317- 6b40ba1a66b4?CorrelationId=b44046dd-0bbf-472c-aaaf- 1c7fd6858b56&ui=en-US&rs=en- US&ad=US&ocmsassetID=HP010021569 Microsoft. (2007). Copy Excel data or charts to Word (Links to an external site.)Links to an external site.. Retrieved from http://office.microsoft.com/en-us/word-help/copy-excel-data-or- charts-to-word-HP010198874.aspx Multimedia AnalystSoft Inc. StatPlus:mac LE (Links to an external site.)Links to an external site.. Retrieved from http://www.analystsoft.com/en/products/statplusmacle
  • 13. Week1/Week 1 Lecture 1.pdf Week 1 Lecture 1 Class Approach to Statistics Statistics is basically a set of tools that allow us to get information out of data sets (we will get to the more formal definition below). As such, it can be taught as a math class (focusing on formulas), a logic class (If this, then that), or as a case study (here is the problem, what are we going to do). We have chosen the later – we will be examining statistical tools and approaches as they help us answer a business question. The question we will focus on involves the Equal Pay Act, specifically the requirement that males and females be paid the same if they are performing equal or equivalent work. So, our business research question is: are males and females paid the same for equal work? In starting out with our case, we will have a data set that provides a number of variables (measures that can assume different values with different subjects) for each of 50 employees selected randomly from our company. (The company and employee data are fictitious, of course). For each employee (labeled 1 thru 50 in the ID column), we will have:
  • 14. • Salary, the annual salary, rounded to the nearest hundred dollars; for example, a salary of 32, 650 would be rounded to 32.7. • Compa (short for compa-ratio or Comparative ratio) – a measure of how a salary relates to the midpoint of a pay range, found by dividing the salary by the pay range midpoint. • Midpoint – the middle of the salary range assigned to each grade. • Age – the employee’s age (rounded to the nearest birthday) • Performance rating – a value between 1 and 100 showing the manager’s rating how good the employee performs their job • Service – the years the employee has been with the company (rounded to the nearest hiring anniversary • Gender – a numerical code indicating the employee’s gender (1 = female, 0 = male) • Raise – the percent increase in pay of the last performance based increase in salary • Degree – the educational achievement of the employee (0 = BA/BS, 1 = Master’s or more) • Gender1 – a letter code indicating the employee’s gender (F = female, M = male) • Grade – the employee’s pay level – grade A is the lowest (entry level) and grade E is the highest.
  • 15. During each week, we will examine some of these variables to see if they help us answer the question of males and females receiving equal pay for equal work. In the weekly lectures, we will work with the variable salary. In the homework assignments for weeks 2, 3, and 4; you will have the same questions but work with the variable compa, which – by definition – is an alternate method of looking at pay. If you have any questions about this description of our course case, please ask them in either Ask Your Instructor or in one of the class posts. Introduction to Statistics Formally, we can define statistics as “the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions” (Lind, Marchel, & Wathen, 2008, p. 4). This makes statistics and statistical analysis a subset of both critical thinking and quantitative thinking, both skills that Ashford University has identified as critical abilities for any student graduating with a degree. H. G. Wells, the author, once said that “one day quantitative reasoning will be as necessary for effective citizenship as the ability to read.” In this class, we will focus mostly on the analyzing and interpreting of data that we will assume has been correctly collected to allow us to use it to make decisions with. In doing this, there is a fairly well agreed upon approach to understanding
  • 16. what the data is trying to tell us. This approach will be followed in this class, and involves: • Identifying what kinds of data we are working with, then • Developing summary statistics for the data • Developing appropriate statistical tests to make decisions about the population the data came from. • Drawing conclusions from the test results to answer the initial research question(s). Data Characteristics We all recognized that not all data is the same. Saying we “like” something is quite a bit different than saying, the part weighs 3.7 ounces. We treat these two kinds of data in very different ways. The first distinction we make in data types involves identifying our data as either qualitative or quantitative. Qualitative data identifies characteristics or attributes of something being studied. They are non-numeric and can often be used for grouping purposes. Some examples include nationality, gender, type of car, etc. Quantitative data, on the other hand, tend to measure how much of what is being examined exists. Examples of these kinds of variables include, money, temperature, number of drawers in a desk, etc. Within quantitative data, we can identify continuous and discrete data types. Continuous
  • 17. data variables can assume any value with limits. For example, depending upon how accurate our measuring instrument is, the temperature, in degrees Fahrenheit, could be 75, 75.3 75.32, 75.3287468…. There are no natural “breaks” in temperature even though we typically only report it in whole numbers and ignore the decimal portion. Height would be another continuous data variable. Discrete data, on the other hand, has only certain values, and shows breaks between these values. The number of drawers in a desk could be 3 or 4, but not 3.56, for example. The second important approach in defining data is the “level” of the data. There exist four distinct levels: • Nominal – these serve as names or labels, and could be considered qualitative. The basic use for this level is to identify distinctions between and among subjects, such as ID numbers, gender identification (Male or Female), car type (Ford, Nissan, etc.). We can basically only count how many exist within each group of a nominal data variable. • Ordinal – these data have the same characteristics as nominal with the addition of being rankable – that is, we can place them in a descending or ascending order. One example is rating something using good, better, best (even if coded 1 = good, 2 =
  • 18. better, and 3 = best). We can rank this preference, but cannot say the difference between each data point is the same for everyone. • Interval – this level of data adds the element of constant differences between sequential data points – while we did not know the difference between good and better or better and best; we do know the difference between 57 degrees and 58 degrees – and it is the same as the difference between 67 and 68 degrees. • Ratio – this level adds a “meaningful” 0 – which means the absence of any characteristic. Temperature (at least for the Celsius and Fahrenheit scales)) does not have a 0 point meaning no heat at all. A scale with a meaningful 0, such as length, has equal ratios – the ratio of 4 feet to 2 feet has the same value as that of 8 feet to 4 feet – both are 2. This cannot be said of temperatures, for example (Tanner & Youssef-Morgan, 2013). These are often recalled by the acronym NOIR. Knowing what kinds of data we have is important, as it identifies what kinds of statistical analysis we can do. Equal Pay Question At the end of each lecture, we will apply the topics discussed to our research question of do males and females receive equal pay for equal work. In this
  • 19. section, we will look at identifying the data characteristics for each of our data variables. In looking at our first classification of qualitative versus quantitative, we have Qualitative Quantitative Continuous Discrete ID Compa Salary Gender Age Midpoint Gender1 Raise Performance Rating Degree Service Grade Most of these are fairly clear – the variables in the qualitative column merely identify different groups. The continuous variable lists can all – theoretically – be carried out to many decimal points, while those in the discrete list all have distinct values within their range of available values. The identification for the NOIR classification are shown below. Nominal Ordinal Interval Ratio ID Degree
  • 20. Performance Rating Salary Gender Grade Midpoint Gender1 Service Compa Age Raise While an argument can be made that Performance Ratings, being basically opinions, are really ordinal data; for this class let us assume that they are interval level as many organizations treat them as such. An important reason for always knowing the data level for each variable is that we are limited to what can be done with different levels. With nominal scales, we can count the differences. With ordinal scales, we can do some limited analysis of differences using certain tests that are not covered in this course. Both interval and ratio scales allow us to do both inferential and descriptive analysis (Tanner & Youssef-Morgan, 2013). Most of the statistical tools we will cover in this class require data scales that are at least interval in nature. During our last two weeks, we will look at some techniques for nominal and ordinal data measures. In Lecture 2, we will start to see what kinds of things we can do with each level of the NOIR characteristics. If you have any questions about this material, please ask
  • 21. questions in either Ask Your Instructor or in the discussion area. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Diego, CA: Bridgeport Education. Week1/Week 1 Lecture 2.pdf Week 1 Lecture 2 In Lecture 1, we focused on identifying the characteristics – quantitative, qualitative, discrete, continuous, NOIR – of the data. In this section, we will take a look at how we can summarize a data set with descriptive statistics, and how we can ensure that these descriptive statistics can be used as inferential statistics to make inferences and judgments about a larger population. We are moving into the second step of the analysis approach mentioned in Lecture 1.
  • 22. Descriptive Statistics Once we understand the kinds of data we have, the natural reaction is want to summarize it – reduce what may be a lot of data into a few measures to make sense of what we have. We start with summary descriptions, the principle types focus on location, variability, and likelihood. (Note, we will deal with likelihood, AKA probability, in Lecture 3 for this week.) For nominal data, our analysis is limited to counting how many exist in each group, such as how many cars by car company (Ford, Nissan, etc.) are in the company parking lot. However, we can also use nominal data as a group name to form different groups to examine, in this case we do nothing with the actual data label, but do some analysis with the data in each group. An example related to our class case: we can group the salary data values into two groups using the nominal variable gender (or gender1). With ordinal scales, we can do some limited analysis of differences using certain tests; most of which are not covered in this course. We can also use ordinal data as grouping labels, for example we could do some analysis of salary by educational degree. Both interval and ratio scales allow us to do both inferential and descriptive analysis (Tanner & Youssef-Morgan, 2013). Most of the statistical tools we will cover in this class require data scales that are at least interval in nature.
  • 23. Location measures. When working with interval or ratio level variables, the first measure most researchers look are indications of location – mean, median, mode. The mean is the numerical average of the data – simply add the values and divide by the total count. The median is the middle of the data set; rank order the values form low to high or high to low, and pick the value that is in the middle. This is easy if we have an odd number of values, we can find the middle exactly. If we have an even number of variables, the middle is the average of the middle two values. For example, in this data set: 2, 3, 4, 5, 6, we have five values and the median is 4. However, in this data set: 2, 3, 4, 5, we have only four values and the median is the average of the middle 2 numbers = (3 + 4)/2 = 7/2 = 3.5. Finally, the mode is the most frequently occurring value; as such, it may or may not occur. And, there may be more than one mode in any data set. Generally, the mean is the most useful measure for a data set, as it contains information regarding all the values. It is the location measure that is used in many statistical tests. The symbol for the mean of a population is μ – called mu – while we use �� – sometimes typed as x- bar – for the sample mean. Variation measures. After finding our mean (or other center measure), we generally want to know how consistent the data is – that is, is the data
  • 24. bunched around the center, or is it spread out. The more spread out a data is, the less any single measure accurately describes all of the data. Looking at the consistency (or lack of consistency) in a data set will often give us a different understanding of what is going on. A simple example, if we have two departments in a company that each averaged 3.0 on a question in a company morale survey, we might be tempted to say they were the same. However, if we looked at the actual scores and saw that one department had individual scores of 3, 3, 3, 3, 3, and 3 while the other department’s scores were 5, 5, 5, 1, 1, and1 we can now see that the groups are quite a bit different. The mean alone did not provide enough information to interpret what was going on in each group. We have 3 general measures of variation – range, standard deviation, and variance. Range is simply the difference between the largest and smallest value (largest – smallest = range). Standard deviation and variance are related values. The variance is a somewhat awkward measure to initially understand. To calculate it, we first take the difference between each value and mean of the entire group. This outcome will have both positive and negative values, and if we add them together we would get a result of 0. So, to eliminate the negative values, we square each outcome. Then we get the sum these squared values and divide it by the count. (Note: this is the same as the mean of the squared differences.) For example, the
  • 25. variance of this data set (2, 3, 4) would be: • Mean = (2 + 3 + 4)/3 = 9/3 = 3 • Variance = ((2 - 3)^2 + (3 – 3)^2 + (4 – 3)^2)/3 = ((-1)^2 + (0)^2 + (1)^2))/3 = (1 + 0 + 1)/3 = 2/3 = 0.667. This gives us an awkward measure – the variance of something measured in inches, for example, would be measured in inches squared – not a measure we all use on a daily basis. The standard deviation changes this awkward measure to one that makes more intuitive sense. It does so by taking the positive square root of the variance. This would give us, for our inches measure a result that is expressed in inches. The standard deviation is always expressed in the same units as the initial measure. For our example above with the variance of 0.667, the standard deviation would be the square root of 0.667 or 0.817. Both the variance and standard deviation require data that is at least interval in nature. The standard deviation is about 1/6 of the range, and is considered the average difference from the mean for all of the data values in the set (Tanner & Youssef-Morgan, 2013). Technical point – both the variance and the standard deviation have two different formulas, one for populations and one for samples. The difference is that with the sample formula, the average is found with the (count -1) rather than the full count. This serves to increase the estimate, since the data in a sample will not be as
  • 26. spread out as in the population (unlikely to have the extreme largest and smallest value). The symbol for the population standard deviation is σ, while the sample standard deviation symbol is s. In statistics, since we deal with samples, we use the sample formulas – to be discussed below. The nice thing about descriptive statistics is that Excel will do all of the math calculations for us, we just need to know how to interpret our results. For a video discussion of descriptive statistics take a look at Descriptive statistics from the Kahn Academy - https://www.khanacademy.org/math/probability/descriptive- statistics. Research Question Example Now that we have identified the data types for each of our variables, we need to develop some descriptive statistics – particularly for those at the interval and ratio level. In our discussion and example of salary, we will be using a salary sample of 50 that does not exactly match the data that is available in your data set. It is not significantly different, and should be considered to come from a different sample of the same population. The results will be accurate enough to consider them in answering our equal pay for equal work question for the sample results provided to the class.
  • 27. Equal Pay Question. The obvious first question to ask is what is the overall average salary, and what is the average for the males and females separately? This descriptive statistic should also be accompanied with the standard deviation of each group to examine group diversity. (Reminder: the salary results presented each week will not exactly match the results from this class’ data set if you choose to duplicate the results presented in this lecture. The results are statistically close enough to use to answer our assignment question on equal pay.) The related question concerns the standard deviation of each of the three groups (entire sample, males, and females) – what is the standard deviation for each group. In setting up the data for this, copy the salary data column (B1:B51) and paste it on a new sheet. This is a recommend practice – never do analysis on the raw data set so that relationships between various columns are not compromised. Then copy the Gender column (M and F) and paste it beside the salary data. Using Excel’s sort function, sort the two columns (at the same time) using Gender as the sort key. This will give you the salary data grouped by males and females. The screen shot below displays the results using both the Descriptive Statistics option found in the Data Analysis list and the =Average and = Stdev.s functions found in the fx or Formulas – statistics section.
  • 28. Note a couple of things about the Descriptive Statistics output. First, since for both the overall and female groups, the input range included the label Sal, this was shown at the top. The male range did not have a label, so Column 1 was automatically used. We can use Descriptive Statistics for any number of contiguous columns in the input range box. For reporting purposes, we should change the Sal and Column 1 labels to Overall, Female, and Male. https://www.khanacademy.org/math/probability/descriptive- statistics The second issue about the descriptive statistics output is that it contains much more information than we were looking for. This is a good tool for an overall look at a data set. Looking at the fx values and those from the Descriptive Statistics output, we can see that the means and standard deviations are identical for each group – so, it does not matter which approach you use. Now, looking at the actual statistical values, we see that the overall all salary mean (45) lies in the middle between the lower female mean (38) and the upper male mean (52) – overall means will always be flanked by sub-group means, but the differences will not also be
  • 29. equidistant. The standard deviations, on the other hand are much closer together with the overall (19.2) being somewhat larger than either the female (18.3) or male (17.8). This is also somewhat common – the variation in the entire group is generally a bit larger than for the sub-groups. While we did not specifically ask for it, we can also note that the range in each group is very close 22 – 77 for overall and females and 24 – 77 for males. So, what can we say at this point? It appears that males and females have about the same range and standard deviations for salaries, but that females appear to average less than the males. However, at this point, we cannot say anything about our equal pay for equal work question as the Salaries have not been divided into equal work groups. So, at this point we have some interesting information, but no conclusive results yet.
  • 30. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Diego, CA: Bridgeport Education. Week1/Week 1 Lecture 3.pdf Week 1 Lecture 3 A second way of looking at data differences or similarities is to consider how likely a given outcome is. In looking at our data set, we could ask questions such as, what is the
  • 31. probability (likelihood) of a male or female salary exceeding 60K, what is the probability that a person’s salary is within the range of 38K to 52K, etc. Probability questions about a data begin to help us look at distributions, a topic we will delve into in more detail in the upcoming weeks. Probability is the likelihood that a specific outcome will occur; it is always positive and ranges from 0 (will never occur) to 1.00 (will always occur). Generally speaking, we have 3 kinds of probability – empirical (counting actual outcomes), theoretical (using theory/logic to determine what should occur) and subjective (our individual guesses and feelings). Obviously the theoretical and empirical are the best approaches for business research questions, but at times the best we can get is an expert’s guess. Theoretical probability is just as it sounds – the theory of what the probability should be. For example, if we flip a fair coin, our theory says we should get heads 50% of the time – one outcome out of the two possible. If we flip the coin a number of times, we will get the empirical probability – the number of actual heads divided by the number of flips. While this is generally close to .5, achieving this is usually the result of a lot of flips rather than just a few (even up to 100) (Lind, Marchel, & Wathen, 2008). While many approaches to theoretical probability exist (binominal, hypergeometric, Poisson, etc.) (Lind, Marchel, & Wathen, 2008) exist, we will look at just two particular types – the binominal and the normal curve based probabilities. The
  • 32. binominal requires that we have only two outcomes, such as heads and tails when flipping a coin. This is not as restrictive as it might seem, as we can always create 2 groups out of what we have. For example, if we have a single die (one of a pair of dice), we could form several two group situations – evens versus odds, 1 – 3 versus 4 – 6, etc. We will use the binominal to discuss several basic probability rules. Four general probability (P) concerns exist. Typically, we want to know one or more of the following probabilities: • of something happening – called P(event), • of two things happening together – called joint probability: P(A and B), • of either one or the other but not both events occurring – P(A or B), • of something occurring given that something else has occurred – conditional probability: P(A|B) (read as probability of A given B). • Compliment rule: P(not A) = 1- p(A) (Lind, Marchel, & Wathen, 2008). Two other issues are needed, the idea of mutually exclusive means that the elements of one data set do not belong to another – for example, males and pregnant are mutually exclusive data sets. The other term we frequently hear with probability is collectively exhaustive – this simply means that all members of the data set are listed (Lind, Marchel, & Wathen, 2008).
  • 33. Some rules, which apply for both theoretical and empirical based probabilities, for dealing with these different probability situations include: • P(event) = (number of success)/(number of attempts or possible outcomes) • P(A and B) = P(A)*P(B) for independent events or P(A)*P(B|A) for dependent events (This last is called conditional probability the probability of B occurring given that A has occurred). • P(A or B) = P(A) + P(B) – P(A and B); if A and B cannot occur together (such as the example of male and pregnant) then P(A and B) = 0 • P(A|B) = P(A and B)/P(B) (Lind, Marchel, & Wathen, 2008). Binominal Probability Binominal probabilities deal with dichotomous outcomes – those that have only 2 possible outcomes. A typical example is flipping a coin, the result can only be a head or tail. Another common example is gender, we are born as either male or female. The interesting element about binominal outcomes is that while every single trail (such as the flip of a coin) has the same probability, the outcome of a group of trails will not necessarily match that probability. For example, the probability of getting exactly 5 heads out of 10 flips of a fair coin is not .5, but rather 24.6%! This is due to the number of ways the 10
  • 34. outcomes can be distributed (Lind, Marchel, & Wathen, 2008). We can turn almost any outcome into a dichotomous outcome by creating groups. For example, we can say that when we toss a six-sided die (half of a pair of dice), we have two outcomes: getting a 1 or 2 versus getting anything else. Now we have two outcomes of interest instead of the original 6 possible outcomes. Tables exist to determine the likelihood, but the easier way is to use the Excel functions found in the fx or Formulas lists. For example, Excel’s BINOM.DIST functions can quickly provide us with the correct probability of getting a certain number of outcomes within a given number of attempts. Research Question Example Understanding the distribution of the data is an important element of understanding what the data is trying to tell us. Probabilities can give us a sense of the data set and allow us to compare results across groups. Equal Pay Example. In thinking about equal pay, we might be interested in the probability that both males and females appear to be grouped in similar ways as the overall group. This would be an example of an empirical probability, as we would be counting how many of each group fall into each of the ranges we would set up.
  • 35. We noted that the overall salary mean was 45, with the female mean equaling 38 and the male mean equaling 52. This suggests one group to look at – what is the probability of someone having a salary between 38 and 52 in each group – overall, females, and males? Translating this into “probability” terms, we want to know: • What is p(38 <= salary <=52)? What is the probability that salaries are between 38 and 52 inclusive? • What is p(38 <= salary <=52|Female)? What is the probability that salaries are between 38 and 52 inclusive, given a female salary? Or, if a female, what is the probability that salaries are between 38 and 52 inclusive? • What is p(38 <= salary <=52|Male)? What is the probability that salaries are between 38 and 52 inclusive, given a Male salary? Or, if a Male, what is the probability that salaries are between 38 and 52 inclusive? We know a couple of things right off. First the entire sample has 50 members, and we have 25 males and females. These become the denominators in the respective probabilities. Since you do not have the exact data set we are working with, the counts for salaries in these ranges are: Overall: 8, Females: 3, and Males: 5. So, we have:
  • 36. P(38 <= salary <=52) = 8/50 = .16. P(38 <= salary <=52|Female) = 3/25 = .12. P(38 <= salary <=52|Male) = 5/25 = .20. We can see if gender influences being within this range by seeing if the formula for independent events is true. Above we had stated P(A and B) = P(A)*P(B) for independent events. In this case, the P(within salary range AND Female) is the same as P(within salary range|Female); (this is not always the case). So, since P(within salary range) = .16, and P(Female) = .5, we would have: P(within salary range and Female) = P(within salary range) * P(Female) Replacing these with the associated values, we would have: .12 = .16 * .5 (= .08). An expression that is clearly not correct or true. Since the two sides of the equation are clearly not the same; we can say that gender and being within this salary ranger are not independent elements. Doing this for other ranges produces similar results, so we have a clue that gender and salary interact in some ways that suggest males and females are not paid equally. What we still do not know yet, is how to consider equal work in our examination.
  • 37. The Normal Curve The normal curve is a data distribution that is often called the bell curve, as when you plot the likelihood of outcomes occurring, the resulting graph looks like a bell – the most outcomes in the middle (where the mean = median = mode), and then smoothly decreasing on each side. As a probability distribution, the normal has some interesting characteristics. First, the probability of any outcome equals the area under the curve for that range of outcomes. (Tables and Excel give us these values.) Second, the curve technically extends from - infinity to + infinity (although this range is rarely actually used). Third, since the normal curve is continuous data, the probability of any single outcome (for example, getting a 76 on a test) is 0, so to overcome this we develop a range of values – the 76 score outcome would be the area from 75.5 to 76.5 – the adding of +/- half a unit to a value allows us to translate discrete data into a continuous range (Lind, Marchel, & Wathen, 2008). The normal curve is important due to its wide spread appearance in everyday situations. Some examples of data that follow the normal curve are height, weight, IQ, standardized test scores such as the college boards, many manufacturing measures (above and below the average result), etc.
  • 38. To make working with different normal curves (having different means and standard deviations), we can convert them all into the Standard Normal Curve, which has a mean of 0 and a standard deviation of 1.0. We do this using a z-score – subtract the mean from the data value, and divide the result by the standard deviation. Doing this for every value in a data set would change the mean of the new distribution to 0 (due to the subtraction), while the division changes the standard deviation to 1. The resulting data values are now z-scores, and the area between z-scores is the probability of an outcome within that range of values. One characteristic of the z-score is that it tells us, in standard deviations, how close or far from the mean any individual score is; so in some ways this is another location measure but one that focuses on individual values. Here is an example using the normal distribution and related Excel functions found in the fx list. (See the Excel Week 1 lecture for guidance on using this function if you are unclear about it.) To find the probability of an outcome between a z-score of 1.63 and 2.0, we would need to find the area between these two scores. To do this, we would subtract the area under the curve up to the z score of 1.63 from the area under curve up to the area of the z-score for 2.0. In excel we use the fx function NORM.S.DIST (z, cumulative) this way” =NORM.S.DIST(2.0,1)-NORM.S.DIST(1.63,1) = 0.0288. This tells us that the probability of finding a sample value
  • 39. within this range is about 2.9%. A second example with values that are above and below the mean would be done the same way. Looking to find the probability that we would find a sample value between the z scores of -1.63 and 2.0 would be: =NORM.S.DIST(2.0,1)-NORM.S.DIST(-1.63,1) = 0.9257 or 92.6%. The final example of finding a normal curve based probability is determining the probability of being greater than some value; for example, what is the probability of exceeding a z score of 2.0? =1 - NORM.S.DIST(2.0,1) = 0.02275 or 2.3% Finding the area below a negative z score is found by simply using the NORM.S.DIST function. = NORM.S.DIST(-2.0,1) = 0.02275 or 2.3% A hint on doing these kinds of problems is to draw a picture of the normal curve, and draw a vertical line at each of the z score values you are working with. Then shade the area you are interested in. There are three cases – the area below a certain value, the area above a certain value, and the area between two values. This visual guide helps in determining what we subtract from what.
  • 40. Side note – the probability of exceeding a particular outcome by pure chance alone is called the p-value. We will start using this idea next week. Research Question Example We will be assuming that the variables we are using for our equal pay question come from a normally distributed population. This allows us to use normal curve based probabilities and statistical tests to examine the data we are using to answer our question. Equal Pay Example. Earlier we found that the likelihood of males and females having a salary between 38 and 52K were not the same, suggesting that gender and salary interacted in some way. Let us ask if the probability of having a salary greater than the overall mean of 45 is the same for both genders. Since we are assuming that salary is normally distributed as a whole and for each gender, we can use a normal curve probability to examine this. The first step is to find the z scores for each gender for the data value of 45. We found earlier that the female mean is 38, and the sample standard deviation is 18.3, and the male mean and standard deviation is 52 and 17.8 respectively. This gives
  • 41. us the information needed to determine our z scores: Female z = (45-38)/18.3 = (7)/18.3 = 0.38. Male z = (45-52)/17.8 = (-)7/17.8 = -0.39. The second step is to find the probability of exceeding each of these values using the NORM.S.DIST function. Female: =1-NORM.S.DIST(0.38,1) = 0.352 Males = 1-NORM.S.DIST(-0.39,1) = 0.652 So, it again appears that males and females have a different salary distribution as males are almost twice as likely to be above the overall average of 45 as females are. Again, we have not yet considered the equal work element. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Week2/Assingment.docx
  • 42. Problem Set Week Two The assignment for this week involves developing an understanding of the problem and the data that we will be analyzing during the class. We will be using a data set of 50 employees sampled from an imaginary company to answer the question of whether males and females receive equal pay for performing equal work. The questions in the assignment follow the examples provided in the weekly guidance lectures. The first question this week focuses on the kind of data we have. Different levels of data allow us to do different kinds of analysis, so we need to understand what we have to work with. Question two involves developing the probability of randomly picking a student who has certain characteristics from the sample. Question three involves finding the probability of randomly picked employees falling within the top one-third of different groups using Excel functions. Question four and five involve using statistical tests to determine if the compa-ratio (an alternate measure of pay). The final question asks for an interpretation of your opinion on the question of equal pay for equal work based on the work done this week. Both the assignment file and the data file are located in the Course Materials section at the bottom in the Multi- Media section. The assignment file contains all of the weekly assignments (for Weeks 2, 3, and 4). See the labeled tabs at the bottom of the Excel assignment file. The data in the data file needs to be copied over into the assignment file, and you will be set for the entire class. *Ask questions if it is not clear how to move the data from one file to the other.
  • 43. Week2/discu1.docx Hypothesis Testing / T-tests / F-test Although the initial post is due on Day 5, you are encouraged to start working on it early, as it is a three-part discussion that should be completed in sequential order. Part One – Hypothesis Testing Read Lecture Four. Lecture Four starts out with the five-step procedure for hypothesis testing. What is this? What does it do for us? Why do we need to follow these steps in making a judgement about the populations our samples came from? What are the “tricky” parts of developing appropriate hypotheses to test? What examples can you suggest where this process might be appropriate in your personal or professional lives? (This should be started on Day 1.) Part Two – T-tests Read Lecture Five. Lecture Five illustrates several t-tests on the data set. What conclusions can you draw from these tests about our research question on equal pay for equal work? What is missing from these results to give us a complete answer to the question? Why? (This should be started on Day 3.) Part Three – F-test Read Lecture Six. Lecture Six introduces you to the F-test for variance equality. Last week, we discussed how adding a variation measure to reports of means was a smart thing to do. Why does variation make our analysis of the equal pay for equal work question more complicated? What causes of variation impact salary that we have not discussed yet? How can you relate this issue to measures used in your personal or professional lives? (This should be completed by Day 5.) Your responses should be separated in the initial post, addressing each part individually, similar to what you see here. Week2/discu2.docx
  • 44. Post a question that you had related to the material this week. Conduct research to provide the answer to the question and provide the source. Week2/Week 2 Lecture 4-1.pdf Lecture 4 (Sampling basics and Hypothesis test) This week we turn from descriptive statistics to inferential statistics and making decisions about our populations based on the samples we have. For example, our class case research question is really asking if in the entire company population of employees, do males and females receive the same pay for doing equal work. However, we are not analyzing the entire population, instead we have a sample of 25 males and 25 females to work with. This brings us to the idea of sampling – taking a small group/sample from a larger population. To paraphrase, not all samples are created equal. For example, if you wanted to study religious feelings in the United States, would you only sample those leaving a fundamentalist church on a Wednesday? While this is a legitimate element of US religions, it does not represent the entire range of religious views – it is representative of only a portion of the US population, and not the entire population. The key to ensuring that sample descriptive statistics can be
  • 45. used as inferential statistics – sample results that can be used to infer the characteristics (AKA parameters) of a population – is have a random sample of the entire population. A random sample is one where, at the start, everyone in the population has the same chance of being selected. There are numerous ways to design a random sampling process, but these are more of a research class concern than a statistical class issue. For now, we just need to make certain that the samples we use are randomly selected rather than selected with an intent of ensuring desired outcomes are achieved. The issue about using samples that students often new to statistics is that the sample statistic values/outcomes will rarely be exactly the same as the population parameters we are trying to estimate. We will have, for each sample, some sampling error, the difference between the actual and the sample result. Researchers feel that this sampling error is generally small enough to use the data to make decisions about the population (Lind, Marchel, & Wathen, 2008). While we cannot tell for any given sample exactly what this difference is, we can estimate the maximum amount of the error. Later, we will look at doing this; for now, we just need to know that this error is incorporated into the statistical test outcomes that we will be studying. Once we have our random sample (and we will assume that our class equal pay case
  • 46. sample was selected randomly), we can start with our analysis. After developing the descriptive statistics, we start to ask questions about them. In examining a data set, we need to not only identify if important differences exist or not but also to identify reasons differences might exist. For our equal pay question, it would be legal to pay males and females different salaries if, for example, one gender performed the duties better, or had more required education, or have more seniority, etc. Equal pay for equal work, as we are beginning to see, is more complex than a simple single question about salary equality. As we go thru the class, we will be able to answer increasingly more complex questions. For this week, we will stay with questions about involving ways to sort our salary results – looking for differences might exist. Some of these questions for this week with our equal pay case could include: • Could the means for both males and females be the same, and the observed difference be due to sampling error only? • Could the variances for the males and female be the same (AKA statistically equal)? • Could salaries per grade be statistically equal? • Could salaries per degree (undergraduate and graduate) be the same? • Etc.
  • 47. Hypothesis Testing As we might expect, research and statistics have a set procedure/process on how to go about answering these questions. The hypothesis testing procedure is designed to ensure that data is analyzed in a consistent and recognized fashion so everyone can accept the outcome. Statistical tests focus on differences – is this difference large enough to be significant, that is not simply a sampling error? If so, we say the difference is statistically significant; if not, the difference is not considered statistically significant. This phrasing is important as it is easy to measure a difference from some point, it is much harder to measure “things are different.” It is that pesky sampling error that interferes with assessing differences directly. Before starting the hypothesis test, we need to have a clear research question. The questions above are good examples, as each clearly asks if some comparison is statistically equal or not. Once we have a clear question – and a randomly drawn sample – we can start the hypothesis testing procedure. The procedure itself has five steps: • Step 1: State the null and alternate hypothesis • Step 2: Form the decision rule • Step 3: Select the appropriate statistical test • Step 4: Perform the analysis • Step 5: Make the decision, and translate the outcome into an answer to the initial
  • 48. research question. Step 1. The null hypothesis is the “testable” claim about the relationship between the variables. It always makes the claim of no difference exists in the populations. For the question of male and female salary equality, it would be: Ho: Male mean salary = Female mean salary. If this claim is found not to be correct, then we would accept the alternate hypothesis claim: Ha: Male salary mean =/= (not equal) Female salary mean. (Note, some alternate ways of phrasing these exist, and we will cover them shortly. For now, let’s just go with this format.) Step 2. This step involves selecting the decision rule for rejecting the null hypothesis claim. This will be constant for our class – we will reject the null hypothesis when the p-value is equal to or less than 0.05 (this probability is called alpha). Other common values are .1, and .01 – the more severe the consequences of being wrong if we reject the null, the smaller the value of alpha we select. Recall that we defined the p-value last week as the probability of exceeding a value, the value in this case would be the statistical outcome from our test. Step 3. Selecting the appropriate statistical test is the next step. We start with a question about mean equality, so we will be using the T-test – the most appropriate test to determine if two population means are equal based upon sample results.
  • 49. Step 4. Performing the analysis comes next. Fortunately for us, we can do all the arithmetic involved with Excel. We will go over how to select and run the appropriate T-test below. Step 5. Interpret the test results, making a decision on rejecting or not rejecting the null hypothesis, and using this outcome to answer the research question is the final step. Excel output tables provide all the information we need to make our decision in this step. Step 1: Setting up the hypothesis statements In setting up a hypothesis test for looking at the male and female means, there are actually three questions we could ask and associated hypothesis statements in step 1. 1. Are male and female mean salaries equal? a. Ho: Male mean salary = Female mean salary b. Ha: Male mean salary =/= Female mean salary 2. Is the male mean salary equal to or greater than the Female mean salary? a. Ho: Male mean salary => Female mean salary b. Ha: Male mean salary < Female mean salary 3. Is the male salary equal to or less than the female mean salary? a. Ho: Male mean salary <= Female mean salary b. Ha: Male mean salary > Female mean salary While they appear similar each answers a different question.
  • 50. We cannot, for example, take the first question, determine the means are not equal and then say that, for example, the male mean is greater than the female mean because the sample results show this. Our statistical test did not test for this condition. If we are interested in a directional difference, we need to use a directional set of hypothesis statements as shown in statements 2 and 3 above. Rules. There are several rules or guidelines in developing the hypothesis statements for any statistical test. 1. The variables must be listed in the same order in both claims. 2. The null hypothesis must always contain the equal (=) sign. 3. The null can contain an equal (=), equal to or less than (<=) or equal to or greater than (=>) claim. 4. The null and alternate hypothesis statement must, between them, account for all possible actual comparisons outcomes. So, if the null has the equal (=) claim, the alternate must contain the not equal (=/= or ≠) statement. If the null has the equal or less than (<= or ≤) claim, the alternate must contain the greater than (>) claim. Finally, if the null has the equal to or greater (=> or ≥) claim, the null must contain the less than (<) claim.
  • 51. Deciding which pair of statements to use depends on the research question being asked – which is why we always start with the question. Look at the research question being asked; does it contain words indicating a simple equality (means are equal, the same, etc.) or inequality (not equal, different, etc.), if so we have the first example Ho: variable 1 mean = variable 2 mean, Ha: variable 1 mean =/= variable 2 mean. If the research question implies a directional difference (larger, greater, exceeds, increased, etc. or smaller, less than, reduced, etc.) then it is often easier to use the question to frame the alternate hypothesis and back into the null. For example, the question is the male mean salary greater than the female mean salary would lead to an alternate of exactly what was said (Ha: Male salary mean > Female salary mean) and the opposite null (Male salary mean <= Female salary mean). Step 2: Decision Rule Once we have our hypothesis statements, we move on to deciding the level of evidence that will cause us to reject the null hypothesis. Note, we always test the null hypothesis, since that is where our claim of equality lies. And, our decision is either reject the null or fail to reject the null. If the latter, we are saying that the alternate hypothesis statement is the more accurate description of the relationship between the two variable population means. We never accept the alternate.
  • 52. When we perform a statistical test; we are in essence asking if, based on the evidence we have is, the difference we observe be large enough to have been caused by something other than chance or is it due to sampling error? A statistical test gives us a statistic as a result. We know the shape of the statistical distribution for each type of test, therefore we can easily find the probability of exceeding this test value. Remember we called this the p-value. Now all we need to decide is what is an acceptable level of chance – that is, when would the outcome be so rare that we would not expect to see it purely by chance sampling error alone? Most researchers agree that if the p-value is 5% (.05) or less than, then chance is not the cause of the observed difference, something else must be responsible. This decision point is called alpha. Other values of alpha frequently used are 10% (often used in marketing tests) and 1% (frequently used in medical studies). The smaller the chosen alpha is, the more serious the error is in rejecting the null when we should not have. For our analysis, we will use an alpha of .05 for all our tests. Final Point You may have noticed that we have two basic types of hypothesis statements – those testing equality and those testing directional differences. This leads to two different types of statistical tests – the two-tail and the one-tail. In the one-tail test, the entire value of alpha is
  • 53. focused on the distribution tail – either the right or left tail depending upon the phrasing of the alternate hypothesis. A neat hint, the arrow head in the alternate hypothesis shows which tail the result needs to be in to reject the null. In the case of the two-tail test (equality), we do not care if one variable is bigger or smaller than the other, only that they differ. This means that the rejection statistic could be in either tail, the right or left. Since the reject region is split into two areas, we need to split alpha into these areas – so with a two-tail test, we use alpha/2 as the comparison with our p-value (e.g., 0.05/2 = 0.025). The example in Lecture 5 will review this in more detail. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Week2/Week 2 Lecture 5-1.pdf
  • 54. Lecture 5 The T-Test In the previous lecture, we introduced the hypothesis testing procedure, and developed the first two steps of a statistical test to determine if male and female mean salaries could be equal in the population – where our differences were caused simply by sampling errors. This lecture continues with this example by completing the final three steps. It also introduces our first statistical test, the t-test for mean equality. Last week we looked at the normal curve and noted several of its characteristics, such as mean = median = mode, symmetrical around the mean, curve height drops off the further the score gets from the mean (meaning scores further from the mean are less likely to occur). Our first statistical test, the t-test, is based on a population that is distributed normally. The t-test is used when we do not have the population variance value – this is the situation every time we use a sample to make decisions about their related populations. While the t-test has several different versions, we will focus on the most commonly used form – the two sample test for mean equality assuming equal variance. When we are testing measures for mean equality, it is fairly rare for the variances to be much difference, and the observed difference is often merely sample error. (In Lecture 6, we will revisit this assumption.)
  • 55. The logic of the test is that the difference between mean values divided by a measure of this difference’s variation will provide a t statistic that is distributed normally, with the mean equaling 0 and the standard deviation equaling 1. This outcome can then be tested to see what the likelihood is that we would get a value this large or larger purely by chance – our old friend the p-value. If this p-value exceeds our decision criteria, alpha, then we reject the null hypothesis claim of no difference (Lind, Marchel, & Wathen, 2008). Setting up the t-test Before selecting any test from Excel, the data needs to be set up. For the t-test, there are a couple of steps needed. First, copy the data you want to first set up the data. In our question about male and female salaries, copy the gender variable column from the data page to a new worksheet page (the recommendation is on the week 2 tab) and paste it to the right of the questions (such as in column T), then copy and paste the salary values and paste them next to the gender data. Next, sort both columns by the gender column – this will give you the salary data sorted by gender. Then, in column V place the label/word Males, and in column W place the label Females. Now copy the male salaries and paste them under the Male label, and do the same for the female salaries and the female label. The data is now set up for easy entry into the T-test data entry section.
  • 56. The t-test is found in the Analysis Toolpak that was loaded into your Excel program last week. To find it, click on the Data button in the top ribbon, then on the Data Analysis link in the Analyze box at the right, then scroll down to the T-test: Two- Sample Assuming Equal Variances. For assistance in setting up the t-test, please see the discussion in the Week 2 Excel Help lecture. Interpreting the T-test Output The t-test output contains a lot of information, and not all of it is needed to interpret the result. The important elements of the t-test outcome will be shown with an example for our research case question. Equal Pay Example - continued In Lecture 4 we set up the first couple of steps for our testing of the research question: Do males and females receive equal pay for equal work? Our first examination of the data we have for answering this question involves determining if the average salaries are the same. Here is the completed hypothesis test for the question: Is the male average salary equal to the female average salary? Step 1. Ho: Male mean salary = female mean salary Ha: Male mean salary ≠ female mean salary
  • 57. Step 2. Reject the null if the p-value is < (less than) alpha = .05. Step 3. The selected test is the Two-Sample T-test assuming equal variances. Step 4. The test results are below. The screen shot shows output table. Step 5. Interpretation and conclusions. The first step is to ensure we have all of the correct data. We see that we have 25 males and females in the Observations row, and that the respective means are equal to what we earlier calculated. The calculated t statistic is 2.74 (rounded). We have two ways to determine if our result rejects or fails to reject the null hypothesis; both involve the two-tail rows, as we have a two tail test (equal or not equal hypothesis statements). The first is a comparison of the t-values – if the critical t of 2.74 (rounded) is greater than the T-Critical two-tail value of 2.01, we reject the null hypothesis. The second way is to compare the p-value with our criteria of alpha = .05. Remember, since this is a two-tail test, the alpha for each tail is half of the overall alpha or .025. If the p-value (shown as P(T<=t) two -tail value of 0.0085 is
  • 58. less than our one tail alpha (.025) then we reject the null hypothesis. Note: at times Excel will report the p-value in an E format, such as 3.45E-04. This is called an Exponent format, and is the same as 3.45 * 10-04. This means move the decimal point 4 places to the left, making 3.45E-04 = 0.000345. Virtually any p-value reported with an E-xx form will be less than our alpha of 0.05 (which would be 5E-02). Since we rejected the null hypothesis in both approaches (and both will always provide the same outcome), we can answer our question with: No - the male and female mean salaries are not equal. Note that for this set of data, we would have rejected the null for a one-tail test if and only if the null hypothesis had been: Male mean salary is <= Female mean salary and the alternate was Male mean salary is > Female mean salary. The arrow in the alternate points to the positive/right tail and that is where the calculated t-statistic is. So, even if the p-value is smaller than alpha in a one tail test, we need to ensure the t-statistic is in the correct tail for rejection. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
  • 59. Week2/Week 2 Lecture 6-1.pdf Lecture 6 (Additional information on t-tests and hypothesis testing) Lecture 5 focused on perhaps the most common of the t-tests, the two sample assuming equal variance. There are other versions as well; Excel lists two others, the two sample assuming unequal variance and the paired t-test. We will end with some comments about rejecting the null hypothesis. Choosing between the t-test options As the names imply each of the three forms of the t-test deal with different types of data sets. The simplest distinction is between the equal and unequal variance tests. Both require that the data be at least interval in nature, come from a normally distributed population, and be independent of each other – that is, collected from different subjects. The F-test for variance. To determine if the population variances of two groups are statistically equal – in order to
  • 60. correctly choose the equal variance version of the t-test – we use the F statistic, which is calculated by dividing one variance by the other variance. If the outcome is less than 1.0, the rejection region is in the left tail; if the value is greater than 1.0, the rejection region is in the right tail. In either case, Excel provides the information we need. To perform a hypothesis test for variance equality we use Excel’s F-Test Two-Sample for Variances found in the Data Analysis section under the Data tab. The test set-up is very similar to that of the t-test, entering data ranges, checking Labels box if they are included in the data ranges, and identifying the start of the output range. The only unique element in this test is the identification of our alpha level. Since we are testing for equality of variances, we have a two sample test and the rejection region is again in both tails. This means that our rejection region in each tail is 0.25. The F-test identifies the p-value for the tail the result is in, but does not give us a one and two tail value, only the one tail value. So, compare the calculated p-value against .025 to make the rejection decision. If the p-value is greater than this, we fail to reject the null; if smaller, we reject the null of equal variances. Excel Example. To test for equality between the male and female salaries in the population, we set up the following hypothesis test. Research question: Are the male and female population
  • 61. variances for salary equal? Step 1: Ho: Male salary variance = Female salary variance Ha: Male salary variance ≠ Female salary variance Step 2: Reject Ho if p-value is less than Alpha = 0.025 for one tail. Step 3: Selected test is the F-test for variance Step 4: Conduct the test Step 5: Conclusion and interpretation. The test resulted in an F- value less than 1.0, so the statistic is in the left tail. Had we put Females as the first variable we would have gotten a right tail F-value greater than 1.0. This has no bearing on the decision. The F value is larger than the critical F (which is the value for a 1-tail probability of 0.25 – as that was entered for the alpha value). So, since our p-value (.44 rounded) is > .025 and/or our F (0.94 rounded) is greater than our F Critical, we fail to reject the null hypothesis of no differences in variance. The correct t- test would be the two-sample T-test assuming equal variances. Other T-tests. We mentioned that Excel has three versions of the t-test. The
  • 62. equal and unequal variance versions are set up in the same way and produce very similar output tables. The only difference is that the equal variance version provides an estimate of the common variation called pooled variance while this row is missing in the unequal variance version. A third form of the t-test is the T-Test: Paired Two Sample for Means. A key requirement for the other versions of the t-test is that the data are independent – that means the data are collected on different groups. In the paired t-test, we generally collect two measures on each subject. An example of paired data would be a pre- and post-test given to students in a statistics class. Another example, using our class case study would the comparing the salary and midpoint for each employee – both are measured in dollars and taken from each person. An example of NON-pared data, would the grades of males and females at the end of a statistics class. The paired t-test is set up in the same way as the other two versions. It provides the correlation (a measure of how closely one variable changes when another does – to be covered later in the class) coefficient as part of its output. An Excel Trick. You may have noticed that all of the Excel t- tests are for two samples, yet at times we might want to perform a one-sample test, for example quality control might want to test a sample against a quality standard to see if things have
  • 63. changed or not. Excel does not expressly allow this. BUT, we can do a one-sample test using Excel. The reason is a bit technical, but boils down to the fact that the two-sample unequal variance formula will reduce to the one-sample formula when one of the variables has a variance equal to 0. So using the unequal variance t-test, we enter the variable we are interested – such as salary – as variable one and the hypothesized value we are testing against – such as 45 for our case – as variable two, ensuring that we have the same number of variables in each column. Here is an example of this outcome. Research question: Is the female population salary mean = 45? Step 1: Ho: Female salary mean = 45 Ha: Female salary mean ≠ 45 Step 2: Reject the null hypothesis is less than Alpha = 0.05 Step 3: Selected test is the two sample unequal variance t-test Step 4: Conduct the test Step 5: Conclusions and Interpretation. Since the two tail p- value is greater than (>) .025 and/or the absolute value of the t-statistic is less than the critical two tail t value, we fail to reject
  • 64. the null hypothesis. Our research question answer is that, based upon this sample, the overall female salary average could equal 45. Miscellaneous Issues on Hypothesis Testing Errors. Statistical tests are based on probabilities, there is a possibility that we could make the wrong decision in either rejecting or failing to reject the null hypothesis. Rejecting the null hypothesis when it is true is called a Type I error. Accepting (failing to reject) the null when it is false is called a Type II error. Both errors are minimized somewhat by increasing the sample size we work with. A type I error is generally considered the more severe of the two (imagine saying a new medicine works when it does not), and is managed by the selection of our alpha value – the smaller the alpha, the harder it is to reject the null hypothesis (or, put another way, the more evidence is needed to convince us to reject the null). Managing the Type II error probability is slightly more complicated and is dealt with in more advanced statistics class. Choosing an alpha of .05 for most test situations has been found to provide a good balance between these two errors. Reason for Rejection. While we are not spending time on the formulas behind our statistical outcomes, there is one general issue with virtually all statistical tests. A larger sample size makes it easier to reject the null hypothesis. What is a
  • 65. non-statistically significant outcome based upon a sample size of 25, could very easily be found significant with a sample size of, for example, 25,000. This is one reason to be cautious of very large sample studies – far from meaning the results are better, it could mean the rejection of the null was due to the sample size and not the variables that were being tested. The effect size measure helps us investigate the cause of rejecting the null. The name is somewhat misleading to those just learning about it; it does NOT mean the size of the difference being tested. The significance of that difference is tested with our statistical test. What it does measure is the effect the variables had on the rejection (that is, is the outcome practically significant and one we should make decisions using) versus the impact of the sample size on the rejection (meaning the result is not particularly meaningful in the real world). For the two-sample t-test, either equal or unequal variance, the effect size is measured by Cohen’s D. Unfortunately, Excel does not yet provide this calculation automatically, however it is fairly easy to generate. Cohen’s D = (absolute value of the difference between the means)/the standard deviation of both samples combined. Note: the total standard deviation is not given in the t-test outputs, and is not the same as the square root of the pooled variance estimate. To get this value, use the fx function stdev.s on the
  • 66. entire data set – both samples at the same time. Interpreting the effect size outcome is fairly simple. Effect sizes are generally between 0 and 1. A large effect (a value around .8 or larger) means the variables and their interactions caused the rejection of the null, and the result has a lot of practical significance for decision making. A small effect (a value around .2 or less) means the sample size was more responsible for the rejection decision than the variable outcomes. The medium effect (values around .5) are harder to interpret and would suggest additional study (Tanner & Youssef-Morgan, 2013). References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business & Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Deigeo, CA: Bridgepoint Education.
  • 67. Week3/assignemtn.docx Problem Set Week Two The assignment for this week involves developing an understanding of the problem and the data that we will be analyzing during the class. We will be using a data set of 50 employees sampled from an imaginary company to answer the question of whether males and females receive equal pay for performing equal work. The questions in the assignment follow the examples provided in the weekly guidance lectures. The first question this week focuses on the kind of data we have. Different levels of data allow us to do different kinds of analysis, so we need to understand what we have to work with. Question two involves developing the probability of randomly picking a student who has certain characteristics from the sample. Question three involves finding the probability of randomly picked employees falling within the top one-third of different groups using Excel functions. Question four and five involve using statistical tests to determine if the compa-ratio (an alternate measure of pay). The final question asks for an interpretation of your opinion on the question of equal pay for equal work based on the work done this week. Both the assignment file and the data file are located in the Course Materials section at the bottom in the Multi- Media section. The assignment file contains all of the weekly assignments (for Weeks 2, 3, and 4). See the labeled tabs at the bottom of the Excel assignment file. The data in the data file needs to be copied over into the assignment file, and you will be set for the entire class. *Ask questions if it is not clear how to move the data from one file to the other.
  • 68. Week3/Discu1.docx Multiple Testing / ANOVA / Effect Size Although the initial post is due on Day 5, you are encouraged to start working on it early, as it is a three-part discussion that should be completed in sequential order. Part One – Multiple Testing Read Lecture Seven. The lectures from last week and Lecture Seven discuss issues around using a single test versus multiple uses of the same tests to answer questions about mean equality between groups. This suggests that we need to master—or at least understand—a number of statistical tests. Why can’t we just master a single statistical test—such as the t-test—and use it in situations calling for mean equality decisions? (This should be started on Day 1.) Part Two – ANOVA Read Lecture Eight. Lecture Eight provides an ANOVA test showing that the mean salary for each job grade significantly differed. It then shows a technique to allow us to determine which pair or pairs of means actually differ. What other factors would you be interested in knowing if means differed by grade level? Why? Can you provide an ANOVA table showing these results? (Do not bother with which means differ.) How does this help answer our research question of equal pay for equal work? What kinds of results in your personal or professional lives could use the ANOVA test? Why? (This should be started on Day 3.) Part Three – Effect Size Read Lecture Nine. Lecture Nine introduces you to Effect size measure. There are two reasons we reject a null hypothesis. One is that the interaction of the variables causes significant differences to occur – our typical understanding of a rejected null hypothesis. The other is having a large sample size – virtually any difference can be made to appear significant if the
  • 69. sample is large enough. What is the Effect size measure? How does it help us decide what caused us reject the null hypothesis? (This should be completed by Day 5.) Week3/Week 3 lecture 7-1.pdf Week 3 Lecture 7 We have so far seen how we can summarize data sets using descriptive statistics, showing several characteristics including mean and standard deviation. We also found that if our data comes from a random sample of a larger population, these descriptive statistics become inferential statistics, and can be used to make inferences about the population. These inferences can then be used in statistical tests to see if things have changed or not (equal to known standards or other data sets or not). We have looked at one and two sample mean tests (with the t- test) and two sample comparisons of variance equality (with the F test). This week we will look at the Analysis of Variance (ANOVA) test for mean equality between three or more groups. ANOVA The first question often asked is why not just do multiple t-tests comparing three or more different group means? One answer involves efficiency. Conducting multiple t-tests can become somewhat tedious. Comparing just three groups (A, B,
  • 70. and C) requires us to compare A and B, B and C, and A and C (3 tests). With 4 groups (A, B, C, and D) we have A and B, A and C, A and D, B and C, B and D, C and D (6 tests)! So a single test can save us a lot of time and is much more efficient. A second reason and much more important reason is that we lose confidence in our results when multiple tests are performed on the same data. With an alpha of 0.05, we are 95% certain we are right with each test, but being certain we are right for all the tests involves multiplying the results together, so for three tests we would be .95*.95*.95 or 86% certain; with six tests, our confidence drops to .95^6 = .74, a long way from our desired 95% confidence. So, a single test maintains our desired level of confidence in the outcome (Lind, Marchel, & Wathen, 2008). Logic A second question asked comes from the name itself, how can analyzing variance tell us anything about mean differences? The answer lies in how ANOVA works. The key assumptions for an ANOVA analysis are that each of the groups are normally distributed AND have equal variances. These mean that the distributions are shaped the same and, this allows for an easy comparison. Take a look at the following two sets of normal curves.
  • 71. Exhibit A Exhibit B The means of the three sample groups in Exhibit A could clearly come from three populations that have the same mean, and the differences seen are merely sampling errors. However, we cannot say the same thing about the sample groups in Exhibit B. ANOVA takes the variation of all of the data in the groups being tested (three in this case) and compares it with the average variation for each of the groups using the F-test (discussed last week). Since for the Exhibit A groups, the overall variation will be only slightly larger than the average of the three (which are assumed to be equal). Since the resulting F value 0 0.05 0.1 0.15 0.2 0.25 0.3
  • 72. 0.35 0.4 0.45 -5 -4 -3 -2 -1 0 1 2 3 4 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -10 -5 0 5 10 will not be statistically significant, we can say that the groups are closely distributed and the means are statistically equal.
  • 73. In Exhibit B, however, the variation of the entire group would be around three times the variation of the average. Just by looking at the average variance for the individual groups and comparing it to the variance for the entire group, we can make a judgement on how close the distributions are, and with that a judgement on mean equality. As with the t-test, ANOVA will let us know exactly how much difference in the population locations is enough to say means differ or not, we cannot just “eyeball” it. Hypothesis Stating the null and alternate hypothesis for an ANOVA test is simple, as they are always the same: Ho: All means equal. Ha: At least one mean differs (Tanner & Youssef-Morgan, 2013). You might recall from last week that we said the alternate always states the opposite from the null statement. If so, why isn’t our alternate: all means differ, which seems like the opposite? The reason is that the ANOVA test will reject the null hypothesis if even one mean from the groups being examined is statistically significant difference. So, the opposite of all means differ is actual at least one mean differs. Data Set-up
  • 74. Setting-up the data for an ANOVA analysis is just a bit more complicated than for a t- test. While with the T-test we just highlighted the column or portion of a column of data (sometimes after sorting it by a variable such as gender), for an ANOVA test, we need to create a table. For example, if we wanted to look at average salaries per grade (shown in the Week 3 Lecture 8 example), we would need a table looking like this. Doing this is fairly simple. Copy the grade and salary columns (separately) and paste them onto a new Excel sheet (probably in Week 3 to the right of the questions). Then, highlight both columns – from labels to last value – and select Data Sort. Select sorting on the grade variable and click on OK. Both columns are now in grade order, and you can highlight and cut the salaries for each grade and paste them into a new table you create with the grade letter as the head. When finished, you will have the input table used in setting up an Excel ANOVA test. References Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008). Statistical Techniques in Business &
  • 75. Finance. (13th Ed.) Boston: McGraw-Hill Irwin. Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for Managers. San Diego, CA: Bridgeport Education. Week3/Week 3 Lecture 8-1.pdf Week 3 Lecture 8 Excel ANOVA Example In our on-going investigation of whether or not males and females are paid equally for equal work, we have come up with contradicting results so far, average salaries are clearly different but average compa-ratios are not. We need to examine reasons that might impact these differences to see if we can explain what is going on. For possible factors influencing individual salaries, we need to be able to, paraphrasing what they say in TV cop shows, “rule it out as a suspect” in causing differences or keep it in as a cause of differences between the gender pay practices. One key issue in our question that has not clearly been examined yet is the impact of grades on salaries. Clearly, grade differences have the potential to complicate the issue as the work done differs by grade. One question to ask here is, “are average salaries equal across grade