Practical Statistical Testing

Practical Statistical Testing
Adrian Cuyugan
18 August 2014

I. T-test and Z-test 3
II. Chi-square Test of Independence 6
III. I-MR Control Charts 8
IV. Binary Logistic Regression 11
V. Data Sources 15
Agenda

T-Test and Z-test
Testing the Difference of Means on a Two-Tailed Test
Problem Statement
Is there significant difference in mean between Forecasted and Calls Offered?
Data Overview
Daily forecasted call volume is done automatically using Blue Pumpkin; this is prepared by Global Workforce Management Team. Staffing, two-year
historical data and other factors are used to produce this forecast.
Calls offered are the actual calls that came into the IVR as initiated by the user.
The data sample is from April to July 2014.
The data has a bimodal shape due to weekends having fewer number
of calls. It is wiser to perform two separate analysis on weekdays and
weekends if the sensitivity of the underlying problem is too high.
For the sake of looking for the comparison of the two groups,
forecasted and offered, the weekdays and weekends are combined.
Another implication of removing the weekend is that the data will be
extremely skewed to the left.
Samples are collected from two different population at the same
time-period, less than 10 % of the population and it is more than 30
observations which are enough to perform inference.
Assuming that the data is normally distributed, we can start exploring
the data further.
Hypothesis
H0 – Forecasted Calls = Calls Offered, μ = 0.
HA – Forecasted Calls ≠ Calls Offered.

T-Test and Z-test
Exploratory
The differences of the forecasted calls and the
calls offered vary each month.
Even when looked as a whole, the difference
in average is 6.4 calls in favor of forecasted
calls. It is also noticeable that the difference in
the standard deviation is just 0.25 calls in favor
of calls offered.
Calls Forecasted
Min. 1st Qu. Median Mean 3rd Qu. Max. SD n
13 23 188 155.1 221 278 89.7 122
Calls Offered
Min. 1st Qu. Median Mean 3rd Qu. Max. SD n
7 23.25 177.5 148.7 210.5 287 89.95 122

T-Test and Z-test
Variance test
A non-significant p-value is not interpreted as meaning that the
variances are equal, only that there is insufficient evidence to reject
the null hypothesis that the variances are equal.
F-test = 0.9945,
Numerator DF = 121, Denominator DF = 121
95% confidence interval = 0.695185, 1.422628
p-value = 0.9758
T-test
t = 0.5588
df = 241.998
95 CI = -16.22862, 29.08108
Mean of Forecasted Calls = 155.1230
Mean of Calls Offered = 148.6967
p-value = 0.5768
Hypothesis
H0 – Forecasted Calls = Calls Offered, μ = 0.
HA – Forecasted Calls ≠ Calls Offered.
Z-test
t = 0.5588
95 CI = -16.11532, 28.96778
Mean of Forecasted Calls = 155.1230
Mean of Calls Offered = 148.6967
p-value = 0.5763
Results
Statistics Language
The probability of 0.5768 of having a t-score of 0.5588 in 241.998 degrees
of freedom is more extreme that having less than or greater than 0.5588
from independent samples, therefore the null hypothesis is not rejected.
Since the bounds of confidence intervals are beyond 0, this further
supports the non-rejection of null-hypothesis.
Business Language
The daily number of forecasted calls provided by the Global Workforce
Management is not statistically significant to say that there is difference
with the number of calls offered to Voice Center; it is expected that
without any unusual events, the number of agents forecasted to answer
the calls is sufficient to pass the abandoned %; this test supports of having
a pass rate of Abandoned % (although what is just measured are
abandoned calls more than 30 seconds.)

Chi-square Test of Independence
Finding association between two categorical variables
Problem Statement
Is there significant relationship between CSAT Survey Result and Reported
Source?
Hypothesis
H0 – CSAT Survey Result and Source are independent.
HA – CSAT Survey Result and Source are dependent.
ExploratoryData Overview
A survey is sent to the user after the logged ticket has been marked as
resolved. There are 8 questions included in the survey and these are
scored from 1 to 5, where the latter is the highest.
Since this test can only be done on categorical variables, CSAT Survey
Result is used a dichotomous response variable that indicates the survey
result as 1 = Positive (success) and 0 = Negative (failure).
The explanatory variable is Reported Source. Service Desk only creates
tickets from two sources: Phone and Email.
Samples are collected within less than 10 % of the population. The
expected frequencies is at least 5 counts. As the observed counts is more
than enough to perform inference, bootstrapping method calculation is
not done.

Chi-square Test of Independence
Finding association between two categorical variables
Contingency Table
CSAT Result
Reported Source Contents Pos Neg Row Total
Email
Observed 241 50 291
Expected 252.5 38.5
Row % 82.8% 17.2% 28.9%
Col % 27.6% 37.6%
Phone
Observed 632 83 715
Expected 620.5 94.5
Row % 88.4% 11.6% 71.1%
Col % 72.4% 62.4%
Col Total
873 133 1006
86.8% 13.2%
Results
Statistics Language
Chi-square = 5.6
DF = 1
P-value = 0.0180
The probability that a chi-square statistic having one degree of freedom is
more extreme than 5.6 where the p-value is less than 0.05.
Business Language
There might be other confounding factors that may affect the relationship
between the result of the CSAT and the reported source or where the
ticket originated from; it may be the resolution time; how the ticket was
responded by the service desk or the resolver, the problem itself, etc. But
this cannot be assumed as this test is done to test the relationship
between the given variables (CSAT ~ Reported Source).
It is concluded that that there is strong relationship between CSAT and
Reported Source; that these two vary.
Hypothesis
H0 – CSAT Survey Result and Source are independent.
HA – CSAT Survey Result and Source are dependent.

Individuals-Moving Range Control Charts
Is it within control?
Problem Statement
Are there any highly unusual events that spiked the number of calls
or systemic pattern received by Voice Center?
Data Overview
When plotted on an individuals chart, you can see a seasonal
pattern which occurs every 5 data points, these data points are the
weekends. To produce a more sensible observation, this analysis
only covers weekdays, a separate analysis can be done covering
weekends, if needed.

Observations – Nelson Rules
Special Causes
Rule #1 - Two data points fall below the lower control limit which
are highly unusual to happen on a normal working day. This is
caused by the two holidays in the United States: Memorial Day and
Fourth of July.
Common Causes
• Rule #2 - There are no more than 8 consecutive points that fall
below or above the center line.
• Rule #3 - There are no 6 consecutive points show increasing or
decreasing trend.
• Rule #5 - No points that are very close to the limits.
• Rule #4 - Close to oscillation as the data points are very random.

Trend
Looking at the whole dataset plotted in time-series and not just each of
the data points that are within control, we can further test if there’s an
overall trend.
Based on the decomposition trend, there’s no obvious pattern.
Additive model is used as we do not assume that the calls increase as
the time progresses.
An exponential smoothing 5-day forecasting can be done, if needed.

Binary Logistic Regression
Predicting the outcome of a binary categorical dependent variable
Problem Statement
What are the odds and the probabilities of predicting the CSAT survey result based on the ticket age (resolution time), reported source, VIP user
status, status reason of the ticket and resolution method.
Data Overview
Each case in the dataset is the survey result responded by the user. The sample is from August 2013 to July 2014. Only tickets that have been
resolved by groups are included in this analysis.
The CSAT survey consists of 9 questions in which the first 8 questions can be rated by the respondent 1 to 5, where the latter is the highest. The last
question contains free-form text in which the respondent can provide comments based on the respondent’s feelings. The result of the survey is the
sum of the eight questions. It is easier to regress the outcome as the total score is computed mathematically based on the questions but for this
analysis, different variables were used to predict the outcome of the survey.
As previously tested, the survey result is dependent on the reported source. In this test, we can determine if this predictor is still significant.
Several data munging were done to produce categorical variables from continuous variables (dummy coding). Continuous variable can also be
predictors.
Response
CSAT Result
1 – Positive (Success)
0 – Negative (Failure)
Predictors
Ticket Age Reported Source VIP Status Reason Resolution Method
1 – < 3 Days 1 – Phone 1 – Yes 1 – First Call Resolved 1 – Service Desk Assisted
2 – < 7 Days 2 – Email 2 – No 2 – Status Call 2 – Remote Control
3 – < 15 Days 3 – Others 3 – On-site Support
4 – < 30 Days 4 – Self-service
5 – 30+ Days

Model 1
Assessing the fit of the model
The probability that a chi-square statistic having 11 degrees of freedom is more
extreme than 31.56 where the p-value is less than 0.05 (p-value = 0.0008970423).
This means that when there is no residual deviance left and when all of the degrees of
freedom have been used, only the following predictors are significant to the response
variable:
• Ticket Age
• Status Reason
Having a lot of insignificant predictors, a better model can be built even this one is a
good fit against an empty model.

Model 2
Assessing the fit of the model
The probability that a chi-square statistic having 6 degrees of freedom is more extreme
than 26.71 where the p-value is less than 0.05 (p-value = 0.0001640877).
Overall effect
Since the model is a good fit against the null, we can proceed with other diagnosis to
test the two predictors of the overall effect since:
Ticket Age has 5 categories Status Reason has 3 categories
Wald test Chi-square = 10.2 Wald test Chi-square = 7.6
df = 4 df = 2
p-value = 0.037 p-value = 0.022
Both of the categorical predictors are significant. This means that the difference
between two categories is statistically significant – the difference between < 7 Days
and < 15 Days, and Status Call and Others.
Dummy coding and base categories
You might notice that < 3 Days and First Call Resolved are both missing from the
generalized linear model summary, it is because that R uses these categories as the
base in computing the coefficients, where if the ticket has been resolved in less than 3
days, the coefficient is 0. This also follows the same calculation for Status Reason.
These coefficients are very hard to interpret not like in a linear regression because it
follows the logit of the value. We can compute for the odds ratio of each predictor and
category by using the exponential form against the log and the probability by
computing for the scale and location of the parameters.

Odds Ratio
For a unit increase in each of the categorical predictors,
the odds of having a positive survey is the value in the
OddsRatio table, this is more interpretable compared to
the logit coefficients from the previous slide.
TicketAgeClass StatusReason2 Prob
< 3 Days First Call Resolved 100.00%
< 3 Days Status Call 100.00%
< 3 Days Others 100.00%
+ 30 Days First Call Resolved 37.50%
+ 30 Days Status Call 37.50%
+ 30 Days Others 37.50%
Since the base variables are <3
Days and First Call Resolved, it is
more likely that the CSAT survey
will be positive 10 times when
tickets that have been resolved
less than 3 days and where the
status reason is First Call Resolved
compared to the other tickets
that have been resolved longer
and have a different status
reason.
Probability
The probability of having a
success or positive CSAT survey
may range because of the
variation in the data. Although we
have an idea what would be the
outcome of the survey based on
the significant predictors that we
have finalized, Ticket Age and
Status Reason.

Practical Statistical Testing
Thank You

Practical Statistical Testing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Practical Statistical Testing

Similar to Practical Statistical Testing (20)

Practical Statistical Testing