Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Probability Models Project
1. PROBABILITY MODELS : PROJECT REPORT
Srikanth Popuri M12388241
Poorvi Deshpande M12388313
CONTENTS
PROBABILITY MODELS : PROJECT REPORT....................................................................................................1
CONTENTS...................................................................................................... Error! Bookmark not defined.
INTRODUCTION.........................................................................................................................................2
OVERVIEW OF DATASET............................................................................................................................2
PROBLEM STATEMENT..............................................................................................................................2
ANALYSIS...................................................................................................................................................3
ECDF OF SAMPLE.......................................................................................................................................4
HYPOTHESIS TESTING FOR DIFFERENCE IN MEANS : WALD TEST ............................................................5
CONCLUSION.............................................................................................................................................7
APPENDIX..................................................................................................................................................8
2. INTRODUCTION
The objective of this project is to translate our theoretical knowledge about the data into
practical use. We apply various methods learnt for non-parametric distribution on our
data and study their accuracy.
OVERVIEW OF DATASET
The data set used in this study is the Flights data available in R. It contains details about
the flights that departed from NYC such as the time of each flight, the time of departure
and arrival, distance of the flight etc. It has 336776 observations with 19 variables for the
flights data available for the airport on each day. We plan to study the variable ‘air_time’
which describes the time spent in air by a given flight.
PROBLEM STATEMENT
The air_time variable is being studied to determine how accurate is our sample mean to
our true population mean.
3. ANALYSIS
Distribution of sample of airtime:
Figure 2 : Histogram of the sample of air_time (Day 1 in January)
Figure 3 : Summary statistics of sample of air_time (Day 1 in January)
We see that the sample does not follow a normal distribution. Therefore we plot an
empirical distribution to see the cumulative distribution of sample.
4. ECDF OF SAMPLE
Figure 3 : ECDF of air_time sample with 95% confidence interval
We do not know the distribution of the sample. Thus, we proceed with the non-
parametric approach to compare population mean and sample mean.
Using non parametric bootstrap, we found the mean of the population which comes out
to be 169.6914.
Distribution of the bootstrap means :
5. Figure 4 : Histogram of the means obtained by bootstrap
HYPOTHESIS TESTING FOR DIFFERENCE IN MEANS : WALD TEST
The hypothesis testing is performed to see if there is any difference between the mean
air_time for the sample and the mean air_time for the population. Since the distribution of
the data is appearing to be not normal, the Wald – test is used for the hypothesis testing
condition. The hypothesis condition for the same is:
Ho : μ1 – μ2 = 0
Ha : μ1 – μ2 ≠ 0
where, μ1 is the mean air_time from bootstrap and μ2 is the mean air_time from
population.
The results for the test conducted in R is denoted in the figure:
6. Figure 1: Results from Wald test for hypothesis testing on difference of means
As observed, the p-value is very less as compared to 0.05. Thus, we have enough evidence
to reject the null hypothesis for the test. Hence, there is a difference in the sample mean
and the population mean of air_time.
BAYESIAN APPROACH
The frequentist approach says that the means are different. We will further confirm this
using a different approach i.e. Bayesian analysis, to determine if the conclusion we have
come to is correct or not.
We test the data for the difference of means of the sample and population. “Jeffreys”
method was used to test the equality of means. We assume that the variance of both
population and sample is the same (essentially, it is same as the sample comes from the
population itself).
The result of Bayesian Analysis was as follows:
7. According to the t-statistics observed above, there was a significant difference between
the population mean and sample mean.
CONCLUSION
Since the population of air_time of the Flights data was known, we can have a glimpse
at the population summary statistics which is as follows :
Comparing this to the sample statistics :
We can conclude that the sample mean and population mean are significantly different.
We confirm this by performing tests on this using two approaches namely Frequentist
8. and Bayesian approach. In the Wald test, we reject our null hypothesis that the means
are same. This is confirmed by the Bayesian test statistics.
APPENDIX
R Code :
library("nycflights13")
library(ACSWR)
library(bootstrap)
library(Bolstad)
airtime <- flights$air_time
airtime <- na.omit(airtime)
airtime_sample <- flights$air_time[flights$day==1 & flights$month==1]
airtime_sample <- na.omit(airtime_sample)
at <- hist(airtime)
summary(airtime)
sd(airtime)
hist(airtime_sample)
summary(airtime_sample)
####ecdf