SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Higher Diploma in Data Analytics
Programming for Big Data
Semester 2
Advanced Business Data Analysis
Continuous Assessment
Assignment 2
Sue Ryo Kim
1. Abstract
Dublin County consists of four local authorities such as Dublin City, Dún Laoghaire-Rathdown, Fingal
and Dublin South. Dublin County has total area of 922 km2
(356 sq mi) and has population of 1,273,069
(CSO 2011 [1]).
The purpose of this analysis is to find out if there are any differencesbetweenlocal authorities in Dublin
region. As the capital of Ireland, Dublin local authorities have active public transportation and travel
times are expected to be influenced by traffic.
Rush hour time (between 8 and 9am) is expected to have longer travel time. Also it is expected to have
short travel time early in the morning before 07:30 am and after 9:30am as majority of starting hours of
school and work is 9:00 am. Among the four authorities, Dublin City Council was expected to have
longer travel time.
However, considering distance of traveling from outside of Dublin to Dublin City and availability of
transportation and road, Dublin City might have shorter travel times than other authorities.
Time leaving home to travelto work betweenfour authorities of Dublin City, Dún Laoghaire-Rathdown,
Fingal and Dublin South from Eastern and Midlands planning region were compared. Nine variables of
time differences are travel time before 6:30, 6:30 to 07:00, 07:01 to 07:30, 07:31 to 08:00, 08:01 to
08:30, 08:31 to 09:00, 09:01 to 09:30, after 9:30 and not stated.
2. Analysis
2.1. Descriptive statistics
In this section, four Dublin authorities and all Dublin County Area are analysed.
The Figure 1 shows average travel times for the above time samples in four Dublin authorities and
average of all Dublin County area. Overall, in Dublin City, traveltimes to school or work takes the least
time among four Dublin authorities. Also, the travel time to school or work in Dublin City is almost 20
minutes shorter than the average four Dublin authorities.
Dublin City, South Dublin and Fingal shows similar patterns for the average traveltime. However,Dún
Laoghaire-Rathdown has shortest travel time before 6:30 excluding not stated. On the other hand, Dún
Laoghaire-Rathdown shows the longest travel time at 08:01 to 08:30. Also, other County areas shows
gradual decrease in travel time after the highest at 08:01 to 08:30. However, the travel time drops
suddenly after the peak in Dún Laoghaire-Rathdown.
Figure 1. Average travel time to school or work in Dublin County areas.
For all authorities, the busiest period was from 08:01 to 08:30 and the travel time rapidly dropped at
09:01 to 09:30. There is also sudden increase in travel time after 9:30. Both South Dublin and Fingal
has very similar travel time patterns, the graphs are almost overlaps from 07:01.
Fingal has the longest travel time overall and it takes over 23 minutes longer than compare to the
average Dublin County. The time differences between Fingal and Dublin City is approximately 44
minutes.
2.2. Graphical display of distributions
Histograms and scatter plots of travel time for Dublin City is drew below in Figure 2. Noticeably all the
histogram are skewed right, the median is smaller than the mean, the mode is smaller than the median.
Also, the upper tails are over the line in the Q-Q plots. The distance between each of the data points
increases to the top of upper end of the range.
0
5
10
15
20
25
30
35
40
45
50
Minutes
Axis Title
AverageTravelTime to school/work
DubCity Dún Laoghaire-Rathdown SouthDub Fingal AveDubCo
Figure 2. Histogram and scatterplot of Travel Time in Dublin City
Figure 3 and figure 4 compared the average traveltime to school or work in Dún Laoghaire-Rathdown
before 06:30 and at 08:01 to 08:30.
The average of travel time in Dún Laoghaire-Rathdown is the most close to the average of travel time
in the all for Dublin County areas. Also, Dún Laoghaire-Rathdown has the shortest and longest travel
times among the Dublin County areas. In the figure 3, binominal distribution is observed. From the box
plot outliers are observed upper quartile.
The most of the travel time is observed between 0 minutes and 10 minutes. The average travel time
before 06:30 is around 4 minutes.
Figure 3. Graphical display of distributions in Dún Laoghaire-Rathdown Travel Time before 0630
Figure 4.Graphical display of distributionsin Dún Laoghaire-Rathdown Travel Time at 0801 to 0830
Travel time at 08:01 to 08:30 has higher median than the travel time before 06:30. Also, the travel
time range is wider than the range of the travel time before 06:30.
From the box plot of figure 4, the most observations are upper part of the quartile which means it is
skewed to the right.
Figure 5. A multiple scatterplot among four Dublin Counties.
In figure 5, a multiple scatter plot is created to predict relationships between four Dublin County areas.
Asshown in the table , no patterns appear.This shows that there is no relationship betweenDublin City,
Dún Laoghaire-Rathdown, Fingal and South Dublin. Thus, all areas in Dublin County areas are
independent.
3. Kruskal_Wallis H Test
A comparison of means of transport to work in four Dublin County area are tested with R.
Null hypothesis: H0 = μ1= μ2= μ3= μ4 (There is no difference in the average travel time between 4
Dublin County areas.)
Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 ≠ u4 (There is at least 2 differencesbetween4 Dublin County
areas.)
The value of the test statistic is 1.2503. The value of the chi-square-tabulation is 7.814728. Thus, Test
statistic value does not fall into the critical statistic value.
We fail to reject the null hypothesis at the 95% significant level. Therefore, we concluded that the
average traveltime of four Dublin County areas are statistically equal.
4. One way ANOVA analysis
Three planning regions of Eastern and Midlands, North and West, and Southern areas are analysed to
find out differences in the number of people using public transportation to school or work.
Null hypothesis: H0 = μ1= μ2= μ3 (The average number of people using public transportation to school
or work in the three planning regions are the same.)
Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 (There is at least 2 differences between the average number
of people using public transportation in the regions.)
The test result is obtained by ANOVA analysis with SPSS. From the figure 6, the values of the test
statistic is 3207.3. The critical statistic value of 6.96 is obtained from F-distribution table with df1 is 2
and df2 is 18485 (maximum number of df2 is 1000 from F-table). The test statistic value falls into the
critical statistic value.
We have enough evidence to reject the null hypothesis at p-value of 0.000. Therefore, we infer that
there is at least one differences in the average number of people using public transportation to school
or work among Eastern and Midlands, North and West and Southern planning regions [2].
Figure 6. ANOVA analysisfor the mean of people using public transportation to school or work.
The difference in the number of people using private transportation to school or work among the three
planning regions is also analysed by ANOVA.
Null hypothesis: H0 = μ1= μ2= μ3 (The average number of people using private transportation to school
or work in among Eastern and Midlands, North and West and Southern planning regions is the same.)
Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 (There is at least 2 differences between the average number
of people using private transportation in the regions.)
From the figure 7, the values of the test statistic, 50.19 is greater than the critical statistic value of
2.996218.
We have enough evidence to reject the null hypothesis at p-value of 0.000. Therefore, we concluded
that there is atleast one difference in the average number of people using private transportation to school
or work among the three planning regions.
Figure 7. ANOVA analysisfor the mean of people using private Transportation to school or work.
5. Conclusion
In this study, statistical tests for non-normal data are analysed with excel, R and SPSS. Travel time to
school or work in Four Dublin County areas of Dublin City, Dún Laoghaire-Rathdown, Fingal and
Dublin South are filtered from the provided data of “Small Area Population Statistics”.
Descriptive statistics are used to summarize the data. Histogram, Q-Q plot and a multiple scatter plot is
drawn to visualize the data of Four Dublin County areas.
With Kruskal_Wallis H Test,no statistical differences in the average traveltime of four Dublin County
areas are observed.
ANOVA is applied to observe differences the average number of people using public or private
transportation to school or work betweenEastern and Midlands, North and Westand Southern planning
regions. The test result shows that the average number of people using public transportation to school
or work is different at least one area among others. Also, the same result appeared to private
transportation users.
References
[1] http://www.cso.ie/en/statistics/population/populationofeachprovincecountyandcity2011/
[2] http://www.sjsu.edu/faculty/gerstman/StatPrimer/F-table.pdf link to open resource.

Weitere ähnliche Inhalte

Andere mochten auch

Captstone FINAL presentation
Captstone FINAL presentationCaptstone FINAL presentation
Captstone FINAL presentation
Ryan Blair
 
Corneal collagen cross linking
Corneal collagen cross linkingCorneal collagen cross linking
Corneal collagen cross linking
Paavan Kalra
 
Picture Discussion Practice
Picture Discussion PracticePicture Discussion Practice
Picture Discussion Practice
iknowican
 
แผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
แผนการจัดการเรียนรู้ที่ 2 phonics a and e soundแผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
แผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
pantiluck
 

Andere mochten auch (16)

Cooking Light Magazine Fit Home
Cooking Light Magazine Fit HomeCooking Light Magazine Fit Home
Cooking Light Magazine Fit Home
 
Uveites famose saint-lary_2
Uveites famose saint-lary_2Uveites famose saint-lary_2
Uveites famose saint-lary_2
 
The Birthday Present
The Birthday PresentThe Birthday Present
The Birthday Present
 
Captstone FINAL presentation
Captstone FINAL presentationCaptstone FINAL presentation
Captstone FINAL presentation
 
Lect 3 - Herpes viruses hsv
Lect 3 - Herpes viruses hsvLect 3 - Herpes viruses hsv
Lect 3 - Herpes viruses hsv
 
Posyandu Balita & Lansia
Posyandu Balita & LansiaPosyandu Balita & Lansia
Posyandu Balita & Lansia
 
Introduction to virology
Introduction to virologyIntroduction to virology
Introduction to virology
 
Corneal collagen cross linking
Corneal collagen cross linkingCorneal collagen cross linking
Corneal collagen cross linking
 
Food and drink game
Food and drink gameFood and drink game
Food and drink game
 
Parturition
ParturitionParturition
Parturition
 
Keratoconus
KeratoconusKeratoconus
Keratoconus
 
Clothes vocabularies
Clothes vocabulariesClothes vocabularies
Clothes vocabularies
 
Picture Discussion Practice
Picture Discussion PracticePicture Discussion Practice
Picture Discussion Practice
 
แผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
แผนการจัดการเรียนรู้ที่ 2 phonics a and e soundแผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
แผนการจัดการเรียนรู้ที่ 2 phonics a and e sound
 
Zoonoses (zoonotic diseases)
Zoonoses (zoonotic diseases)Zoonoses (zoonotic diseases)
Zoonoses (zoonotic diseases)
 
ACADEMIC HONESTY: Role of Teachers
ACADEMIC HONESTY: Role of TeachersACADEMIC HONESTY: Role of Teachers
ACADEMIC HONESTY: Role of Teachers
 

SueCA2

  • 1. Higher Diploma in Data Analytics Programming for Big Data Semester 2 Advanced Business Data Analysis Continuous Assessment Assignment 2 Sue Ryo Kim
  • 2. 1. Abstract Dublin County consists of four local authorities such as Dublin City, Dún Laoghaire-Rathdown, Fingal and Dublin South. Dublin County has total area of 922 km2 (356 sq mi) and has population of 1,273,069 (CSO 2011 [1]). The purpose of this analysis is to find out if there are any differencesbetweenlocal authorities in Dublin region. As the capital of Ireland, Dublin local authorities have active public transportation and travel times are expected to be influenced by traffic. Rush hour time (between 8 and 9am) is expected to have longer travel time. Also it is expected to have short travel time early in the morning before 07:30 am and after 9:30am as majority of starting hours of school and work is 9:00 am. Among the four authorities, Dublin City Council was expected to have longer travel time. However, considering distance of traveling from outside of Dublin to Dublin City and availability of transportation and road, Dublin City might have shorter travel times than other authorities. Time leaving home to travelto work betweenfour authorities of Dublin City, Dún Laoghaire-Rathdown, Fingal and Dublin South from Eastern and Midlands planning region were compared. Nine variables of time differences are travel time before 6:30, 6:30 to 07:00, 07:01 to 07:30, 07:31 to 08:00, 08:01 to 08:30, 08:31 to 09:00, 09:01 to 09:30, after 9:30 and not stated. 2. Analysis 2.1. Descriptive statistics In this section, four Dublin authorities and all Dublin County Area are analysed. The Figure 1 shows average travel times for the above time samples in four Dublin authorities and average of all Dublin County area. Overall, in Dublin City, traveltimes to school or work takes the least time among four Dublin authorities. Also, the travel time to school or work in Dublin City is almost 20 minutes shorter than the average four Dublin authorities. Dublin City, South Dublin and Fingal shows similar patterns for the average traveltime. However,Dún Laoghaire-Rathdown has shortest travel time before 6:30 excluding not stated. On the other hand, Dún Laoghaire-Rathdown shows the longest travel time at 08:01 to 08:30. Also, other County areas shows gradual decrease in travel time after the highest at 08:01 to 08:30. However, the travel time drops suddenly after the peak in Dún Laoghaire-Rathdown.
  • 3. Figure 1. Average travel time to school or work in Dublin County areas. For all authorities, the busiest period was from 08:01 to 08:30 and the travel time rapidly dropped at 09:01 to 09:30. There is also sudden increase in travel time after 9:30. Both South Dublin and Fingal has very similar travel time patterns, the graphs are almost overlaps from 07:01. Fingal has the longest travel time overall and it takes over 23 minutes longer than compare to the average Dublin County. The time differences between Fingal and Dublin City is approximately 44 minutes. 2.2. Graphical display of distributions Histograms and scatter plots of travel time for Dublin City is drew below in Figure 2. Noticeably all the histogram are skewed right, the median is smaller than the mean, the mode is smaller than the median. Also, the upper tails are over the line in the Q-Q plots. The distance between each of the data points increases to the top of upper end of the range. 0 5 10 15 20 25 30 35 40 45 50 Minutes Axis Title AverageTravelTime to school/work DubCity Dún Laoghaire-Rathdown SouthDub Fingal AveDubCo
  • 4. Figure 2. Histogram and scatterplot of Travel Time in Dublin City Figure 3 and figure 4 compared the average traveltime to school or work in Dún Laoghaire-Rathdown before 06:30 and at 08:01 to 08:30. The average of travel time in Dún Laoghaire-Rathdown is the most close to the average of travel time in the all for Dublin County areas. Also, Dún Laoghaire-Rathdown has the shortest and longest travel times among the Dublin County areas. In the figure 3, binominal distribution is observed. From the box plot outliers are observed upper quartile. The most of the travel time is observed between 0 minutes and 10 minutes. The average travel time before 06:30 is around 4 minutes.
  • 5. Figure 3. Graphical display of distributions in Dún Laoghaire-Rathdown Travel Time before 0630 Figure 4.Graphical display of distributionsin Dún Laoghaire-Rathdown Travel Time at 0801 to 0830
  • 6. Travel time at 08:01 to 08:30 has higher median than the travel time before 06:30. Also, the travel time range is wider than the range of the travel time before 06:30. From the box plot of figure 4, the most observations are upper part of the quartile which means it is skewed to the right. Figure 5. A multiple scatterplot among four Dublin Counties. In figure 5, a multiple scatter plot is created to predict relationships between four Dublin County areas. Asshown in the table , no patterns appear.This shows that there is no relationship betweenDublin City, Dún Laoghaire-Rathdown, Fingal and South Dublin. Thus, all areas in Dublin County areas are independent. 3. Kruskal_Wallis H Test A comparison of means of transport to work in four Dublin County area are tested with R. Null hypothesis: H0 = μ1= μ2= μ3= μ4 (There is no difference in the average travel time between 4 Dublin County areas.) Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 ≠ u4 (There is at least 2 differencesbetween4 Dublin County areas.)
  • 7. The value of the test statistic is 1.2503. The value of the chi-square-tabulation is 7.814728. Thus, Test statistic value does not fall into the critical statistic value. We fail to reject the null hypothesis at the 95% significant level. Therefore, we concluded that the average traveltime of four Dublin County areas are statistically equal. 4. One way ANOVA analysis Three planning regions of Eastern and Midlands, North and West, and Southern areas are analysed to find out differences in the number of people using public transportation to school or work. Null hypothesis: H0 = μ1= μ2= μ3 (The average number of people using public transportation to school or work in the three planning regions are the same.) Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 (There is at least 2 differences between the average number of people using public transportation in the regions.) The test result is obtained by ANOVA analysis with SPSS. From the figure 6, the values of the test statistic is 3207.3. The critical statistic value of 6.96 is obtained from F-distribution table with df1 is 2 and df2 is 18485 (maximum number of df2 is 1000 from F-table). The test statistic value falls into the critical statistic value. We have enough evidence to reject the null hypothesis at p-value of 0.000. Therefore, we infer that there is at least one differences in the average number of people using public transportation to school or work among Eastern and Midlands, North and West and Southern planning regions [2]. Figure 6. ANOVA analysisfor the mean of people using public transportation to school or work.
  • 8. The difference in the number of people using private transportation to school or work among the three planning regions is also analysed by ANOVA. Null hypothesis: H0 = μ1= μ2= μ3 (The average number of people using private transportation to school or work in among Eastern and Midlands, North and West and Southern planning regions is the same.) Alternative hypothesis: H1= μ1 ≠ μ2 ≠ μ3 (There is at least 2 differences between the average number of people using private transportation in the regions.) From the figure 7, the values of the test statistic, 50.19 is greater than the critical statistic value of 2.996218. We have enough evidence to reject the null hypothesis at p-value of 0.000. Therefore, we concluded that there is atleast one difference in the average number of people using private transportation to school or work among the three planning regions. Figure 7. ANOVA analysisfor the mean of people using private Transportation to school or work.
  • 9. 5. Conclusion In this study, statistical tests for non-normal data are analysed with excel, R and SPSS. Travel time to school or work in Four Dublin County areas of Dublin City, Dún Laoghaire-Rathdown, Fingal and Dublin South are filtered from the provided data of “Small Area Population Statistics”. Descriptive statistics are used to summarize the data. Histogram, Q-Q plot and a multiple scatter plot is drawn to visualize the data of Four Dublin County areas. With Kruskal_Wallis H Test,no statistical differences in the average traveltime of four Dublin County areas are observed. ANOVA is applied to observe differences the average number of people using public or private transportation to school or work betweenEastern and Midlands, North and Westand Southern planning regions. The test result shows that the average number of people using public transportation to school or work is different at least one area among others. Also, the same result appeared to private transportation users. References [1] http://www.cso.ie/en/statistics/population/populationofeachprovincecountyandcity2011/ [2] http://www.sjsu.edu/faculty/gerstman/StatPrimer/F-table.pdf link to open resource.