SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
American Voter Study - by Oojwal Manglik
15/04/2015
Introduction:
The Presidential Elections in the United States are interesting not just for Americans but
also for the rest of the world, owing to the status of USA as a major player in global events.
Since these are characterized by high marketing spends by candidates and scrutiny by the
general public, it is interesting to observe how the American voters perception of the
presidential candidate changes during a President's term in office.
For this study I would like to analyse the proportion of an American voter who has a
similar voting pattern between two consecutive US presidential elections
Data:
About the data set
I have taken the American National Electoral Survey data taken in 2012. The extract used of
the American National Elections Study (ANES) provides a sample of selected indicators in
the 2012. Complete citation of the data used is available in the citation section.
Data collection methodology
Data for the study was collected over a 5 month period (September 2012-January 2013)
through face to face and internet based interviews. For the pre-election data, interviews
were conducted from 2 months prior to the election day and for post election data,
interviews were conducted for another 2 months post election result being declared.
Cases
Each case in the study represents a survey respondent who has reported his voting pattern
in the 2008 and 2012 elections.
Nature of Study
During the collection of the data set, surveyors were only collecting data based on
observations made by them and took no measures were taken by the surveyor to introduce
any bias or influence that would alter the response of the voter. Also data was collected
multiple times during the study. Hence, the proposed project will hence be a prospective
observational study.
Scope of inference - Generalizability
This study takes data only for 2008 and 2012 elections and cannot be generalized for all US
Presidential elections. A similar analysis conducted for multiple pairs of elections can give
greater insight on how the proportion changes between different pairs of elections.
Scope of inference - Bias
The current study may not be generalizable to the complete population of the United States
of America mainly because the survey has a majority of African American and Hispanic
respondents, a demographic mix which is not representative of the complete population.
Scope of inference - Causality
For the current study, there may be some confounding variables which can make it look as
if there is causal relationship between voting patterns of a respondent during multiple
elections. For ex. a voter may simply vote for Barack Obama because the voter is a lifelong
democrat. Such confounding variables may be difficult to exhaustively identify. As a result
causality cannot be established with absolute certainty.
Variables
The data fields used for this study are as follows:
1. interest_voted2008: Did R vote for President in 2008 The first variable used in the
study is the vote cast by respondent in 2008 election. This is a non ordinal categorical
variable with 3 levels - "Barack Obama", "John McCain" and Others. Additionally 1394
values are recorded as NA (23.5% of all values) for various reasons.
#first few records of whovote2008
head(anes$interest_whovote2008)
## [1] Barack Obama Barack Obama Barack Obama <NA> Barack Obama
## [6] <NA>
## Levels: Barack Obama John Mccain Other {Specify}
#Summary of interest_whovote2008 field
VotePat2008<-anes$interest_whovote2008
VotePat2008Sumry<-summary(VotePat2008)
VotePat2008Sumry
## Barack Obama John Mccain Other {Specify} NA's
## 2704 1702 114 1394
2. presvote2012_x: SUMMARY: For whom did R vote for President in 2012 The second
variable used in this study is the vote cast by the President in 2012.This is a non
ordinal categorical variable with 3 levels - "Barack Obama", "Mitt Romney" and Others.
Additionally 1394 values are recorded as NA (27.1% of all values) for various reasons.
#first few records of presvote2012_x field
head(anes$presvote2012_x)
## [1] <NA> Barack Obama Barack Obama Barack Obama Barack Obama
## [6] <NA>
## Levels: Barack Obama Mitt Romney Other
#Summary of interest_presvote2012_x field
VotePat2012<-anes$presvote2012_x
VotePat2012Sumry<-summary(VotePat2012)
VotePat2012Sumry
## Barack Obama Mitt Romney Other NA's
## 2496 1692 118 1608
Note: The Republican candidates for 2008 and 2012 were not the same (John McCain in
2008 and Mitt Romney in 2012). For this study, it is assumed the two values to be same.
Such a response represents a class of voters who do not change their voting preference (to
become pro-president) between elections as a result of the President's work during his
term.
3. sample_state: SAMPLE- State of Respondent address (used for exploratory analysis)
The third variable used in this study is the state from which the respondent comes.This is a
non ordinal categorical variable with 51 levels, each level being a state from the USA.
#first few records of sample_state field
head(anes$sample_state)
## [1] AL AL AL AL AL AL
## 51 Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA ... WY
The total data set has responses from 5914 American voters. Responses which have value
of NA recorded in any of the fields of interest have been ignored for the exploratory and
statistical analysis in this study.
Exploratory data analysis:
The following bar plot depicts the number of respondents in the study who voted for the
differnt Presidential Candidates in 2008. Visually it depicts that most respondents (2704 to
be precise) voted for Barack Obama. In terms of percentage, 59.82% of valid responses
were for Barack Obama, 37.65% were for John McCain and 2.52% were for Others. Here
valid responses are those responses which have not been categorized as NA for this
variable.
This response pattern is inline with the outcome of the 2008 Presidential elections in
which Barack Obama had emerged as the winner. Of the 61.6% of eligible Americans that
had cast their vote, Barack Obama had secured 52.9% votes, John McCain had secured
45.7% votes and others had secured 1.4% votes (source of data is Wikipedia page -
http://en.wikipedia.org/wiki/United_States_presidential_election,_2008).
There is a significant observed variation between the observed sample proportions and the
actual reported proportions.
Note : All "NA" values have been omitted for each variable under consideration for the
purpose of this exploratory analysis.
#Bar plot distribution of interest_whovote2008 field
barplot(VotePat2008Sumry[1:3],main = "Voting Pattern of Respondents in
2008",xlab="Name of Candidates",ylab="Number of Respondents")
The following bar plot depicts the number of respondents in the study who voted for the
differnt Presidential Candidates in 2012. Visually it depicts that most respondents (2496 to
be precise) voted for Barack Obama. In terms of percentage, 57.97% of valid responses
were for Barack Obama, 39.29% were for Mitt Romney and 2.74% were for Others. Here
valid responses are those responses which have not been categorized as NA for this
variable.
This response pattern is inline with the outcome of the 2008 Presidential elections in
which Barack Obama had emerged as the winner. Of the 58.2% of eligible Americans that
had cast their vote, Barack Obama had secured 51.1% votes, Mitt Romney had secured
47.2% votes and others had secured 1.7% votes (source of data is Wikipedia page -
http://en.wikipedia.org/wiki/United_States_presidential_election,_2012).
It is also interesting to observe here that the difference between the voting patterns in
2008 and 2012 of respondents is +/- 2% which is significant but small.
There is a significant observed variation between the observed sample proportions and the
actual reported proportions.
#Bar plot distribution of presvote2012_x field
barplot(VotePat2012Sumry[1:3],main = "Voting Pattern of Respondents in
2012",xlab="Name of Candidates",ylab="Number of Respondents")
Now I move on to the comparison of the sample voting patterns of 2008 and 2012 by
looking at the below contingency table. The Y-Axis here depicts the sample voting patterns
for 2008 elections and sample voting patterns for 2012 elections. In this table, the diagnal
values represents respondents whose voting behaviour did not change in the 2008 and
2012 elections. Just by looking at the data I can see that a very high proportion of
respondents had similar voting preferences both in 2008 and 2012. This is indicative of the
fact that voting preferences do not change significantly amongst voters between elections.
#Contingency table to compare voting patterns between 2008 and 2012
ContingencyTabVote<-table(VotePat2008,VotePat2012)
ContingencyTabVote
## VotePat2012
## VotePat2008 Barack Obama Mitt Romney Other
## Barack Obama 2077 184 34
## John Mccain 95 1325 35
## Other {Specify} 22 32 37
The following bar plot depicts the state wise voting pattern of the respondents for 2008
elections. Visually it is evident that voting patterns vary from state to state. One limitation
to this inference is that the number of respondents available for each state is not the same
and for certain states the number of respondents is very less. However this inference is
inline with the traditional view about certain states in the US being affiliated to certain
political parties (democrat/republican).
#Compare statewise voting patterns in 2008
ContingencyTabPat2008<-table(VotePat2008,RespState)
barplot(ContingencyTabPat2008,legend=rownames(ContingencyTabPat2008),main =
"State Wise Voting Pattern of Respondents in 2008",xlab="States",ylab="Number
of Respondents")
The following bar plot depicts the state wise voting pattern of the respondents for 2012
elections. Visually this plot also depicts that voting patterns vary from state to state. There
is also a certain consistency in the voting patterns between this plot and the previous plot.
This could possibly be attributed to the fact that it is the same respondent who has been
sampled for getting responses of 2008 and 2012. However this inference is inline with the
traditional view about certain states in the US being affiliated to certain political parties
(democrat/republican).
#Compare statewise voting patterns in 2012
ContingencyTabPat2012<-table(VotePat2012,RespState)
barplot(ContingencyTabPat2012,legend=rownames(ContingencyTabPat2012),main =
"State Wise Voting Pattern of Respondents in 2012",xlab="States",ylab="Number
of Respondents")
Inference:
For the statistical analysis, my objective is to compare 2 paired categorical variables which
depict the voting behaviour of respondents in the 2008 and 2012 elections. Let us first
proceed with the 95% confidence interval analysis.
Confidence Interval
The statistical parameter of interest chosen for this purpose is the proportion. The
objective of this analysis is to find the 95% confidence interval for the proportion of voters
who have voted for either the president (voted Barack Obama in both elections) or not the
president (voted John McCain - 2008 and voted Mitt Romney - 2012, voted other in both
elections) in the two elections. In other word this is the proportion of voters whose voting
pattern has remained consistent in the two elections.
I begin this analysis by converting the available sample data from 2008 and 2012 to the
same levels for comparison. The levels chosen for our analyisis are President, Not President
and Other. Even though the category other also constitutes a vote not for Barack Obama, for
now I have classified them separately.
#Converting voter data into the same levels for 2008 and 2012
Vote2008<-revalue(VotePat2008, c("Barack Obama"="President","John
Mccain"="Not President","Other {Specify}"="Other"))
Vote2012<-revalue(VotePat2012, c("Barack Obama"="President","Mitt
Romney"="Not President"))
Now I consolidate the two vectors into a single vector that records the comparitive voting
pattern in 2008 and 2012. Here if the voting pattern in the two years is the same, a value of
1 is recorded. If the voting pattern in the two years is not the same, a value of 0 is recorded.
All responses with NA value are omitted from the final vector. Distribution of the final
comparison is depicted in the table output for VoteSamp below. This vector is now a binary
variable with output only as success/failure or 1/0, 1 meaning that the respondent voted
similarly in 2008 and 2012 and 0 meaning that the respondent did not vote similarly in
2008 and 2012.
#Initializing the vector to compare voting between 2008 and 2012
VoteSamp<-rep(NA,length(Vote2008))
#Populating the comparison vector VoteSamp
#If value from voter sample in 2008 or 2012 is NA then assign NA. I would
later remove these records
#If value from voter sample in 2008 or 2012 is equal (ex. Voter cast his vote
for Barack Obama in both elections) then assign 1
#If value from voter sample in 2008 or 2012 is not equal (ex. Voter cast his
vote for different candidates in two elections) then assign 0
for (i in 1:length(Vote2008)) {
if (is.na(Vote2008[i])|is.na(Vote2012[i])) {
VoteSamp[i]<-NA
} else if (Vote2008[i]!=Vote2012[i]) {
VoteSamp[i]<- 0
} else {
VoteSamp[i]<- 1
}
}
#Removing all samples for which response NA has been recorded
VoteSamp<-na.omit(VoteSamp)
#VoteSamp has now been converted into a binary distribution
table(VoteSamp)
## VoteSamp
## 0 1
## 402 3439
This table shows that 89.53% of the candidates voted similarly between 2008 and 2012.
This is the sample proportion (p-hat) of our study.
#Calculating sample proportions
SampProportionSame = table(VoteSamp)[2]/length(VoteSamp)
SampProportionNotSame = table(VoteSamp)[1]/length(VoteSamp)
SampProportionSame
## 1
## 0.8953398
Next I construct the sampling distribution of sample proportions based on the available
VoteSamp vector. But before I do that, I calculate the number of samples needed for 1%
margin of error. For calculating the margin of error, I have assumed worst case scenario for
the proportion of success and failure as 50% each. This is mainly because no reference
proportions from any reliable past study is available currently. Based on this, the number
of samples requied is 9604
#Calculating the number of sample required for 1% margin of error for a 95%
confidence interval assuming equal probability of success & failure
zvalue<-qnorm(0.975)
n=(zvalue^2)*(0.5)*(0.5)/(0.01^2)
n
## [1] 9603.647
Now I create the sampling distribution. Number of samples in the sampling distribution is
taken as 500 with each sample consisting of 9604 samples. Sampling is done with
replacement to ensure independence of each sample. Summary and histogram of the
sampling distribution constructed is given below.
#Creating the sampling distribution
SamplingDistribution_Proportion<-rep(NA,500)
for (i in 1:length(SamplingDistribution_Proportion)) {
Samp<-sample(VoteSamp,n,replace=TRUE)
SamplingDistribution_Proportion[i] = table(Samp)[2]/length(Samp)
}
#Summary of sampling distribution
summary(SamplingDistribution_Proportion)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8864 0.8931 0.8953 0.8952 0.8971 0.9042
#Histogram of sampling distribution
hist(SamplingDistribution_Proportion)
From the histogram of the sampling distribution, I can visually see that it is nearly normal
and centred around the sample proportion that I had calculated earlier. There is also very
less skew on the left or right. But before I apply the central limit theorem, let us evaluate
each required condition:
1. Independence - The available sample vector VoteSamp consists of 3841 respondents.
This is <10% of the American voter population. During constructing the sampling
distribution of proportion, I have sampled with replacement. This has ensured the
condition of independence is met for our sampling distribution.
2. Skewness - I have 3439 successes and 402 failures in our sample. That means I have at
least 10 successes and 10 failures in our sample and this satisfies the success-failure
requirement. With this I can conclude that the sampling distribution for proportion is
not skewed and is approximately normal as required by the central limit theorem. By
looking at the histogram visually also I can conclude that this condition is met.
It is evident that we have adequate number of samples available for our analysis.
Consequently we go ahead and use the pvalue method for our analysis (no need for t-
distribution here). Now that the conditions have been met, I calculate the mean of the
sampling distribution that has been created.
#Mean of sampling distribution
SamplingDistribution_Proportion_Mean<-mean(SamplingDistribution_Proportion)
I also calculate the standard error of the sampling distribution for a 95% confidence
interval. Please note I have taken the sample size 'n' such that this standard error is 1%.
#Standard error for 95% confidence interval using P Distribution
SE<-(zvalue)*(SamplingDistribution_Proportion_Mean^0.5)*((1-
SamplingDistribution_Proportion_Mean)^0.5)/(n^0.5)
SE
## [1] 0.006125668
Confidence interval is calculated as sampling distribution mean +/- standard error of the
sampling distribution
LowerConfidenceLimit <- SamplingDistribution_Proportion_Mean - SE
UpperConfidenceLimit <- SamplingDistribution_Proportion_Mean + SE
ConfidenceIntervalP <- c(LowerConfidenceLimit,UpperConfidenceLimit)
ConfidenceIntervalP
## [1] 0.8890837 0.9013351
Hypothesis Testing
Now I move on to the hypothesis testing. I want to check the possibility that population
proportion for the number of people who voted similarly in the 2008 and 2012 elections is
91%. For this I construct the following hypothesis:
Ho = PopulationProportion = 91%
Ha = PopulationProportion != 91%
For this hypothesis, we will be performing a two side test for the normal distribution. We
will reuse the sampling distribution already created earlier for proportion and it has
already been established that this distribution is nearly normal. First we calculate the z
value for the null value of 91%:
NullVal<-0.91
zvalue<-(SamplingDistribution_Proportion_Mean-NullVal)/SE
zvalue
## [1] -2.414526
Next I calculate the associated p value for this z score. Note that since this is a two tailed
test. Area of interest in this case is the area under the curve for which
zscore > abs(zvalue)
Below I calculate the pvalue for this area and I would be multiplying the obtained p value
by 2 for the two tailed test.
pval<-2*pnorm(zvalue)
pval
## [1] 0.01575569
Since the obtained p value is very small (1.5%), we can reject the null hypothesis that the
population proportion of Americans who voted similarly in the 2008 and 2012 US
Presidential elections is 91% or more.
The result of the hypothesis test is inline with the 95% confidence interval we have earlier
identified. Hence we can say that the two findings are consistent.
Conclusion:
Based on this analysis, I can conclude with 95% confident that 88.9% to 90.1% of all
Americans vote consistently between the first and second term of a presidential election
with 1% margin of error.
For the hypothesis testing, we can reject the null hypothesis that the population proportion
of Americans who voted similarly in the 2008 and 2012 US Presidential elections is 91%.
In the future, this methodology can be repeated for multiple pairs of US Presidential
elections to see if there is any statistical consistency in the findings over the years.
The main learning out of this excercise has been a practical insight into how statistical
techniques can be used to strengthen our ability to draw conclusions and inferences.
Citation:
Full details of this data set is available in the following links:
Information on the study http://www.electionstudies.org/
Study Codebook
https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fanes1.html
Data Set Used http://bit.ly/dasi_anes_data
Additionally following wikipedia links have also been referenced for checking the actual
result of the presidential election in 2008 and 2012:
http://en.wikipedia.org/wiki/United_States_presidential_election,_2008
http://en.wikipedia.org/wiki/United_States_presidential_election,_2012

Weitere ähnliche Inhalte

Was ist angesagt?

Trump vs Clinton - Polling Opinions: How the polls were wrong and how to fix...
Trump vs Clinton - Polling Opinions:  How the polls were wrong and how to fix...Trump vs Clinton - Polling Opinions:  How the polls were wrong and how to fix...
Trump vs Clinton - Polling Opinions: How the polls were wrong and how to fix...chrisbrock54
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copyAlec Mitchell
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copyAlec Mitchell
 
Democracy Corps/Campaign for America's Future Election 2010 Poll
Democracy Corps/Campaign for America's Future Election 2010 PollDemocracy Corps/Campaign for America's Future Election 2010 Poll
Democracy Corps/Campaign for America's Future Election 2010 Pollourfuture
 

Was ist angesagt? (6)

Trump vs Clinton - Polling Opinions: How the polls were wrong and how to fix...
Trump vs Clinton - Polling Opinions:  How the polls were wrong and how to fix...Trump vs Clinton - Polling Opinions:  How the polls were wrong and how to fix...
Trump vs Clinton - Polling Opinions: How the polls were wrong and how to fix...
 
Comments on “The Political Economy of Corruption in the Bureaucracy”
Comments on “The Political Economy of Corruption in the Bureaucracy”Comments on “The Political Economy of Corruption in the Bureaucracy”
Comments on “The Political Economy of Corruption in the Bureaucracy”
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copy
 
mitchell_186_final paper copy
mitchell_186_final paper copymitchell_186_final paper copy
mitchell_186_final paper copy
 
Andre blais 1
Andre blais 1Andre blais 1
Andre blais 1
 
Democracy Corps/Campaign for America's Future Election 2010 Poll
Democracy Corps/Campaign for America's Future Election 2010 PollDemocracy Corps/Campaign for America's Future Election 2010 Poll
Democracy Corps/Campaign for America's Future Election 2010 Poll
 

Andere mochten auch

B.vishnu resume
B.vishnu resumeB.vishnu resume
B.vishnu resumevishnu B
 
El Papel del Docente On Line
El Papel del Docente On Line El Papel del Docente On Line
El Papel del Docente On Line Francisco Colina
 
Roles and Functions of Educational Technology in the 21st Century Education
Roles and Functions of Educational Technology in the 21st Century EducationRoles and Functions of Educational Technology in the 21st Century Education
Roles and Functions of Educational Technology in the 21st Century EducationSyrine Rose Arnaiz
 
Parrilla de programacion
Parrilla de programacionParrilla de programacion
Parrilla de programacionlavozdegenova
 
Мастер-класс «Контроль знаний на уроках химии»
Мастер-класс «Контроль знаний  на уроках химии»Мастер-класс «Контроль знаний  на уроках химии»
Мастер-класс «Контроль знаний на уроках химии»rvroman
 
Los Aportes Significativos de Sócrates, Platón y Aristóteles.
Los Aportes Significativos de Sócrates, Platón y Aristóteles.Los Aportes Significativos de Sócrates, Platón y Aristóteles.
Los Aportes Significativos de Sócrates, Platón y Aristóteles.Ibranjeny Perez
 
Business etiquettes in australia
Business etiquettes in australiaBusiness etiquettes in australia
Business etiquettes in australiarashmichainani
 
Sustitución de importaciones, 1º etapa (1932-1952)
Sustitución de importaciones, 1º etapa (1932-1952)Sustitución de importaciones, 1º etapa (1932-1952)
Sustitución de importaciones, 1º etapa (1932-1952)OcampoKaren
 
20170209 comunicato slc-cgil coordinamento nazionale rsu s
20170209  comunicato slc-cgil coordinamento nazionale rsu s20170209  comunicato slc-cgil coordinamento nazionale rsu s
20170209 comunicato slc-cgil coordinamento nazionale rsu sFabio Bolo
 
Aportes de Sócrates, Platón y Aristóteles a la Comunicación
Aportes de Sócrates, Platón y Aristóteles a la ComunicaciónAportes de Sócrates, Platón y Aristóteles a la Comunicación
Aportes de Sócrates, Platón y Aristóteles a la ComunicaciónPatricia Alejos
 
Analýza právního prostředí a limitů rozvoje metropolitní oblasti
Analýza právního prostředí a limitů rozvoje metropolitní oblastiAnalýza právního prostředí a limitů rozvoje metropolitní oblasti
Analýza právního prostředí a limitů rozvoje metropolitní oblastiFrantišek Šudřich
 

Andere mochten auch (19)

B.vishnu resume
B.vishnu resumeB.vishnu resume
B.vishnu resume
 
El Papel del Docente On Line
El Papel del Docente On Line El Papel del Docente On Line
El Papel del Docente On Line
 
Roles and Functions of Educational Technology in the 21st Century Education
Roles and Functions of Educational Technology in the 21st Century EducationRoles and Functions of Educational Technology in the 21st Century Education
Roles and Functions of Educational Technology in the 21st Century Education
 
O allah! bestow on medina
O allah! bestow on medinaO allah! bestow on medina
O allah! bestow on medina
 
c.v moin mech. engg. oil
c.v moin mech. engg. oilc.v moin mech. engg. oil
c.v moin mech. engg. oil
 
Parrilla de programacion
Parrilla de programacionParrilla de programacion
Parrilla de programacion
 
Presentation1
Presentation1Presentation1
Presentation1
 
La diabetis
La diabetisLa diabetis
La diabetis
 
Infor. vacunas
Infor. vacunasInfor. vacunas
Infor. vacunas
 
MABE
MABEMABE
MABE
 
Мастер-класс «Контроль знаний на уроках химии»
Мастер-класс «Контроль знаний  на уроках химии»Мастер-класс «Контроль знаний  на уроках химии»
Мастер-класс «Контроль знаний на уроках химии»
 
Los Aportes Significativos de Sócrates, Platón y Aristóteles.
Los Aportes Significativos de Sócrates, Platón y Aristóteles.Los Aportes Significativos de Sócrates, Platón y Aristóteles.
Los Aportes Significativos de Sócrates, Platón y Aristóteles.
 
Business etiquettes in australia
Business etiquettes in australiaBusiness etiquettes in australia
Business etiquettes in australia
 
Sustitución de importaciones, 1º etapa (1932-1952)
Sustitución de importaciones, 1º etapa (1932-1952)Sustitución de importaciones, 1º etapa (1932-1952)
Sustitución de importaciones, 1º etapa (1932-1952)
 
20170209 comunicato slc-cgil coordinamento nazionale rsu s
20170209  comunicato slc-cgil coordinamento nazionale rsu s20170209  comunicato slc-cgil coordinamento nazionale rsu s
20170209 comunicato slc-cgil coordinamento nazionale rsu s
 
Aportes de Sócrates, Platón y Aristóteles a la Comunicación
Aportes de Sócrates, Platón y Aristóteles a la ComunicaciónAportes de Sócrates, Platón y Aristóteles a la Comunicación
Aportes de Sócrates, Platón y Aristóteles a la Comunicación
 
Linear Regression using R
Linear Regression using RLinear Regression using R
Linear Regression using R
 
Analýza právního prostředí a limitů rozvoje metropolitní oblasti
Analýza právního prostředí a limitů rozvoje metropolitní oblastiAnalýza právního prostředí a limitů rozvoje metropolitní oblasti
Analýza právního prostředí a limitů rozvoje metropolitní oblasti
 
Stucco Pompeji
Stucco PompejiStucco Pompeji
Stucco Pompeji
 

Ähnlich wie Complete Study

Analysis of us presidential elections, 2016
  Analysis of us presidential elections, 2016  Analysis of us presidential elections, 2016
Analysis of us presidential elections, 2016Tapan Saxena
 
Who should be nominated to run in the 2012 U.S. presidential election?
Who should be nominated to run in the 2012 U.S. presidential election?Who should be nominated to run in the 2012 U.S. presidential election?
Who should be nominated to run in the 2012 U.S. presidential election?agraefe
 
Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?agraefe
 
Matheson Capstone Final Draft V2
Matheson Capstone Final Draft V2Matheson Capstone Final Draft V2
Matheson Capstone Final Draft V2Trent Matheson, MPP
 
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612Magellan Strategies
 
Persp2012 forward
Persp2012 forwardPersp2012 forward
Persp2012 forwardkristinaak
 
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...Toni Menninger
 
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking Poll
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking PollA 2016 Election Post-Mortem: The ABC News/Washington Post Tracking Poll
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking PollLangerResearch
 
The Making of an Engaged Electorate? (view full screen)
The Making of an Engaged Electorate? (view full screen)The Making of an Engaged Electorate? (view full screen)
The Making of an Engaged Electorate? (view full screen)Rhesa Jenkins
 
Part 1 Individual Factors Affecting Voter Turnout Based on .docx
Part 1 Individual Factors Affecting Voter Turnout Based on .docxPart 1 Individual Factors Affecting Voter Turnout Based on .docx
Part 1 Individual Factors Affecting Voter Turnout Based on .docxdanhaley45372
 
A2 G&P invisible primaries
A2 G&P invisible primariesA2 G&P invisible primaries
A2 G&P invisible primariesOliver Pratten
 
An Inside Look at Campaign 2008
An Inside Look at Campaign 2008An Inside Look at Campaign 2008
An Inside Look at Campaign 2008tarekrizk
 
Gross_Nedler_Ukani_Final Paper
Gross_Nedler_Ukani_Final PaperGross_Nedler_Ukani_Final Paper
Gross_Nedler_Ukani_Final PaperEvan Gross
 
Zak Baker Capstone paper
Zak Baker Capstone paperZak Baker Capstone paper
Zak Baker Capstone paperZak Baker
 
Election 2012
Election 2012Election 2012
Election 2012osistrunk
 
1 Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx
1  Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx1  Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx
1 Quant Tools for Bus & Econ – ECON 220-02 (11273) .docxhoney725342
 
Giving You the Edge - The Science of Winning Elections
Giving You the Edge - The Science of Winning Elections Giving You the Edge - The Science of Winning Elections
Giving You the Edge - The Science of Winning Elections Michael Lieberman
 
Nov 2012 tracking charts in new template
Nov 2012 tracking charts in new templateNov 2012 tracking charts in new template
Nov 2012 tracking charts in new templateKFF
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 

Ähnlich wie Complete Study (20)

Analysis of us presidential elections, 2016
  Analysis of us presidential elections, 2016  Analysis of us presidential elections, 2016
Analysis of us presidential elections, 2016
 
Who should be nominated to run in the 2012 U.S. presidential election?
Who should be nominated to run in the 2012 U.S. presidential election?Who should be nominated to run in the 2012 U.S. presidential election?
Who should be nominated to run in the 2012 U.S. presidential election?
 
Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?Who should be nominated to run in the 2012 U.S. Presidential Election?
Who should be nominated to run in the 2012 U.S. Presidential Election?
 
Matheson Capstone Final Draft V2
Matheson Capstone Final Draft V2Matheson Capstone Final Draft V2
Matheson Capstone Final Draft V2
 
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612
Magellan Strategies 2012 Internal Survey Research Summary Memorandum 120612
 
Persp2012 forward
Persp2012 forwardPersp2012 forward
Persp2012 forward
 
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...
Descriptive and Inferential Statistical Methods: Analysis of Voting and Elect...
 
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking Poll
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking PollA 2016 Election Post-Mortem: The ABC News/Washington Post Tracking Poll
A 2016 Election Post-Mortem: The ABC News/Washington Post Tracking Poll
 
The Making of an Engaged Electorate? (view full screen)
The Making of an Engaged Electorate? (view full screen)The Making of an Engaged Electorate? (view full screen)
The Making of an Engaged Electorate? (view full screen)
 
Part 1 Individual Factors Affecting Voter Turnout Based on .docx
Part 1 Individual Factors Affecting Voter Turnout Based on .docxPart 1 Individual Factors Affecting Voter Turnout Based on .docx
Part 1 Individual Factors Affecting Voter Turnout Based on .docx
 
A2 G&P invisible primaries
A2 G&P invisible primariesA2 G&P invisible primaries
A2 G&P invisible primaries
 
An Inside Look at Campaign 2008
An Inside Look at Campaign 2008An Inside Look at Campaign 2008
An Inside Look at Campaign 2008
 
2013 Rockefeller Center NH State of the State Poll
2013 Rockefeller Center NH State of the State Poll2013 Rockefeller Center NH State of the State Poll
2013 Rockefeller Center NH State of the State Poll
 
Gross_Nedler_Ukani_Final Paper
Gross_Nedler_Ukani_Final PaperGross_Nedler_Ukani_Final Paper
Gross_Nedler_Ukani_Final Paper
 
Zak Baker Capstone paper
Zak Baker Capstone paperZak Baker Capstone paper
Zak Baker Capstone paper
 
Election 2012
Election 2012Election 2012
Election 2012
 
1 Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx
1  Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx1  Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx
1 Quant Tools for Bus & Econ – ECON 220-02 (11273) .docx
 
Giving You the Edge - The Science of Winning Elections
Giving You the Edge - The Science of Winning Elections Giving You the Edge - The Science of Winning Elections
Giving You the Edge - The Science of Winning Elections
 
Nov 2012 tracking charts in new template
Nov 2012 tracking charts in new templateNov 2012 tracking charts in new template
Nov 2012 tracking charts in new template
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 

Complete Study

  • 1. American Voter Study - by Oojwal Manglik 15/04/2015 Introduction: The Presidential Elections in the United States are interesting not just for Americans but also for the rest of the world, owing to the status of USA as a major player in global events. Since these are characterized by high marketing spends by candidates and scrutiny by the general public, it is interesting to observe how the American voters perception of the presidential candidate changes during a President's term in office. For this study I would like to analyse the proportion of an American voter who has a similar voting pattern between two consecutive US presidential elections Data: About the data set I have taken the American National Electoral Survey data taken in 2012. The extract used of the American National Elections Study (ANES) provides a sample of selected indicators in the 2012. Complete citation of the data used is available in the citation section. Data collection methodology Data for the study was collected over a 5 month period (September 2012-January 2013) through face to face and internet based interviews. For the pre-election data, interviews were conducted from 2 months prior to the election day and for post election data, interviews were conducted for another 2 months post election result being declared. Cases Each case in the study represents a survey respondent who has reported his voting pattern in the 2008 and 2012 elections. Nature of Study During the collection of the data set, surveyors were only collecting data based on observations made by them and took no measures were taken by the surveyor to introduce any bias or influence that would alter the response of the voter. Also data was collected multiple times during the study. Hence, the proposed project will hence be a prospective observational study. Scope of inference - Generalizability This study takes data only for 2008 and 2012 elections and cannot be generalized for all US Presidential elections. A similar analysis conducted for multiple pairs of elections can give greater insight on how the proportion changes between different pairs of elections.
  • 2. Scope of inference - Bias The current study may not be generalizable to the complete population of the United States of America mainly because the survey has a majority of African American and Hispanic respondents, a demographic mix which is not representative of the complete population. Scope of inference - Causality For the current study, there may be some confounding variables which can make it look as if there is causal relationship between voting patterns of a respondent during multiple elections. For ex. a voter may simply vote for Barack Obama because the voter is a lifelong democrat. Such confounding variables may be difficult to exhaustively identify. As a result causality cannot be established with absolute certainty. Variables The data fields used for this study are as follows: 1. interest_voted2008: Did R vote for President in 2008 The first variable used in the study is the vote cast by respondent in 2008 election. This is a non ordinal categorical variable with 3 levels - "Barack Obama", "John McCain" and Others. Additionally 1394 values are recorded as NA (23.5% of all values) for various reasons. #first few records of whovote2008 head(anes$interest_whovote2008) ## [1] Barack Obama Barack Obama Barack Obama <NA> Barack Obama ## [6] <NA> ## Levels: Barack Obama John Mccain Other {Specify} #Summary of interest_whovote2008 field VotePat2008<-anes$interest_whovote2008 VotePat2008Sumry<-summary(VotePat2008) VotePat2008Sumry ## Barack Obama John Mccain Other {Specify} NA's ## 2704 1702 114 1394 2. presvote2012_x: SUMMARY: For whom did R vote for President in 2012 The second variable used in this study is the vote cast by the President in 2012.This is a non ordinal categorical variable with 3 levels - "Barack Obama", "Mitt Romney" and Others. Additionally 1394 values are recorded as NA (27.1% of all values) for various reasons. #first few records of presvote2012_x field head(anes$presvote2012_x) ## [1] <NA> Barack Obama Barack Obama Barack Obama Barack Obama ## [6] <NA> ## Levels: Barack Obama Mitt Romney Other #Summary of interest_presvote2012_x field VotePat2012<-anes$presvote2012_x
  • 3. VotePat2012Sumry<-summary(VotePat2012) VotePat2012Sumry ## Barack Obama Mitt Romney Other NA's ## 2496 1692 118 1608 Note: The Republican candidates for 2008 and 2012 were not the same (John McCain in 2008 and Mitt Romney in 2012). For this study, it is assumed the two values to be same. Such a response represents a class of voters who do not change their voting preference (to become pro-president) between elections as a result of the President's work during his term. 3. sample_state: SAMPLE- State of Respondent address (used for exploratory analysis) The third variable used in this study is the state from which the respondent comes.This is a non ordinal categorical variable with 51 levels, each level being a state from the USA. #first few records of sample_state field head(anes$sample_state) ## [1] AL AL AL AL AL AL ## 51 Levels: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA ... WY The total data set has responses from 5914 American voters. Responses which have value of NA recorded in any of the fields of interest have been ignored for the exploratory and statistical analysis in this study. Exploratory data analysis: The following bar plot depicts the number of respondents in the study who voted for the differnt Presidential Candidates in 2008. Visually it depicts that most respondents (2704 to be precise) voted for Barack Obama. In terms of percentage, 59.82% of valid responses were for Barack Obama, 37.65% were for John McCain and 2.52% were for Others. Here valid responses are those responses which have not been categorized as NA for this variable. This response pattern is inline with the outcome of the 2008 Presidential elections in which Barack Obama had emerged as the winner. Of the 61.6% of eligible Americans that had cast their vote, Barack Obama had secured 52.9% votes, John McCain had secured 45.7% votes and others had secured 1.4% votes (source of data is Wikipedia page - http://en.wikipedia.org/wiki/United_States_presidential_election,_2008). There is a significant observed variation between the observed sample proportions and the actual reported proportions. Note : All "NA" values have been omitted for each variable under consideration for the purpose of this exploratory analysis. #Bar plot distribution of interest_whovote2008 field barplot(VotePat2008Sumry[1:3],main = "Voting Pattern of Respondents in 2008",xlab="Name of Candidates",ylab="Number of Respondents")
  • 4. The following bar plot depicts the number of respondents in the study who voted for the differnt Presidential Candidates in 2012. Visually it depicts that most respondents (2496 to be precise) voted for Barack Obama. In terms of percentage, 57.97% of valid responses were for Barack Obama, 39.29% were for Mitt Romney and 2.74% were for Others. Here valid responses are those responses which have not been categorized as NA for this variable. This response pattern is inline with the outcome of the 2008 Presidential elections in which Barack Obama had emerged as the winner. Of the 58.2% of eligible Americans that had cast their vote, Barack Obama had secured 51.1% votes, Mitt Romney had secured 47.2% votes and others had secured 1.7% votes (source of data is Wikipedia page - http://en.wikipedia.org/wiki/United_States_presidential_election,_2012). It is also interesting to observe here that the difference between the voting patterns in 2008 and 2012 of respondents is +/- 2% which is significant but small. There is a significant observed variation between the observed sample proportions and the actual reported proportions. #Bar plot distribution of presvote2012_x field barplot(VotePat2012Sumry[1:3],main = "Voting Pattern of Respondents in 2012",xlab="Name of Candidates",ylab="Number of Respondents")
  • 5. Now I move on to the comparison of the sample voting patterns of 2008 and 2012 by looking at the below contingency table. The Y-Axis here depicts the sample voting patterns for 2008 elections and sample voting patterns for 2012 elections. In this table, the diagnal values represents respondents whose voting behaviour did not change in the 2008 and 2012 elections. Just by looking at the data I can see that a very high proportion of respondents had similar voting preferences both in 2008 and 2012. This is indicative of the fact that voting preferences do not change significantly amongst voters between elections. #Contingency table to compare voting patterns between 2008 and 2012 ContingencyTabVote<-table(VotePat2008,VotePat2012) ContingencyTabVote ## VotePat2012 ## VotePat2008 Barack Obama Mitt Romney Other ## Barack Obama 2077 184 34 ## John Mccain 95 1325 35 ## Other {Specify} 22 32 37 The following bar plot depicts the state wise voting pattern of the respondents for 2008 elections. Visually it is evident that voting patterns vary from state to state. One limitation to this inference is that the number of respondents available for each state is not the same and for certain states the number of respondents is very less. However this inference is inline with the traditional view about certain states in the US being affiliated to certain political parties (democrat/republican). #Compare statewise voting patterns in 2008 ContingencyTabPat2008<-table(VotePat2008,RespState)
  • 6. barplot(ContingencyTabPat2008,legend=rownames(ContingencyTabPat2008),main = "State Wise Voting Pattern of Respondents in 2008",xlab="States",ylab="Number of Respondents") The following bar plot depicts the state wise voting pattern of the respondents for 2012 elections. Visually this plot also depicts that voting patterns vary from state to state. There is also a certain consistency in the voting patterns between this plot and the previous plot. This could possibly be attributed to the fact that it is the same respondent who has been sampled for getting responses of 2008 and 2012. However this inference is inline with the traditional view about certain states in the US being affiliated to certain political parties (democrat/republican). #Compare statewise voting patterns in 2012 ContingencyTabPat2012<-table(VotePat2012,RespState) barplot(ContingencyTabPat2012,legend=rownames(ContingencyTabPat2012),main = "State Wise Voting Pattern of Respondents in 2012",xlab="States",ylab="Number of Respondents")
  • 7. Inference: For the statistical analysis, my objective is to compare 2 paired categorical variables which depict the voting behaviour of respondents in the 2008 and 2012 elections. Let us first proceed with the 95% confidence interval analysis. Confidence Interval The statistical parameter of interest chosen for this purpose is the proportion. The objective of this analysis is to find the 95% confidence interval for the proportion of voters who have voted for either the president (voted Barack Obama in both elections) or not the president (voted John McCain - 2008 and voted Mitt Romney - 2012, voted other in both elections) in the two elections. In other word this is the proportion of voters whose voting pattern has remained consistent in the two elections. I begin this analysis by converting the available sample data from 2008 and 2012 to the same levels for comparison. The levels chosen for our analyisis are President, Not President and Other. Even though the category other also constitutes a vote not for Barack Obama, for now I have classified them separately. #Converting voter data into the same levels for 2008 and 2012 Vote2008<-revalue(VotePat2008, c("Barack Obama"="President","John Mccain"="Not President","Other {Specify}"="Other")) Vote2012<-revalue(VotePat2012, c("Barack Obama"="President","Mitt Romney"="Not President"))
  • 8. Now I consolidate the two vectors into a single vector that records the comparitive voting pattern in 2008 and 2012. Here if the voting pattern in the two years is the same, a value of 1 is recorded. If the voting pattern in the two years is not the same, a value of 0 is recorded. All responses with NA value are omitted from the final vector. Distribution of the final comparison is depicted in the table output for VoteSamp below. This vector is now a binary variable with output only as success/failure or 1/0, 1 meaning that the respondent voted similarly in 2008 and 2012 and 0 meaning that the respondent did not vote similarly in 2008 and 2012. #Initializing the vector to compare voting between 2008 and 2012 VoteSamp<-rep(NA,length(Vote2008)) #Populating the comparison vector VoteSamp #If value from voter sample in 2008 or 2012 is NA then assign NA. I would later remove these records #If value from voter sample in 2008 or 2012 is equal (ex. Voter cast his vote for Barack Obama in both elections) then assign 1 #If value from voter sample in 2008 or 2012 is not equal (ex. Voter cast his vote for different candidates in two elections) then assign 0 for (i in 1:length(Vote2008)) { if (is.na(Vote2008[i])|is.na(Vote2012[i])) { VoteSamp[i]<-NA } else if (Vote2008[i]!=Vote2012[i]) { VoteSamp[i]<- 0 } else { VoteSamp[i]<- 1 } } #Removing all samples for which response NA has been recorded VoteSamp<-na.omit(VoteSamp) #VoteSamp has now been converted into a binary distribution table(VoteSamp) ## VoteSamp ## 0 1 ## 402 3439 This table shows that 89.53% of the candidates voted similarly between 2008 and 2012. This is the sample proportion (p-hat) of our study. #Calculating sample proportions SampProportionSame = table(VoteSamp)[2]/length(VoteSamp) SampProportionNotSame = table(VoteSamp)[1]/length(VoteSamp) SampProportionSame ## 1 ## 0.8953398
  • 9. Next I construct the sampling distribution of sample proportions based on the available VoteSamp vector. But before I do that, I calculate the number of samples needed for 1% margin of error. For calculating the margin of error, I have assumed worst case scenario for the proportion of success and failure as 50% each. This is mainly because no reference proportions from any reliable past study is available currently. Based on this, the number of samples requied is 9604 #Calculating the number of sample required for 1% margin of error for a 95% confidence interval assuming equal probability of success & failure zvalue<-qnorm(0.975) n=(zvalue^2)*(0.5)*(0.5)/(0.01^2) n ## [1] 9603.647 Now I create the sampling distribution. Number of samples in the sampling distribution is taken as 500 with each sample consisting of 9604 samples. Sampling is done with replacement to ensure independence of each sample. Summary and histogram of the sampling distribution constructed is given below. #Creating the sampling distribution SamplingDistribution_Proportion<-rep(NA,500) for (i in 1:length(SamplingDistribution_Proportion)) { Samp<-sample(VoteSamp,n,replace=TRUE) SamplingDistribution_Proportion[i] = table(Samp)[2]/length(Samp) } #Summary of sampling distribution summary(SamplingDistribution_Proportion) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.8864 0.8931 0.8953 0.8952 0.8971 0.9042 #Histogram of sampling distribution hist(SamplingDistribution_Proportion)
  • 10. From the histogram of the sampling distribution, I can visually see that it is nearly normal and centred around the sample proportion that I had calculated earlier. There is also very less skew on the left or right. But before I apply the central limit theorem, let us evaluate each required condition: 1. Independence - The available sample vector VoteSamp consists of 3841 respondents. This is <10% of the American voter population. During constructing the sampling distribution of proportion, I have sampled with replacement. This has ensured the condition of independence is met for our sampling distribution. 2. Skewness - I have 3439 successes and 402 failures in our sample. That means I have at least 10 successes and 10 failures in our sample and this satisfies the success-failure requirement. With this I can conclude that the sampling distribution for proportion is not skewed and is approximately normal as required by the central limit theorem. By looking at the histogram visually also I can conclude that this condition is met. It is evident that we have adequate number of samples available for our analysis. Consequently we go ahead and use the pvalue method for our analysis (no need for t- distribution here). Now that the conditions have been met, I calculate the mean of the sampling distribution that has been created. #Mean of sampling distribution SamplingDistribution_Proportion_Mean<-mean(SamplingDistribution_Proportion) I also calculate the standard error of the sampling distribution for a 95% confidence interval. Please note I have taken the sample size 'n' such that this standard error is 1%.
  • 11. #Standard error for 95% confidence interval using P Distribution SE<-(zvalue)*(SamplingDistribution_Proportion_Mean^0.5)*((1- SamplingDistribution_Proportion_Mean)^0.5)/(n^0.5) SE ## [1] 0.006125668 Confidence interval is calculated as sampling distribution mean +/- standard error of the sampling distribution LowerConfidenceLimit <- SamplingDistribution_Proportion_Mean - SE UpperConfidenceLimit <- SamplingDistribution_Proportion_Mean + SE ConfidenceIntervalP <- c(LowerConfidenceLimit,UpperConfidenceLimit) ConfidenceIntervalP ## [1] 0.8890837 0.9013351 Hypothesis Testing Now I move on to the hypothesis testing. I want to check the possibility that population proportion for the number of people who voted similarly in the 2008 and 2012 elections is 91%. For this I construct the following hypothesis: Ho = PopulationProportion = 91% Ha = PopulationProportion != 91% For this hypothesis, we will be performing a two side test for the normal distribution. We will reuse the sampling distribution already created earlier for proportion and it has already been established that this distribution is nearly normal. First we calculate the z value for the null value of 91%: NullVal<-0.91 zvalue<-(SamplingDistribution_Proportion_Mean-NullVal)/SE zvalue ## [1] -2.414526 Next I calculate the associated p value for this z score. Note that since this is a two tailed test. Area of interest in this case is the area under the curve for which zscore > abs(zvalue) Below I calculate the pvalue for this area and I would be multiplying the obtained p value by 2 for the two tailed test. pval<-2*pnorm(zvalue) pval ## [1] 0.01575569
  • 12. Since the obtained p value is very small (1.5%), we can reject the null hypothesis that the population proportion of Americans who voted similarly in the 2008 and 2012 US Presidential elections is 91% or more. The result of the hypothesis test is inline with the 95% confidence interval we have earlier identified. Hence we can say that the two findings are consistent. Conclusion: Based on this analysis, I can conclude with 95% confident that 88.9% to 90.1% of all Americans vote consistently between the first and second term of a presidential election with 1% margin of error. For the hypothesis testing, we can reject the null hypothesis that the population proportion of Americans who voted similarly in the 2008 and 2012 US Presidential elections is 91%. In the future, this methodology can be repeated for multiple pairs of US Presidential elections to see if there is any statistical consistency in the findings over the years. The main learning out of this excercise has been a practical insight into how statistical techniques can be used to strengthen our ability to draw conclusions and inferences. Citation: Full details of this data set is available in the following links: Information on the study http://www.electionstudies.org/ Study Codebook https://d396qusza40orc.cloudfront.net/statistics%2Fproject%2Fanes1.html Data Set Used http://bit.ly/dasi_anes_data Additionally following wikipedia links have also been referenced for checking the actual result of the presidential election in 2008 and 2012: http://en.wikipedia.org/wiki/United_States_presidential_election,_2008 http://en.wikipedia.org/wiki/United_States_presidential_election,_2012