This document discusses ways that pollsters could improve their election predictions in the future based on failures in the 2016 US presidential election. It identifies several issues with polls in 2016, including relying on small sample sizes, outdated polling methods, and neglecting online data sources. It then provides eight fixes for pollsters, such as applying robust statistical models, incorporating modern technology and online data, accurately assessing the impact of scandals, and utilizing digital advertising data from Facebook. The overall message is that pollsters need to embrace new data sources and statistical methods to make more accurate predictions.
2. As data scientists, we feel that pollsters mislead us this year.
We spoke with others about this and they tend to feel mislead
in much the same way. In our research, we found multiple data
sources and valid reasons to take issue with a majority of the
mainstream election polls, as much of what we found pointed
to a Trump victory.
If professional election pollsters seek to stay relevant and at
the same time provide trustworthy predictions, major work
must be done.
This work is designed to help explain some of the pitfalls of this
year’s polling efforts and provide a few key fixes that may assist
the pollsters when making future election predicitons.
3. Professionals can avoid sounding smug, eating bugs, and disappointing
those that still have faith in polling predictions by taking into account some
of our findings for the next election season.
Data Sources: http://www.cnn.com/videos/politics/2016/11/12/pollster-eats-bug-after-trump-win-smerconish.cnn
4. HOW COULD DONALD TRUMP WIN THE 2016
PRESIDENTIAL ELECTION, DESPITE MOST NATIONWIDE
POLLS PREDICTING A HILLARY CLINTON WIN?
CANDIDATES
or
2016
Election
5. The Problem with Pollsters in 2016
Predictions were typically wrong by significant margins
Predictive Models were proven untrustworthy
Implemented Quasi-scientific and/or antiquated polling methods
Sampled small populations and generalized the data nationwide
Neglected big data resources like search engines and social networks
Failed to recognized the full impact that Social and Mobile technology
would have on local voting patterns
Encountering shift of information sharing from TV/Radio/Print to web
6. Many Polls Predicted Hillary Clinton Winning
Data Sources: https://www.yahoo.com/news/clinton-narrowly-leading-trump-on-election-eve-polls-164855002.html
https://www.monmouth.edu/polling-institute/reports/www.foxnews.com www.theupshot.com www.usatoday.com
7. Excerpt from “3 Unlikely Ways Trump Could Win” by Fortune
Source: http://fortune.com/2016/10/17/donald-trump-polls-win-lose/
10. We found 3 Pollsters that had Accurate Predictions
1
2
3
Investor’s Business Daily (IBD)/TechnoMetrica Market Intelligence (TIPP)
USC/ LOS ANGELES TIMES
THE PRIMARY MODEL BY Prof. Helmut Norpoth
1
2
3
Comparing these three, Dr
Norpoth was the most
certain with an 87-99%
degree of confidence in a
Trump victory and he claims
to have a statistical model
for gauging presidential
elections.
A few unnamed pollsters played with a Trump Win scenario but changed to Hillary for election day.
11. WHAT IS THE PRIMARY MODEL?
The forecast issued in March
2016, of a near-certain Trump
victory at a moment when he was
trailing both Clinton and Bernie
Sanders in every poll, some by
double-digits, was greeted with a
heavy mix of shock, cheers,
amazement, and derision, much
of it on social media but also
regular media outlets. Many
offered bets against the forecast,
gleeful that it would turn out
wrong. There is nothing to add to
or subtract from the March
forecast here. It was
unconditional, final, and not
subject to updating. Just in case
Hillary Clinton would not be the
Democratic nominee, the Primary
Model gave the nod to Trump
over Bernie Sanders with 99-
percent certainty; forecasts for
Republican nominees other than
Trump were also issued.
The
ODDS IN
GOP
FAVOR!
THE PRIMARY MODEL IS
a statistical model that relies on
presidential primaries and an
election cycle as predictors of the
vote in the general election.
Model
predicted
Democrats
Not
Favored
PREDICTIVE POWER 6X
For the record, the PRIMARY MODEL,
with slight modifications, has correctly
predicted the winner of the popular vote
in all SIX presidential elections since it
was introduced in 1996. In recent
elections the forecast has been issued as
early as January of the election year.
prediction
2016
Source: primarymodel.com
12. Two Key Principles for the Primary Model
As a rule, the candidate with the stronger
primary performance wins against the
candidate with the weaker primary
performance. For elections from 1912 to
2012 the PRIMARY MODEL picks the
winner, albeit retroactively, every time
except in 1960.
Besides primaries the forecast model relies on a swing of the
electoral pendulum, which generates cycles in the vote for
President. Since 1960, the party controlling the White House
has won six of the seven elections after one term while losing
five of six after two terms. During that span of time the
presidential party succeeded but once to win a third term—
with George H. W. Bush in 1988, following two Reagan terms.
After two terms of Democrat Barack Obama in the White the
electoral pendulum was poised to swing back to the
Republicans in 2016.
GOP VOTE
2016
Source: primarymodel.com
13. Professor Norpoth claims that his model can be
applied retroactively all the way back to the first
Presidential Primary that took place 1912. Yielding
near perfect results, Norpoth’s model has only one
incorrect prediction when applied retroactively.
Figure 1. The Democratic Percentage of the Two-
Party Vote in Presidential Elections, 1828-2012.
Figure 2: The Prediction Formula for 2016
Source: primarymodel.com
14. Fix One
Apply robust data driven statistical models
built upon historical facts.
Though the Primary Model is just one method
for pollsters to consider, the value of knowing
historical trends when predicting future
election outcomes cannot be understated.
15. The Primary Model has predictive power.
Yet, not every prediction has proven correct.
The major inaccuracy occurs when applying the
Primary Model retroactively to the 1960 election of
John F Kennedy.
This incorrect prediction, gives us the ability to explore
a few other areas to enhance future polling predictions.
16. Fix Two
The most accurate and precise statistical models
can only offer probabilities.
Never take probabilities for complete certainty and
always be open to tracking data sets even through
election day.
17. When comparing the 1960 election to the 2016
election, we found many commonalities and
one major difference.
18. Voting by the People (via electoral college)
Party Affiliations
Political Platforms
Primaries
Political Swag/Gear
Campaigning State to State
Debates
Rhetoric
Voter Registration
Fund Raising
Endorsements
Paid Advertising
Newspapers & Radio Broadcasts
TV Coverage (since 1950’s)
Pollsters using Phone Surveys
Election Commonalities for 100+ Years
Outdoor billboards
Television
Radio
Public relations
Corporate Raising
Political Swag
Rally Events
Public Opinion
Surveys
19. Today’s disruptive TECHNOLOGY
built upon advancements in
telecommunications and the
internet have reshaped modern
political discourse. The last three
election cycles have been impacted
in ways that were pure science
fiction during JFK’s election in
1960.
The Major Difference:
Social media
Email
Video
Website / online
Mobile Phones
20. There is little doubt that the medium for political content
distribution, consumption and engagement was
maturing at a time that boosted Obama’s messaging and
campaign performance in 2008 and again in 2012.
In 2016, Trump was able to leverage social and mobile
technology with a means that was key to propelling a
political outsider to the presidency.
21. Fix 4:
Pay attention to the dialogues that drive
Political Discourse no matter where they may
occur online, offline, tv, radio and print .
The internet is rapidly creating new
influential mediums for communications that
reach vast audiences of voters, more
frequently and at a lower cost than ever
before. This technology is more powerful
than other forms of dialogue and will not go
away.
22. Pollsters primarily blame recent failures on two factors: "the
growth of cellphones and the decline in people willing to answer
surveys," says political scientist Cliff Zukin, former president of
the American Association for Public Opinion Research. Ten
years ago, about 6 percent of Americans relied primarily on
cellphones; by 2014 that figure had jumped to 60 percent. That
caused problems for opinion researchers, who typically polled
by making automated "robocalls" to random landline exchanges
and then, when people picked up, passing them to a live
interviewer. "To complete a 1,000-person survey, it's not unusual
to have to dial more than 20,000 random numbers," Zukin says.
Federal law, however, prohibits autodialing cellphones — which
means paid interviewers have to make calls manually, which can
be prohibitively time-consuming and expensive. As a result,
some organizations make compromises, such as leaning too
heavily on landline surveys, which can skew results.
Source: ‘The Problem with Polls" http://theweek.com/articles/617109/problem-polls
23. Traditional Polling is DEAD!
Sampling populations continues to be a problematic issue
that pollsters face.
When pollsters generalize results from small sample
populations they simply cannot accurately predict
presidential elections across the entire voting population.
Small samples do not work.
When you can gather larger populations faster, cheaper, and
arguably more aligned with voting behavior using social media -
24. Fix Five
Professionals must incorporate modern
technology into statistical polling predictions
and no longer rely on surveys of small
populations.
25. In 2016, we learned that Bernie was wrong about the emails.
26. Fix Six
Don’t turn a blind eye to the facts and
information outlets, no matter the source.
In the modern area, distribution, social
dialogue, transparency and authenticity will
win elections over denial of misdeeds.
27. In recent years, Anthony Weiner has taught us so much about Sex Scandals. How
many headlines, hashtags, jokes and memes can be influenced by one person?
28. Judging the full impact that Sex Scandals played on public opinion remains difficult for this election.
Both parties were plagued with bitter public Sex Scandals that dominated headlines.
29. Fix Seven
In future elections, Pollsters must accurately
access the full impact that scandals have on
campaigns.
Since this election had multiple sex
scandals from both parties, the scandals
that typically end political careers appear to
be negated.
30. First, let’s take a quick look at how
relevant election data can be garnered
from Website Traffic on Search Engines.
The biggest area of opportunity is based on harnessing
Big Data from Search Engines & Social Networks
Polling predictions can be greatly enhanced by
factoring in key performance indicators such as reach,
traffic, views, likeability, follower count, sentiment,
engagement and search trends.
32. The Big Surge of Donaldjtrump.com to a top 300 website two
days prior to the election should have been a key indicator
for predictions, as the site hillaryclinton.com was never
ranked better than 1100, which is quite substantial in the
week leading up to the election.
ALL INFORMATION
ABOVE IS REPORTED
FROM ALEXA.com
Website
Ranking
Web Rankings can provide Key Indicators for Pollsters to Follow
Data gathered Nov 10-12, 2016
33. Since mainstream news websites
are typically able to cover all news
without strong bias, we decided to
drop the TV related stations from
the comparison and reviewed Right
Wing vs Left Wing websites.
We are making the assumption
that the typical readers of these
websites would most likely identify
with the strong political messaging
and their votes should align with
that website’s political identity.
Though the Right Leaning websites compared are few,
the high concentration of traffic, incredible month over
month growth (wikileaks.org 85% in 30 days), more pages
read per user average, and higher Alexa rankings of these
sites over Left Leaning sites are all solid indicators that
the Right Leaning websites were reaching large
audiences on a daily basis.
Right Wing vs Left Wing Websites
Left Leaning
Right Leaning
34. Online Video & Live Streaming
When comparing YouTube Video
subscriptions CNN is trailing
independent video channels like
infowars.com and subscribers
prefer the hacker group
Anonymous’s channel over many
mainstream media outlets.
Independent Live Streaming Competed with
Mainstream Outlets and typically won out as well
Youtube News Subscribers
35. Twitter, we feel, has made some poor choices by actively
deleting accounts in the last six months due to terms of
use violations. This will make sampling for Twitter less
accurate in the future. However, for the candidates to use
Twitter to amass followers and distribute messaging is a
great resource. Though we did not score tonality nor
sentiment, we wanted show the Follower numbers of the
candidates:
What about Twitter?
Trump has 14.6 Million Followers
Clinton has 10.9 Million Followers
36. The Trump’s Reddit threads were significantly more active
than the Hillary Clintons’ Reddit. The tonality and
sentiment also seemed to favor Trump, but a more accurate
analysis is needed to provide statistical proof.
We found that user forums like Reddit, opened up
investigations into questionable “conspiracy theories” like
“Pizzagate”- which went from obscure forum to
mainstream news when an active shooter entered the
dining establishment that was a key focus of the forum and
thread.
Though not a mainstream news outlet, nor is it a
professionally fact checked resource for news, the
unsubstantiated claims made on Reddit had a long lasting
impact on much of the voting population.
What about Reddit?
37. Is Klout Biased? Probably.
Trump has 14.6 Million Followers
Clinton has 10.9 Million Followers
Klout should be an accurate
source for information gathering
but considering that Trump has a
substantially larger social
footprint in comparison to
Clinton, it is problematic that
they have HRC leading.
http://www.forbes.com/sites/williamarruda/2016/08/07/donald-trump-vs-hillary-clinton-the-social-media-report/#31f9fd7e4f0b
Source: Forbes
38. The Hidden Gem for Pollster Data is: FACEBOOK!!!
Social Proof is readily
available in our Newsfeed
However, that data can be
difficult to measure and can be
anecdotal. The real power
behind Facebook is found in
the ad network!
39. 49%
Facebook’s ad inventory is the most
powerful data that pollsters could use this
election season. As data analysts, we were
able to gather population data and social
affinity overlaps with candidates. Using
Facebook’s ad network, we could find
accurate numbers concerning political
affiliations and even real data on political
contributions. This data can be broken
down nationwide, state -by-state, even by
city and zipcode at a level that survey
based pollsters simply cannot compete.
We found multiple areas in which polls were
not in alignment with the Facebook data we
were able to pull, which gave us many
indications that the election was much
closer than polls foretold.
40. Fix Eight
Some of the smartest people on earth are in digital
ad tech, for next time find these people and hire
them as their ability to filter the ad networks is a
far superior indicator of political voting patterns
than self reported phone calls and traditional
polling manners.
41. Though predicting outcomes before they happen
can be difficult, we hope you have found a few key
areas of opportunity in this presentation to apply
when seeking to make election polls more accurate
in the future.
Thank you.
Christopher Brock Founder of Primary.Hosting