A white paper sharing the TrueSample team's perspective on the next wave of issues impacting research quality and how the online market research industry must evolve to address them.
1. Online Research Quality:
The Next Frontier
A TrueSample Perspective
Yes, technology has already improved online research quality dramatically. But
data quality issues persist—and new challenges are emerging. The advent of
real-time sampling techniques, the proliferation of mobile devices, and the
soaring popularity of social media have created new, more nuanced questions
about data quality—and increased demand for new approaches and solutions.
How will your company—and our industry—respond? Here are key questions
every buyer of online sample should be asking today, along with the
TrueSample approach to delivering next-generation data quality solutions.
A TrueSample White Paper 1
2. Online Research Quality: The Next Frontier
Is Online Data Quality Still a Problem?
Since the beginning of the new millennium there have been nagging suspicions that the data
generated from online panels can’t be completely trusted. The risk of fake or duplicate respondents,
habitually unengaged panelists, professional survey takers, “gamers,” “straightliners,” and
“satisficers,” those who provide unusually positive responses, have all raised concerns. While some
of these problems have been largely tamed by technology or controlled by panel management best
practices, some issues persist and others grow more imminent. For example:
Under- or over-representation of certain groups in online panels introduces bias that can
Clearly, the market
impact data quality.
research industry has
Declining use of email creates new challenges for reaching viable survey respondents and raised the bar on what is
delivering engaging survey design. considered acceptable
Biases due to panel tenure or membership in multiple panels cast doubt on the ability to quality for online
research sample.
achieve consistent survey results over time.
Although the largest
The key question today is whether—and to what extent—these issues impact research results and issues have been tamed,
business decisions. This paper addresses the question in two ways. First, to provide context, it more nuanced questions
briefly summarizes how the industry has responded to data quality issues over the past few years. about data quality
Then it examines emerging issues and describes the approach the TrueSample team is taking to continue to emerge—and
maximize data quality in the years ahead. must be addressed.
How Has the Industry Addressed Data Quality Concerns?
Since the advent of the online era, market research firms and sample providers have responded—
though not always in a coordinated way—to stamp out the threat of unreliable data. The first step
was to quantify how much bad data and “bad actors” were influencing research results.
In a detailed analysis of a typical online panel, the Advertising Research Foundation (ARF) found
they could identify 20% corrupt sample, meaning respondents who exhibit “bad behaviors”.1 The
TrueSample team’s own research showed that relying on data from “bad” respondents who are
found to be fake, duplicate or unengaged increased the risk of making the wrong decision by as
much as 50%. These and other findings prompted action by the market research industry, and a
number of efforts emerged:
Research vendors developed their own manual data cleaning and data weighting
techniques.
Machine fingerprinting and identity validation technologies used within other industries were
applied to online panels.
TrueSample was introduced to eliminate fake, duplicate and unengaged survey
respondents (see Figure 1).
The Advertising Research Foundation developed the Quality Enhancement Process to help
clients and research vendors engage in structured conversations about online data quality.
The TrueSample Quality Council issued Online Research Quality Guidelines for all research
buyers to follow when choosing vendors (for details see the TrueSample Quality Council’s
Online Consumer Research Quality Guidelines).2
1
Source: “The Online Panel Quality Wars,” by Brad Bortner, Forrester Research, November 20, 2009, footnote 7.
2
URL: http://www.slideshare.net/TrueSample/online-consumer-research-quality-guidelines
A TrueSample White Paper
3. Online Research Quality: The Next Frontier
Clearly, the market research industry has raised the bar on what is considered acceptable quality for
online research sample. In the process it has made sample buyers more confident in the business
decisions that derive from market research.
Unfortunately, neither time nor technology stands still. While the largest issues have been tamed,
more nuanced questions about data quality continue to emerge—and must be addressed.
How Does TrueSample Solve or Control Data Quality Issues?
Introduced in 2008, TrueSample is now used by more than 100 research groups and
panel companies to ensure data quality across multiple sample sources and survey
platforms. It uses a combination of real-time technologies to provide:
• Elimination of fakes. TrueSample uses third-party databases to validate all
prospective panelists and survey respondents to guarantee that they are who
they say they are.
• Prevention of duplicates. Sophisticated digital fingerprinting eliminates duplicate
respondents from panels and surveys, ensuring that no individual can take the same
survey twice.
• Assurance of true engagement. Survey engagement technology eliminates
speeders and straightliners in real time, and SurveyScore quantifies the panelist
experience by providing benchmarks of perception and engagement behavior
(for details see www.truesample.com) .
Not real
24.3%
Not unique
2.8%
Not engaged
TrueSample 1.65%
71.25%
Figure 1: An average of 28.8% of panelists are rejected by TrueSample.
A TrueSample White Paper
4. Online Research Quality: The Next Frontier
Where Should the Research Industry Focus Now?
To continue to improve the quality of online research—and to fully exploit the new opportunities
the online era holds for market research—the industry should turn its attention to three specific
areas:
1. Real-time and social media sampling methods
In response to declining online panel membership and email usage, and as a means to solicit
feedback from hard-to-reach groups such as 18-24 year olds, many researchers are turning to
websites and social networks as an active recruitment source for surveys. Soliciting potential
survey takers while they are visiting websites (real-time sampling, also known as river sampling)
and sampling from web-based social networking sites may provide fast access to very specific Soliciting potential
groups, users, and demographics, but it also introduces new questions about how to ensure data survey takers while they
quality. For example: are visiting websites
Do real-time survey respondents answer surveys differently than respondents (real-time sampling, also
known as river sampling)
sourced from online panels?
and sampling from web-
Will survey takers sourced from real-time sample or social media sample provide based social networking
their names and addresses for address verification, or are alternative means of sites may provide fast
identity verification required? access to very specific
groups, users, and
Will real-time respondents take the 20- to 30-minute surveys market researchers demographics, but it also
typically design, or should surveys be redesigned in shorter formats for real-time introduces new
participants? questions about how to
ensure data quality.
In answer to the first question, preliminary research by the TrueSample team shows that real-time
survey takers exhibit the same satisficing behavior as newer panelists, meaning that they provide
unusually positive responses, even if the real-time survey takers are also on many other panels
and have long panel tenures. This satisficing behavior can introduce bias into research results
unless the correct data quality measures and tenure balances are put in place.
Equally important, researchers suspect that survey takers on the web or social media networks
may be less tolerant of long, complicated surveys that may interrupt their online experience.
Therefore respondent engagement for this segment needs to be measured and benchmarked to
determine if survey design must be recalibrated.
There are also concerns that real-time survey takers may not be willing to provide name and
address information before taking surveys because it feels like a privacy violation—thereby
eliminating the ability to validate the respondents’ identities.
So what does all of this mean in terms of data quality solution requirements? The TrueSample
team anticipates that sample buyers will demand solutions that deliver the following:
Consistent quality assurance when blending sampling methods. To address
some of the tenure- and panel-membership-related biases present in real-time
samples, researchers will likely need to blend online or offline panelists with real-
time respondents, to reach a more balanced and representative sample. This
sample will need to be “cleansed” using a data quality solution that can be
consistently applied across all sampling methods and ensures that all respondents
are real, unique and engaged.
A TrueSample White Paper
5. Online Research Quality: The Next Frontier
Mechanisms for measuring and improving respondent engagement. Surveys will need to
be optimized to effectively engage specific types of survey takers. Research has already led
to the development of TrueSample SurveyScore® and SurveyScore® Predictor, which help to
optimize online survey design to achieve the highest engagement levels among respondents,
but these tools must be applied to real-time and social media sampling techniques for
measurement and benchmarking that is specific to the respondent audiences of these
sampling methods.
Creative use of profile data for identity validation. A great deal of identity verification data
already exists online (see Figure 2). The industry will need to get creative about using “social
sign-on” and other existing profile information to validate respondents’ identities, rather than
ask for name and address information during a survey.
Profile Data Available on Social Networks, May 2010
Facebook Twitter Yahoo! Google MySpace Linkedin Aol
Name
Email
Nickname
Photo
Profile URL
Birthday
Gender
Location
Social graph
Additional profile
information
Source: Gigya, Multiple Identities, July 7, 2010.
Figure 2: A variety of identity verification data exists in online profiles.
A TrueSample White Paper
6. Online Research Quality: The Next Frontier
2. Mobile survey modalities
The gizmos people use to access the Internet and communicate with each other—and with
market researchers—are evolving at a jaw-dropping rate. iPads, Android phones, Nook readers,
Kindles, Netbooks, and whatever’s next on the horizon all point to the development of a new set
of survey modalities that will impact the quality of market research data.
Early adopters of these devices tend to be younger and more af pre-teens (the emerging
generation of survey-takers), online chats and text messaging have supplanted email as the
preferred communication vehicle. Additionally, adoption of mobile communications is
accelerated in hard-to-reach European and Asian markets. For these reasons, market
If respondents using
researchers are starting to pay attention to mobile devices as a mechanism for collecting
quantitative survey feedback. mobile devices and
tablets differ from those
The key questions that need to be addressed: respondents using
computers, we will need
How do we get people to take surveys on mobile phones and other instant access to account for those
platforms when it’s inevitable that their attention will be fragmented by the other demographic differences
activities they pursue on these devices? to prevent biased results.
How is representivity affected when they do respond to our surveys?
How do we optimize survey design to maximize engagement on these devices?
What capabilities should next-generation data quality solutions provide to achieve reliable
quality in mobile surveys?
Mode-based sample balancing. If respondents using mobile devices and tablets differ
from those respondents using computers, we will need to account for those demographic
differences to prevent biased results. Next-generation data quality solutions will need to
help researchers blend and balance sample using different modalities to achieve
representativeness.
Mechanisms for measuring and improving respondent engagement. Surveys will need
to be optimized for the engagement of specific types of survey modalities. SurveyScore and
SurveyScore Predictor, two features of TrueSample, help to optimize online survey design
to maximize engagement levels among online respondents, but new norms and predictive
models must be built using mobile survey data to bring these same measurement and
benchmarking capabilities to mobile survey-taking.
A TrueSample White Paper
7. Online Research Quality: The Next Frontier
3. Ongoing concerns over representivity
According to Forrester Research, online panel-based research is now the dominant mode for quantitative
research. But questions linger about how representative online panelists really are and whether or not
we’ve exacerbated the problem with data quality solutions that verify identities using consumer
databases.
For example:
Is there something inherently different about the types of people who join certain
online panels?
Do these differences impact their survey responses?
Do data quality solutions increase these biases by rejecting particular types of
respondents in greater numbers?
The TrueSample team has identified three key issues related to the representivity of online
panelists that can be addressed by a next-generation data quality solution. First, the length of
time panelists have belonged to an online panel, or their “panel tenure” may impact those
panelists’ survey responses. Specifically, the newer panelists are to a panel, the more likely they
are to “satisfice” or provide unusually positive responses.
Second, the number of online panels that panelists belong to or their “panel membership” can
impact their responses and can increase their likelihood for survey-taking hyperactivity.
TrueSample research has shown that multi-panel members show a higher score bias, meaning
that they provide more positive responses than single-panel members and may thereby impact
the reliability of research results.
Third, there is clear evidence of underrepresentation of certain demographic groups within
online panels. For example, 18-24-year-olds and Hispanics are historically hard to find in online
panels. This underrepresentation is aggravated by traditional identity validation techniques
because these are “high-velocity” segments; in other words both groups tend to move and
change their address more frequently than other segments. So using name and mailing address
to validate identity may not be a good test for panel inclusion in these groups, because it causes
them to fail the “real” test in disproportionate numbers.
120.0% 120.0%
Real Not Real Duplicate Overall Real % Real Not Real Duplicate Overall Real %
100.0% 100.0%
80.0% 80.0%
60.0% 60.0%
40.0% 40.0%
20.0% 20.0%
0.0% 0.0%
18-24 25-34 35-44 45-54 55-64 65+ 0 1 2 3 4 6 5
(White) (Black) (Native (Asian (Other) (Decline (Hispanic)
American) & Pacific to answer)
Islander)
Figure 3: TrueSample pass-through rates by age (left) and race (right) indicate that 18-24-
year-olds and Hispanics fail identity validation more frequently than other segments.
A TrueSample White Paper
8. Online Research Quality: The Next Frontier
The interrelationships between these three issues are complex, and quantification of the
impact on market research quality—individually or collectively—remains incomplete.
However, it is clear that next-generation data solutions will need to evolve to address
inconsistencies with sample representivity. Specifically, next-generation data quality
solutions will require the following attributes:
New data sources for identify validation. To reduce the likelihood of falsely rejecting survey
respondents who may be “real” but can’t be validated due to frequently changing addresses or a
lack of offline identify information, data quality solutions must employ additional data sources for
identify validation using attributes such as email addresses and social networking profile data.
TrueSample has begun to use additional data sources to validate offline and online identities and to
reduce over-rejection particularly in high-velocity demographics such as 18-24 year olds and
Hispanics.
Balancing on panelist tenure and behavior. Data quality solutions such as TrueSample already
allow users to evaluate the blend of panelists by their tenure and panel membership, and can allow
users to filter by sample source to break out validation results for respondents from each individual
sample source; however, additional advancements are needed. The next step toward mitigating the
potential impact of “high-velocity” segments will be to provide sophisticated panelist behavior
modeling so that sample can be proactively balanced on tenure, memberships, and survey taking
frequency for consistent research results.
The Questions Will Keep Coming. So Will the Answers.
The latest wave of technological innovation presents exciting new opportunities for quantitative market
research, but the industry needs better quality control mechanisms to fully exploit those opportunities.
Questions and concerns about data quality will continue to evolve. No one can claim to have all the answers,
but our goal is to ask the right questions and explore the right avenues as we continue to guide the industry in
assuring the highest possible data quality.
A TrueSample White Paper