Session sponsored by Choicestream: How to keep noise from drowning your data, Digiday Brand Summit, December 2016

My name is Matt Rosenberg and I’m CMO at an ad tech company called ChoiceStream. I will not shock the room by saying that when brands spend money to advertise, it’s crucial to place your ad in front of the audience it’s created for. Right? Duh? And in this world of media fragmentation, it’s an onerous task to scale up an audience from individual publishers. Programmatic is increasingly the answer to that, finding your audience wherever they happen to be. It’s data that enables that. Most of the time data you rent rather than data you own. If you’re doing all your ad targeting based on your own first party date, you can tune out for the next twelve minutes. But if, like most brands, you depend on your vendors to use really good data to target your desired audience, stay with me for a few minutes. Because the data your vendors are providing isn’t very good.
Most of that data comes from inferences made based on people’s web browsing that identifies the kind of content you’re seeing, the kinds of actions you’re taking, and much more. You go to a site that helps you pick baby names, you’re probably an expectant mother. Or are you? Could be dad, could be gramma, could be your best friend is thinking of naming their kid Emma and you want to prove to her that there are just too damn many Emma’s already. But if your cookie shows up later at a site about infant nutrition, score one more for the idea that you’re an expecting mom. So it’s inferences, but theoretically inferences that are validated to a level of certainty by multiple indicators. As you add in more data points, you get a sharper picture of your user…but as you do this, you also may reduce the size of your audience.
Advertisers need scale, and as a data vendor, if you can’t provide that, no one will buy your segment. Companies in the media business can’t make money if there’s no scale. But too much scale is like playing music too loud – you introduce distortion. One key issue in our industry is there’s been a willingness to sacrifice fidelity for scale, in the name of generating revenue. Think about it from the data provider’s perspective. If you can get 300 Thousand people in a group with 95% confidence that they belong there, or 30 million people in a group with 60% confidence, well, it might not be such a hard decision to relax your model a bit, especially when no one is set up to audit you.
Now, lots of folks here at a Digiday conference have probably used data for targeting. So let me ask you - how would you grade the quality of the data you have access to? A – F? I’d give it about a C – because it’s a combination of As and Fs. The right audiences are in there – it’s just hard to find them among the noisy data. How noisy? Well, as a programmatic media company, we have access to a lot of different sources of data, the companies that virtually any DSP or ad network work with. So we did a couple of tests.
We looked at several very commonly used vendors of audience data that are available to all programmatic companies. We took a random sample of 10 thousand users who were tagged as males. And we found [CLICK] that as many as 84 percent of those same users were also tagged as female by the same vendor. So this data even internally is very inconsistent. So the answer must be simply to find the most consistent data source and just use that so that you don’t run into one company that can’t tell the difference between boys and girls. Well… we tried to do that.
We took the two most consistent vendors, the folks who had the least males also tagged as females. We took over 100k random males from vendor A who also had any gender data in vendor B. [CLICK] And when we looked at that data, we found that fully one third of vendor A males were tagged as females by Vendor B. Now, this is 2016 and we now know that gender exists along a continuum... But seriously... What we’re seeing here is that different inference-based methodologies can yield very different conclusions. And that is one of the big issues with audience data. The data is drowning in noise. The right people ARE in there – but a lot of the wrong people are in there too and you can't tell who is who. And it gets even trickier when you get past the supposedly simple things like demographics and into more interesting segmentation around interests and psychographics. So what do you do?
You need to interrogate the data. What do I mean when I say that? First don’t trust that it is what it says it is. A segment called Fashionistas should not be assumed to contain only people interested in fashion. I’ll tell you one of the things ChoiceStream does to interrogate the data, and I’m not going to go deeply into it because I’m not standing up here as an ad for ChoiceStream. Though salespeople are standing by. Part of our practice is that we have a consumer facing website called Pollshare.com where we ask people questions, many segmentation questions. How interested in fashion are you: love it, like it, neutral, not interested. From the love its and like its, we build a model – what do those cookies have in common. When we see a cookie in some third party fashionista segment come through, we compare it with our model and any that don’t look like fashionistas to us get blocked. We actually did an experiment last month. Asked that question about fashion interest, then partnered with GFK to ask it again to a third party segment and again to that same segment but with our model as a filter.
So, you start off buying a fashionista segment that you assume is all people interested in fashion. These 3rd party segments might look at users who visit Refinery29 and Vogue and Chanel.com and say look, these are users interested in fashion. [CLICK] But when you run a survey into that segment, you find that only about 45% of it says they like or love fashion. 55% of that group is neutral or uninterested. [CLICK] Now use the filter of asking people and the audience is now 60% top two boxes, a 15% improvement in total audience and a 30% improvement on the segment accuracy. Now, that doesn’t get you all the way to a hundred... And just for the record asking the question is only one thing we do at ChoiceStream to drive that accuracy up. But it’s not nothing either and it only takes checking against an actual source of people who self-identify to do it.
My point is that when you just trust the data and don’t take steps to filter out the noise, you will drown in the noise. Asking people is not the only way to go, and it’s not even the only thing we do. But without interrogating the data, without doing something, you are in a world of waste. What do you do really? If you do your media planning in house, you can do direct tests to figure this out. Pin your vendors to the wall and really get into what they are doing to validate the data they’re selling you. You’ll be shocked. Most of you probably use agencies. Do not assume that your agencies are doing this work for you. Big dirty secret is that no one has any incentive to question the data. They summarize your target audience in an RFP and their vendors will often respond with a targeting story the beginning, middle, and end of which are the name of a third party segment. Don’t let them do that, don’t let their vendors do that. It takes time and effort, but go to vendor meetings with them. Get into the nitty gritty on where the targeting data is coming from. Have them set up media tests to see where the quality is for your various segments. Don’t let anyone in your ecosystem out of their responsibility to separate the signal from the noise.
Happy to take any questions.