Pointlogic Analysis Data Fusion

A Pointlogic White Paper
Data Fusion:
Combining Multiple Analysis

Susanne Hartog-Buijtenhek

enabling smart decisions
www.pointlogic.com

2 Data Fusion

Preface
Nowadays, a lot of money is spent on advertisement on a yearly basis. For
advertisers it is important to know what the pay-off of their advertisement
will be. Therefore, it is important to know how many people will see the
advertisement (or: how many people will be reached). Several respondent
researches are available to fulfill this need for information. For example, the
reach of magazines and newspapers is measured by print researches. In a
print research, a so-called ‘reading probability’ is available for every
respondent. This ‘reading probability’ serves as an indicator in computing
the reach.

In contrast, the reach of websites is measured by an internet research that
tracks the behavior of internet respondents. The results are used to
compute the probability that respondents visit a certain website in a certain
period. The resulting data is published by independent agencies and serves
as the currency in the market.

Advertisers show an increasing demand for combined reach figures. This is
a result from the increased use of several media in a single advertisement
campaign. Moreover, publishers of print media often have an accompanying
website. Hence, the question is: who is reached by both an advertisement
in a magazine/paper as well as an advertisement on the internet?

Data fusion
A combined research, with information on both print reach and internet
reach, could be created by setting up a research that contains information
about print reach as well as internet reach. Nevertheless, this is not cost-
efficient. An alternative method is to complement the print research with
information about the internet reach. This is done by a mathematical
technique that uses overlapping information, i.e. information from both
analyses. This technique is called Data fusion.

Data fusion combines the information of two analyses by using overlapping
information. We used one of the data fusion techniques for generating
combined print and internet data. This data fusion technique is related to a
well-known statistical technique named “Imputation”. Imputation is used for
complementing data in a dataset. Basically, the print research can be seen
as a research that misses some data.

The data fusion method consists of two sequential steps. At first,
econometric models need to be estimated, based on the respondent data of
the internet research. Secondly, these models need to be applied on the
respondents of the print research. Since the dataset contains a large
amount of websites, both steps are accomplished fully automatically.

www.pointlogic.com

3 Data Fusion

The estimation of models
The first step in data fusion is estimating econometric models that explain
the internet behavior of respondents in the internet analysis.

The estimation of models based on internet data is not really a
straightforward process. The data of the reach of websites namely have a
specific character. For most of the websites, respondents have a reading
probability equal to zero. Regarding the people with a reading probability
greater than zero, a substantial amount still has a probability almost equal
to zero (or: very small). This structure impedes the use of a standard
regression model and hence a more sophisticated model has to be chosen.
The overlapping information can be used for the explanatory variables.
However, there is also a high mutual correlation between the visiting
probabilities and the websites. If only the overlapping information is taken
into account, the correlation between the websites is ignored. By taking the
websites into account as an explanatory variable, the correlation between
the websites can be included. The challenge in this method lies in the
application of the models. When applying a model for a website, based on
the respondent data of a print research, it is still lacking information
concerning other websites. The answer to this problem is an iterative
technique named Gibbs’ Sampler, which will be discussed later on.

The models need to be estimated for over 300 websites, each having
different characteristics. Considering the extensive amount of
characteristics, this will not be done manually. Therefore, we have
developed a self-evident estimation procedure, which makes a selection of
interesting explanatory variables per website, based on the underlying
correlation and the underlying partial correlation.

Applying models
After the estimation of the models, the models have to be applied. Before
actually applying the models, a starting-value is created for every
respondent for every website. This starting-value forms the basic principle
for the Gibbs’ Sampler implementation method.

The models of the internet research have other websites as explanatory
variables too. By initiating a starting-value, websites can be used as
explanatory variables in the implementation process. Subsequently, the
implementation of the models occurs iteratively. The initialized starting-
value changes during every iteration, which is being carried through in the
model. For convergence issues, it is important that the amount of websites
included in the model is limited.

By applying the models, one could choose to impute the expected value per
respondent. But, in order to maintain the variance in visiting probabilities, a
better alternative is the imputation of a-select drawings from the probability
distribution for each respondent based on the models.
www.pointlogic.com

4 Data Fusion

Results
The results of the data fusion technique have been extensively validated.
This was possible, since a section of respondents were present in both
researches1. The presence of both the true values and the imputed values
for these respondents generates an unbiased way to validate the model’s
results, which were remarkably positive.

Qualitative validations are at least as important as quantitative validations.
From a mathematical point of view, the results contain the average reach as
well as the variance of the unbiased estimators. The comparison of the
overlapping respondents provides another validation for the results of the
used method. Unfortunately, this does not automatically mean the analysis
is accepted by market. Other validations are necessary for common
acceptance.

Two executed validations are the judgment of the model’s used significant
explanatory variables and the final combined scope data. The used variables
simply need to be ‘logical’.

However, the most important thing is whether the final overlap is being
recognized and experienced as logical by the publishers. Some publishers
strive to make the overlap as small as possible and therefore attract a
different public. Others have the goal to have the overlap as large as
possible, which is realized by, for example, placing a reference to a website
in a magazine. If the final overlap is recognized, is an important part of the
acceptance and hence the validation.

Future
To conclude, the column-wise data fusion method provides very good
results for combined print-internet reach. This method will be, based on
principles of new print and internet analyses, used semi-annually in order to
generate a combined analysis file.

These results make it obviously desirable to test the methodology on other
analyses. By doing this, the method can quite easy be extended by new
model formulations, which can then be used in determining combined reach
with other media.

1
Both analyses come from the same agency.
www.pointlogic.com

5 Data Fusion

About Pointlogic | enabling smart decisions
Founded in 1992 by Peter Kloprogge and Sjoerd Mostert - with offices
in New York, London, Frankfurt, Sydney, Amsterdam, and Rotterdam
- Pointlogic combines cutting-edge research, advanced mathematical
modeling, and flexible software tools to enable our clients to make
smart decisions.

Pointlogic works together with clients, applying fresh, analytical
thinking to problems. We then use powerful mathematical modeling
to generate insight into clients’ choices. And then, most importantly,
we deliver concrete, software-based solutions that clients can both
implement and distribute across internal and partner networks.

For more information about any of Pointlogic’s products or for
press inquiries please contact Nicole Alexander:

Office: 212-683-2330
E-Mail: alexander@pointlogic.com

www.pointlogic.com

Pointlogic Analysis Data Fusion

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Pointlogic Analysis Data Fusion