SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Copyright

     by

Esteban Ribero

    2005
Brand Communications Modeling:

Developing and Using Econometric Models in Advertising.

        An Example of a Full Modeling Process



                           By



                  Esteban Ribero, B.A.




                         Report



     Presented to the Faculty of the Graduate School

          of The University of Texas at Austin

                  in Partial Fulfillment

                  of the Requirements

                    for the Degree of




                     Master of Arts




           The University of Texas at Austin

                    December, 2005
Brand Communications Modeling:

Developing and Using Econometric Models in Advertising.

        An Example of a Full Modeling Process




                                  APPROVED BY

                                  SUPERVISING COMMITTEE:


                                  __________________________
                                  John D. Leckenby


                                  __________________________
                                  Gary B. Wilcox
Brand Communications Modeling:

               Developing and Using Econometric Models in Advertising.

                          An Example of a Full Modeling Process

                                   Esteban Ribero, M.A.

                          The University of Texas at Austin, 2005

                            SUPERVISOR: John D. Leckenby



       This report presents a description and a complete example of the modeling

process required to build a comprehensive market response model that would account for

the impacts of previous marketing actions on sales in order to make better and more

informed decisions that would help solve some advertising and marketing management

problems.

       Real marketing and sales data of a big competitor in the skin-care market of a

Latin American country was analyzed using multivariate regression analysis of time-

series. The report presents a full description and an example of the four major steps

required to build a market response model: specification, estimation, verification and

prediction. The model developed was used then to measure the ROI of the different

marketing actions developed during the time period analyzed. A market share

decomposition analysis as well as other analysis was provided in order to quantify the

direction and power of the impact of the market share drivers. The model was also used

to simulate two slightly different scenarios as an attempt to illustrate the “what-if

process” that can be done using a market response model suggesting different marketing

and media strategies for the brand.




                                              iv
Table of Contents


List of tables…………………………………………………………………………….vii

List of figures……………………...……………………………………………………viii

Brand Communications Modeling: Developing and Using Econometric Models in

Advertising. An Example of a Full Modeling Process……………………………………1

      The Eras of Marketing Modeling………………………………………………….5

The Modeling Process……………………………………………………………………..7

      Specification………………………………………………………………………9

             The modeler’s toolbox…………………………………………………...13

                    Current effects functional forms…………………………………13

                    Lagged advertising effects……………………………………….18

                    Modeling with adstock………………………………………...…23

      Estimation………………………………………………………………………..24

             Ordinary Least Squares…………………………………………………..25

             Generalized Least Squares……………………………………………….30

             Nonlinear Least Squares…………………………………………………32

             Maximum Likelihood…………………………………………………....33

      Verification………………………………………………………………………34

      Prediction………………………………………………………………………...41

      Model building Summary………………………………………………………..43

An Example……………………………………………………………………………...45

      Specifying the model…………………………………………………………….45

      Estimating the model……………………………………………………………48



                                       v
Verifying the model.……………………………………………………………52

   Validating the model……………………………………………………………55

Using the model………………………………………………………………………...60

Summary………………………………………………………………………………..69

References………………………………………………………………………………70

Vita……………………………………………………………………………………..72




                       vi
List of Tables

Table 1…………………………………………………………………………………...24

Table 2…………………………………………………………………………………...39

Table 3…………………………………………………………………………………...40

Table 4…………………………………………………………………………………...49

Table 5…………………………………………………………………………………...50

Table 6…………………………………………………………………………………...51

Table 7…………………………………………………………………………………...56

Table 8…………………………………………………………………………………...58

Table 9…………………………………………………………………………………...65




                       vii
List of Figures

Figure 1…………………………………………………………………………………....8

Figure 2…………………………………………………………………………….….....11

Figure 3…………………………………………………………………………….….....12

Figure 4…………………………………………………………………………….….....12

Figure 5…………………………………………………………………………….….....18

Figure 6…………………………………………………………………………….….....27

Figure 7…………………………………………………………………………….….....37

Figure 8…………………………………………………………………………….….....54

Figure 9……………………………………………………………………………...….. 57

Figure 10………………………………………………………………………………....59

Figure 11………………………………………………………………………………....61

Figure 12………………………………………………………………………………....64

Figure 13………………………………………………………………………………....68




                        viii
Brand Communications Modeling:

                   Developing and Using Econometric Models in Advertising.

                              An Example of a Full Modeling Process



       The way advertising is planned and executed is changing. The media landscape

has been changing at an impressive rate. The development of new technologies has made

possible the emergence of new and multiple media. The fragmentation of media channels,

the decreasing audience’s size of traditional media and the empowerment of consumers

create a new set of rules for marketing and advertising mangers who want to succeed in

the increasing competitive landscape.

       Within this framework to be accountable is no more a desire, it is a need. The

famous statement attributed to John Wanamaker is more relevant now than ever: “I know

half of my advertising budget is wasted. The problem is I don’t know which half”.

Finding which one is what we need now. And this is applicable not only to advertising

but to all marketing activities. Being able to fully understand the effects of the different

marketing policy instruments on sales should be a regular practice for marketing and

advertising mangers.

       Fortunately with today’s improvement in data collection and statistical analysis’

techniques it is possible to address the problem in a scientific, yet subjective, manner. As

we will se, the use of mathematical models to help marketers and advertising

professionals to solve management problems is not new. However, the recent use of

econometric modeling in the advertising industry is becoming an important activity and

more and more companies are using the technique to improve their decision making




                                              1
process. “Econometrics buzzes ad world as a way of measuring results” claimed a recent

article in the Wall Street Journal (Patrick, 2005). The article mentioned the recent raise

on the number of employees working on econometric models in the advertising industry.

For example, WPP’s MindShare has increase the number of people doing econometric

modeling from 20 to 150 in just 5 years. Omnicom’s OMD has its own business unit

(OMD Metrics) dedicated to built econometric models for their international and local

clients, and its staff members have increase from 6 to 45 in the past three years.

       Why is it so important to use formalized models in an industry that has been

traditionally reluctant to scientific scrutiny? Well, the game has changed: The

proliferation of options to promote the sales of a brand and the pressure for accountability

is demanding more measurable results for the advertising industry. The pressure to come

up with ways to show which ads and media strategy boost sales of a product is the

driving force of this new interest in econometric modeling.

       There are many benefits of using formalized models to solve complex problems

like the ones one might encounter in marketing and advertising. John Sterman, an MIT

professor dedicated to the use of formalized model to improve our ability to comprehend

and manage complex systems, discuses the advantages of using formalized models versus

mental models. Following Sterman (1992), mental models have some advantages: they

are flexible, take a wide range of information into account, can be adapted to new

situations and are updated with new information. But mental models also have great

disadvantages: they are not explicit, not easily examined by others. Their assumptions are

hard to discuss, even for our own mental models. But the most important problem with

mental models is that our rationality is bounded: The best-intentioned mental analysis of




                                             2
a complex problem cannot hope to account accurately for the effects of all the

interactions between the variables, especially if those interactions are nonlinear.

       In the other hand, formal models’ assumptions can be discussed openly. Formal

models are able to relate many factors simultaneously and can be simulated under

controlled conditions, allowing analysts to conduct experiments which are not feasible in

the real world.

       This does not mean that formal models are correct. All models are wrong

(Sterman, 2002): they represent the reality, they are not the reality. But formalized

models can help us to understand the systems we work in and for.

       Advertising and marketing managers can greatly be benefited by using models to

solve important problems. For example, the use of econometric models can help a

manger to find the optimal or near optimal advertising budget for future periods. The

analysis would allow him or her to find the adequate advertising budget for attaining a

specific sales goal or, if financial information is available, the model can incorporate

short-term and long-term criteria to maximize profit. (To see some examples, visit the

following http addresses:

http://www.ciadvertising.org/sa/spring_05/adv391k/eribero/frameset.htm

http://www.ciadvertising.org/sa/spring_05/adv391k/eribero/Solo2/frameset.htm ).

   Other applications of the modeling process could help managers to answer the

following questions:

   •   What is the optimal mix of TV vs. Posters vs. Radio?

   •   What happens to sales when we obtain a wider distribution?

   •   What happens to sales when we do not advertise?

   •   How much should we spend on advertising vs. promotion?


                                              3
•   What is the best pattern and level of advertising for my brand?

   •   How effective is our pricing strategy?

   •   Which competitors hurt my brand and how?

   •   Which of my communications channels offers best value for money?

   •   How does advertising work and how can we prove this to the Financial Director?

   •   How do I spend the same budget but increase sales?

   •   What’s the impact of economic changes on my brand?

   •   What’s the best pattern and level of advertising for my brand?

   •   Which copy strategy/campaign worked better?

   •   How much sales could we make next period with X budget?



   Besides these direct practical applications for budgeting, forecasting and

accountability the modelling process would improve the manager’s ability to cope with

his complex environment. Leeflang, Wittink, Wedel & Naert (2000, p. 25-27) lists 8

possible indirect benefits of using models in business. The benefits are described as

follows:

       1. “A model would force him [a manger] to explicate how the market works. This explication

       alone will often lead to an improved understanding of the role of advertising and how advertising

       effectiveness might depend on a variety of other marketing and environmental conditions.”

       2. “Models may work as problem-finding instruments. That is, problems may emerge after a

       model has been developed. Managers may identify problems by discovering differences between

       their perception of the environment and a model of that environment.”

       3. “Models can be instrumental in improving the process by which decision-makers deal with

       existing information”




                                                   4
4. “Models can help managers decide what information should be collected. Thus models may

       lead to improved data collection, and their use may avoid the collection and storage of large

       amounts of data without apparent purpose.

       5. “Models can also guide research by identifying areas in which information is lacking, and by

       pointing out the kinds of experiments that can provide useful information.”

       6. “[A] model helps the manager to detect a possible problem more quickly, by giving him an

       early signal that something outside the model has happened”.

       7. “Models provide a framework for discussion. If a relevant performance measure (such as

       market share) is decreasing, the model user may be able to defend himself to point to the effects of

       changes in the environment that are beyond his control, such as new product introductions by the

       competition. Of course, a top manager may also employ a model to identify poor decisions by

       lower-level managers.”

       8. “Finally, a model may result in a beneficial reallocation of management time, which means less

       time spent on programmable, structured, or routine and recurring activities, and more time on less

       structured ones.”

                                The Eras of Marketing Modeling

       As Leckenby and Wedding said (1982), “the concept of model building in

advertising can be traced back only as far as the early 1950’s”. Even though it is a

relative short history, Leeflang et al (2000) identified five eras of model building in

marketing.

       The first era is characterized by the emulation or transposition of Operational

Research and Management Science into the marketing framework. The OS/MS tools that

included mathematical programming, computer simulations, game theory, and dynamic

modeling were initially developed to solve some of the strategic problems faced during

World War II. The emphasis was on quantitative method sophistication rather than on the

marketing problem per se (Leckenby & Wedding, 1982). The advertising and marketing



                                                   5
problem was adjusted to fit the requirements of the technical methods available, rather

than the other way around. The methods were typically not realistic, and the use of those

methods in marketing applications was therefore very limited (Leeflang et al, 2000).

       The second era which ended in the late sixties early seventies was characterized

by the attempt to adapt the models to fit the marketing problems in order to overcome the

misuse of the OR approach in the advertising and marketing field. The models were

however so complex that lacked usability.

       The third era that started around 1970, showed and increased emphasis on models

that were good representations of reality and at the same time easier to use. John D.C.

Little developed the concept of “Decision Calculus”. He used the term to describe models

that would process judgments and data in a manner which would assist the manager in

decision making (Leckenby and Wedding, 1982). This emphasis in helping decision

making made a major change in the direction of model building in advertising. Little

(1970) suggested possible answers to the question of why models were not used: good

models and parameterization is hard to find; managers do not understand models; and

models are incomplete. So in order to overcome such problems a model should be:

simple; robust; easy to control; adaptive; complete on important issues; and easy to

communicate with. He also said that a model should be evolutionary (Little, 1975)

meaning that a model should start with a simple structure, to which detail is added latter.

The use of judgmental data as well as objective data in the model building process helped

the raise of models implementation (Leeflang et al, 2000).

       Even though the third era of modeling in marketing and advertising was focused

on implementation and usability of models it was not really until the fourth era (starting




                                             6
in the mid 1980) when models were actually implemented (Leeflang et al, 2000). The

main factor that helped this implementation boom was the availability of precise

marketing data coming from scanning equipment that captured in-store and household-

level purchases. This era coincided with the proliferation of marketing support systems.

       The fifth era may be characterized by an increase in routinized model

applications. It is predicted that in the coming decades the age of marketing decision

support will usher in an era of marketing decision automation (Leeflang et al, 2000;

Bucklin et al, 1998). It is expected that marketing support systems take care of routine

marketing decisions like assortment decisions and shelf space allocation, customized

product offerings, coupon targeting, loyalty reward and frequent shopper club programs,

etc. The focus of this paper is in the model building process representative of the third

and fourth era.

                                    The Modeling Process

The model building process for any mathematical model, including response models, is

supposed to follow a sequence of steps. The traditional view assumes the following four

steps: specification, estimation, verification and prediction (Leckenby & Wedding, 1982;

Leeflang et al, 2000). Leeflang et al (2000) propose an alternative sequence more focused

on implementation (see figure 1; for a detailed explanation of the implementation view

see Leeflang et al, 2000, chapter 5). In order to keep it as simple as possible we are

focusing on the traditional view.




                                             7
Figure 1. The implementation view on model building. (From Leeflang et al,

2000, p. 52)




                                   8
Specification

       “A model is a representation of the most important elements of a perceived

realworld system.” (Leeflang et al, 2000)

       In order to better understand the model building process and especially the

specification stage is important to analyze the definition provided above. The definition

indicates that models are representations, “simplified pictures” (Leeflang et al, 2000) of

reality. Those representations may be useful for decision makers trying to understand the

reality they deal with. The definition above has an extremely important implication: since

a model is a representation of a perceived realworld it is something subjective. Different

model builder could have different perceptions and interpretations about the same reality.

Modelers could also have different opinions about which are “the most important

elements” to represent. This makes the model building process not only more interesting

but very dependent on the modeler’s “theory” of the reality he tries to represent.

       That is why it is so important in the model building process to fully specify the

variables and the relationship between them. That is exactly what is done in the

specification stage.

        For example, if we consider sales as the dependent variable and advertising and

the rest of the marketing policy instruments as the independent variables, specification

would be the process of deciding upon the functional form which will describe the

relationship between advertising (and the other marketing variables) and sales (Leckenby

& Wedding, 1982). In other words: “specification is the process by which the manager’s

theory of how advertising works for a particular brand or company is put into testable

form” (Leckenby & Wedding, 1982, p. 257).




                                             9
Rephrasing Little’s suggestions for building good models (Little 1970), a model

should be:

               a. simple;

               b. complete on important issues;

               c. adaptive;

               d. robust.

Leeflang et al, (2000) pointed that it is easy to see that some of these criteria are in

conflict. They state that “none of the criteria should be pushed to the limit. Instead, we

can say that the more each individual criterion is satisfied, the higher the likelihood of

model acceptance” (Leeflang et al, 2000, p. 53)

       While specifying a model one should then consider these elements. As a goal,

models should be as simple as possible. That is, considering the principle of parsimony,

one should choose between competing models the one that fairly represent the reality

with the simplest structure. Equally important is to consider the trade-off between

accuracy and usability. It is not uncommon to find two competing models that perform

differently in these two criteria. If accurate forecasting is more important than

understanding the effects of the independent variables then a more accurate model should

be chosen even though it might be more complex and then less easy to explain and use.

But if it is more important to understand the market dynamics and the way the marketing

variables affect sales a simpler model should be used.

       Fortunately for modelers they are several functional forms to choose from while

specifying a model. The one to be selected depends on the above criteria as well as on the




                                              10
underlying theory of marketing and advertising that the manager or modeler is

considering.

       We first will consider the different shapes that a response function might have.

Then we will describe some of the most used response functions in advertising.

       The shapes of a response function could be classified as linear, concave or s-

shape. Any other shape could be the result of a combination of one or more of these

shapes. Figure 2 shows a typical linear response. Figure 3 shows different concave

response shapes and figure 4 shows some s-shape functions.




           Q




                                                                       A

       Figure 2. A linear shape function




                                           11
Figure 3. Some concave response functions




Figure 4. Some s-shape response functions




                                  12
The modeler’s toolbox

       “To the craftsman with a hammer, the entire world looks like a nail, but the

availability of a screwdriver introduces a host of opportunities!”

                                                       Lilien & Rangaswamy (1998)

       Because it is true that one should not modify to problems to fit the tools it is

easier for the modeler if he/she can choose from a series of predetermined functions that

he/she can then modify to fit the problem. The decision to pick one or the other depends

on the problem at hand and the data availability. For example, a linear function (the

simplest possible response function) could fit the data pretty well if the data range

correspond to a linear section of a more complex response function. (Lilien &

Rangaswamy, 1998)

       The following are some of the most used response functions in advertising. Even

though a brief description of the functions is provided, for more details please refer to

Hanssens, Parsons & Schultz, 2001; Leeflang et al, 2000; or Kotler, 1971.

       Current effects functional forms. The simplest response functions, Current Effects

Functions (CE), assume that the effects of the marketing variables occur in full in the

same period in which they appear. For example, advertising expenditures in April are

supposed to affect sales in April and only April. While this might not hold true for most

of the brands CE functions are useful for their simplicity and ease to explain.

       The Linear response model has the following form:

                S = a + bA + u

       Where:

                S = Sales




                                             13
a = the y intercept

               b = slope of the function

               A = Advertising expenditures

               u = disturbance term or error term

       The linear response function assumes constant returns to scale. That is, sales

increase by a constant amount to equivalent constant increase in marketing effort (Figure

2). The linear model would not lead to locally different conclusions than another function

if the data are available only over a limited range. While adequate for asking “what if”

questions around the current operating range, the linear model would be misleading if

data outside the range are used like it would be the case in trying to find the optimal

advertising effort.

       More realistic response models are said to have diminishing returns to scale.

These models suppose that sales always increase with increases in advertising or

marketing effort, but each additional unit of marketing effort brings less in incremental

sales than the previous unit did (Hanssens et al, 2001). The following concave downward

response functions show diminishing returns to scale:

       The Semilogarithmic (Log) function:

                S = a + b ln A + u

       The Square-root function:

                S = a+b A +u

       The Quadratic function:

                S = a + b1 A − b2 A 2 + u




                                             14
The quadratic function has the important property that differentiates it from the

others which is that it can represent the concept of supersaturation; phenomenon that

occurs when too much marketing effort causes a negative response. The so called

“wearout” effect is an example of a case of supersaturation in advertising.

        The following functions are nonlinear in the variables but linear in parameters and

can be linearizables with some algebra in order to be able to estimate them through linear

regression (see the section Estimation in this paper):

        The Power function:

                 a) S = aA b

                 b) ln S = ln a + ln A

        The power function is very flexible since depending on the value of the parameter

b it can take very different forms (see Leeflang et al, 2000 p. 75-76; Kotler, 1971 p. 33) It

also has the great characteristic that the coefficient b is actually the elasticity of the

demand to advertising (Hanssens, 2001, p. 101, Broadbent, 1997). Also, when more than

one independent variable are considered the power function, also known as the

multiplicative function, accounts for possible interactions between the independent

variables.

        The Modified Exponential function:

                 a) S = S (1 − e a +bA )

                       ⎡ S⎤
                 b) ln ⎢1 − ⎥ = a + bA
                       ⎣ S⎦

        Where:

                 S = upper bound level or saturation point




                                               15
e = a mathematical constant equals to 2.71...16 …

         An attractive characteristic of the modified exponential function and some of the

next functions as well, is that it supposes an upper limit or saturation point where the

market potential reaches its maximum. One special characteristic is that it implies that the

marginal sales response will be proportional to the level of untapped potential (Kotler,

1971).

         All previous functional forms except the linear one are concave downward

functions (figure 3). That implies diminishing returns at all points in the response. It is

sometimes the desire of the modeler or manager to represent the intuitive concept of a

“threshold effect” in advertising. That is, the idea that small doses of advertising does not

count for much and that there is a tipping point that must be crossed in order to expect

real effects of advertising on sales. Even though there is little evidence that such a

phenomenon occurs in advertising (Kotler, 1971; Leckenby & Wedding, 1982; Hanssens

2001) it is possible to represent the concept using s-shape functions (figure 4). These

functions assume increasing marginal returns at first and then diminishing marginal

returns with respect to various alternative levels of advertising. The following are the

most common s-shape functions:

         The Gompertz function

                a) S = Se − e e
                                a bA




                b) ln(ln S − ln S ) = a + bA

         The Logistic function:

                                 S
                a) S =
                         (1 + e − ( a + bA) )




                                                16
⎡ S ⎤
                b) ln ⎢        = a + bA
                      ⎣S − S ⎥
                             ⎦

       The Lower-Bound Logistic function:

                       S + S LB e a + bA
                a) S =
                         1 + e a + bA

                      ⎡ S −S ⎤
                b) ln ⎢          ⎥ = a + bA
                      ⎣ S − S LB ⎦

       Where:

                S LB = Lower bound level or minimum sales when advertising is 0.

       As described above, these functions are just approximations of different

“realities” and the modeler can modify them to incorporate other elements to better

address the problem at hand. For example, these functions only consider one independent

variable and do not account for special situations like seasonality or special events during

the period analyzed. The modeler can then add different variables to these functions or

use dummy variables to represent qualitative differences or changes in the data (see some

examples at Hanssens et al, 2001, p. 97-99). Figure 5 shows some of the functions

discussed above.




                                              17
Figure 5. Graphical representation of some CE functional forms (from Leckenby &

Wedding, 1982).



       Lagged advertising effects. As discussed earlier, Current Effects response

functions assume that the effects of an advertising or marketing expenditure in period t

occurs only, and completely, in period t. This assumption does not correspond with

common understanding of advertising theory since it is assumed that a big part of

advertising effects occur with time. So, in order to accommodate this into advertising

response models we need first to discuss some basic concepts about carryover effects.




                                            18
Carryover effect is the term used to describe the idea that marketing and

advertising expenditures have effects on sales that carries over into future periods

(Kotler, 1971). There are two major categories of carryover effects that can be

distinguished: the delayed response effect and the customer holdover effect (Leckenby &

Wedding, 1982).

       The delayed response effect develops because delays occur between the time the

advertising dollars and programs are implemented and the time the advertising generated

purchases occur (Leckenby & Wedding, 1982). There are four types of delayed response

effects: Execution delay, noting delay, purchase delay and recording delay. The delay

occurs either because executing takes time, consumers do not notice the ads immediately

or because they delay the purchase to future periods. The recording delay is a problem

with the data and may not represent a real delayed response, just a mismatch between the

data (for more detail see Kotler, 1971: Leckenby & Wedding, 1982)

       The customer holdover effect is clearly explained by Kotler (1971): “suppose that

a marketing stimulus is paid for today, appears today, is noted today, and leads to

purchase today. No delayed response is involved. The buyer finds the product agreeable

and decides to remain with this brand. On this basis it can be said that marketing stimulus

this period affected sales this period and for many future periods.” (p. 124)

       This repurchase scenario suggests that advertising should be credited, in some

part, for holding the costumer to the brand in future time periods. Retaining new and

possibly old customers in future periods is not the only way a holdover effect can occur.

A holdover effect can also occur even if the number of customer does not increase as a

result of the advertising expenditure. This can happen when the advertising or other




                                             19
marketing stimulus increases the average quantity purchased per period per customer

(Kotler, 1971).

        Regardless the type of carryover that could be present for a brand at a particular

time, it is possible to represent it with some dynamic models. To better understand some

of these models we will consider the simplest linear model with lagged effects. The

model has the following form:

                  S t = a + bAt + bcAt −1 + bc 2 At −2 + ....

        Where:

                  a = the intercept term

                  b = regression coefficient

                  c = carryover rate or retention rate (0 < c <1)

        The basic assumption behind this model is that the effect of advertising in period t

decays exponentially in subsequent periods. That is, the effect on sales in period t is the

result of the advertising in period t plus a fraction of advertising in t-1 plus a fraction of

advertising in t-2, etc. The rate of decay, or in other words, the amount of advertising

effect that is carried over the immediate next period is the carryover rate (c).

        Because estimating the parameters on this models requires us to know how many

periods we have to look back as well as dealing with autocorrelations (see Estimation in

this paper) some modifications done by Koyck and others give us the following lagged

effect models:

        The Koyck Geometric Distributed Lag (GL) model:

                  S t = a(1 − c) + bAt + cS t −1 + {ut − cut −1}

        Where:



                                                      20
u t = white noise (disturbance term)

                c = carryover rate or retention rate (0 < c <1)

                b = β (1 − c) Short-term effect of advertising

                      b
                β=        Long-term effect of advertising
                     1− c

       This model hypothesizes that the effect of advertising conducted in all preceding

time periods on current sales period t can be summarized in one term: lagged sales. Sales

are then assumed to be a function of advertising and sales in the preceding time period.

The model performs well sometimes, however where strong sales trends are noted, the

effect of previous time period sales on current sales is so strong that the effect of current

advertising on sales can hardly be detected (Leckenby & Wedding, 1982), something not

totally in accordance with advertising theory.

       The Partial Adjustment (PA) model:

                S t = (1 − ϕ )[a + bAt ] + ϕS t −1 + wt

       Where:

                1 − ϕ = adjustment rate

                w = white noise

       The Partial Adjustment model is similar to the Geometric Lag in its structure. It

assumes that consumers can only partially adjust to advertising stimulus in the short-term

but they will gradually adjust to the desired consumption level, which causes the

advertising effect to be distributed over time (Hanssens et al, 2001).

       Note: The above Partial Adjustment model should not be confused with the

Nerlove Partial Adjustment model (Nerlove PA). The latter may not be a carryover effect




                                                   21
model but it represents the concept of brand loyalty and assumes some inertia from the

past. This model could be tried after some unsuccessful attempts with the Current Effects

models and before the more complex models of carryover effects.

        The Nerlove PA functional form is:

                 S t = a + b1 A1 + b2 S t −1 + ut

        Another carryover effects model similar to GL but with an autoregressive

structure is the following:

        The Geometric Lag Autoregressive (GLA) model:

                 S t = a + b1 A1 − b1 ρAt −1 + (c + ρ ) S t −1 − cρS t − 2 + {u t − cut −1}

        Where:

                 c = carryover rate or retention rate (0 < c <1)

                 ρ = autocorrelation coefficient

        The GLA model is a nested model which means that lower-order equations are

contained within the parameters of its higher-order structure (Hanssens et al, 2001). For

example, where ρ = 0 the GLA becomes GL; where ρ = 0 & c=0 the CE linear model

and the special case where ρ = c (≡ ϕ ) the Partial Adjustment model (Hanssens et al,

2001; Leeflang et al, 2000).

        A modeler should first try some of the CE models, then if after estimating the

parameters (see Estimation in this paper), autocorrelation appears he should try i) to add

important explanatory (independent) variables or ii) to change model specifications

through transformations. If after i) and ii), autocorrelation (the fact that a variable is

correlated with itself in previous time periods) remains it may be “true” autocorrelation.

That is, a generalized carryover effect so the modeler should specify this autocorrelation


                                                      22
in the model (Leckenby, personal notes). The Geometric Lag Autoregressive model

(GLA) is an example of that process (For others autoregressive models see Hanssens,

2001, cap. 4).

       It is important to know that these lagged effects response models can also take

different functional forms in order to represent diminishing returns to scale or s-shape

behavior; pretty much like the Current Effects models discussed earlier.

       Modeling with adstock. The concept of carryover effect can be modeled either

explicitly, as we have seen in the previews models or implicitly using stock variables.

The latter approach was championed by Simon Broadbent in several publications (see

Broadbent, 1979, 1984, 1997). The basic idea with the creation of stock variables is that

they capture the present and past amount of advertising effect for any period into one

single value for that specific period. The approach assumes the same geometrical decline

in advertising effect as the models presented above. The adstock variable is then just

added to the equation like any other independent or explanatory variable.

       Its key advantage is the ease of communicating results to management and its

simpler estimation process since the retention rate can be estimated subjectively using the

concept of half-life (HL). Half-life is simple the time it takes for an advertising effort to

have half of its effects. Event thought this time can vary from 3 to 10 weeks it tends to be

between 4 to 6 weeks (Broadbent, 1984).

       There is a carryover rate or retention rate (c) associated with every HL value.

Table 1 show the retention rate for different half-lives for “first period counts full”

convention or “first period counts half” (see Broadbent, 1984; Hanssens et al, 2001 for a

discussion on these conventions). To the extend that the adstock approach uses the same




                                              23
model of carryover the work is not different than the one resulting from the models that

specify the carryover effect explicitly (Hanssens et al, 2001).



Table 1.

Half-life and retention rate.

 Half Life        1         2         3           4        5         6         7         8

   f=1          0.500     0.707     0.794     0.841     0.871     0.891      0.906      0.917

  f = 1/2       0.334     0.640     0.761     0.821     0.858     0.882      0.899      0.912

 Half Life        9         10       11         12        13        14        15         16

   f=1          0.926     0.933     0.939     0.944     0.948     0.952      0.955      0.958

  f = 1/2       0.922     0.930     0.936     0.942     0.948     0.950      0.953      0.956



                                          Estimation

       Once the modeler has specify a model based on theoretical relations between the

explanatory and dependent variables or by examination of the available data he or she

must estimate the parameters of the function using historical or cross sectional data

(Leckenby & Wedding, 1982). The essence of the process is fitting a determined equation

to a set of data in order to find the best estimates of the different parameters in the model

( a, b1, b2 , c , etc). There are many estimation techniques however the most “robust” and

popular is regression analysis.

       We will now describe the basic concepts of the simplest regression analysis:

Ordinary Least Squares (OLS). We will discuss the assumptions underlying this

technique and the problems when they are violated as well as possible remedies.



                                             24
It is important to notice that the process of model building is somehow circular in

the sense that a model is specified, estimated, and verified but very often some violations

of the assumptions as well as unsatisfactory results force the modeler to choose a

different estimation technique or to modify the model specification and start the process

again.

         Another annotation is that the estimation process in model building is more of a

confirmatory approach (see Hair, 1998) of multiple regression analysis. It differs

somehow with an exploratory approach because a pre-established functional form based

on theoretical relations between variables is “tested” or confirmed against empirical data.

However, as noted earlier, it is an iterative process where different fictional forms might

be “confirmed” until finding satisfactory results.

Ordinary Least Squares

         The basic idea of estimating the parameters of a response function is to find the

values for each parameter that would minimize the sum of errors or disturbance terms in

the equation. Let us consider the simplest linear functional form:

                  S = a + bA + u

         Where:

                  S = Sales

                  a = the intercept term

                  b = slope of the function

                  A = Advertising expenditures

                  u = disturbance term or error term




                                              25
Rephrasing, the objective in the estimation process of model building is to find

the values of a and b that would give the least value of u in the average. Because what we

are trying to find is the statistical relationship between the variables there is always some

random errors: for every value of an independent variable there might be more than one

value of the dependent variable. These multiple values of the dependent variable for

every value of the explanatory variables are the result of random components in the

relationship (Hair, 1998).

       The Ordinary Least Squares is the basic technique in which the parameters of a

linear or linearized (see Specification section in this paper) response function are

estimated by minimizing the sum of the error terms at every point of the function.

Because the difference between a predicted value by the function and the observed value

could be positive or negative, the error terms are squared so they can be added to produce

a measure of the fit of the model to the data in the sample. That measure is the residual

sum of squares (RSS) or the sum of squared errors (SSE) (Hair, 1998). There is also a

measure of the improvement in explanation of the dependent variable attributable to the

independent variables compared to just using the media of the dependent variable. It is

called the sum of squared regression (SSR) and it is calculated by adding the squared

differences between the mean and the predicted value of the dependent variable for all

observations (Hair, 1998). These tow measures are crucial for assessing the model’s

capacity to explain the variation of the data of the dependent variable. If the SSR is

divided by the total sum of squares (TSS), the total variance of the dependent variable,

we obtain the coefficient of determination R 2 that represents the portion of the total

variance of the dependent variable (usually sales S or market share) explained by the




                                             26
model. Figure 6 shows a graphical representation of those measures. The unexplained

variance is SSE, the explained variance is SSR and the total variance is TSS.




       Figure 6. Variance in regression analysis (from Leckenby & Wedding, 1982).



       The procedure underlying OLS has several restrictive assumptions that must be

carefully considered in assessing the validity of the estimated model (see Verification in

these paper). The fundamental assumptions are the following:

       a.) The mean of the error terms equals 0

       b.) Constant variance of the error terms

       c.) Independence of the error terms

       d.) Normality of the error terms’ distribution

       e.) Low multicollinearity




                                             27
The basic idea behind these assumptions is that u is a random variable. This is clearly

explained by Koutsoyiannis in his Theory of Econometrics book (1978):

       “(…) u can assume various values in a chance way. For each value of an independent variable the

       term u may assume positive, negative or zero values each with a certain probability. We said that u

       is introduced into the model in order to take into account the influence of various 'errors', such as

       errors of omitted variables, errors of the mathematical form of the model, errors of measurement

       of the dependent variable, and the effects of the erratic element which is inherent in human

       behavior. Now, for u to be random the omitted variables should be numerous, each one

       individually unimportant, and they should change in different directions so that their overall effect

       on the dependent variable is unpredictable in any particular period.”



       If we agree that what we are trying to represent in model building is the

relationship between the independent and depend variables in the average, it is imperative

that the mean of the error term equals 0 (assumption a). Otherwise the parameters of the

function are biased (Leeflang et al, 2000).

       Assumption b means that the dispersion of the error terms remains the same over

all observations of the independent variables. It is said that the variance of the error terms

around the zero mean is homoscedastic, which means that it does not depend on the

values of the independent variables. Conversely, the case of heteroscedasticity is when

increasing or decreasing dispersion of the error terms is observed. The consequence of

violating this assumption is that it is not possible to calculate an effective confidence

interval for the parameters reducing their efficacy (Leeflang et al, 2000) and their

statistical significance (Koutsoyiannis, 1978).

       Assumption c is also known as absence of autocorrelation. That means that the

error terms at any point in the function should be independent from each other. This



                                                    28
might be relevant only when the model is estimated using time series because the

autocorrelation is actually a serial correlation (Leeflang et al, 2000) between the error at

one period and the error(s) at the previous period(s). There is positive autocorrelation and

negative autocorrelation. Positive autocorrelation means that the residual in t tends to

have the same sign as the residual in t-1. Negative autocorrelation is when a positive sign

tends to be followed by a negative sign or vice versa (Leeflang et al, 2000). The

consequences of violating this assumption is that even though the estimated parameters

are unbiased (as when assumption b is violated) the OLS formula underestimates their

sampling variance and the model will seem to fit the data better than it actually does

(Hanssens et al, 2001).

        The assumption of normality (assumption d) is necessary for conducting the

statistical tests of significance of the parameter estimates and for constructing confidence

intervals. If this assumption is violated the estimates are still unbiased and best, but it is

not possible to assess their statistical reliability by the classical test of significance (t, F,

etc.) because this test is based on normal distributions.

        Multicollinearity results form the correlation between independent variables.

When one independent variable “moves” at the same time as another one it is said that

they are collinear. In marketing as in many other areas variables tend to be correlated all

the time. For example, a price reduction is announced via some TV advertising as well as

radio. These variables will be correlated to each other since they vary at the same time.

Managers usually do not leave all variables constant and vary only one at the same time.

The degree of multicollinearity has an important impact on the parameters of the

response function. A high level of multicollinearity limits the size of the coefficient of




                                                29
determination R 2 and it makes determining the contribution of each independent variable

difficult because the effects of the independent variables are “mixed” or confounded

(Hair, 1998). In consequence, the reliability of the parameter estimates is low (Leeflang

et al, 2000).

        The assumptions discussed above limit the applicability of OLS to estimate the

parameters of the function because these assumptions are often violated. There are many

reasons why the assumptions are violated but usually it is the result of misspecification of

the response function. There are some tests and procedures to test if one or more of the

assumptions are violated. Some of them would be described in the Verification section of

this paper.

        Once the parameters are estimated and the underlying assumptions tested it is

sometimes possible to take some corrective actions if violations to the assumptions are

present. The simplest corrective action is modifying the specification of the response

function and estimating it again. However, sometimes the only solution is to use a

different estimation technique.

Generalized Least Squares

        In the Generalized Least Squares (GLS) techniques some of the restrictive

assumptions about the disturbance term in OLS are relaxed, specifically the assumptions

of constant variance and independence of the error terms (autocorrelation). These

estimation methods are “generalized” because they can account for especial cases or

models. Actually, OLS is a special case of GLS where all the assumptions are met

(Leeflang et al, 2000). Other special case is when the variance is heteroscedastic --for

example, when cases that are high on some attribute show more variability than cases that




                                            30
are low on that attribute, and the difference can be predicted from another variable, a

weight estimation procedure can compute the coefficients or parameters of a linear model

using weighted least squares (WLS), such that the more precise observations (that is,

those with less variability) are given greater weight in determining the regression

coefficients (Leeflang, 2000). The weight estimation procedure in statistical packages

like SPSS tests a range of weight transformations and indicates which will give the best

fit to the data.

        Another special case, typical of time-series, is when there is strong presence of

autocorrelation of the disturbance terms but at the same time the variance is

homoscedastic. Assuming that the autocorrelation is generated by a first-order

autoregressive scheme (Markov scheme) some transformations are done to incorporate an

autoregressive coefficient that would allow better parameter estimates (see Leeflang et al,

2000, p. 371-376 for a detailed explanation). There are others GLS methods that account

for especial cases of the behavior of the disturbance term. For an extensive list of

literature on those methods see Hanssens et al, 2001, Chapter 5.

        One important note is that these GLS procedures for dealing with special patterns

of the disturbance terms would not give better parameter estimates if the pattern is due to

misspecified models, as it is usually the case (Leeflang et al, 2000). Additionally,

“robustness may generally be lost if GLS estimation method are used” (Leeflang et al,

2000, p. 376). So, before using these procedures the modeler should be convinced that he

or she is using the best possible model specification (Leeflang et al, 2000).




                                             31
Nonlinear Least Squares

        There are some models that are nonlinear and nonlinearizables. Additionally,

there are other models that violate the assumptions of the disturbance term in their mere

specification. Those models include the Koyck General Lag (GL), Partial Adjustment

(PA) and General Lag Autoregressive (GLA). Those cannot be accurately estimated by

linear regression. For solving this problem some procedures have been created to allow

the modeler to estimate those kinds of models. The general or more common

characteristic of this procedure is that it is iterative. In its simplest form the parameter

that is causing the model to be nonlinear is guessed by either subjective estimation or trial

and error until a satisfactory result is achieved. Leeflang et al (2000) explain this grid

search in the following terms: “For simplicity assume that for any value of y [the

parameter causing the nonlinear attribute], the model is estimated by OLS, under the

usual assumptions about the disturbance term. Then choose m values for y, covering a

plausible wide range, and choose the value of y for which the model’s R 2 is maximized”

(p. 384). This procedure is equivalent to the one using adstock models when different

half-life values are tested to select the one that gives the best results (Broadbent, 1984).

        This grid search can also be done when instead of replacing a parameter that is

causing the nonlinearity, different transformations of the predictor variables are tested

sequentially until finding satisfactory results (Leeflang et al, 2000). However, grid search

procedures are costly and inefficient, especially if a model is nonlinear in several of its

parameters (Leeflang et al, 2000).




                                              32
More sophisticated methods have been developed where initial estimates of some

parameters are reintroduced in the equation in an iterative process until the whole process

converges (Leeflang et al, 2000; Koutsoyiannis, 1978).

       All the techniques discussed above estimate the parameters in an attempt to

minimize the squares of the differences between the estimated points and the observed

ones. They are all Least Squares (LS) methods. A radically different approach is the

Maximum Likelihood (ML) method.

Maximum Likelihood

       The ML method is based on distributional assumptions about the data. Basically it

finds the values of parameters that make the probability of obtaining the observed sample

outcome as highly as possible (Hanssens et al, 2001). In other words “the maximum

likelihood principle is an estimation principle that finds an estimate for one or more

parameters such that is maximizes the likelihood of observing the data. The likelihood of

a model (L) can be interpreted as the probability of the observed data y, given the model”

(Leeflang et al, 2000, p. 390). Under this assumption a certain parameter is more likely

than other.

       The assumptions underlying ML method are actually the ones involved in

hypothesis testing in social sciences (Leeflang et al, 2000) and not surprisingly the

method is very sensible to the sample size, giving better results with large samples.

       The ML method can also be used to select a model between competing ones (see

Summary in this paper). For more details on ML and LS methods for estimating the

parameters of a response function consult Hanssens et al, 2001; and Leeflang et al, 2000.




                                             33
Verification

        Another important step in developing market response models is to verify that the

parameters estimated in the previous step truly represent the relationship between sales

(or any other dependent variable) and the marketing variables. The usual way to do this is

to use statistical significant testing (Leckenby & Wedding, 1982). By verifying the

parameters it is possible to determine with a certain risk level how representative they are

of the true advertising-marketing/sales relationship. In market response model (if

commercially used) the significance level often used is about 15 percent (Leckenby &

Wedding, 1982). If achieving that level of significance one could say that in at least 85

samples of every 100 samples of data that we use for estimating the response function the

parameters would be between x and y number (the confidence interval).

        The first measures that should be verified are those related with the fit of the

model to the data in the sample. As discussed above, the R 2 value indicates the

percentage of the variance of the dependent variable explained by the independent

variables in model. Because this measure is affected by the number of observations per

independent variables used, the modeler should focus on the adjusted R 2 for comparing

between competing models and to control for “overfitting” the data (Hair, 1998). It is

important to notice that the minimum ratio of observation per independent variable

should be 5 to 1 in order to avoid making the results too specific to the sample

(“overfitting”) thus lacking generalizability. Verifying the statistical significance of R 2

and adjusted R 2 is critical in this step. The F ratio is the statistical significance test that

most statistical packages use to test this. The parameters of the models should also be

tested in terms of their statistical significance. The t value of a coefficient or parameter is



                                                34
the coefficient divided by the standard error. To determine if the parameter is

significantly different form zero (no effects or relation with the dependent variable) the

computed t value is compared to the table value for the sample size and confidence level

selected. This test is not that important for the intercept term in a linear model since it

acts only to position the model (for details see Hair, 1998, p. 184)

       Another measure highly related with the overall fit of the model that must be also

checked in this step of model building is the RSS or SSE (the squared sum of the errors

or disturbance terms). Even though a high R 2 could be found for a specific model the

RSS could still be very large indicating the inability of the model to accurately make

predictions.

       As discussed in the previous section, the assumptions underlying the different

estimation techniques are highly important for assessing the validity of the parameter

estimates since violations of the assumptions give biased coefficients or, more frequently,

make their statistical significance hard to estimate (Leeflang et al, 2000). If the

assumptions are violated the confidence that the parameters truly represent the

relationship under analysis is diminished. So another important task of the verification

step is to verify that the assumptions used for estimating the parameters are not violated.

The simplest way to do this is by a careful analysis of the residuals using scatter plots. It

is recommended to use some form of standardization as it makes the residuals directly

comparable. The most widely used is the studentized residuals, whose values correspond

to t values (Hair, 1998). Figure 7 shows different plots that illustrate the pattern that the

disturbance terms could take if some of the assumptions are violated.




                                              35
The null plot (Figure 7a) is the usual pattern when all the assumptions are met.

“The null plot shows the residuals falling randomly, with relatively equal dispersion

about zero and no strong tendency to be either greater or less than zero. Likewise, no

pattern is found for large versus small values of the independent variable.” (Hair, 1998, p.

173).

        By analyzing these plots the modeler could find violations to the assumptions and

then find remedies for those violations. These plots are the typical pattern one should find

when violations occur. For example, nonlinearity (b) in the relationship between the

dependent and explanatory variables; heteroscedasticity of the variance (c) and (d); and

autocorrelation (e). The normal histogram of the residuals (g) allows the modeler to test

the assumption of normality of variance. A pattern like (f) would result when important

events in the data are omitted in the specification of the response function (Hair, 1998).

(For example, dummy variables that account for seasonality or special promotional

events).




                                             36
Figure 7. Graphical Analysis of residuals. (From Hair, 1998).




                                    37
Plotting the residuals against the independent variables is quite useful, however,

the prototypical patterns depicted in figure 7 are hard to detect for small samples and

sometimes large samples as well. Some statistical tests have been developed for helping

the modeler find violation to the assumptions in a more systematic way. For example, the

Durbin-Watson (D.W.) test allows the model builder to test autocorrelations of the

disturbance terms. The D.W. statistic varies between zero and four. Small values indicate

positive autocorrelation and large values negative autocorrelation (Leeflang et al, 2000).

Durbin and Watson formulated lower and upper bounds ( d L , d U ) for various significance

levels and for specific sample sizes and numbers of parameters. The test is used as

follows (for details see, Leeflang et al, 2000, p. 340):

For positive autocorrelation

       a. If D.W. < d L , there is positive autocorrelation

       b. If d L < D.W. < d U , The result is inconclusive

       c. If D.W. > dU , There is no positive autocorrelation

For negative autocorrelation

       d. If [4-D.W.] < d L , there is negative autocorrelation

       e. If d L < [4-D.W.] < d U , The result is inconclusive

       f. If [4-D.W.] > dU , There is no negative autocorrelation

       Other tests have been developed for testing violations to other assumptions. The

description of those tests is outside the scope of this paper, for a detailed description see

Hanssens et al, 2001, chap. 5; and Leeflang et al, 2000 chap. 16.




                                              38
Leeflang et al, 2000, developed a table (table 2 in these paper) that summarizes

the violations to the assumptions in model building using Least Squares as well as

possible reasons, consequences, tests for detecting them and possible remedies.

Table 2.

Violation of the assumptions about the disturbance term: reasons, consequences, tests and

remedies. (From Leeflang et al, 2000, p. 332)




                                           39
As table 2 shows when some violation of assumptions are detected by either

plotting the residuals or applying specific test, the modeler can try to take some remedies,

often modifications to the specification of the model, or the use other estimation

technique that relax the violated assumption (see Estimation in this paper). As frequently

mentioned by Leeflang et al (2000) and Hanssens et al (2001), violation of the model are

usually caused by specification errors, so the first thing a modeler should do if the results

are not satisfactory is to try a different functional form (see Specification in this paper) or

to modify the specification of the model under scrutiny.

       The process of model verification is clearly explained in the following table (table

3) taken from Hanssens et al, (2001).

Table 3.

Steps in evaluating a regression model. (from Hanssens et al, 2001)




                                              40
Prediction

       Verification is just one part of the validation of a response model. The response

function in order to be believed must be able to predict future sales or market share for

the brand relative to the explanatory variables (Leckenby & Wedding, 1982). For

example, if it is true that advertising expenditures can explain sales a valid model should

be able to predict the amount of sales in period x given a certain level of advertising

expenditures in period x and probably previous periods. Because waiting for future sales

data to test a model is not only risky but useless if we want to use the model to forecast or

decide on future marketing and advertising expenditure levels, a process called

“postdiction” is used. Postidction refers to the idea of predicting values that are already

known. For example, a model is estimated using a sample that includes all the data from

the past two years but not from this year even thought we already know the figures. The

process of postdicting is the use the model to predict the sales of this year given the

marketing and advertising expenditures this year too. If the accuracy of the predictions is

good the model is a valid model for future forecast and then can be used in different

managerial decision making tasks.

       The way a modeler can perform this validation process is to split the sample of

data in two subsets: one for estimating the model and the other for validating it using the

process described above. With large samples this can be easily done by just leaving a fair

number of data for validation purposes. However, the modeler usually does not have a lot

of data to do this, so a minimum of three data points are left for validating the model.

       When the model is estimated using cross sectional data, the validation sub-sample

is chosen randomly but when the model is estimated using time-series data the last three




                                             41
or more periods are reserved for the validation process. The reason for doing this is that

the modeler would like to take into account the prediction accuracy when carryover

effects are involved in the response functions (Leeflang et al, 2000) and also because the

manager would be more interested in the prediction accuracy of recent events than that of

distant ones.

          There are several measures of the prediction accuracy of the model (see Leeflang

et al, 2000, chapter 18) but the basic principle is to compare the predicted values with the

observed ones and calculate the average error of the predictions. The two most common

measures are the Average Prediction of Error (APE) and the Mean Absolute Percentage

of Error (MAPE).

          The Average Prediction of Error is calculated by averaging the differences

between the observed and the estimated values. The procedure allows negative and

positive errors to offset each other (Leeflang et al, 2000). In accordance with the zero

mean assumption in regression analysis (see Estimation in this paper) the APE should be

close or equal to 0. However, even with an APE of 0 a model could still have large

estimation errors if they offset each other.

          A better estimate of the prediction accuracy is the MAPE since it is a measure that

allows the modeler to asses the error as a relative measure (percentage) of the real or

observed value. The MAPE is calculated by averaging the absolute percentage of error

    | y− y|
         ˆ
(           .100 ) of each pair of predicted/observed data points in the validation sub-
       y

sample. It is important to notice that if data outside the range used to estimate the model

are used to predict the outcome of the model, misleading results can occur. This is

especially important when using “non-robust” models like the linear ones where there is


                                               42
no limit to the response of the dependent variable for larger values of the explanatory

ones (Hair, 1998).

       The “postdiciton” procedure described above is an adequate method for testing

the validity of a model, however, “the acid test of the model’s validity still remain with

predictive test into the future” (Leckenby & Wedding, 1982). If the model can fairly or

acceptably predict sales figures which have not yet occurred, then the model is useful and

can be used to solve marketing and advertising problems. A model should always and

continually be checked for its prediction accuracy of future events as data become

available.

                                 Model building summary

       Developing advertising and marketing response models is a fairly structured

process with defined steps. However, model building is an iterative process where the

results of one of the steps could suggest revising previous ones and start the process

again. The model building process also involves subjective judgments form the part of

the modeler as frequent tradeoffs become present and the solutions require judgment and

personal experience. For example, a usual tradeoff that the modeler faces is when in order

to enhance the prediction accuracy of a model he must make important changes to the

specification of the model making it harder to interpret and grasp significant economic

meaning. As Hair said (1998) “Prediction is often maximized at the expense of

interpretation” (p. 161). The important role of the model builder in developing response

functions is what makes it part science and part art.

       Summarizing the steps in model building for marketing decisions, a good model

should first, be specified in accordance to advertising or marketing theory; second,




                                             43
estimated using an adequate estimation technique; third, verified using statistical

significance tests and analysis of residuals to look for violations of the assumptions; and

fourth, validated using postdiction and prediction accuracy tests.

       Sometimes a modeler has competing models that have been verified and validated

and he or she must decide on which one to choose. The principle of parsimony would

suggest him to always pick the simplest one. However, it is sometimes hard to find the

optimal one since there is always a tradeoff involved in selecting a model that is simple

but less accurate and one that is more precise but with increasing complexity. One should

always evaluate the models with the original objective of the model building process in

mind. Why were we building the model in the first place? What do we want to do with

the model? What is the managerial relevance or usefulness of the model? If the answer to

those questions still does not point toward one single model, there are some additional

procedures that can be used to solve the problem of selecting between competing models.

There are informal decision rules like “choose the model with the higher adjusted R 2 ”or

“choose the one that has the least residual sum of square” and formalized decision rules

involving hypothesis testing (Hanssens et al, 2001). The formal decision rules include the

Maximum Likelihood (ML) statistic, Akaike’s Information Criteria (AIC) and Bayesian

information criteria (for details on those tests see Hanssens et al, 2001, p. 230-239).

       Ideally, the model to be chosen should be the one with the higher adjusted R 2 , the

least RSS, statistically significant t values, no autocorrelation and simpler structure.

Fortunately, as Hanssens et al (2001) note: “the consequences in terms of deviation from

the optimal level of discounted profits that arise from misspecifying market response is

usually not great” (p. 239).



                                              44
Once a model has been specified, estimated, verified, validated and compared to

other possible competing models it can be used in decision making for planning future

scenarios, running controlled simulations and deriving economic measures for better

accountability of past actions. The latter is the essence of model building in marketing

and advertising: the better we understand the past the better we will predict the future.

                                        An Example

       In order to illustrate the process of developing marketing and advertising response

models, real data from an important brand in the skin-care market in a Latin American

country was used to build a model.

                                   Specifying the model

       After an initial exploration of the data that included an analysis of the multiple

correlations between several variables and preliminary estimations of very basic response

functions, the following models where specified:

        1. The Linear Current Effects response model:

                MS = a + b1TVR + b2U + b3 RP + b4T + b5 C + b6 M

       Where:

                MS = Market Share

                TVR = TV GRPs

                U = Advertising expenditures for the Umbrella brand

                RP = Relative Price (brand’s price/main competitor’s price)

                T = Trend (linear trend over time)

                C = Total competitors’ advertising expenditures

                M = Magazine advertising expenditures



                                             45
2. The Modified Exponential Current Effects model:

         MS = MS (1 − e a +b1TVR +b2U +b3 RP +b4T +b5C +b6 M )

Where:

         MS = upper bound level or saturation point

         e = a mathematical constant equals to 2.71...16 …

3. The Gompertz Current Effects model

         MS = MSe − e e
                             a b1TVR b2U b3RP b4T b5C b6M
                                    e e      e e e



4. The Linear Partial Adjustment (Nerlove) model:

         MS = a + b1TVRt + b2U t + b3 RPt + b4Tt + b5Ct + b6 M t + b7 MSt −1

5. The Logistic Partial Adjustment (Nerlove) model:

                                                     MS
         MS =              − ( a + b1TVRt + b2U t + b3 RPt + b4Tt + b5Ct + b6 M t + b7 MS t −1 )
                  (1 + e                                                                           )

6. The Gompertz Partial Adjustment (Nerlove) model:

         MS = MSe − e e
                             a b1TVRt b2U t b3RPt b4Tt b5Ct b6M t b7 MSt −1
                                     e     e     e e e           e



7. The Modified Exponential Partial Adjustment (Nerlove) model:

         MS = MS (1 − e a +b1TVRt +b2U t +b3 RPt +b4Tt +b5Ct +b6 M t +b7 MSt −1 )

8. The Linear Adstock model:

         MS = a + b1 Adstock + b2U + b3 RP + b4T + b5C + b6 M

Where:

         Adstock = TV GRPs Adstock




                                                              46
9. The Logistic Adstock model:

                                                    MS
                MS =             − ( a + b1 Adstock + b2U + b3 RP + b4T + b5C + b6 M )
                        (1 + e                                                           )

        10. The Gompertz Adstock model

                MS = MSe − e e
                                  a b1Adstock b2U b3RP b4T b5C b6M
                                             e e      e e e



        11. The Modified Exponential Adstock model:

                MS = MS (1 − e a +b1 Adstock +b2U +b3 RP +b4T +b5C +b6 M )

        All the above models assume independent effects of the explanatory variables.

For example, the Linear CE model (number 1.) assumes that the market share for the

brand is a constant, plus the effect of TV GRPs, plus the effect of the advertising

expenditures on the umbrella or family brand, plus the effect of the price relative to the

main competitor, plus a trend in time, plus (minus) the effect of the sum of all

competitors’ advertising expenditures, plus the advertising expenditures of the brand in

magazines.

        The assumption about independent effects means there is no interactions between

the variables, for example between TV advertising and magazines advertising. This might

not be true in reality, in consequence, some models that assumed such interactions where

estimated but failed to deliver satisfactory results and no significant interactions were

identified.

        It is important to notice that a trend in the data was incorporated into the model in

order to gain more predictive power. However, as the quote says: “a trend in a model is a

factor you forgot to include in the explanatory consideration set”. Considering that

usually not all the data are available, adding a trend component is a partial solution to



                                                                  47
lack of information and helps sometimes enhancing the model’s fit and its prediction

accuracy. However, as discussed earlier, there is usually a trade-off between prediction

accuracy and explanatory power. Trend components in models should be avoided if there

is no important improvement in the capacity of the model to make fair estimations of the

observed data. Knowing when to include or exclude a trend is part of the art of modeling.

       Other models where also specified but where discarded early in the process

because they failed to fairly represent the relationship between advertising and market

share for the brand. For example, the univariate Koyck Geometric Distributed Lag

(GL) model:

                MSt = a(1 − c) + bTVRt + cSt −1 + {ut − cut −1}

and the univariate Geometric Lag Autoregressive (GLA) model:

                MSt = a + b1TVR1 − b1ρTVRt −1 + (c + ρ ) MSt −1 − cρMSt − 2 + {ut − cut −1}

failed to deliver satisfactory results. This occurred mainly because they used only one

explanatory variable that, alone, seems not to contribute much on explaining the market

share variance for this particular brand.

                                     Estimating the model

       Once specified, the above models (number 1 to 11), where estimated using

Ordinary Least Squares. Table 4 shows the parameter estimates for the Current Effects

functions and their derived statistics. Table 5 and table 6 show the same information for

the Nerlov Partial Adjustment models and the Adstock models.




                                               48
Table 4.

Current Effects models’ statistics and parameter estimates




                                   Current Effects Models
                                unstandarized
             Model                                t                 Rsq     Adj. Rsq   DW***     RSS
                                 coefficients
                              a   = 4.000368      4.62 *
                              b1 = 0.001629       4.45 *
                              b2 = 0.000112       6.08 *
  1. Linear                   b3 = 0.036254       4.12 *            93%       91%       2.629     1.54
                              b4 = 0.075848       6.08 *
                              b5 = -0.000011 -2.76 *
                              b6 = 0.000232       1.11
                              a   = -0.149235 -1.10
                              b   = -0.000258 -4.51 *
                              b2 = -0.000017 -5.90 *
  2. Modified Exponential**** b3 = -0.005595 -4.08 *                92%       89%       2.655     1.76
                              b4 = -0.010991 -5.65 *
                              b5 = 0.000001       1.98 *
                              b6 = -0.000018 -0.55
                              a   = 0.469480      1.48
                              b1 = -0.000587 -4.40 *
                              b2 = -0.000038 -5.70 *
  3. Gompertz*****            b3 = -0.012564 -3.91 *                91%       88%       2.688     1.90
                              b4 = -0.023992 -5.27 *
                              b5 = 0.000003       1.71 **
                              b6 = -0.000030 -0.40
  Sample Size = 23
  *p < .05 **p < .15
  *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05
  ****Upper Bound =15
  *****Upper Bound =12




                                                   49
Table 5.

Partial Adjustment models’ statistics and parameter estimates




                                 Partial Adjustment Models
                                unstandarized
              Model                              t        Rsq               Adj. Rsq   DW***     RSS
                                 coefficients
                              a   = 3.729706     3.88 *
                              b1 = 0.001686      4.43 *
                              b2 = 0.000100      4.08 *
                              b3 = 0.031444      2.80 *
  1. Linear                                               94%                 91%       2.667     1.49
                              b4 = 0.068501      4.17 *
                              b5 = -0.000011 -2.71 *
                              b6 = 0.000243      1.14
                              b7 = 0.095134      0.70
                              a   = -0.126388 -0.83
                              b   = -0.000262 -4.37 *
                              b2 = -0.000016 -4.11 *
                              b3 = -0.005189 -2.92 *
  2. Modified Exponential****                             92%                 89%       2.654     1.74
                              b4 = -0.010371 -4.00 *
                              b5 = 0.000001      1.92 *
                              b6 = -0.000019 -0.56
                              b7 = -0.008031 -0.38
                              a   = -1.020810 -3.86 *
                              b1 = 0.000461      4.40 *
                              b2 = 0.000028      4.09 *
                              b3 = 0.008621      2.78 *
  3. Logistic****                                         93%                 90%       2.668     1.52
                              b4 = 0.018575      4.11 *
                              b5 = -0.000003 -2.63 *
                              b6 = 0.000064      1.09
                              b7 = 0.024295      0.65
                              a   = 0.405843     2.01 *
                              b   = -0.000353 -4.41 *
                              b2 = -0.000021 -4.11 *
                              b3 = -0.006763 -2.86 *
  4.Gompertz****                                          93%                 90%       2.667     1.59
                              b4 = -0.014045 -4.07 *
                              b5 = 0.000002      2.30 *
                              b6 = -0.000038 -0.84
                              b7 = -0.015138 -0.53
  Sample Size = 23
  *p < .05 **p < .15
  *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05
  ****Upper Bound =15




                                                   50
Table 6.

Adstock models’ statistics and parameter estimates




                                      Adstock Models
                               unstandarized
             Model                              t                  Rsq     Adj. Rsq   DW***     RSS
                                coefficients
                             a   = 4.069391     4.89 *
                             b1 = 0.002388      4.72 *
                             b2 = 0.000082      4.67 *
 1. Linear                   b3 = 0.036416      4.28 *             94%       91%       2.223     1.44
                             b4 = 0.067203      5.67 *
                             b5 = -0.000013 -3.39 *
                             b6 = 0.000325      1.71 **
                             a   = -0.933690 -4.10 *
                             b   = 0.000657     4.75 *
                             b2 = 0.000022      4.67 *
 2. Logistic****             b3 = 0.009899      4.26 *             94%       91%       2.218     1.47
                             b4 = 0.018084      5.58 *
                             b5 = -0.000003 -3.31 *
                             b6 = 0.000086      1.66 **
                             a   = 0.347017     1.98 *
                             b1 = -0.000501 -4.71 *
                             b2 = -0.000017 -4.54 *
 3. Gompertz*****            b3 = -0.007556 -4.22 *                93%       91%       2.161     1.60
                             b4 = -0.013388 -5.37 *
                             b5 = 0.000002      2.95 *
                             b6 = -0.000056 -1.40
                             a   = -0.162484 -1.23
                             b   = -0.000372 -4.62 *
                             b2 = -0.000012 -4.38 *
 4. Modified Exponential**** b3 = -0.005608 -4.14 *                92%       90%       2.094     1.82
                             b4 = -0.009619 -5.10 *
                             b5 = 0.000002      2.53 *
                             b6 = -0.000034 -1.11
 Sample Size = 23
 Adstock Half Life = 1 period. Carry-over = 33%
 *p < .05 **p < .15
 *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05
 ****Upper Bound =15




                                                   51
Verifying the model

       As table 4, 5, and 6 show the R 2 and adjusted R 2 of all the models is 90% or

above. That means they all explain at least 90% of the variance of the market share of the

brand in the period analyzed. The adjusted R 2 is more useful for comparing the CE and

Adstock models with the PA models since the Partial Adjustment models include an

additional lag parameter and the R 2 is sensible to the number of variables in the model.

       Other way of measuring the ability of the models to fairly represent the

relationship between the explanatory variables and the dependent one is by analyzing the

model fit to the data in the sample. The Residual Sum of Squares (RSS) delivers a direct

measure of the “unfitness” of the model. The estimated models show small RSS varying

from 1.44 to 1.90.

       The Durbin-Watson statistic shows that none of the estimated models show

significant autocorrelations. This is especially important if we desire that the estimation

process delivers unbiased and statistically significant parameter estimates.

       As discussed under the verification section in the first chapter, the estimation

process should deliver statistically significant parameter estimates so the modeler could

project the model beyond the data sample. In other words, the parameter estimates should

have a value different form zero meaning that their associated variables have a real effect

in the dependent variable. The estimated models vary in this criterion since not all of

them have all statistically significant coefficients or parameter estimates. Actually, just

the Adstock Linear and the Adstock Logistic model have statistically significant b6

coefficient. Interestingly, the parameters corresponding to the lagged variable ( b7 ) in the

Nerlove PA models are not statistically significant. This means that these PA models



                                             52
actually reduce to the Current Effects ones since the only difference between them is the

additional lagged variable.

       Another very interesting result is that the coefficient for the relative price is

positive. Since the variable was defined as the ratio between the brand’s price and the

main competitor’s price (brand’s price/main competitor’s price) it is surprising to realize

that, at least for the data analyzed, the highest the ratio the highest the market share all

else being equals. Since the parameter’s sign is consistent across all models it should not

be discarded. There are situations in which raising the price actually raise the demand of

the product because it acts as a clue that signals good quality. This phenomenon has been

detected in many specialty products, including beauty products (Kotler, 1971). The brand

is a competitive brand in the “wrinkle prevention” market, a highly specialized category

driven mainly by research and product innovation. It is not unlikely that this is one of

those special cases where the relation between price and demand is reversed. The brand

use to have a lower price compared to its main competitor but it seems that the closer the

price of the brand to the price of its main competitor the higher the demand for the brand.

This result should be taken with caution and would apply probably only for the data

range analyzed (min = 51.5; max = 101.7; mean = 76.13; std. deviation = 11.43).

       In order to check for violations of the OLS assumptions residuals’ scatter plots of

the best four models (CE Linear model, PA Linear model, Adstock Linear model and

Adstock Logistic model) where analyzed. Figure 8 shows the scatter plots of the

studentized residuals vs. the actual market share values for the four models. No

systematic pattern is observed for any of the models analyzed showing that no

fundamental assumption was violated. However, some outliers can be recognized,




                                              53
especially two outliers for the adstock models. The treatment of outliers is controversial

(Hair, 1998) but a careful analysis should be provided in order to asses their impact on

the overall performance of the model. We will discuss this latter.




Figure 8. Scatter plots of the studentized residuals vs. the actual market share values for

the CE Linear model, PA Linear model, Adstock Linear model and Adstock Logistic

model.

         Ideally the best model should have all statistically significant coefficients, no

autocorrelation, the highest R 2 or adjusted R 2 and the lowest RSS. However, not always

all of these criteria can be found in one single model as it is the case for the Adstock




                                               54
Linear model in our example. Additionally, the best model is not really identified until

the acid test is performed. So before selecting a single model the best ones should be

validated using the prediction/postdiction procedure.

                                      Validating the model

       The best competing models (CE Linear model, PA Linear model, Adstock Linear

model and Adstock Logistic model) were selected to be validated using a subset of the

sample.

       The sample of data was split into two subsets: one with the first 20 observations

to estimate again the parameters of the model and the other one with the last 3 to be

predicted/postdicted by the model. The Mean Absolute Percentage of Error (MAPE) was

used to compare the prediction ability of the models. Table 7 shows the results and all the

statistics for the selected models.

       All the models have MAPEs below 3,5% which means that they all can make

accurate predictions of future outcomes. However, the Adstock models clearly

outperform the CE and PA linear models. The principle of parsimony would suggest

choosing the simplest model between two competing ones. The MAPE criteria as well as

all the other criteria also point the Linear Adstock model as the winner. Figure 9 shows

the modeled market share versus the actual market share for the Adstock Linear Model.




                                              55
Table 7.

Best models comparison



                                                Best Models
                                  unstandarized
             Model                                   t           Rsq    Adj. Rsq    DW***    RSS         MAPE
                                   coefficients
                                a   = 4.000368      4.62 *
                                b1 = 0.001629       4.45 *
                                b2 = 0.000112       6.08 *
  1. CE Linear                  b3 = 0.036254       4.12 *       93%      91%       2.629    1.54        2.99%
                                b4 = 0.075848       6.08 *
                                b5 = -0.000011 -2.76 *
                                b6 = 0.000232       1.11
                                a   = 3.729706      3.88 *
                                b   = 0.001686      4.43 *
                                b2 = 0.000100       4.08 *
                                b3 = 0.031444       2.80 *
  2. PA Linear                                                   94%      91%       2.667    1.49        3.01%
                                b4 = 0.068501       4.17 *
                                b5 = -0.000011 -2.71 *
                                b6 = 0.000243       1.14
                                b7 = 0.095134       0.70
                                a   = 4.069391      4.89 *
                                b1 = 0.002388       4.72 *
                                b2 = 0.000082       4.67 *
  3. Adstock Linear             b3 = 0.036416       4.28 *       94%      91%       2.223    1.44        1.65%
                                b4 = 0.067203       5.67 *
                                b5 = -0.000013 -3.39 *
                                b6 = 0.000325       1.71 **
                                a   = -0.933690 -4.10 *
                                b   = 0.000657      4.75 *
                                b2 = 0.000022       4.67 *
  4. Adstock Logistic****       b3 = 0.009899       4.26 *       94%      91%       2.218    1.47        1.71%
                                b4 = 0.018084       5.58 *
                                b5 = -0.000003 -3.31 *
                                b6 = 0.000086       1.66 **
  Sample Size = 23 (note: the MAPE was calculated using parameter estimates from a sample data of 20)
  Adstock Half Life = 1 period. Carry-over = 33%
  *p < .05 **p < .15
  *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05.
  ****Upper Bound =15




                                                     56
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.
Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.

Weitere ähnliche Inhalte

Ähnlich wie Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.

Where we are with marketing ROI measurement
Where we are with marketing ROI measurement Where we are with marketing ROI measurement
Where we are with marketing ROI measurement Michael Wolfe
 
Chapter 17 designing and managing integrated marketing zhao
Chapter 17 designing and managing integrated marketing  zhaoChapter 17 designing and managing integrated marketing  zhao
Chapter 17 designing and managing integrated marketing zhaoyang zhao
 
New product development strategy of samsung
New product development strategy of samsungNew product development strategy of samsung
New product development strategy of samsunghiteshkrohra
 
Shows approach which expands the breadth of what marketing-mix models c
Shows approach which expands the breadth of what marketing-mix models cShows approach which expands the breadth of what marketing-mix models c
Shows approach which expands the breadth of what marketing-mix models cMichael Wolfe
 
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number Module Title
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number  Module TitleASSIGNMENT PROJECT FRONT SHEET CIM Membership Number  Module Title
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number Module TitleAudrey Britton
 
Core Concepts of MarketingThis book is licensed under .docx
Core Concepts of MarketingThis book is licensed under .docxCore Concepts of MarketingThis book is licensed under .docx
Core Concepts of MarketingThis book is licensed under .docxfaithxdunce63732
 
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...Media Needle
 
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSES
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSESCRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSES
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSESTushar Dalvi
 
Core concepts-of-marketing
Core concepts-of-marketingCore concepts-of-marketing
Core concepts-of-marketingbjahboi
 
Core concepts-of-marketing
Core concepts-of-marketingCore concepts-of-marketing
Core concepts-of-marketingbjahboi
 
SMCG for Management Consultants and Business Analysts
SMCG for Management Consultants and Business AnalystsSMCG for Management Consultants and Business Analysts
SMCG for Management Consultants and Business AnalystsAsen Gyczew
 
National Safety Council (2009). Supervisors safety manual (10t.docx
 National Safety Council (2009). Supervisors safety manual (10t.docx National Safety Council (2009). Supervisors safety manual (10t.docx
National Safety Council (2009). Supervisors safety manual (10t.docxaryan532920
 
Phil Shaps E Book Optimal Variation For Lead Generation
Phil Shaps E Book Optimal Variation For Lead GenerationPhil Shaps E Book Optimal Variation For Lead Generation
Phil Shaps E Book Optimal Variation For Lead GenerationPhilShaps
 
Economics_and_Management_Decisions.docx
Economics_and_Management_Decisions.docxEconomics_and_Management_Decisions.docx
Economics_and_Management_Decisions.docxManojMba2
 

Ähnlich wie Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process. (20)

Where we are with marketing ROI measurement
Where we are with marketing ROI measurement Where we are with marketing ROI measurement
Where we are with marketing ROI measurement
 
Book robert c blattberg sales promotion models
Book robert c  blattberg sales promotion modelsBook robert c  blattberg sales promotion models
Book robert c blattberg sales promotion models
 
Chapter 17 designing and managing integrated marketing zhao
Chapter 17 designing and managing integrated marketing  zhaoChapter 17 designing and managing integrated marketing  zhao
Chapter 17 designing and managing integrated marketing zhao
 
New product development strategy of samsung
New product development strategy of samsungNew product development strategy of samsung
New product development strategy of samsung
 
Shows approach which expands the breadth of what marketing-mix models c
Shows approach which expands the breadth of what marketing-mix models cShows approach which expands the breadth of what marketing-mix models c
Shows approach which expands the breadth of what marketing-mix models c
 
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number Module Title
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number  Module TitleASSIGNMENT PROJECT FRONT SHEET CIM Membership Number  Module Title
ASSIGNMENT PROJECT FRONT SHEET CIM Membership Number Module Title
 
240CoachFinalProduct
240CoachFinalProduct240CoachFinalProduct
240CoachFinalProduct
 
Core Concepts of MarketingThis book is licensed under .docx
Core Concepts of MarketingThis book is licensed under .docxCore Concepts of MarketingThis book is licensed under .docx
Core Concepts of MarketingThis book is licensed under .docx
 
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
 
EMDT_1
EMDT_1EMDT_1
EMDT_1
 
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSES
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSESCRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSES
CRITICAL ANALYSIS OF EFFECTIVE INTERNET ADVERTISING DONE BY SMALL BUSINESSES
 
Core concepts-of-marketing
Core concepts-of-marketingCore concepts-of-marketing
Core concepts-of-marketing
 
Core concepts-of-marketing
Core concepts-of-marketingCore concepts-of-marketing
Core concepts-of-marketing
 
SMCG for Management Consultants and Business Analysts
SMCG for Management Consultants and Business AnalystsSMCG for Management Consultants and Business Analysts
SMCG for Management Consultants and Business Analysts
 
MMA-MM-Overview (1).ppt
MMA-MM-Overview (1).pptMMA-MM-Overview (1).ppt
MMA-MM-Overview (1).ppt
 
MMA Overview.ppt
MMA Overview.pptMMA Overview.ppt
MMA Overview.ppt
 
National Safety Council (2009). Supervisors safety manual (10t.docx
 National Safety Council (2009). Supervisors safety manual (10t.docx National Safety Council (2009). Supervisors safety manual (10t.docx
National Safety Council (2009). Supervisors safety manual (10t.docx
 
Phil Shaps E Book Optimal Variation For Lead Generation
Phil Shaps E Book Optimal Variation For Lead GenerationPhil Shaps E Book Optimal Variation For Lead Generation
Phil Shaps E Book Optimal Variation For Lead Generation
 
Economics_and_Management_Decisions.docx
Economics_and_Management_Decisions.docxEconomics_and_Management_Decisions.docx
Economics_and_Management_Decisions.docx
 
Sem2 mba springassignments
Sem2 mba springassignmentsSem2 mba springassignments
Sem2 mba springassignments
 

Mehr von Esteban Ribero

Conjoint analysis with mcmc
Conjoint analysis with mcmcConjoint analysis with mcmc
Conjoint analysis with mcmcEsteban Ribero
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modelingEsteban Ribero
 
Consumer Segmentation with Bayesian Statistics
Consumer Segmentation with Bayesian StatisticsConsumer Segmentation with Bayesian Statistics
Consumer Segmentation with Bayesian StatisticsEsteban Ribero
 
Modeling Sexual Selection with Agent-Based Models
Modeling Sexual Selection with Agent-Based ModelsModeling Sexual Selection with Agent-Based Models
Modeling Sexual Selection with Agent-Based ModelsEsteban Ribero
 
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...Esteban Ribero
 
Is looking at consumers' brain the ultimate solution?
Is looking at consumers' brain the ultimate solution?Is looking at consumers' brain the ultimate solution?
Is looking at consumers' brain the ultimate solution?Esteban Ribero
 

Mehr von Esteban Ribero (8)

Conjoint analysis with mcmc
Conjoint analysis with mcmcConjoint analysis with mcmc
Conjoint analysis with mcmc
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
Consumer Segmentation with Bayesian Statistics
Consumer Segmentation with Bayesian StatisticsConsumer Segmentation with Bayesian Statistics
Consumer Segmentation with Bayesian Statistics
 
Modeling Sexual Selection with Agent-Based Models
Modeling Sexual Selection with Agent-Based ModelsModeling Sexual Selection with Agent-Based Models
Modeling Sexual Selection with Agent-Based Models
 
The Learning Lab
The Learning LabThe Learning Lab
The Learning Lab
 
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...
ARF RE:THINK 2005. The Extension of The Concept of Brand to Cultural Event Ma...
 
Is looking at consumers' brain the ultimate solution?
Is looking at consumers' brain the ultimate solution?Is looking at consumers' brain the ultimate solution?
Is looking at consumers' brain the ultimate solution?
 

Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process.

  • 1. Copyright by Esteban Ribero 2005
  • 2. Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process By Esteban Ribero, B.A. Report Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Master of Arts The University of Texas at Austin December, 2005
  • 3. Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process APPROVED BY SUPERVISING COMMITTEE: __________________________ John D. Leckenby __________________________ Gary B. Wilcox
  • 4. Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process Esteban Ribero, M.A. The University of Texas at Austin, 2005 SUPERVISOR: John D. Leckenby This report presents a description and a complete example of the modeling process required to build a comprehensive market response model that would account for the impacts of previous marketing actions on sales in order to make better and more informed decisions that would help solve some advertising and marketing management problems. Real marketing and sales data of a big competitor in the skin-care market of a Latin American country was analyzed using multivariate regression analysis of time- series. The report presents a full description and an example of the four major steps required to build a market response model: specification, estimation, verification and prediction. The model developed was used then to measure the ROI of the different marketing actions developed during the time period analyzed. A market share decomposition analysis as well as other analysis was provided in order to quantify the direction and power of the impact of the market share drivers. The model was also used to simulate two slightly different scenarios as an attempt to illustrate the “what-if process” that can be done using a market response model suggesting different marketing and media strategies for the brand. iv
  • 5. Table of Contents List of tables…………………………………………………………………………….vii List of figures……………………...……………………………………………………viii Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process……………………………………1 The Eras of Marketing Modeling………………………………………………….5 The Modeling Process……………………………………………………………………..7 Specification………………………………………………………………………9 The modeler’s toolbox…………………………………………………...13 Current effects functional forms…………………………………13 Lagged advertising effects……………………………………….18 Modeling with adstock………………………………………...…23 Estimation………………………………………………………………………..24 Ordinary Least Squares…………………………………………………..25 Generalized Least Squares……………………………………………….30 Nonlinear Least Squares…………………………………………………32 Maximum Likelihood…………………………………………………....33 Verification………………………………………………………………………34 Prediction………………………………………………………………………...41 Model building Summary………………………………………………………..43 An Example……………………………………………………………………………...45 Specifying the model…………………………………………………………….45 Estimating the model……………………………………………………………48 v
  • 6. Verifying the model.……………………………………………………………52 Validating the model……………………………………………………………55 Using the model………………………………………………………………………...60 Summary………………………………………………………………………………..69 References………………………………………………………………………………70 Vita……………………………………………………………………………………..72 vi
  • 7. List of Tables Table 1…………………………………………………………………………………...24 Table 2…………………………………………………………………………………...39 Table 3…………………………………………………………………………………...40 Table 4…………………………………………………………………………………...49 Table 5…………………………………………………………………………………...50 Table 6…………………………………………………………………………………...51 Table 7…………………………………………………………………………………...56 Table 8…………………………………………………………………………………...58 Table 9…………………………………………………………………………………...65 vii
  • 8. List of Figures Figure 1…………………………………………………………………………………....8 Figure 2…………………………………………………………………………….….....11 Figure 3…………………………………………………………………………….….....12 Figure 4…………………………………………………………………………….….....12 Figure 5…………………………………………………………………………….….....18 Figure 6…………………………………………………………………………….….....27 Figure 7…………………………………………………………………………….….....37 Figure 8…………………………………………………………………………….….....54 Figure 9……………………………………………………………………………...….. 57 Figure 10………………………………………………………………………………....59 Figure 11………………………………………………………………………………....61 Figure 12………………………………………………………………………………....64 Figure 13………………………………………………………………………………....68 viii
  • 9. Brand Communications Modeling: Developing and Using Econometric Models in Advertising. An Example of a Full Modeling Process The way advertising is planned and executed is changing. The media landscape has been changing at an impressive rate. The development of new technologies has made possible the emergence of new and multiple media. The fragmentation of media channels, the decreasing audience’s size of traditional media and the empowerment of consumers create a new set of rules for marketing and advertising mangers who want to succeed in the increasing competitive landscape. Within this framework to be accountable is no more a desire, it is a need. The famous statement attributed to John Wanamaker is more relevant now than ever: “I know half of my advertising budget is wasted. The problem is I don’t know which half”. Finding which one is what we need now. And this is applicable not only to advertising but to all marketing activities. Being able to fully understand the effects of the different marketing policy instruments on sales should be a regular practice for marketing and advertising mangers. Fortunately with today’s improvement in data collection and statistical analysis’ techniques it is possible to address the problem in a scientific, yet subjective, manner. As we will se, the use of mathematical models to help marketers and advertising professionals to solve management problems is not new. However, the recent use of econometric modeling in the advertising industry is becoming an important activity and more and more companies are using the technique to improve their decision making 1
  • 10. process. “Econometrics buzzes ad world as a way of measuring results” claimed a recent article in the Wall Street Journal (Patrick, 2005). The article mentioned the recent raise on the number of employees working on econometric models in the advertising industry. For example, WPP’s MindShare has increase the number of people doing econometric modeling from 20 to 150 in just 5 years. Omnicom’s OMD has its own business unit (OMD Metrics) dedicated to built econometric models for their international and local clients, and its staff members have increase from 6 to 45 in the past three years. Why is it so important to use formalized models in an industry that has been traditionally reluctant to scientific scrutiny? Well, the game has changed: The proliferation of options to promote the sales of a brand and the pressure for accountability is demanding more measurable results for the advertising industry. The pressure to come up with ways to show which ads and media strategy boost sales of a product is the driving force of this new interest in econometric modeling. There are many benefits of using formalized models to solve complex problems like the ones one might encounter in marketing and advertising. John Sterman, an MIT professor dedicated to the use of formalized model to improve our ability to comprehend and manage complex systems, discuses the advantages of using formalized models versus mental models. Following Sterman (1992), mental models have some advantages: they are flexible, take a wide range of information into account, can be adapted to new situations and are updated with new information. But mental models also have great disadvantages: they are not explicit, not easily examined by others. Their assumptions are hard to discuss, even for our own mental models. But the most important problem with mental models is that our rationality is bounded: The best-intentioned mental analysis of 2
  • 11. a complex problem cannot hope to account accurately for the effects of all the interactions between the variables, especially if those interactions are nonlinear. In the other hand, formal models’ assumptions can be discussed openly. Formal models are able to relate many factors simultaneously and can be simulated under controlled conditions, allowing analysts to conduct experiments which are not feasible in the real world. This does not mean that formal models are correct. All models are wrong (Sterman, 2002): they represent the reality, they are not the reality. But formalized models can help us to understand the systems we work in and for. Advertising and marketing managers can greatly be benefited by using models to solve important problems. For example, the use of econometric models can help a manger to find the optimal or near optimal advertising budget for future periods. The analysis would allow him or her to find the adequate advertising budget for attaining a specific sales goal or, if financial information is available, the model can incorporate short-term and long-term criteria to maximize profit. (To see some examples, visit the following http addresses: http://www.ciadvertising.org/sa/spring_05/adv391k/eribero/frameset.htm http://www.ciadvertising.org/sa/spring_05/adv391k/eribero/Solo2/frameset.htm ). Other applications of the modeling process could help managers to answer the following questions: • What is the optimal mix of TV vs. Posters vs. Radio? • What happens to sales when we obtain a wider distribution? • What happens to sales when we do not advertise? • How much should we spend on advertising vs. promotion? 3
  • 12. What is the best pattern and level of advertising for my brand? • How effective is our pricing strategy? • Which competitors hurt my brand and how? • Which of my communications channels offers best value for money? • How does advertising work and how can we prove this to the Financial Director? • How do I spend the same budget but increase sales? • What’s the impact of economic changes on my brand? • What’s the best pattern and level of advertising for my brand? • Which copy strategy/campaign worked better? • How much sales could we make next period with X budget? Besides these direct practical applications for budgeting, forecasting and accountability the modelling process would improve the manager’s ability to cope with his complex environment. Leeflang, Wittink, Wedel & Naert (2000, p. 25-27) lists 8 possible indirect benefits of using models in business. The benefits are described as follows: 1. “A model would force him [a manger] to explicate how the market works. This explication alone will often lead to an improved understanding of the role of advertising and how advertising effectiveness might depend on a variety of other marketing and environmental conditions.” 2. “Models may work as problem-finding instruments. That is, problems may emerge after a model has been developed. Managers may identify problems by discovering differences between their perception of the environment and a model of that environment.” 3. “Models can be instrumental in improving the process by which decision-makers deal with existing information” 4
  • 13. 4. “Models can help managers decide what information should be collected. Thus models may lead to improved data collection, and their use may avoid the collection and storage of large amounts of data without apparent purpose. 5. “Models can also guide research by identifying areas in which information is lacking, and by pointing out the kinds of experiments that can provide useful information.” 6. “[A] model helps the manager to detect a possible problem more quickly, by giving him an early signal that something outside the model has happened”. 7. “Models provide a framework for discussion. If a relevant performance measure (such as market share) is decreasing, the model user may be able to defend himself to point to the effects of changes in the environment that are beyond his control, such as new product introductions by the competition. Of course, a top manager may also employ a model to identify poor decisions by lower-level managers.” 8. “Finally, a model may result in a beneficial reallocation of management time, which means less time spent on programmable, structured, or routine and recurring activities, and more time on less structured ones.” The Eras of Marketing Modeling As Leckenby and Wedding said (1982), “the concept of model building in advertising can be traced back only as far as the early 1950’s”. Even though it is a relative short history, Leeflang et al (2000) identified five eras of model building in marketing. The first era is characterized by the emulation or transposition of Operational Research and Management Science into the marketing framework. The OS/MS tools that included mathematical programming, computer simulations, game theory, and dynamic modeling were initially developed to solve some of the strategic problems faced during World War II. The emphasis was on quantitative method sophistication rather than on the marketing problem per se (Leckenby & Wedding, 1982). The advertising and marketing 5
  • 14. problem was adjusted to fit the requirements of the technical methods available, rather than the other way around. The methods were typically not realistic, and the use of those methods in marketing applications was therefore very limited (Leeflang et al, 2000). The second era which ended in the late sixties early seventies was characterized by the attempt to adapt the models to fit the marketing problems in order to overcome the misuse of the OR approach in the advertising and marketing field. The models were however so complex that lacked usability. The third era that started around 1970, showed and increased emphasis on models that were good representations of reality and at the same time easier to use. John D.C. Little developed the concept of “Decision Calculus”. He used the term to describe models that would process judgments and data in a manner which would assist the manager in decision making (Leckenby and Wedding, 1982). This emphasis in helping decision making made a major change in the direction of model building in advertising. Little (1970) suggested possible answers to the question of why models were not used: good models and parameterization is hard to find; managers do not understand models; and models are incomplete. So in order to overcome such problems a model should be: simple; robust; easy to control; adaptive; complete on important issues; and easy to communicate with. He also said that a model should be evolutionary (Little, 1975) meaning that a model should start with a simple structure, to which detail is added latter. The use of judgmental data as well as objective data in the model building process helped the raise of models implementation (Leeflang et al, 2000). Even though the third era of modeling in marketing and advertising was focused on implementation and usability of models it was not really until the fourth era (starting 6
  • 15. in the mid 1980) when models were actually implemented (Leeflang et al, 2000). The main factor that helped this implementation boom was the availability of precise marketing data coming from scanning equipment that captured in-store and household- level purchases. This era coincided with the proliferation of marketing support systems. The fifth era may be characterized by an increase in routinized model applications. It is predicted that in the coming decades the age of marketing decision support will usher in an era of marketing decision automation (Leeflang et al, 2000; Bucklin et al, 1998). It is expected that marketing support systems take care of routine marketing decisions like assortment decisions and shelf space allocation, customized product offerings, coupon targeting, loyalty reward and frequent shopper club programs, etc. The focus of this paper is in the model building process representative of the third and fourth era. The Modeling Process The model building process for any mathematical model, including response models, is supposed to follow a sequence of steps. The traditional view assumes the following four steps: specification, estimation, verification and prediction (Leckenby & Wedding, 1982; Leeflang et al, 2000). Leeflang et al (2000) propose an alternative sequence more focused on implementation (see figure 1; for a detailed explanation of the implementation view see Leeflang et al, 2000, chapter 5). In order to keep it as simple as possible we are focusing on the traditional view. 7
  • 16. Figure 1. The implementation view on model building. (From Leeflang et al, 2000, p. 52) 8
  • 17. Specification “A model is a representation of the most important elements of a perceived realworld system.” (Leeflang et al, 2000) In order to better understand the model building process and especially the specification stage is important to analyze the definition provided above. The definition indicates that models are representations, “simplified pictures” (Leeflang et al, 2000) of reality. Those representations may be useful for decision makers trying to understand the reality they deal with. The definition above has an extremely important implication: since a model is a representation of a perceived realworld it is something subjective. Different model builder could have different perceptions and interpretations about the same reality. Modelers could also have different opinions about which are “the most important elements” to represent. This makes the model building process not only more interesting but very dependent on the modeler’s “theory” of the reality he tries to represent. That is why it is so important in the model building process to fully specify the variables and the relationship between them. That is exactly what is done in the specification stage. For example, if we consider sales as the dependent variable and advertising and the rest of the marketing policy instruments as the independent variables, specification would be the process of deciding upon the functional form which will describe the relationship between advertising (and the other marketing variables) and sales (Leckenby & Wedding, 1982). In other words: “specification is the process by which the manager’s theory of how advertising works for a particular brand or company is put into testable form” (Leckenby & Wedding, 1982, p. 257). 9
  • 18. Rephrasing Little’s suggestions for building good models (Little 1970), a model should be: a. simple; b. complete on important issues; c. adaptive; d. robust. Leeflang et al, (2000) pointed that it is easy to see that some of these criteria are in conflict. They state that “none of the criteria should be pushed to the limit. Instead, we can say that the more each individual criterion is satisfied, the higher the likelihood of model acceptance” (Leeflang et al, 2000, p. 53) While specifying a model one should then consider these elements. As a goal, models should be as simple as possible. That is, considering the principle of parsimony, one should choose between competing models the one that fairly represent the reality with the simplest structure. Equally important is to consider the trade-off between accuracy and usability. It is not uncommon to find two competing models that perform differently in these two criteria. If accurate forecasting is more important than understanding the effects of the independent variables then a more accurate model should be chosen even though it might be more complex and then less easy to explain and use. But if it is more important to understand the market dynamics and the way the marketing variables affect sales a simpler model should be used. Fortunately for modelers they are several functional forms to choose from while specifying a model. The one to be selected depends on the above criteria as well as on the 10
  • 19. underlying theory of marketing and advertising that the manager or modeler is considering. We first will consider the different shapes that a response function might have. Then we will describe some of the most used response functions in advertising. The shapes of a response function could be classified as linear, concave or s- shape. Any other shape could be the result of a combination of one or more of these shapes. Figure 2 shows a typical linear response. Figure 3 shows different concave response shapes and figure 4 shows some s-shape functions. Q A Figure 2. A linear shape function 11
  • 20. Figure 3. Some concave response functions Figure 4. Some s-shape response functions 12
  • 21. The modeler’s toolbox “To the craftsman with a hammer, the entire world looks like a nail, but the availability of a screwdriver introduces a host of opportunities!” Lilien & Rangaswamy (1998) Because it is true that one should not modify to problems to fit the tools it is easier for the modeler if he/she can choose from a series of predetermined functions that he/she can then modify to fit the problem. The decision to pick one or the other depends on the problem at hand and the data availability. For example, a linear function (the simplest possible response function) could fit the data pretty well if the data range correspond to a linear section of a more complex response function. (Lilien & Rangaswamy, 1998) The following are some of the most used response functions in advertising. Even though a brief description of the functions is provided, for more details please refer to Hanssens, Parsons & Schultz, 2001; Leeflang et al, 2000; or Kotler, 1971. Current effects functional forms. The simplest response functions, Current Effects Functions (CE), assume that the effects of the marketing variables occur in full in the same period in which they appear. For example, advertising expenditures in April are supposed to affect sales in April and only April. While this might not hold true for most of the brands CE functions are useful for their simplicity and ease to explain. The Linear response model has the following form: S = a + bA + u Where: S = Sales 13
  • 22. a = the y intercept b = slope of the function A = Advertising expenditures u = disturbance term or error term The linear response function assumes constant returns to scale. That is, sales increase by a constant amount to equivalent constant increase in marketing effort (Figure 2). The linear model would not lead to locally different conclusions than another function if the data are available only over a limited range. While adequate for asking “what if” questions around the current operating range, the linear model would be misleading if data outside the range are used like it would be the case in trying to find the optimal advertising effort. More realistic response models are said to have diminishing returns to scale. These models suppose that sales always increase with increases in advertising or marketing effort, but each additional unit of marketing effort brings less in incremental sales than the previous unit did (Hanssens et al, 2001). The following concave downward response functions show diminishing returns to scale: The Semilogarithmic (Log) function: S = a + b ln A + u The Square-root function: S = a+b A +u The Quadratic function: S = a + b1 A − b2 A 2 + u 14
  • 23. The quadratic function has the important property that differentiates it from the others which is that it can represent the concept of supersaturation; phenomenon that occurs when too much marketing effort causes a negative response. The so called “wearout” effect is an example of a case of supersaturation in advertising. The following functions are nonlinear in the variables but linear in parameters and can be linearizables with some algebra in order to be able to estimate them through linear regression (see the section Estimation in this paper): The Power function: a) S = aA b b) ln S = ln a + ln A The power function is very flexible since depending on the value of the parameter b it can take very different forms (see Leeflang et al, 2000 p. 75-76; Kotler, 1971 p. 33) It also has the great characteristic that the coefficient b is actually the elasticity of the demand to advertising (Hanssens, 2001, p. 101, Broadbent, 1997). Also, when more than one independent variable are considered the power function, also known as the multiplicative function, accounts for possible interactions between the independent variables. The Modified Exponential function: a) S = S (1 − e a +bA ) ⎡ S⎤ b) ln ⎢1 − ⎥ = a + bA ⎣ S⎦ Where: S = upper bound level or saturation point 15
  • 24. e = a mathematical constant equals to 2.71...16 … An attractive characteristic of the modified exponential function and some of the next functions as well, is that it supposes an upper limit or saturation point where the market potential reaches its maximum. One special characteristic is that it implies that the marginal sales response will be proportional to the level of untapped potential (Kotler, 1971). All previous functional forms except the linear one are concave downward functions (figure 3). That implies diminishing returns at all points in the response. It is sometimes the desire of the modeler or manager to represent the intuitive concept of a “threshold effect” in advertising. That is, the idea that small doses of advertising does not count for much and that there is a tipping point that must be crossed in order to expect real effects of advertising on sales. Even though there is little evidence that such a phenomenon occurs in advertising (Kotler, 1971; Leckenby & Wedding, 1982; Hanssens 2001) it is possible to represent the concept using s-shape functions (figure 4). These functions assume increasing marginal returns at first and then diminishing marginal returns with respect to various alternative levels of advertising. The following are the most common s-shape functions: The Gompertz function a) S = Se − e e a bA b) ln(ln S − ln S ) = a + bA The Logistic function: S a) S = (1 + e − ( a + bA) ) 16
  • 25. ⎡ S ⎤ b) ln ⎢ = a + bA ⎣S − S ⎥ ⎦ The Lower-Bound Logistic function: S + S LB e a + bA a) S = 1 + e a + bA ⎡ S −S ⎤ b) ln ⎢ ⎥ = a + bA ⎣ S − S LB ⎦ Where: S LB = Lower bound level or minimum sales when advertising is 0. As described above, these functions are just approximations of different “realities” and the modeler can modify them to incorporate other elements to better address the problem at hand. For example, these functions only consider one independent variable and do not account for special situations like seasonality or special events during the period analyzed. The modeler can then add different variables to these functions or use dummy variables to represent qualitative differences or changes in the data (see some examples at Hanssens et al, 2001, p. 97-99). Figure 5 shows some of the functions discussed above. 17
  • 26. Figure 5. Graphical representation of some CE functional forms (from Leckenby & Wedding, 1982). Lagged advertising effects. As discussed earlier, Current Effects response functions assume that the effects of an advertising or marketing expenditure in period t occurs only, and completely, in period t. This assumption does not correspond with common understanding of advertising theory since it is assumed that a big part of advertising effects occur with time. So, in order to accommodate this into advertising response models we need first to discuss some basic concepts about carryover effects. 18
  • 27. Carryover effect is the term used to describe the idea that marketing and advertising expenditures have effects on sales that carries over into future periods (Kotler, 1971). There are two major categories of carryover effects that can be distinguished: the delayed response effect and the customer holdover effect (Leckenby & Wedding, 1982). The delayed response effect develops because delays occur between the time the advertising dollars and programs are implemented and the time the advertising generated purchases occur (Leckenby & Wedding, 1982). There are four types of delayed response effects: Execution delay, noting delay, purchase delay and recording delay. The delay occurs either because executing takes time, consumers do not notice the ads immediately or because they delay the purchase to future periods. The recording delay is a problem with the data and may not represent a real delayed response, just a mismatch between the data (for more detail see Kotler, 1971: Leckenby & Wedding, 1982) The customer holdover effect is clearly explained by Kotler (1971): “suppose that a marketing stimulus is paid for today, appears today, is noted today, and leads to purchase today. No delayed response is involved. The buyer finds the product agreeable and decides to remain with this brand. On this basis it can be said that marketing stimulus this period affected sales this period and for many future periods.” (p. 124) This repurchase scenario suggests that advertising should be credited, in some part, for holding the costumer to the brand in future time periods. Retaining new and possibly old customers in future periods is not the only way a holdover effect can occur. A holdover effect can also occur even if the number of customer does not increase as a result of the advertising expenditure. This can happen when the advertising or other 19
  • 28. marketing stimulus increases the average quantity purchased per period per customer (Kotler, 1971). Regardless the type of carryover that could be present for a brand at a particular time, it is possible to represent it with some dynamic models. To better understand some of these models we will consider the simplest linear model with lagged effects. The model has the following form: S t = a + bAt + bcAt −1 + bc 2 At −2 + .... Where: a = the intercept term b = regression coefficient c = carryover rate or retention rate (0 < c <1) The basic assumption behind this model is that the effect of advertising in period t decays exponentially in subsequent periods. That is, the effect on sales in period t is the result of the advertising in period t plus a fraction of advertising in t-1 plus a fraction of advertising in t-2, etc. The rate of decay, or in other words, the amount of advertising effect that is carried over the immediate next period is the carryover rate (c). Because estimating the parameters on this models requires us to know how many periods we have to look back as well as dealing with autocorrelations (see Estimation in this paper) some modifications done by Koyck and others give us the following lagged effect models: The Koyck Geometric Distributed Lag (GL) model: S t = a(1 − c) + bAt + cS t −1 + {ut − cut −1} Where: 20
  • 29. u t = white noise (disturbance term) c = carryover rate or retention rate (0 < c <1) b = β (1 − c) Short-term effect of advertising b β= Long-term effect of advertising 1− c This model hypothesizes that the effect of advertising conducted in all preceding time periods on current sales period t can be summarized in one term: lagged sales. Sales are then assumed to be a function of advertising and sales in the preceding time period. The model performs well sometimes, however where strong sales trends are noted, the effect of previous time period sales on current sales is so strong that the effect of current advertising on sales can hardly be detected (Leckenby & Wedding, 1982), something not totally in accordance with advertising theory. The Partial Adjustment (PA) model: S t = (1 − ϕ )[a + bAt ] + ϕS t −1 + wt Where: 1 − ϕ = adjustment rate w = white noise The Partial Adjustment model is similar to the Geometric Lag in its structure. It assumes that consumers can only partially adjust to advertising stimulus in the short-term but they will gradually adjust to the desired consumption level, which causes the advertising effect to be distributed over time (Hanssens et al, 2001). Note: The above Partial Adjustment model should not be confused with the Nerlove Partial Adjustment model (Nerlove PA). The latter may not be a carryover effect 21
  • 30. model but it represents the concept of brand loyalty and assumes some inertia from the past. This model could be tried after some unsuccessful attempts with the Current Effects models and before the more complex models of carryover effects. The Nerlove PA functional form is: S t = a + b1 A1 + b2 S t −1 + ut Another carryover effects model similar to GL but with an autoregressive structure is the following: The Geometric Lag Autoregressive (GLA) model: S t = a + b1 A1 − b1 ρAt −1 + (c + ρ ) S t −1 − cρS t − 2 + {u t − cut −1} Where: c = carryover rate or retention rate (0 < c <1) ρ = autocorrelation coefficient The GLA model is a nested model which means that lower-order equations are contained within the parameters of its higher-order structure (Hanssens et al, 2001). For example, where ρ = 0 the GLA becomes GL; where ρ = 0 & c=0 the CE linear model and the special case where ρ = c (≡ ϕ ) the Partial Adjustment model (Hanssens et al, 2001; Leeflang et al, 2000). A modeler should first try some of the CE models, then if after estimating the parameters (see Estimation in this paper), autocorrelation appears he should try i) to add important explanatory (independent) variables or ii) to change model specifications through transformations. If after i) and ii), autocorrelation (the fact that a variable is correlated with itself in previous time periods) remains it may be “true” autocorrelation. That is, a generalized carryover effect so the modeler should specify this autocorrelation 22
  • 31. in the model (Leckenby, personal notes). The Geometric Lag Autoregressive model (GLA) is an example of that process (For others autoregressive models see Hanssens, 2001, cap. 4). It is important to know that these lagged effects response models can also take different functional forms in order to represent diminishing returns to scale or s-shape behavior; pretty much like the Current Effects models discussed earlier. Modeling with adstock. The concept of carryover effect can be modeled either explicitly, as we have seen in the previews models or implicitly using stock variables. The latter approach was championed by Simon Broadbent in several publications (see Broadbent, 1979, 1984, 1997). The basic idea with the creation of stock variables is that they capture the present and past amount of advertising effect for any period into one single value for that specific period. The approach assumes the same geometrical decline in advertising effect as the models presented above. The adstock variable is then just added to the equation like any other independent or explanatory variable. Its key advantage is the ease of communicating results to management and its simpler estimation process since the retention rate can be estimated subjectively using the concept of half-life (HL). Half-life is simple the time it takes for an advertising effort to have half of its effects. Event thought this time can vary from 3 to 10 weeks it tends to be between 4 to 6 weeks (Broadbent, 1984). There is a carryover rate or retention rate (c) associated with every HL value. Table 1 show the retention rate for different half-lives for “first period counts full” convention or “first period counts half” (see Broadbent, 1984; Hanssens et al, 2001 for a discussion on these conventions). To the extend that the adstock approach uses the same 23
  • 32. model of carryover the work is not different than the one resulting from the models that specify the carryover effect explicitly (Hanssens et al, 2001). Table 1. Half-life and retention rate. Half Life 1 2 3 4 5 6 7 8 f=1 0.500 0.707 0.794 0.841 0.871 0.891 0.906 0.917 f = 1/2 0.334 0.640 0.761 0.821 0.858 0.882 0.899 0.912 Half Life 9 10 11 12 13 14 15 16 f=1 0.926 0.933 0.939 0.944 0.948 0.952 0.955 0.958 f = 1/2 0.922 0.930 0.936 0.942 0.948 0.950 0.953 0.956 Estimation Once the modeler has specify a model based on theoretical relations between the explanatory and dependent variables or by examination of the available data he or she must estimate the parameters of the function using historical or cross sectional data (Leckenby & Wedding, 1982). The essence of the process is fitting a determined equation to a set of data in order to find the best estimates of the different parameters in the model ( a, b1, b2 , c , etc). There are many estimation techniques however the most “robust” and popular is regression analysis. We will now describe the basic concepts of the simplest regression analysis: Ordinary Least Squares (OLS). We will discuss the assumptions underlying this technique and the problems when they are violated as well as possible remedies. 24
  • 33. It is important to notice that the process of model building is somehow circular in the sense that a model is specified, estimated, and verified but very often some violations of the assumptions as well as unsatisfactory results force the modeler to choose a different estimation technique or to modify the model specification and start the process again. Another annotation is that the estimation process in model building is more of a confirmatory approach (see Hair, 1998) of multiple regression analysis. It differs somehow with an exploratory approach because a pre-established functional form based on theoretical relations between variables is “tested” or confirmed against empirical data. However, as noted earlier, it is an iterative process where different fictional forms might be “confirmed” until finding satisfactory results. Ordinary Least Squares The basic idea of estimating the parameters of a response function is to find the values for each parameter that would minimize the sum of errors or disturbance terms in the equation. Let us consider the simplest linear functional form: S = a + bA + u Where: S = Sales a = the intercept term b = slope of the function A = Advertising expenditures u = disturbance term or error term 25
  • 34. Rephrasing, the objective in the estimation process of model building is to find the values of a and b that would give the least value of u in the average. Because what we are trying to find is the statistical relationship between the variables there is always some random errors: for every value of an independent variable there might be more than one value of the dependent variable. These multiple values of the dependent variable for every value of the explanatory variables are the result of random components in the relationship (Hair, 1998). The Ordinary Least Squares is the basic technique in which the parameters of a linear or linearized (see Specification section in this paper) response function are estimated by minimizing the sum of the error terms at every point of the function. Because the difference between a predicted value by the function and the observed value could be positive or negative, the error terms are squared so they can be added to produce a measure of the fit of the model to the data in the sample. That measure is the residual sum of squares (RSS) or the sum of squared errors (SSE) (Hair, 1998). There is also a measure of the improvement in explanation of the dependent variable attributable to the independent variables compared to just using the media of the dependent variable. It is called the sum of squared regression (SSR) and it is calculated by adding the squared differences between the mean and the predicted value of the dependent variable for all observations (Hair, 1998). These tow measures are crucial for assessing the model’s capacity to explain the variation of the data of the dependent variable. If the SSR is divided by the total sum of squares (TSS), the total variance of the dependent variable, we obtain the coefficient of determination R 2 that represents the portion of the total variance of the dependent variable (usually sales S or market share) explained by the 26
  • 35. model. Figure 6 shows a graphical representation of those measures. The unexplained variance is SSE, the explained variance is SSR and the total variance is TSS. Figure 6. Variance in regression analysis (from Leckenby & Wedding, 1982). The procedure underlying OLS has several restrictive assumptions that must be carefully considered in assessing the validity of the estimated model (see Verification in these paper). The fundamental assumptions are the following: a.) The mean of the error terms equals 0 b.) Constant variance of the error terms c.) Independence of the error terms d.) Normality of the error terms’ distribution e.) Low multicollinearity 27
  • 36. The basic idea behind these assumptions is that u is a random variable. This is clearly explained by Koutsoyiannis in his Theory of Econometrics book (1978): “(…) u can assume various values in a chance way. For each value of an independent variable the term u may assume positive, negative or zero values each with a certain probability. We said that u is introduced into the model in order to take into account the influence of various 'errors', such as errors of omitted variables, errors of the mathematical form of the model, errors of measurement of the dependent variable, and the effects of the erratic element which is inherent in human behavior. Now, for u to be random the omitted variables should be numerous, each one individually unimportant, and they should change in different directions so that their overall effect on the dependent variable is unpredictable in any particular period.” If we agree that what we are trying to represent in model building is the relationship between the independent and depend variables in the average, it is imperative that the mean of the error term equals 0 (assumption a). Otherwise the parameters of the function are biased (Leeflang et al, 2000). Assumption b means that the dispersion of the error terms remains the same over all observations of the independent variables. It is said that the variance of the error terms around the zero mean is homoscedastic, which means that it does not depend on the values of the independent variables. Conversely, the case of heteroscedasticity is when increasing or decreasing dispersion of the error terms is observed. The consequence of violating this assumption is that it is not possible to calculate an effective confidence interval for the parameters reducing their efficacy (Leeflang et al, 2000) and their statistical significance (Koutsoyiannis, 1978). Assumption c is also known as absence of autocorrelation. That means that the error terms at any point in the function should be independent from each other. This 28
  • 37. might be relevant only when the model is estimated using time series because the autocorrelation is actually a serial correlation (Leeflang et al, 2000) between the error at one period and the error(s) at the previous period(s). There is positive autocorrelation and negative autocorrelation. Positive autocorrelation means that the residual in t tends to have the same sign as the residual in t-1. Negative autocorrelation is when a positive sign tends to be followed by a negative sign or vice versa (Leeflang et al, 2000). The consequences of violating this assumption is that even though the estimated parameters are unbiased (as when assumption b is violated) the OLS formula underestimates their sampling variance and the model will seem to fit the data better than it actually does (Hanssens et al, 2001). The assumption of normality (assumption d) is necessary for conducting the statistical tests of significance of the parameter estimates and for constructing confidence intervals. If this assumption is violated the estimates are still unbiased and best, but it is not possible to assess their statistical reliability by the classical test of significance (t, F, etc.) because this test is based on normal distributions. Multicollinearity results form the correlation between independent variables. When one independent variable “moves” at the same time as another one it is said that they are collinear. In marketing as in many other areas variables tend to be correlated all the time. For example, a price reduction is announced via some TV advertising as well as radio. These variables will be correlated to each other since they vary at the same time. Managers usually do not leave all variables constant and vary only one at the same time. The degree of multicollinearity has an important impact on the parameters of the response function. A high level of multicollinearity limits the size of the coefficient of 29
  • 38. determination R 2 and it makes determining the contribution of each independent variable difficult because the effects of the independent variables are “mixed” or confounded (Hair, 1998). In consequence, the reliability of the parameter estimates is low (Leeflang et al, 2000). The assumptions discussed above limit the applicability of OLS to estimate the parameters of the function because these assumptions are often violated. There are many reasons why the assumptions are violated but usually it is the result of misspecification of the response function. There are some tests and procedures to test if one or more of the assumptions are violated. Some of them would be described in the Verification section of this paper. Once the parameters are estimated and the underlying assumptions tested it is sometimes possible to take some corrective actions if violations to the assumptions are present. The simplest corrective action is modifying the specification of the response function and estimating it again. However, sometimes the only solution is to use a different estimation technique. Generalized Least Squares In the Generalized Least Squares (GLS) techniques some of the restrictive assumptions about the disturbance term in OLS are relaxed, specifically the assumptions of constant variance and independence of the error terms (autocorrelation). These estimation methods are “generalized” because they can account for especial cases or models. Actually, OLS is a special case of GLS where all the assumptions are met (Leeflang et al, 2000). Other special case is when the variance is heteroscedastic --for example, when cases that are high on some attribute show more variability than cases that 30
  • 39. are low on that attribute, and the difference can be predicted from another variable, a weight estimation procedure can compute the coefficients or parameters of a linear model using weighted least squares (WLS), such that the more precise observations (that is, those with less variability) are given greater weight in determining the regression coefficients (Leeflang, 2000). The weight estimation procedure in statistical packages like SPSS tests a range of weight transformations and indicates which will give the best fit to the data. Another special case, typical of time-series, is when there is strong presence of autocorrelation of the disturbance terms but at the same time the variance is homoscedastic. Assuming that the autocorrelation is generated by a first-order autoregressive scheme (Markov scheme) some transformations are done to incorporate an autoregressive coefficient that would allow better parameter estimates (see Leeflang et al, 2000, p. 371-376 for a detailed explanation). There are others GLS methods that account for especial cases of the behavior of the disturbance term. For an extensive list of literature on those methods see Hanssens et al, 2001, Chapter 5. One important note is that these GLS procedures for dealing with special patterns of the disturbance terms would not give better parameter estimates if the pattern is due to misspecified models, as it is usually the case (Leeflang et al, 2000). Additionally, “robustness may generally be lost if GLS estimation method are used” (Leeflang et al, 2000, p. 376). So, before using these procedures the modeler should be convinced that he or she is using the best possible model specification (Leeflang et al, 2000). 31
  • 40. Nonlinear Least Squares There are some models that are nonlinear and nonlinearizables. Additionally, there are other models that violate the assumptions of the disturbance term in their mere specification. Those models include the Koyck General Lag (GL), Partial Adjustment (PA) and General Lag Autoregressive (GLA). Those cannot be accurately estimated by linear regression. For solving this problem some procedures have been created to allow the modeler to estimate those kinds of models. The general or more common characteristic of this procedure is that it is iterative. In its simplest form the parameter that is causing the model to be nonlinear is guessed by either subjective estimation or trial and error until a satisfactory result is achieved. Leeflang et al (2000) explain this grid search in the following terms: “For simplicity assume that for any value of y [the parameter causing the nonlinear attribute], the model is estimated by OLS, under the usual assumptions about the disturbance term. Then choose m values for y, covering a plausible wide range, and choose the value of y for which the model’s R 2 is maximized” (p. 384). This procedure is equivalent to the one using adstock models when different half-life values are tested to select the one that gives the best results (Broadbent, 1984). This grid search can also be done when instead of replacing a parameter that is causing the nonlinearity, different transformations of the predictor variables are tested sequentially until finding satisfactory results (Leeflang et al, 2000). However, grid search procedures are costly and inefficient, especially if a model is nonlinear in several of its parameters (Leeflang et al, 2000). 32
  • 41. More sophisticated methods have been developed where initial estimates of some parameters are reintroduced in the equation in an iterative process until the whole process converges (Leeflang et al, 2000; Koutsoyiannis, 1978). All the techniques discussed above estimate the parameters in an attempt to minimize the squares of the differences between the estimated points and the observed ones. They are all Least Squares (LS) methods. A radically different approach is the Maximum Likelihood (ML) method. Maximum Likelihood The ML method is based on distributional assumptions about the data. Basically it finds the values of parameters that make the probability of obtaining the observed sample outcome as highly as possible (Hanssens et al, 2001). In other words “the maximum likelihood principle is an estimation principle that finds an estimate for one or more parameters such that is maximizes the likelihood of observing the data. The likelihood of a model (L) can be interpreted as the probability of the observed data y, given the model” (Leeflang et al, 2000, p. 390). Under this assumption a certain parameter is more likely than other. The assumptions underlying ML method are actually the ones involved in hypothesis testing in social sciences (Leeflang et al, 2000) and not surprisingly the method is very sensible to the sample size, giving better results with large samples. The ML method can also be used to select a model between competing ones (see Summary in this paper). For more details on ML and LS methods for estimating the parameters of a response function consult Hanssens et al, 2001; and Leeflang et al, 2000. 33
  • 42. Verification Another important step in developing market response models is to verify that the parameters estimated in the previous step truly represent the relationship between sales (or any other dependent variable) and the marketing variables. The usual way to do this is to use statistical significant testing (Leckenby & Wedding, 1982). By verifying the parameters it is possible to determine with a certain risk level how representative they are of the true advertising-marketing/sales relationship. In market response model (if commercially used) the significance level often used is about 15 percent (Leckenby & Wedding, 1982). If achieving that level of significance one could say that in at least 85 samples of every 100 samples of data that we use for estimating the response function the parameters would be between x and y number (the confidence interval). The first measures that should be verified are those related with the fit of the model to the data in the sample. As discussed above, the R 2 value indicates the percentage of the variance of the dependent variable explained by the independent variables in model. Because this measure is affected by the number of observations per independent variables used, the modeler should focus on the adjusted R 2 for comparing between competing models and to control for “overfitting” the data (Hair, 1998). It is important to notice that the minimum ratio of observation per independent variable should be 5 to 1 in order to avoid making the results too specific to the sample (“overfitting”) thus lacking generalizability. Verifying the statistical significance of R 2 and adjusted R 2 is critical in this step. The F ratio is the statistical significance test that most statistical packages use to test this. The parameters of the models should also be tested in terms of their statistical significance. The t value of a coefficient or parameter is 34
  • 43. the coefficient divided by the standard error. To determine if the parameter is significantly different form zero (no effects or relation with the dependent variable) the computed t value is compared to the table value for the sample size and confidence level selected. This test is not that important for the intercept term in a linear model since it acts only to position the model (for details see Hair, 1998, p. 184) Another measure highly related with the overall fit of the model that must be also checked in this step of model building is the RSS or SSE (the squared sum of the errors or disturbance terms). Even though a high R 2 could be found for a specific model the RSS could still be very large indicating the inability of the model to accurately make predictions. As discussed in the previous section, the assumptions underlying the different estimation techniques are highly important for assessing the validity of the parameter estimates since violations of the assumptions give biased coefficients or, more frequently, make their statistical significance hard to estimate (Leeflang et al, 2000). If the assumptions are violated the confidence that the parameters truly represent the relationship under analysis is diminished. So another important task of the verification step is to verify that the assumptions used for estimating the parameters are not violated. The simplest way to do this is by a careful analysis of the residuals using scatter plots. It is recommended to use some form of standardization as it makes the residuals directly comparable. The most widely used is the studentized residuals, whose values correspond to t values (Hair, 1998). Figure 7 shows different plots that illustrate the pattern that the disturbance terms could take if some of the assumptions are violated. 35
  • 44. The null plot (Figure 7a) is the usual pattern when all the assumptions are met. “The null plot shows the residuals falling randomly, with relatively equal dispersion about zero and no strong tendency to be either greater or less than zero. Likewise, no pattern is found for large versus small values of the independent variable.” (Hair, 1998, p. 173). By analyzing these plots the modeler could find violations to the assumptions and then find remedies for those violations. These plots are the typical pattern one should find when violations occur. For example, nonlinearity (b) in the relationship between the dependent and explanatory variables; heteroscedasticity of the variance (c) and (d); and autocorrelation (e). The normal histogram of the residuals (g) allows the modeler to test the assumption of normality of variance. A pattern like (f) would result when important events in the data are omitted in the specification of the response function (Hair, 1998). (For example, dummy variables that account for seasonality or special promotional events). 36
  • 45. Figure 7. Graphical Analysis of residuals. (From Hair, 1998). 37
  • 46. Plotting the residuals against the independent variables is quite useful, however, the prototypical patterns depicted in figure 7 are hard to detect for small samples and sometimes large samples as well. Some statistical tests have been developed for helping the modeler find violation to the assumptions in a more systematic way. For example, the Durbin-Watson (D.W.) test allows the model builder to test autocorrelations of the disturbance terms. The D.W. statistic varies between zero and four. Small values indicate positive autocorrelation and large values negative autocorrelation (Leeflang et al, 2000). Durbin and Watson formulated lower and upper bounds ( d L , d U ) for various significance levels and for specific sample sizes and numbers of parameters. The test is used as follows (for details see, Leeflang et al, 2000, p. 340): For positive autocorrelation a. If D.W. < d L , there is positive autocorrelation b. If d L < D.W. < d U , The result is inconclusive c. If D.W. > dU , There is no positive autocorrelation For negative autocorrelation d. If [4-D.W.] < d L , there is negative autocorrelation e. If d L < [4-D.W.] < d U , The result is inconclusive f. If [4-D.W.] > dU , There is no negative autocorrelation Other tests have been developed for testing violations to other assumptions. The description of those tests is outside the scope of this paper, for a detailed description see Hanssens et al, 2001, chap. 5; and Leeflang et al, 2000 chap. 16. 38
  • 47. Leeflang et al, 2000, developed a table (table 2 in these paper) that summarizes the violations to the assumptions in model building using Least Squares as well as possible reasons, consequences, tests for detecting them and possible remedies. Table 2. Violation of the assumptions about the disturbance term: reasons, consequences, tests and remedies. (From Leeflang et al, 2000, p. 332) 39
  • 48. As table 2 shows when some violation of assumptions are detected by either plotting the residuals or applying specific test, the modeler can try to take some remedies, often modifications to the specification of the model, or the use other estimation technique that relax the violated assumption (see Estimation in this paper). As frequently mentioned by Leeflang et al (2000) and Hanssens et al (2001), violation of the model are usually caused by specification errors, so the first thing a modeler should do if the results are not satisfactory is to try a different functional form (see Specification in this paper) or to modify the specification of the model under scrutiny. The process of model verification is clearly explained in the following table (table 3) taken from Hanssens et al, (2001). Table 3. Steps in evaluating a regression model. (from Hanssens et al, 2001) 40
  • 49. Prediction Verification is just one part of the validation of a response model. The response function in order to be believed must be able to predict future sales or market share for the brand relative to the explanatory variables (Leckenby & Wedding, 1982). For example, if it is true that advertising expenditures can explain sales a valid model should be able to predict the amount of sales in period x given a certain level of advertising expenditures in period x and probably previous periods. Because waiting for future sales data to test a model is not only risky but useless if we want to use the model to forecast or decide on future marketing and advertising expenditure levels, a process called “postdiction” is used. Postidction refers to the idea of predicting values that are already known. For example, a model is estimated using a sample that includes all the data from the past two years but not from this year even thought we already know the figures. The process of postdicting is the use the model to predict the sales of this year given the marketing and advertising expenditures this year too. If the accuracy of the predictions is good the model is a valid model for future forecast and then can be used in different managerial decision making tasks. The way a modeler can perform this validation process is to split the sample of data in two subsets: one for estimating the model and the other for validating it using the process described above. With large samples this can be easily done by just leaving a fair number of data for validation purposes. However, the modeler usually does not have a lot of data to do this, so a minimum of three data points are left for validating the model. When the model is estimated using cross sectional data, the validation sub-sample is chosen randomly but when the model is estimated using time-series data the last three 41
  • 50. or more periods are reserved for the validation process. The reason for doing this is that the modeler would like to take into account the prediction accuracy when carryover effects are involved in the response functions (Leeflang et al, 2000) and also because the manager would be more interested in the prediction accuracy of recent events than that of distant ones. There are several measures of the prediction accuracy of the model (see Leeflang et al, 2000, chapter 18) but the basic principle is to compare the predicted values with the observed ones and calculate the average error of the predictions. The two most common measures are the Average Prediction of Error (APE) and the Mean Absolute Percentage of Error (MAPE). The Average Prediction of Error is calculated by averaging the differences between the observed and the estimated values. The procedure allows negative and positive errors to offset each other (Leeflang et al, 2000). In accordance with the zero mean assumption in regression analysis (see Estimation in this paper) the APE should be close or equal to 0. However, even with an APE of 0 a model could still have large estimation errors if they offset each other. A better estimate of the prediction accuracy is the MAPE since it is a measure that allows the modeler to asses the error as a relative measure (percentage) of the real or observed value. The MAPE is calculated by averaging the absolute percentage of error | y− y| ˆ ( .100 ) of each pair of predicted/observed data points in the validation sub- y sample. It is important to notice that if data outside the range used to estimate the model are used to predict the outcome of the model, misleading results can occur. This is especially important when using “non-robust” models like the linear ones where there is 42
  • 51. no limit to the response of the dependent variable for larger values of the explanatory ones (Hair, 1998). The “postdiciton” procedure described above is an adequate method for testing the validity of a model, however, “the acid test of the model’s validity still remain with predictive test into the future” (Leckenby & Wedding, 1982). If the model can fairly or acceptably predict sales figures which have not yet occurred, then the model is useful and can be used to solve marketing and advertising problems. A model should always and continually be checked for its prediction accuracy of future events as data become available. Model building summary Developing advertising and marketing response models is a fairly structured process with defined steps. However, model building is an iterative process where the results of one of the steps could suggest revising previous ones and start the process again. The model building process also involves subjective judgments form the part of the modeler as frequent tradeoffs become present and the solutions require judgment and personal experience. For example, a usual tradeoff that the modeler faces is when in order to enhance the prediction accuracy of a model he must make important changes to the specification of the model making it harder to interpret and grasp significant economic meaning. As Hair said (1998) “Prediction is often maximized at the expense of interpretation” (p. 161). The important role of the model builder in developing response functions is what makes it part science and part art. Summarizing the steps in model building for marketing decisions, a good model should first, be specified in accordance to advertising or marketing theory; second, 43
  • 52. estimated using an adequate estimation technique; third, verified using statistical significance tests and analysis of residuals to look for violations of the assumptions; and fourth, validated using postdiction and prediction accuracy tests. Sometimes a modeler has competing models that have been verified and validated and he or she must decide on which one to choose. The principle of parsimony would suggest him to always pick the simplest one. However, it is sometimes hard to find the optimal one since there is always a tradeoff involved in selecting a model that is simple but less accurate and one that is more precise but with increasing complexity. One should always evaluate the models with the original objective of the model building process in mind. Why were we building the model in the first place? What do we want to do with the model? What is the managerial relevance or usefulness of the model? If the answer to those questions still does not point toward one single model, there are some additional procedures that can be used to solve the problem of selecting between competing models. There are informal decision rules like “choose the model with the higher adjusted R 2 ”or “choose the one that has the least residual sum of square” and formalized decision rules involving hypothesis testing (Hanssens et al, 2001). The formal decision rules include the Maximum Likelihood (ML) statistic, Akaike’s Information Criteria (AIC) and Bayesian information criteria (for details on those tests see Hanssens et al, 2001, p. 230-239). Ideally, the model to be chosen should be the one with the higher adjusted R 2 , the least RSS, statistically significant t values, no autocorrelation and simpler structure. Fortunately, as Hanssens et al (2001) note: “the consequences in terms of deviation from the optimal level of discounted profits that arise from misspecifying market response is usually not great” (p. 239). 44
  • 53. Once a model has been specified, estimated, verified, validated and compared to other possible competing models it can be used in decision making for planning future scenarios, running controlled simulations and deriving economic measures for better accountability of past actions. The latter is the essence of model building in marketing and advertising: the better we understand the past the better we will predict the future. An Example In order to illustrate the process of developing marketing and advertising response models, real data from an important brand in the skin-care market in a Latin American country was used to build a model. Specifying the model After an initial exploration of the data that included an analysis of the multiple correlations between several variables and preliminary estimations of very basic response functions, the following models where specified: 1. The Linear Current Effects response model: MS = a + b1TVR + b2U + b3 RP + b4T + b5 C + b6 M Where: MS = Market Share TVR = TV GRPs U = Advertising expenditures for the Umbrella brand RP = Relative Price (brand’s price/main competitor’s price) T = Trend (linear trend over time) C = Total competitors’ advertising expenditures M = Magazine advertising expenditures 45
  • 54. 2. The Modified Exponential Current Effects model: MS = MS (1 − e a +b1TVR +b2U +b3 RP +b4T +b5C +b6 M ) Where: MS = upper bound level or saturation point e = a mathematical constant equals to 2.71...16 … 3. The Gompertz Current Effects model MS = MSe − e e a b1TVR b2U b3RP b4T b5C b6M e e e e e 4. The Linear Partial Adjustment (Nerlove) model: MS = a + b1TVRt + b2U t + b3 RPt + b4Tt + b5Ct + b6 M t + b7 MSt −1 5. The Logistic Partial Adjustment (Nerlove) model: MS MS = − ( a + b1TVRt + b2U t + b3 RPt + b4Tt + b5Ct + b6 M t + b7 MS t −1 ) (1 + e ) 6. The Gompertz Partial Adjustment (Nerlove) model: MS = MSe − e e a b1TVRt b2U t b3RPt b4Tt b5Ct b6M t b7 MSt −1 e e e e e e 7. The Modified Exponential Partial Adjustment (Nerlove) model: MS = MS (1 − e a +b1TVRt +b2U t +b3 RPt +b4Tt +b5Ct +b6 M t +b7 MSt −1 ) 8. The Linear Adstock model: MS = a + b1 Adstock + b2U + b3 RP + b4T + b5C + b6 M Where: Adstock = TV GRPs Adstock 46
  • 55. 9. The Logistic Adstock model: MS MS = − ( a + b1 Adstock + b2U + b3 RP + b4T + b5C + b6 M ) (1 + e ) 10. The Gompertz Adstock model MS = MSe − e e a b1Adstock b2U b3RP b4T b5C b6M e e e e e 11. The Modified Exponential Adstock model: MS = MS (1 − e a +b1 Adstock +b2U +b3 RP +b4T +b5C +b6 M ) All the above models assume independent effects of the explanatory variables. For example, the Linear CE model (number 1.) assumes that the market share for the brand is a constant, plus the effect of TV GRPs, plus the effect of the advertising expenditures on the umbrella or family brand, plus the effect of the price relative to the main competitor, plus a trend in time, plus (minus) the effect of the sum of all competitors’ advertising expenditures, plus the advertising expenditures of the brand in magazines. The assumption about independent effects means there is no interactions between the variables, for example between TV advertising and magazines advertising. This might not be true in reality, in consequence, some models that assumed such interactions where estimated but failed to deliver satisfactory results and no significant interactions were identified. It is important to notice that a trend in the data was incorporated into the model in order to gain more predictive power. However, as the quote says: “a trend in a model is a factor you forgot to include in the explanatory consideration set”. Considering that usually not all the data are available, adding a trend component is a partial solution to 47
  • 56. lack of information and helps sometimes enhancing the model’s fit and its prediction accuracy. However, as discussed earlier, there is usually a trade-off between prediction accuracy and explanatory power. Trend components in models should be avoided if there is no important improvement in the capacity of the model to make fair estimations of the observed data. Knowing when to include or exclude a trend is part of the art of modeling. Other models where also specified but where discarded early in the process because they failed to fairly represent the relationship between advertising and market share for the brand. For example, the univariate Koyck Geometric Distributed Lag (GL) model: MSt = a(1 − c) + bTVRt + cSt −1 + {ut − cut −1} and the univariate Geometric Lag Autoregressive (GLA) model: MSt = a + b1TVR1 − b1ρTVRt −1 + (c + ρ ) MSt −1 − cρMSt − 2 + {ut − cut −1} failed to deliver satisfactory results. This occurred mainly because they used only one explanatory variable that, alone, seems not to contribute much on explaining the market share variance for this particular brand. Estimating the model Once specified, the above models (number 1 to 11), where estimated using Ordinary Least Squares. Table 4 shows the parameter estimates for the Current Effects functions and their derived statistics. Table 5 and table 6 show the same information for the Nerlov Partial Adjustment models and the Adstock models. 48
  • 57. Table 4. Current Effects models’ statistics and parameter estimates Current Effects Models unstandarized Model t Rsq Adj. Rsq DW*** RSS coefficients a = 4.000368 4.62 * b1 = 0.001629 4.45 * b2 = 0.000112 6.08 * 1. Linear b3 = 0.036254 4.12 * 93% 91% 2.629 1.54 b4 = 0.075848 6.08 * b5 = -0.000011 -2.76 * b6 = 0.000232 1.11 a = -0.149235 -1.10 b = -0.000258 -4.51 * b2 = -0.000017 -5.90 * 2. Modified Exponential**** b3 = -0.005595 -4.08 * 92% 89% 2.655 1.76 b4 = -0.010991 -5.65 * b5 = 0.000001 1.98 * b6 = -0.000018 -0.55 a = 0.469480 1.48 b1 = -0.000587 -4.40 * b2 = -0.000038 -5.70 * 3. Gompertz***** b3 = -0.012564 -3.91 * 91% 88% 2.688 1.90 b4 = -0.023992 -5.27 * b5 = 0.000003 1.71 ** b6 = -0.000030 -0.40 Sample Size = 23 *p < .05 **p < .15 *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05 ****Upper Bound =15 *****Upper Bound =12 49
  • 58. Table 5. Partial Adjustment models’ statistics and parameter estimates Partial Adjustment Models unstandarized Model t Rsq Adj. Rsq DW*** RSS coefficients a = 3.729706 3.88 * b1 = 0.001686 4.43 * b2 = 0.000100 4.08 * b3 = 0.031444 2.80 * 1. Linear 94% 91% 2.667 1.49 b4 = 0.068501 4.17 * b5 = -0.000011 -2.71 * b6 = 0.000243 1.14 b7 = 0.095134 0.70 a = -0.126388 -0.83 b = -0.000262 -4.37 * b2 = -0.000016 -4.11 * b3 = -0.005189 -2.92 * 2. Modified Exponential**** 92% 89% 2.654 1.74 b4 = -0.010371 -4.00 * b5 = 0.000001 1.92 * b6 = -0.000019 -0.56 b7 = -0.008031 -0.38 a = -1.020810 -3.86 * b1 = 0.000461 4.40 * b2 = 0.000028 4.09 * b3 = 0.008621 2.78 * 3. Logistic**** 93% 90% 2.668 1.52 b4 = 0.018575 4.11 * b5 = -0.000003 -2.63 * b6 = 0.000064 1.09 b7 = 0.024295 0.65 a = 0.405843 2.01 * b = -0.000353 -4.41 * b2 = -0.000021 -4.11 * b3 = -0.006763 -2.86 * 4.Gompertz**** 93% 90% 2.667 1.59 b4 = -0.014045 -4.07 * b5 = 0.000002 2.30 * b6 = -0.000038 -0.84 b7 = -0.015138 -0.53 Sample Size = 23 *p < .05 **p < .15 *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05 ****Upper Bound =15 50
  • 59. Table 6. Adstock models’ statistics and parameter estimates Adstock Models unstandarized Model t Rsq Adj. Rsq DW*** RSS coefficients a = 4.069391 4.89 * b1 = 0.002388 4.72 * b2 = 0.000082 4.67 * 1. Linear b3 = 0.036416 4.28 * 94% 91% 2.223 1.44 b4 = 0.067203 5.67 * b5 = -0.000013 -3.39 * b6 = 0.000325 1.71 ** a = -0.933690 -4.10 * b = 0.000657 4.75 * b2 = 0.000022 4.67 * 2. Logistic**** b3 = 0.009899 4.26 * 94% 91% 2.218 1.47 b4 = 0.018084 5.58 * b5 = -0.000003 -3.31 * b6 = 0.000086 1.66 ** a = 0.347017 1.98 * b1 = -0.000501 -4.71 * b2 = -0.000017 -4.54 * 3. Gompertz***** b3 = -0.007556 -4.22 * 93% 91% 2.161 1.60 b4 = -0.013388 -5.37 * b5 = 0.000002 2.95 * b6 = -0.000056 -1.40 a = -0.162484 -1.23 b = -0.000372 -4.62 * b2 = -0.000012 -4.38 * 4. Modified Exponential**** b3 = -0.005608 -4.14 * 92% 90% 2.094 1.82 b4 = -0.009619 -5.10 * b5 = 0.000002 2.53 * b6 = -0.000034 -1.11 Sample Size = 23 Adstock Half Life = 1 period. Carry-over = 33% *p < .05 **p < .15 *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05 ****Upper Bound =15 51
  • 60. Verifying the model As table 4, 5, and 6 show the R 2 and adjusted R 2 of all the models is 90% or above. That means they all explain at least 90% of the variance of the market share of the brand in the period analyzed. The adjusted R 2 is more useful for comparing the CE and Adstock models with the PA models since the Partial Adjustment models include an additional lag parameter and the R 2 is sensible to the number of variables in the model. Other way of measuring the ability of the models to fairly represent the relationship between the explanatory variables and the dependent one is by analyzing the model fit to the data in the sample. The Residual Sum of Squares (RSS) delivers a direct measure of the “unfitness” of the model. The estimated models show small RSS varying from 1.44 to 1.90. The Durbin-Watson statistic shows that none of the estimated models show significant autocorrelations. This is especially important if we desire that the estimation process delivers unbiased and statistically significant parameter estimates. As discussed under the verification section in the first chapter, the estimation process should deliver statistically significant parameter estimates so the modeler could project the model beyond the data sample. In other words, the parameter estimates should have a value different form zero meaning that their associated variables have a real effect in the dependent variable. The estimated models vary in this criterion since not all of them have all statistically significant coefficients or parameter estimates. Actually, just the Adstock Linear and the Adstock Logistic model have statistically significant b6 coefficient. Interestingly, the parameters corresponding to the lagged variable ( b7 ) in the Nerlove PA models are not statistically significant. This means that these PA models 52
  • 61. actually reduce to the Current Effects ones since the only difference between them is the additional lagged variable. Another very interesting result is that the coefficient for the relative price is positive. Since the variable was defined as the ratio between the brand’s price and the main competitor’s price (brand’s price/main competitor’s price) it is surprising to realize that, at least for the data analyzed, the highest the ratio the highest the market share all else being equals. Since the parameter’s sign is consistent across all models it should not be discarded. There are situations in which raising the price actually raise the demand of the product because it acts as a clue that signals good quality. This phenomenon has been detected in many specialty products, including beauty products (Kotler, 1971). The brand is a competitive brand in the “wrinkle prevention” market, a highly specialized category driven mainly by research and product innovation. It is not unlikely that this is one of those special cases where the relation between price and demand is reversed. The brand use to have a lower price compared to its main competitor but it seems that the closer the price of the brand to the price of its main competitor the higher the demand for the brand. This result should be taken with caution and would apply probably only for the data range analyzed (min = 51.5; max = 101.7; mean = 76.13; std. deviation = 11.43). In order to check for violations of the OLS assumptions residuals’ scatter plots of the best four models (CE Linear model, PA Linear model, Adstock Linear model and Adstock Logistic model) where analyzed. Figure 8 shows the scatter plots of the studentized residuals vs. the actual market share values for the four models. No systematic pattern is observed for any of the models analyzed showing that no fundamental assumption was violated. However, some outliers can be recognized, 53
  • 62. especially two outliers for the adstock models. The treatment of outliers is controversial (Hair, 1998) but a careful analysis should be provided in order to asses their impact on the overall performance of the model. We will discuss this latter. Figure 8. Scatter plots of the studentized residuals vs. the actual market share values for the CE Linear model, PA Linear model, Adstock Linear model and Adstock Logistic model. Ideally the best model should have all statistically significant coefficients, no autocorrelation, the highest R 2 or adjusted R 2 and the lowest RSS. However, not always all of these criteria can be found in one single model as it is the case for the Adstock 54
  • 63. Linear model in our example. Additionally, the best model is not really identified until the acid test is performed. So before selecting a single model the best ones should be validated using the prediction/postdiction procedure. Validating the model The best competing models (CE Linear model, PA Linear model, Adstock Linear model and Adstock Logistic model) were selected to be validated using a subset of the sample. The sample of data was split into two subsets: one with the first 20 observations to estimate again the parameters of the model and the other one with the last 3 to be predicted/postdicted by the model. The Mean Absolute Percentage of Error (MAPE) was used to compare the prediction ability of the models. Table 7 shows the results and all the statistics for the selected models. All the models have MAPEs below 3,5% which means that they all can make accurate predictions of future outcomes. However, the Adstock models clearly outperform the CE and PA linear models. The principle of parsimony would suggest choosing the simplest model between two competing ones. The MAPE criteria as well as all the other criteria also point the Linear Adstock model as the winner. Figure 9 shows the modeled market share versus the actual market share for the Adstock Linear Model. 55
  • 64. Table 7. Best models comparison Best Models unstandarized Model t Rsq Adj. Rsq DW*** RSS MAPE coefficients a = 4.000368 4.62 * b1 = 0.001629 4.45 * b2 = 0.000112 6.08 * 1. CE Linear b3 = 0.036254 4.12 * 93% 91% 2.629 1.54 2.99% b4 = 0.075848 6.08 * b5 = -0.000011 -2.76 * b6 = 0.000232 1.11 a = 3.729706 3.88 * b = 0.001686 4.43 * b2 = 0.000100 4.08 * b3 = 0.031444 2.80 * 2. PA Linear 94% 91% 2.667 1.49 3.01% b4 = 0.068501 4.17 * b5 = -0.000011 -2.71 * b6 = 0.000243 1.14 b7 = 0.095134 0.70 a = 4.069391 4.89 * b1 = 0.002388 4.72 * b2 = 0.000082 4.67 * 3. Adstock Linear b3 = 0.036416 4.28 * 94% 91% 2.223 1.44 1.65% b4 = 0.067203 5.67 * b5 = -0.000013 -3.39 * b6 = 0.000325 1.71 ** a = -0.933690 -4.10 * b = 0.000657 4.75 * b2 = 0.000022 4.67 * 4. Adstock Logistic**** b3 = 0.009899 4.26 * 94% 91% 2.218 1.47 1.71% b4 = 0.018084 5.58 * b5 = -0.000003 -3.31 * b6 = 0.000086 1.66 ** Sample Size = 23 (note: the MAPE was calculated using parameter estimates from a sample data of 20) Adstock Half Life = 1 period. Carry-over = 33% *p < .05 **p < .15 *** DW < .90 significant autor; .90 > DW <1.92 inconclusive; DW > 1.92 no significant autor. at .05. ****Upper Bound =15 56