SlideShare ist ein Scribd-Unternehmen logo
1 von 90
Downloaden Sie, um offline zu lesen
Deriving Value from Consumer Networks




                         Shawndra Hill
                       University of Pennsylvania


                         Supernova 2008
                          June 17, 2008


Joint work with: Bob Bell, Deepak Agarwal, Foster Provost, Chris
                                                                   1
                             Volinsky
Communication Networks


–   Nodes represent transactors
–   Edges are explicit transactions




                                                    2
How can firms use data on explicit
         consumer networks to improve
              consumer rankings?
For example, in order to rank customers by
 likelihood of …

Response to a target marketing offer
Fraud
Donating to a cause
Spreading information about a product
…
                                             3
Consumer Networks


Email                Dependencies
                       – Nodes are interdependent
Web purchases
Call detail logs     Scale
Blogs                   – Tens or hundreds of
                          millions of nodes and
Discussion forums         edges
Online auctions
Recommender sites    Dynamic
                        – Large numbers of nodes
Networking portals        coming and going
                          continuously         4
Business problem:
           Target consumers for new
           product

•   Large telecommunications company
•   Product: new telecom service
•   Large direct marketing campaign
•   Long experience with targeted marketing
•   Sophisticated segmentation models based
    on data and intuition
    e.g., regarding the types of customers known or
      thought to have affinity for this type of service 5
The Data

   The firm determined 21 segments by a                                  SEGMENT ID


     combination of customer characteristics
                                                                             1

                                                                             2


                                       Geography (G)
                                                                             3

       Loyalty (L)                                                           4

                                                                             5
                                            State
       Existing Customer                                                     6
                                             Zip                             7
         Prior spending
                                            Urban                            8
          Current plan                                                       9
                                         Cable Region
        Frequent switch                                                      10

                                                                             11


               Demographics (D)                         Other (O)            12

                                                                             13
                                                        Type of Mailer       14
                            Age
                                                         Internet Type       15

                           Gender                                            16

                                                                             17
                          Children
                                                                             18
                   Head of Household                                         19

                                                                             20
                                                                                      6
                                                                             21
separately, assessed >150 potential attributes from these categories
What’s new?
        Directed Network-based Marketing
                                             Existing customers
Store millions of inbound/
outbound                                     “Network Neighbor” targets
                                             Non-customers

communications a day to/
from existing customers

Constructed
representation of
consumer network over
prior 6 months
 Can this additional data improve customer
 ranking significantly?                                           7
What’s new?
       Directed Network-based Marketing
                                     SEGMENT ID
                                1

Store millions of inbound/      2


outbound
                                3

                                4


communications a day to/        5

                                6

from existing customers         7

                                8

                                9


Constructed
                                10

                                11


representation of               12

                                13

consumer network over           14

                                15

prior 6 months                  16

                                17

                                18

                                19

                                20

                                21                8
                    important   22
Results


Relative Take Rates for Marketing Segments

               4.82
              (1.35%)
                          2.96

                         (0.83%)
    1
                                       0.4
  (0.28%)
                                     (0.11%)
Non-NN 1-21   NN 1-21    NN 22     Non-Targe t
                                       NN




                                                 9
More Sophisticated Local
                 Network-based Attributes?

Attribute          Description
Degree             Number of unique customers communicated
                   with before the mailer
# Transactions     Number of transactions to/from customers
                   before the mailer
Seconds of         Number of seconds         communicated         with
communication      customers before mailer
Connected to       Is an influencer in your local neighborhood?
influencer?
Connected          Size of the     connected   component     target
component size     belongs to.
Similarity         Max overlap in local neighborhood with existing
(structural        customer
equivalence)
                                                                    10
More sophisticated Network
                           attributes? For example collective
                                        inference


Relational classifier
     – WvRN

                           1
    p ( yi = c | N i ) =
                           Z
                                ∑     wi , j ⋅ p ( y j = c | N j )
                               v j ∈ Ni




                                                                     11
More sophisticated Network
                           attributes? For example collective
                                        inference


Relational classifier
     – WvRN

                           1
    p ( yi = c | N i ) =
                           Z
                                ∑     wi , j ⋅ p ( y j = c | N j )
                               v j ∈ Ni




                                                                     12
More sophisticated Network
                           attributes? For example collective
                                        inference


Relational classifier
     – WvRN

                           1
    p ( yi = c | N i ) =
                           Z
                                ∑     wi , j ⋅ p ( y j = c | N j )
                               v j ∈ Ni




                                                                     13
Contributions

Consumers that have already interacted with an existing
 customer adopt a product (eg., respond to a direct
 mailer) at a higher rate than those that have not.

Variables constructed from the consumer’s immediate
 network enable the firm to (classify/rank targets,
 generate profit) better.

Global network attributes can be used to help rank
  consumers two hops away from existing customers

Our ability to improve consumer ranking translated into
 significant profit to the firm
                                                          14
Overview: Our Objective


  Design a generic definition,
representation, and approximation for
dynamic graphs that can be used for
problems where looking at entities through
time is of interest.

– What is the graph at time t: Gt
– How does one account for addition and
                                          15
  attrition of nodes
Business problem:
          Repetitive Subscription Fraud

•   Large telecommunications company
•   telecom service
•   Long experience with fraud detection
•   Sophisticated models based on record
    linkage



                                           16
Motivating Example: Repetitive Fraud
    Lots of people cant pay their bill, but they want phone
      service anyway:
Name           Ted Hanley             Name        Debra Handley


Address        14 Pearl Dr            Address     14 Pearl Dr
               St Peters, MN                      St Peters, MN
Balance        $208.00                Balance     $142.00
Disconnected   2/19/04 (nonpayment)   Connected   2/22/04

Name           Elizabeth Harmon       Name        Elizabeth Harmon

Address        APT 1045               Address     180 N 40TH PL
               4301 ST JOHN RD                    APT 40
               SCOTTSDALE, AZ                     PHOENIX, AZ
Balance        $149.00                Balance     $72.00
Disconnected   2/19/04 (nonpayment)   Connected   1/31/04
                                                                     17
Motivating Example: Repetitive Fraud
 How can we identify that it is the same person behind both accounts?

            Old                       New
                     67855232344                 4215554597
            Account:                  Account:
            Old                       New
                       2003-02-25                2003-02-13
            Date:                     Date:
            Old        DAVID          New        DAVID
            Name:      ATKINS         Name:      WATKINS
                                                 10
            Old      10 NIGHT WAY New
                                                 HATSWORT
            Address: APT 114      Address:
                                                 H DR
                                      New
            Old City: FAYVILLE                   BONDALE
                                      City:
            Old                       New
                       AL                        AL
            State:                    State:
            Old Zip:   302141798      New Zip:   300021530
            Old II     551212760990   New II     5312074639
            Code:      1              Code:      501
            Old                       New
                     284.62                      5.83
            Balance:                  Balance:                      18
Motivating Example: Challenges
• This is a problem of record linkage and
  graph matching, but because of obfuscation,
  we can only count on entity matching.
• But the number of potential matches 300K/month
                                         10 K/day


  is huge…               Connect pool




                                                            T
                           Restrict pool        5 K/day
                                              150 K/month
                     45 billion comparisons
• If we have an efficient representation of
                                            19
  entities, we might be able to make a dent….
Our Approach: Defining Dynamic Graphs

We adopt an Exponentially Weighted Moving Average (EWMA):
                     G t = θG t − 1 ⊕ (1 − θ) g t
              i.e. today’s graph is defined recursively as a convex
              combination of yesterday’s graph and today’s data

            • Advantages:
                - recent data has most influence
                - only one most recent graph need be stored

We also use two types of approximation of the graph, by pruning:
Global pruning of edges – overall threshold (ε ) below which edges are
removed from the graph
Local pruning of edges – designate a maximal in and out degree (k) for
each entity, and assign an overflow bin                                  20
Our Approach: Defining Dynamic
                 Graphs
    Selecting θ
θ closer to 1
• calls decay slower
• more historical data included
• smoother


θ closer to 0
• faster decay
• recent calls count more
• more power to detect changes
• less smooth



                                       21
Applying our Method

• Results:

   –   We identify 50-100 of these cases per day
   –   95% match rate
   –   85% block rate
   –   ollars
   –   Credited with saving telecom millions if dollars

   – By far the most reliable matching criteria is the entity based
     matching

   – Optimized parameter set outperforms both current process
     and current theta and optimized k

   *We also demonstrate our method on email and clickstream
     data
                                                                      22
Other applications,
                     conclusions…
•   Our three parameter representation of a dynamic graph is a powerful, flexible, and
    efficient way of analyzing problems where looking at entities through time are of
    interest.

•   Can be applied to any problem where entity modeling over time is of interest
         • Other fraud: Guilt by association
         • Email
         • Web pages
         • Social Networks
         • Terrorism
         • Viral Marketing

•   What class of problems is this good for? After all, there is no model!!!
•   Further work
     – More complex entities
     – Distance Functions
     – More flexible, adaptive parameter setting

                                                                                         23
Want more? Deriving Value
        from Consumer Networks

2. Network-based Marketing: Identifying Likely
   Adopters via Consumer Networks
     Shawndra Hill, F. Provost, C. Volinsky, Network-based Marketing: Identifying
     Likely Adopters via Consumer Networks, Statistical Science, Vol. 21, No. 2, pp.
     256-276


2. Collective Inference in Consumer Networks
     Shawndra Hill, F. Provost, C. Volinsky, Collective Inference in Consumer
     Networks, to be submitted to Marketing Science March 2007.


3. Building an Effective Representation for
    Dynamic Networks
     Shawndra Hill, D. Agarwal, R. Bell, C. Volinsky , Building an Effective
     Representation for Dynamic Networks, Journal of Computational & Graphical
                                                                               24
     Statistics, Vol. 15, No. 3, pp. 584-608(25)
Fraud Revisited: Applying our
•   Results:  methods
    – We identify 50-100
      of these cases per
      day
    – 95% match rate
    – 85% block rate
    – Credited with saving
      large telecom $5
      million / year
    – By far the most
      reliable matching
      criteria is the entity
                                     25
      based matching
Other applications,
                 conclusions…
• Our three parameter representation of a dynamic
  graph is a powerful, flexible, and efficient way of
  analyzing problems where looking at entities through
  time are of interest.
• Can be applied to any problem where entity modeling
  over time is of interest
      •   Other fraud: Guilt by association
      •   Language models
      •   Email
      •   Web pages
      •   Social Networks
      •   Terrorism
      •   Viral Marketing
                                                         26
Matching Algorithm

• What cases will we present to the reps?
• A combination of:
  – COI Overlap measures
     • At least two, and strength determined by uniqueness
       of overlap TNs
  – Name/address overlap
     • Edit distance no more than 50% of the longest name
       or address
  – $$ owed
     • Most interested in the ones that will generate the most
                                                                 27
       $$
Motivating Example: Repetitive
               Fraud
• When we catch a fraudster, we rarely catch the
  person, we simply shut down the line

• They will likely move on to another attempt at
  defrauding us, from a different network location

• Idea: record linkage - network identity has changed,
  but network behavior is the same

• We can use network behavior to indicate that the new
  line has the same “owner” as an old line            28
COI Signatures to COI
• To construct a COI from a COI signature:
  – Often the signature contains things we don’t
    want:
     • Businesses
     • High weight nodes
  – Often the signature doesn’t contain things we
    do want:
     • Local calls
     • Other carrier calls
• To combat this, createexample… by:
                  here’s an a COI                  29
  – Recursively expanding the COI signature
COI
signature


                 other

            me

                 other




                         30
Extended
 COI


                other

           me

                other




                        31
Enhanced
 COI


                other

           me

                other




                        32
Pruned COI



                  other

             me

                  other




                          33
A likely case of the same
fraudster showing up as a new
             number




                         Pink nodes exist
                         in both COI




                                   34
Fraud Revisited: Applying our
         methods
• Calculate the “informative overlap” score:
                                     wao wob      1
      overlap(a, b) =      ∑
                      {o in overlap}   wo
                                             ⋅
                                               d ao d ob


Where:
 wao = weight of edge from a to o
  wob = weight of edge from o to b
  wo = sum weight of edges to o
       Z                wao
                                      wob    B
  dao, dob are the graph distances from a and b to o
                    A          O
                                     wo
                                                           35
Outline
• Defining a dynamic graph, and our
  objectives
• A motivating example: Repetitive
  fraud in telecommunications
• Our approach: representation and
  approximation of dynamic graphs
• Parameter setting and applications to
  other domains
• Fraud revisited – applying our        36
Defining a Dynamic Graph, and
         Our Objectives

                                37
Defining Dynamic Graphs


• Dynamic Graphs represent
  transactional data –
    – Telecommunications network traffic
    – Web connectivity data
    – Web logs         Chris
         Corinna                      Daryl
    – Credit card data
      Anne
    – Online auction data                Debby
                        Jen
Kathleen       Fred              Zach
                        John
     Transactional data can be represented       38
Defining Dynamic Graphs
  • Dynamic Graphs
     – Nodes represent transactors
     – Edges are directed transactions
     – All edges have a time stamp
     – All edges have a weight (?)
     – May contain
         • Other attributes on nodes (avg bill, calling
          Corinna            Chris              Daryl
           plan)
         • Other attributes on edges (wireless, intl)
       Anne
                              Jen                   Debby
Kathleen          Fred                   Zach
                             John
                                                            39
Analysis of dynamic graphs

           Why is it hard?
• What do we want to know?
  – Clusters, social and behavioral patterns,
    fraud…



• Two main challenges:
  – Large Scale
                                                    40
    • Often tens or hundreds of millions of nodes
A motivating example: Repetitive
   fraud in telecommunications

                                   41
Motivating Example: Our data

                                           4 Million TNs
• Our graph is large….                       appear per
     • 350M Telephone numbers (TNs) currently week
      active on our Long Distance network, 300M
      calls/day
• ….dynamic….
                                             4 Million TNs
                                             disappear per
                                                 week




                                                           42
Motivating Example: Our data
…and sparse:
For one year of long distance data:




                                                    95% = 171




                                      Median = 34




                                                                43
• Our Approach to Dynamic
  Graphs
 –Definition of the graph
 –Representation as atomic   44
Our Approach: Defining
             dynamic graphs
We adopt an Exponentially Weighted Moving Average (EWMA):
                 G t = θG t − 1 ⊕ (1 − θ) g t
         i.e. today’s graph is defined recursively as a convex
         combination of yesterday’s graph and today’s data

  Alternatively, this is:                               t
            G t = ω1g1 ⊕ ω 2 g 2 ⊕  ⊕ ω t g t =       ⊕
                                                       i= 1
                                                              ωi g i
                                      t− i
                       where ωi = θ          (1 − θ)

    Through time, edge weights decay with decay rate θ
       • Advantages:
           - recent data has most influence
           - only one most recent graph need be stored

                                                                       45
Our Approach: Defining dynamic
•
                         graphs does the graph at
    Q: for transactional data, what
    timelet g(Gt)mean? of nodes and edges during the time period t
       - t be the collection
             t

        • We could use:         Gt = gt
                 Too narrow!
        • We could use the union of all time periods:
                                                  t
                 Gt = g1 ⊕ g 2 ⊕  ⊕ g t =          ⊕i= 1
                                                            gi
             Too broad!

        • We could use a moving average of the most recent time periods:
                                                                    t
                      Gt = g t − n ⊕ g t − n + 1 ⊕  ⊕ g t =     ⊕
                                                                 i= t − n
                                                                            gi
             Too many!


                                                                                 46
Our Approach: Defining dynamic
                graphs
   Selecting θ
θ closer to 1
• calls decay slower
• more historical data included
• smoother

θ closer to 0
• faster decay
• recent calls count more
• more power to detect changes
• less smooth



     θ = 1/(1-n) means weight reduces to 1/e times its original weight in n days
                                                                               47
Our Approach: Representation
• Because we are interested in entities, and
  to facilitate efficient storage, we represent
  the entire graph as a union of entity graphs.

• These are our atomic units of analysis, a
  signature of the node’s behavior.
                                   2222222222   100.3
                                   1111111111     90.1
                                   3213232423    27.0
• Storing hundreds of millions of small
                                   9098765453    11.3
                                   8876457326    5.4
  graphs is much more efficient than storing
                                   2122121212     3.0
                                   9908989898     0.9
  one massive graph, especially in an indexed
                                   8887878787     0.1

  database.                                   48
Our Approach: Representation
 Update the graph by updating all of the atomic units daily –
  so any time we access the data we have the most recent
  representation.

   Yesterday’s graph          Today’s data            Today’s graph

  2222222222    100.3        1111111111 20.0       1111111111    92.1
  1111111111
  3213232423
                  90.1
                 27.0    +   2122121212 10.0
                             9991119999 5.0    =   2222222222
                                                   3213232423
                                                                 90.3
                                                                 24.3
  9098765453     11.3                              9098765453    10.1
  8876457326     5.4                               8876457326    4.9
  2122121212      3.0                              2122121212     3.7
  9908989898      0.9                              9991119999     0.5
  8887878787      0.1                              3990898989     0.8
                                                   8887878787     0.09




                                                                49
Our Approach: Approximation
• We also use two types of approximation of
  the graph, by pruning.
  – Global pruning of edges – overall threshold (ε)
    below which edges are removed from the
    graph
  – Local pruning of edges – designate a maximal
    degree (k) for each entity



                                                 50
Our Approach: Approximation


Removes stale edges       1111111111   92.1        1111111111   92.1
                          2222222222   90.3        2222222222   90.3
Reduces effect of         3213232423   24.3        3213232423   24.3
supernodes
                          9098765453
                          8876457326
                                       10.1
                                       4.9     =   9098765453
                                                   8876457326
                                                                10.1
                                                                4.9
                          2122121212    3.7        2122121212    3.7
Increases efficiency      9991119999    0.5        Other        1.4
                          3990898989    0.8
Preserves entity weight   8887878787    0.09




                                                                 51
Our Approach: Approximation
• Defending k
  – Most entities have the vast majority of their
    weight in a fraction of their nodes




                                                    52
Our Approach: Parameter Setting
• Let A and B be two entities.
                               I j∈ A∩ B ( p A ( j ) + p B ( j ))
• Weighted Dice: WD( A, B) =           1+    ∑          pA ( j)
                                                  j




                    HD ( A, B ) =     ∑
                                    j∈ ( A∩ B )
                                                      p A ( j ) pB ( j )


• Hellinger Distance:


                                                                           53
54
Viral Marketing


“Word-of-Mouth”?




                     55
Research Questions


How could a firm use the consumer network to
  (network targeting) improve target marketing?


Do consumers who have already interacted with
  someone on the existing customer network respond
  to a direct mailer at a higher rate than those that do
  not?

Can variables constructed from the network enable the
  firm to better classify targets?


Does collective inference help us to improve target
  marketing?
                                                      56
Outline of Talk


Experimental Setup



                                                 4.98

                                                             3.87




Directed network marketing         1
                                                                           0.4


                             Non-Viral 1-21   V iral 1-21   Viral 22   Non-Targe t
                                                                          Viral




Local Network




Collective Network



                                                                                     57
Motivation
Consumer vs. Consumer “Network”




   Consumer                   Consumer “Network”
    –   No link structure       –   Link structure
                                –   Additional consumer information
                                –   Proxy for homophily
                                                              58
Motivation
Consumer vs. Consumer “Network”
                                                2 3          1 1 1 1 0 0 1 1 0 1


                                          1
                                               45 6
                                              7 8 9

                            Relational
                                          10
                                           Weighted
                            Database          Directed
                                               Graph  Relational
                                    1 1 1 1 1 0 1 1 0 1 Vectors
   Consumer                   Consumer “Network”
    –   No link structure       –   Link structure
                                –   Additional Information
                                –   Proxy for homophily
                                                                    59
Analyzing Consumer Networks

                 Why is it hard?
Scale
  – Tens or hundreds of millions of nodes and edges
  – Entire network can’t fit in main memory
Dynamic
  – Large numbers of nodes coming and going
    continuously
  – Accounting for temporal component of changing
    graphs is a challenge
Dependencies
  – Nodes are heterogeneous
  – Nodes are interdependent
                                                      60
What is Viral Marketing?


Explicit advocacy
  – Word-of-Mouth


Implicit advocacy
  – Hotmail


Network targeting
  – My study
                           61
Viral Marketing Research




          Economics
      Marketing Info Sys
                 Statistics
       Sociology
              Epidemiology
               CS




                              62
Viral Marketing Research


                   • Diffusion


  Economics
                   • Customer Value
Marketing Sys
        Info


      Statistics
Sociology
   Epidemiology
      CS
                   • Consumer
                     Preferences

                                      63
Viral Marketing Research
    The Ideal Dataset?

                                 in   dep
                 • Diffusion


 Economics
                 • Customer
Marketing Sys
    Info           Value
    Statistics
Sociology
 Epidemiology
     CS

                 • Consumer
                   Preferences
                                        64
Evidence of Viral Marketing?


We need explicit links as inputs and
 adoption response as the
 dependent

… Our Testbed is closer to the Ideal
 than other published study!

  Remember wiretapping is illegal!   65
Viral Marketing Data: Call Detail


                                                                        Internet telephony service   Existing customers

EXPERIMENT
                                                                                                     Viral targets
                                                                        Millions of calls a day
                                4.98

                                              3.87




NET MKTG
                                                                        We observe calls to and
                 1
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




                                                                        from existing customers
LOCAL




COLLECTIVE




                                                                                                                          66
Viral Marketing Data:
                                                                            Response to Mailer


EXPERIMENT
                                                                        Two months after mailer
                                                                        calculated how many targets
                                                                        responded
                                4.98

                                              3.87




NET MKTG         1
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




LOCAL




COLLECTIVE




                                                                                                      67
Do consumers who have already interacted with
                                                                        someone on the existing customer network respond
                                                                        to a direct mailer at a higher rate than those that do
                                                                                                  not?

                                                                        Model Variables                           Models
EXPERIMENT
                                                                        Dependent Variable: Response    Odds Ratio
                                                                          to direct mailer RES
                                                                           – If response is positive,
NET MKTG
                                4.98

                                              3.87
                                                                              RES = 1.                  ANOVA
                 1




                                                                           – If negative, RES = 0.
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




                                                                                                        Analysis of Deviance Table
                                                                        Independent Variables:
LOCAL                                                                      Segment, traditional         Classification with Logistic
                                                                           marketing attribute, viral      regression evaluated by Area
                                                                           attribute                       under the ROC curve
                                                                            – Segment 1-21
COLLECTIVE
                                                                            – Loyalty, Demographics,
                                                                                Geographics
                                                                            – Binary Viral Attribute                             68
Do consumers who have already interacted with
                                                                        someone on the existing customer network respond
                                                                        to a direct mailer at a higher rate than those that do
                                                                                                  not?

                                                                           Model Variables
EXPERIMENT
                                                                        Dependent Variable: Response
                                                                          to direct mailer RES
                                                                           – If response is positive,
NET MKTG
                 1
                                4.98

                                              3.87
                                                                              RES = 1.
                                                                           – If negative, RES = 0.
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




                                                                        Independent Variables:
LOCAL                                                                      Segment, traditional
                                                                           marketing attribute, viral
                                                                           attribute
                                                                            – Segment 1-21
COLLECTIVE
                                                                            – Loyalty, Demographics,
                                                                                Geographics
                                                                            – Binary Viral Attribute                     69
Do consumers who have already interacted with
                                                                           someone on the existing customer network
                                                                         respond to a direct mailer at a higher rate than
                                                                                       those that do not?



EXPERIMENT                                                                                 Model                                                Deviance       DF    Change     s
                                                                                                                                 Variable                            Deviance   i
                                                                                                                                                                                g
                                                                                                                                 Intercept     11200
NET MKTG
                 1
                                4.98

                                              3.87

                                                                         Analysis of Deviance: The table                         Segment       10869       9        63          *
                                                                            confirms the significance of the main effects
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                                                                                                                                                *
                                                                            and of the interactions.
                                                            Viral




                                                                                                                                Segment +      10733       1        370         *
                                                                                                                                   Cell                                         *
                                                                            Each level of the nested model is significant
                                                                            when using a chi-squared approximation for           Segment +     10687       8        41          *
                                                                            the differences of the deviances.                      Cell +                                       *
LOCAL
                                                                                                                                Interactions
                                                                            The fact that so many interactions are
                                                                            significant demonstrates that the viral effect is
                                                                            stronger for different segments of the
                                                                            prospect population.
COLLECTIVE




                                                                                                                                                                          70
Does collective inference help
                                                                        to improve target marketing?

                                                                         Experiment Setup
EXPERIMENT

                                                                         Dependent Variable: Response to direct mailer RES
                                                                             – If response is positive, RES = 1
NET MKTG
                                4.98

                                              3.87
                                                                             – If negative, RES = 0
                 1




                                                                             – RES over two month time period after mailer
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




                                                                         Independent Variables: Segment, traditional marketing attributes,
LOCAL                                                                        viral attribute
                                                                              – Segment 1-21
                                                                              – Loyalty, demographics, geographics
                                                                              – Binary viral attribute
COLLECTIVE                                                                    – Local network attributes
                                                                              – Collective inference prediction

                                                                                                                                             71
                                                                         Sample: Subset of viral targets
Does collective inference help to
                                                                          improve target marketing?


EXPERIMENT                                                                                                Model                                                                Guilt-by-association
                                                                                                                                                                        weighted-vote RN Classifier (wvRN)



NET MKTG
                                4.98

                                              3.87




                 1
                                                             0.4


           Non-V iral 1-21   V ir al 1-21   V ir al 22   Non-Tar ge t
                                                            Viral




                                                                                                                                                                                    ?
LOCAL



                                                                        eta = β 0 + β 1 ( L) + β 2 (G) + β 3 ( D) + β 4 (O) + β 5 ( N B ) + β 6 ( N L ) + β 7 ( N C )

COLLECTIVE                                                              RESP = exp(eta) / 1 + exp(eta)




                                                                                                                                                                                                             72
• Introduction
                                                                         Toolkit
           Relational classifiers                                       • Case study
Relational classifiers for case study
  – wvRN
                              1
       p ( yi = c | N i ) =
                              Z
                                   ∑     wi , j ⋅ p ( y j = c | N j )
                                  v j ∈ Ni

  – nBC
     • Naïve Bayes on neighbor class labels
     • Markov Random Field, following Chakrabarti et al. (1998)
          – when uncertainty in neighbor labels
          – some minor modifications
  – nLB
     • following Lu & Getoor’s (2003) Link-based Classifier
     • for a node i, form its neighbor-class vector CV(i)
     • logistic regression based on CV(i)
  – cdRN
     • for each class cdRN estimates neighbor-class distribution
       RV(c)                                                                       73
     • p(yi = c|Ni) is the normalized distance between CV(i) and
• Introduction
                                                          Toolkit
         Collective inference                            • Case study


– iterative classification (following Lu & Getoor, 2003)
   • initially assign a “prior” to all nodes using local classifier: p(0)
     (yi = C)
   • Select ordering O
   • walk down chain, classifying with MAP classification
   • Final class labels selected upon convergence or 1000
     iterations

– relaxation labeling (following Chakrabarti et al., 1998)
   • initially assign a “prior” to all nodes using local classifier: p(0)
     (yi = C)
   • estimate p(t)(yi = C) using relational classifier based on p(t-1)

– Gibbs sampling (following Geman & Geman, 1984)
   • Select ordering O on nodes, randomly
   • initially sample labels based on priors                                74
Overview of Contributions


Question 1 – This is the first evidence
 that viral marketing exists in explicit
 cons
Question 2 – Show we can use
 constructed consumer network
 attributes to improve over traditional
 target marketing methods
Question 3 – First time collective
 inference has been used in a real-world
 target marketing problem
                                      75
Essay 1: Results




                   76
Prior Results


                  Model
Odds:
            p
  Odds =          (Range [odds scale] : 0 ... ∞ )
           1- p

Odds Ratio: ratio of odds (focus:
    risk indicator, covariate) odds of
    responding to the mailer in
    network neighbor target group /
    odds in non-network neighbor
    target group

    The odds ratio measures the
    ‘belief’ in a given outcome in two
    different populations or under two
    different conditions. If the odds
    ratio is one, the two populations or
    conditions are similar.
                                                                    77
Prior Results

                         1
Cumulative % of Sales




                        0.8

                        0.6

                        0.4
                                                 All
                        0.2                      "All + NN"

                         0
                              0   0.2   0.4      0.6     0.8   1
     Cumulative % of Consumers Targeted (Ranked by Predicted
                             Sales)
                                                                   78
Network-based Marketing


Experiment Setup
Dependent Variable: Response to direct mailer RES
   – If response is positive, RES = 1
   – If negative, RES = 0
   – RES over two month time period after mailer

Independent Variables: Segment, traditional marketing attributes, viral
   attribute
    – Segment 1-21
    – Loyalty, demographics, geographics
    – Binary NN attribute

Sample: All targets                                                       79
Network-based Marketing


 Model

 Logistic Regression:Logistic Regression across all segments including viral attributes.


eta =         β 0 + β 1 ( L) + β 2 (G ) + β 3 ( D) + β 4 (O) + β 5 ( N B )         {       }
RESP = exp(eta ) / 1 + exp(eta )

                                                                                           80
Prior Results




                81
More Sophisticated Local Network-
                       based Attributes?

 Experiment Setup
Dependent Variable: Response to direct mailer RES
    – If response is positive, RES = 1
    – If negative, RES = 0
    – RES over two month time period after mailer

Independent Variables: Segment, traditional marketing attributes, viral attribute
     – Segment 1-21
     – Loyalty, demographics, geographics
     – Binary viral attribute
     – Local network attributes


Sample: All NN targets
                                                                                    82
Local: Network Neighbor
                                    Attributes


    Model

Logistic Regression:Logistic Regression across all segments including viral attribute, local network
    attributes




         eta = β 0 + β 1 ( L) + β 2 (G ) + β 3 ( D) + β 4 (O) +{ β 5 ( N B ) } {β 6 ( N L )}
                                                                             +


         RESP = exp(eta ) / 1 + exp(eta )




                                                                                                       83
Ranking of “NN” targets


                         1

                        0.8
Cumulative % of Sales




                        0.6

                        0.4
                                              All
                        0.2                   "All + net"

                         0
                              0   0.2   0.4   0.6      0.8   1
        Cumulative % of Consumers Targeted (Ranked by Predicted
                                Sales)
                                                                  84
Results: The bottom line



  Hypothetical (future) profit improvement:
targeted cost total cost resp 1-21 viral resp. viral hyp 6-mo. profit base profit  viral profit   hypothetical profit
  5000000 0.2 1000000        0.30%      1.30%     4.40%       179.94 $1,699,100.00 $10,696,100.00     $38,586,800.00

                                                                   improvement?     $8,997,000.00     $36,887,700.00




                                                                                                               85
Contributions


 Results

Directed network-based marketing

   Consumers that have already interacted with an existing customer adopt a product (eg., respond
      to a direct mailer) at a higher rate than those that have not.

   Variables constructed from the consumer’s immediate network enable the firm to (classify/rank
      targets, generate profit) better.




                                                                                                   86
Even more Sophisticated
           Network-based Attributes?


Can we use collective inference to make
simultaneous inferences about nodes on the
graph?
  –what about massive size of network?




                                         87
Our Approach: Parameter Setting
• We have now defined a representation of a dynamic
  graph by three parameters:

   θ − controls the decay of edges and edge weights
   ε − global pruning parameter
   k – local pruning parameter

• For a given application, we choose the parameter
  values by optimizing predictive performance,
  selecting the parameters which optimize a distance
  metric

  – Two distance metrics we apply:

     • Weighted Dice
     • Hellinger Distance

     … But may be domain dependent                     88
Our Approach: Parameter Setting
          θ = 1 , controls the decay of edges and edge weights
Default
:         ε = 0 , global pruning parameter
          k = ∞ ,local pruning parameter




                                                                  89
Our Approach: Summary
•   Entities are updated daily for all 350 million phone numbers

•   Up-to-date representation of all entities. These entities are stored in
    an indexed data base for easy storage and retrieval

•   Our two main challenges:
     – Scale: updates the entities on a daily basis, don’t have to
       retrieve it. Entities are concise summaries, and are indexed for
       fast retrieval

     – Dynamic nature of data: entities are a summary of behavior
       over a time period (determined by θ) and can be tracked through
       time
                                                                         90

Weitere ähnliche Inhalte

Ähnlich wie Hill Supernova 2008

Customer centric network management
Customer centric network  management Customer centric network  management
Customer centric network management Rafael Junquera
 
Driving Customer Engagement Through Multichannel Marketing
Driving Customer Engagement Through Multichannel MarketingDriving Customer Engagement Through Multichannel Marketing
Driving Customer Engagement Through Multichannel MarketingTim Suther
 
Credit Suisse Presentation
Credit Suisse PresentationCredit Suisse Presentation
Credit Suisse Presentationfinance48
 
Credit Suisse Presentation
Credit Suisse PresentationCredit Suisse Presentation
Credit Suisse Presentationfinance48
 
DecisionPoint Investor Presentation-January 2011
DecisionPoint Investor Presentation-January 2011DecisionPoint Investor Presentation-January 2011
DecisionPoint Investor Presentation-January 2011EHodges
 
Worldwide Business Research
Worldwide Business ResearchWorldwide Business Research
Worldwide Business Researchwbr_marketing
 
Managing brands in digital and social channels
Managing brands in digital and social channelsManaging brands in digital and social channels
Managing brands in digital and social channelsGewoon Groen
 
Deals & Mobile: the Race for Hyper-Local
Deals & Mobile: the Race for Hyper-LocalDeals & Mobile: the Race for Hyper-Local
Deals & Mobile: the Race for Hyper-LocalJoshua Engroff
 
Seamless Receipts Investor Presentation January 2011
Seamless Receipts Investor Presentation January 2011Seamless Receipts Investor Presentation January 2011
Seamless Receipts Investor Presentation January 2011Keith Cowing
 
Fiber Optic Project Assessment
Fiber Optic Project AssessmentFiber Optic Project Assessment
Fiber Optic Project Assessmentquirozlf
 
LBG market insights march 2012
LBG market insights march 2012LBG market insights march 2012
LBG market insights march 2012Claire Calmejane
 
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...Digiday
 
Marketing Analytics Effectiveness
Marketing Analytics Effectiveness Marketing Analytics Effectiveness
Marketing Analytics Effectiveness IBM
 
Driving brand advocacy for telcos
Driving brand advocacy for telcosDriving brand advocacy for telcos
Driving brand advocacy for telcosA Meili
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Jos van Dongen
 
Onlinet Case Study-Raiffeisen Bank
Onlinet Case Study-Raiffeisen BankOnlinet Case Study-Raiffeisen Bank
Onlinet Case Study-Raiffeisen Banknickpenev
 
Onlinet Case Study Raiffeisen Bank Eng
Onlinet Case Study   Raiffeisen Bank  EngOnlinet Case Study   Raiffeisen Bank  Eng
Onlinet Case Study Raiffeisen Bank Engradu_postolache
 
Thương mại di động M-Commerce
Thương mại di động M-CommerceThương mại di động M-Commerce
Thương mại di động M-CommerceCat Van Khoi
 

Ähnlich wie Hill Supernova 2008 (20)

Customer centric network management
Customer centric network  management Customer centric network  management
Customer centric network management
 
Driving Customer Engagement Through Multichannel Marketing
Driving Customer Engagement Through Multichannel MarketingDriving Customer Engagement Through Multichannel Marketing
Driving Customer Engagement Through Multichannel Marketing
 
Credit Suisse Presentation
Credit Suisse PresentationCredit Suisse Presentation
Credit Suisse Presentation
 
Credit Suisse Presentation
Credit Suisse PresentationCredit Suisse Presentation
Credit Suisse Presentation
 
DecisionPoint Investor Presentation-January 2011
DecisionPoint Investor Presentation-January 2011DecisionPoint Investor Presentation-January 2011
DecisionPoint Investor Presentation-January 2011
 
Worldwide Business Research
Worldwide Business ResearchWorldwide Business Research
Worldwide Business Research
 
IPMA Multi-channel Webinar
IPMA Multi-channel WebinarIPMA Multi-channel Webinar
IPMA Multi-channel Webinar
 
Managing brands in digital and social channels
Managing brands in digital and social channelsManaging brands in digital and social channels
Managing brands in digital and social channels
 
Deals & Mobile: the Race for Hyper-Local
Deals & Mobile: the Race for Hyper-LocalDeals & Mobile: the Race for Hyper-Local
Deals & Mobile: the Race for Hyper-Local
 
Seamless Receipts Investor Presentation January 2011
Seamless Receipts Investor Presentation January 2011Seamless Receipts Investor Presentation January 2011
Seamless Receipts Investor Presentation January 2011
 
IS Analysis for GENI.org
IS Analysis for GENI.orgIS Analysis for GENI.org
IS Analysis for GENI.org
 
Fiber Optic Project Assessment
Fiber Optic Project AssessmentFiber Optic Project Assessment
Fiber Optic Project Assessment
 
LBG market insights march 2012
LBG market insights march 2012LBG market insights march 2012
LBG market insights march 2012
 
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...
Digiday Mobile with Tapad: The New Imperative: Connecting with Consumers Cros...
 
Marketing Analytics Effectiveness
Marketing Analytics Effectiveness Marketing Analytics Effectiveness
Marketing Analytics Effectiveness
 
Driving brand advocacy for telcos
Driving brand advocacy for telcosDriving brand advocacy for telcos
Driving brand advocacy for telcos
 
Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?Bin3 Open Source BI, overhyped or undervalued?
Bin3 Open Source BI, overhyped or undervalued?
 
Onlinet Case Study-Raiffeisen Bank
Onlinet Case Study-Raiffeisen BankOnlinet Case Study-Raiffeisen Bank
Onlinet Case Study-Raiffeisen Bank
 
Onlinet Case Study Raiffeisen Bank Eng
Onlinet Case Study   Raiffeisen Bank  EngOnlinet Case Study   Raiffeisen Bank  Eng
Onlinet Case Study Raiffeisen Bank Eng
 
Thương mại di động M-Commerce
Thương mại di động M-CommerceThương mại di động M-Commerce
Thương mại di động M-Commerce
 

Mehr von TerrorNova Guild

Supernova 2009: Stop Stalking Me: How To Engage Youth
Supernova 2009: Stop Stalking Me: How To Engage YouthSupernova 2009: Stop Stalking Me: How To Engage Youth
Supernova 2009: Stop Stalking Me: How To Engage YouthTerrorNova Guild
 
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...TerrorNova Guild
 
Supernova 2009 Overview and Resource Guide
Supernova 2009 Overview and Resource GuideSupernova 2009 Overview and Resource Guide
Supernova 2009 Overview and Resource GuideTerrorNova Guild
 
Supernova 2009: Day 1 Presenters
Supernova 2009: Day 1 PresentersSupernova 2009: Day 1 Presenters
Supernova 2009: Day 1 PresentersTerrorNova Guild
 
Supernova 2009 Legal Track Sponsors
Supernova 2009 Legal Track SponsorsSupernova 2009 Legal Track Sponsors
Supernova 2009 Legal Track SponsorsTerrorNova Guild
 
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 Adoption
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 AdoptionSupernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 Adoption
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 AdoptionTerrorNova Guild
 
Supernova 2009: Chris Anderson (Wired) - Atoms are the New Bits
Supernova 2009: Chris Anderson (Wired) - Atoms are the New BitsSupernova 2009: Chris Anderson (Wired) - Atoms are the New Bits
Supernova 2009: Chris Anderson (Wired) - Atoms are the New BitsTerrorNova Guild
 
Monetizing without Advertising Supernova 2008
Monetizing without Advertising Supernova 2008Monetizing without Advertising Supernova 2008
Monetizing without Advertising Supernova 2008TerrorNova Guild
 
Clay Shirky Supernova 2008
Clay Shirky Supernova 2008Clay Shirky Supernova 2008
Clay Shirky Supernova 2008TerrorNova Guild
 

Mehr von TerrorNova Guild (20)

Supernova 2009: Stop Stalking Me: How To Engage Youth
Supernova 2009: Stop Stalking Me: How To Engage YouthSupernova 2009: Stop Stalking Me: How To Engage Youth
Supernova 2009: Stop Stalking Me: How To Engage Youth
 
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...
Supernova 2009: Eric Clemons and the Prospects for Antitrust Action Against G...
 
Supernova 2009 Overview and Resource Guide
Supernova 2009 Overview and Resource GuideSupernova 2009 Overview and Resource Guide
Supernova 2009 Overview and Resource Guide
 
Supernova 2009: Day 1 Presenters
Supernova 2009: Day 1 PresentersSupernova 2009: Day 1 Presenters
Supernova 2009: Day 1 Presenters
 
Supernova 2009 Legal Track Sponsors
Supernova 2009 Legal Track SponsorsSupernova 2009 Legal Track Sponsors
Supernova 2009 Legal Track Sponsors
 
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 Adoption
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 AdoptionSupernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 Adoption
Supernova 2009: John Curran (ARIN) - IPv4 Depletion, IPv6 Adoption
 
Supernova 2009: Chris Anderson (Wired) - Atoms are the New Bits
Supernova 2009: Chris Anderson (Wired) - Atoms are the New BitsSupernova 2009: Chris Anderson (Wired) - Atoms are the New Bits
Supernova 2009: Chris Anderson (Wired) - Atoms are the New Bits
 
Churchill Supernova 2008
Churchill Supernova 2008Churchill Supernova 2008
Churchill Supernova 2008
 
Iannucci Supernova 2008
Iannucci Supernova 2008Iannucci Supernova 2008
Iannucci Supernova 2008
 
Contractor Supernova 2008
Contractor Supernova 2008Contractor Supernova 2008
Contractor Supernova 2008
 
Dsouza Supernova 2008
Dsouza Supernova 2008Dsouza Supernova 2008
Dsouza Supernova 2008
 
Maxwell Supernova 2008
Maxwell Supernova 2008Maxwell Supernova 2008
Maxwell Supernova 2008
 
Bonabeau Supernova 2008
Bonabeau Supernova 2008Bonabeau Supernova 2008
Bonabeau Supernova 2008
 
Hargittai Supernova 2008
Hargittai Supernova 2008Hargittai Supernova 2008
Hargittai Supernova 2008
 
St Arnaud Supernova 2008
St Arnaud Supernova 2008St Arnaud Supernova 2008
St Arnaud Supernova 2008
 
Waldfogel Supernova 2008
Waldfogel Supernova 2008Waldfogel Supernova 2008
Waldfogel Supernova 2008
 
Monetizing without Advertising Supernova 2008
Monetizing without Advertising Supernova 2008Monetizing without Advertising Supernova 2008
Monetizing without Advertising Supernova 2008
 
Elfving Supernova 2008
Elfving Supernova 2008Elfving Supernova 2008
Elfving Supernova 2008
 
Hosanagar Supernova 2008
Hosanagar Supernova 2008Hosanagar Supernova 2008
Hosanagar Supernova 2008
 
Clay Shirky Supernova 2008
Clay Shirky Supernova 2008Clay Shirky Supernova 2008
Clay Shirky Supernova 2008
 

Kürzlich hochgeladen

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03DallasHaselhorst
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 

Kürzlich hochgeladen (20)

International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03Cybersecurity Awareness Training Presentation v2024.03
Cybersecurity Awareness Training Presentation v2024.03
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 

Hill Supernova 2008

  • 1. Deriving Value from Consumer Networks Shawndra Hill University of Pennsylvania Supernova 2008 June 17, 2008 Joint work with: Bob Bell, Deepak Agarwal, Foster Provost, Chris 1 Volinsky
  • 2. Communication Networks – Nodes represent transactors – Edges are explicit transactions 2
  • 3. How can firms use data on explicit consumer networks to improve consumer rankings? For example, in order to rank customers by likelihood of … Response to a target marketing offer Fraud Donating to a cause Spreading information about a product … 3
  • 4. Consumer Networks Email Dependencies – Nodes are interdependent Web purchases Call detail logs Scale Blogs – Tens or hundreds of millions of nodes and Discussion forums edges Online auctions Recommender sites Dynamic – Large numbers of nodes Networking portals coming and going continuously 4
  • 5. Business problem: Target consumers for new product • Large telecommunications company • Product: new telecom service • Large direct marketing campaign • Long experience with targeted marketing • Sophisticated segmentation models based on data and intuition e.g., regarding the types of customers known or thought to have affinity for this type of service 5
  • 6. The Data The firm determined 21 segments by a SEGMENT ID combination of customer characteristics 1 2 Geography (G) 3 Loyalty (L) 4 5 State Existing Customer 6 Zip 7 Prior spending Urban 8 Current plan 9 Cable Region Frequent switch 10 11 Demographics (D) Other (O) 12 13 Type of Mailer 14 Age Internet Type 15 Gender 16 17 Children 18 Head of Household 19 20 6 21 separately, assessed >150 potential attributes from these categories
  • 7. What’s new? Directed Network-based Marketing Existing customers Store millions of inbound/ outbound “Network Neighbor” targets Non-customers communications a day to/ from existing customers Constructed representation of consumer network over prior 6 months Can this additional data improve customer ranking significantly? 7
  • 8. What’s new? Directed Network-based Marketing SEGMENT ID 1 Store millions of inbound/ 2 outbound 3 4 communications a day to/ 5 6 from existing customers 7 8 9 Constructed 10 11 representation of 12 13 consumer network over 14 15 prior 6 months 16 17 18 19 20 21 8 important 22
  • 9. Results Relative Take Rates for Marketing Segments 4.82 (1.35%) 2.96 (0.83%) 1 0.4 (0.28%) (0.11%) Non-NN 1-21 NN 1-21 NN 22 Non-Targe t NN 9
  • 10. More Sophisticated Local Network-based Attributes? Attribute Description Degree Number of unique customers communicated with before the mailer # Transactions Number of transactions to/from customers before the mailer Seconds of Number of seconds communicated with communication customers before mailer Connected to Is an influencer in your local neighborhood? influencer? Connected Size of the connected component target component size belongs to. Similarity Max overlap in local neighborhood with existing (structural customer equivalence) 10
  • 11. More sophisticated Network attributes? For example collective inference Relational classifier – WvRN 1 p ( yi = c | N i ) = Z ∑ wi , j ⋅ p ( y j = c | N j ) v j ∈ Ni 11
  • 12. More sophisticated Network attributes? For example collective inference Relational classifier – WvRN 1 p ( yi = c | N i ) = Z ∑ wi , j ⋅ p ( y j = c | N j ) v j ∈ Ni 12
  • 13. More sophisticated Network attributes? For example collective inference Relational classifier – WvRN 1 p ( yi = c | N i ) = Z ∑ wi , j ⋅ p ( y j = c | N j ) v j ∈ Ni 13
  • 14. Contributions Consumers that have already interacted with an existing customer adopt a product (eg., respond to a direct mailer) at a higher rate than those that have not. Variables constructed from the consumer’s immediate network enable the firm to (classify/rank targets, generate profit) better. Global network attributes can be used to help rank consumers two hops away from existing customers Our ability to improve consumer ranking translated into significant profit to the firm 14
  • 15. Overview: Our Objective Design a generic definition, representation, and approximation for dynamic graphs that can be used for problems where looking at entities through time is of interest. – What is the graph at time t: Gt – How does one account for addition and 15 attrition of nodes
  • 16. Business problem: Repetitive Subscription Fraud • Large telecommunications company • telecom service • Long experience with fraud detection • Sophisticated models based on record linkage 16
  • 17. Motivating Example: Repetitive Fraud Lots of people cant pay their bill, but they want phone service anyway: Name Ted Hanley Name Debra Handley Address 14 Pearl Dr Address 14 Pearl Dr St Peters, MN St Peters, MN Balance $208.00 Balance $142.00 Disconnected 2/19/04 (nonpayment) Connected 2/22/04 Name Elizabeth Harmon Name Elizabeth Harmon Address APT 1045 Address 180 N 40TH PL 4301 ST JOHN RD APT 40 SCOTTSDALE, AZ PHOENIX, AZ Balance $149.00 Balance $72.00 Disconnected 2/19/04 (nonpayment) Connected 1/31/04 17
  • 18. Motivating Example: Repetitive Fraud How can we identify that it is the same person behind both accounts? Old New 67855232344 4215554597 Account: Account: Old New 2003-02-25 2003-02-13 Date: Date: Old DAVID New DAVID Name: ATKINS Name: WATKINS 10 Old 10 NIGHT WAY New HATSWORT Address: APT 114 Address: H DR New Old City: FAYVILLE BONDALE City: Old New AL AL State: State: Old Zip: 302141798 New Zip: 300021530 Old II 551212760990 New II 5312074639 Code: 1 Code: 501 Old New 284.62 5.83 Balance: Balance: 18
  • 19. Motivating Example: Challenges • This is a problem of record linkage and graph matching, but because of obfuscation, we can only count on entity matching. • But the number of potential matches 300K/month 10 K/day is huge… Connect pool T Restrict pool 5 K/day 150 K/month 45 billion comparisons • If we have an efficient representation of 19 entities, we might be able to make a dent….
  • 20. Our Approach: Defining Dynamic Graphs We adopt an Exponentially Weighted Moving Average (EWMA): G t = θG t − 1 ⊕ (1 − θ) g t i.e. today’s graph is defined recursively as a convex combination of yesterday’s graph and today’s data • Advantages: - recent data has most influence - only one most recent graph need be stored We also use two types of approximation of the graph, by pruning: Global pruning of edges – overall threshold (ε ) below which edges are removed from the graph Local pruning of edges – designate a maximal in and out degree (k) for each entity, and assign an overflow bin 20
  • 21. Our Approach: Defining Dynamic Graphs Selecting θ θ closer to 1 • calls decay slower • more historical data included • smoother θ closer to 0 • faster decay • recent calls count more • more power to detect changes • less smooth 21
  • 22. Applying our Method • Results: – We identify 50-100 of these cases per day – 95% match rate – 85% block rate – ollars – Credited with saving telecom millions if dollars – By far the most reliable matching criteria is the entity based matching – Optimized parameter set outperforms both current process and current theta and optimized k *We also demonstrate our method on email and clickstream data 22
  • 23. Other applications, conclusions… • Our three parameter representation of a dynamic graph is a powerful, flexible, and efficient way of analyzing problems where looking at entities through time are of interest. • Can be applied to any problem where entity modeling over time is of interest • Other fraud: Guilt by association • Email • Web pages • Social Networks • Terrorism • Viral Marketing • What class of problems is this good for? After all, there is no model!!! • Further work – More complex entities – Distance Functions – More flexible, adaptive parameter setting 23
  • 24. Want more? Deriving Value from Consumer Networks 2. Network-based Marketing: Identifying Likely Adopters via Consumer Networks Shawndra Hill, F. Provost, C. Volinsky, Network-based Marketing: Identifying Likely Adopters via Consumer Networks, Statistical Science, Vol. 21, No. 2, pp. 256-276 2. Collective Inference in Consumer Networks Shawndra Hill, F. Provost, C. Volinsky, Collective Inference in Consumer Networks, to be submitted to Marketing Science March 2007. 3. Building an Effective Representation for Dynamic Networks Shawndra Hill, D. Agarwal, R. Bell, C. Volinsky , Building an Effective Representation for Dynamic Networks, Journal of Computational & Graphical 24 Statistics, Vol. 15, No. 3, pp. 584-608(25)
  • 25. Fraud Revisited: Applying our • Results: methods – We identify 50-100 of these cases per day – 95% match rate – 85% block rate – Credited with saving large telecom $5 million / year – By far the most reliable matching criteria is the entity 25 based matching
  • 26. Other applications, conclusions… • Our three parameter representation of a dynamic graph is a powerful, flexible, and efficient way of analyzing problems where looking at entities through time are of interest. • Can be applied to any problem where entity modeling over time is of interest • Other fraud: Guilt by association • Language models • Email • Web pages • Social Networks • Terrorism • Viral Marketing 26
  • 27. Matching Algorithm • What cases will we present to the reps? • A combination of: – COI Overlap measures • At least two, and strength determined by uniqueness of overlap TNs – Name/address overlap • Edit distance no more than 50% of the longest name or address – $$ owed • Most interested in the ones that will generate the most 27 $$
  • 28. Motivating Example: Repetitive Fraud • When we catch a fraudster, we rarely catch the person, we simply shut down the line • They will likely move on to another attempt at defrauding us, from a different network location • Idea: record linkage - network identity has changed, but network behavior is the same • We can use network behavior to indicate that the new line has the same “owner” as an old line 28
  • 29. COI Signatures to COI • To construct a COI from a COI signature: – Often the signature contains things we don’t want: • Businesses • High weight nodes – Often the signature doesn’t contain things we do want: • Local calls • Other carrier calls • To combat this, createexample… by: here’s an a COI 29 – Recursively expanding the COI signature
  • 30. COI signature other me other 30
  • 31. Extended COI other me other 31
  • 32. Enhanced COI other me other 32
  • 33. Pruned COI other me other 33
  • 34. A likely case of the same fraudster showing up as a new number Pink nodes exist in both COI 34
  • 35. Fraud Revisited: Applying our methods • Calculate the “informative overlap” score: wao wob 1 overlap(a, b) = ∑ {o in overlap} wo ⋅ d ao d ob Where: wao = weight of edge from a to o wob = weight of edge from o to b wo = sum weight of edges to o Z wao wob B dao, dob are the graph distances from a and b to o A O wo 35
  • 36. Outline • Defining a dynamic graph, and our objectives • A motivating example: Repetitive fraud in telecommunications • Our approach: representation and approximation of dynamic graphs • Parameter setting and applications to other domains • Fraud revisited – applying our 36
  • 37. Defining a Dynamic Graph, and Our Objectives 37
  • 38. Defining Dynamic Graphs • Dynamic Graphs represent transactional data – – Telecommunications network traffic – Web connectivity data – Web logs Chris Corinna Daryl – Credit card data Anne – Online auction data Debby Jen Kathleen Fred Zach John Transactional data can be represented 38
  • 39. Defining Dynamic Graphs • Dynamic Graphs – Nodes represent transactors – Edges are directed transactions – All edges have a time stamp – All edges have a weight (?) – May contain • Other attributes on nodes (avg bill, calling Corinna Chris Daryl plan) • Other attributes on edges (wireless, intl) Anne Jen Debby Kathleen Fred Zach John 39
  • 40. Analysis of dynamic graphs Why is it hard? • What do we want to know? – Clusters, social and behavioral patterns, fraud… • Two main challenges: – Large Scale 40 • Often tens or hundreds of millions of nodes
  • 41. A motivating example: Repetitive fraud in telecommunications 41
  • 42. Motivating Example: Our data 4 Million TNs • Our graph is large…. appear per • 350M Telephone numbers (TNs) currently week active on our Long Distance network, 300M calls/day • ….dynamic…. 4 Million TNs disappear per week 42
  • 43. Motivating Example: Our data …and sparse: For one year of long distance data: 95% = 171 Median = 34 43
  • 44. • Our Approach to Dynamic Graphs –Definition of the graph –Representation as atomic 44
  • 45. Our Approach: Defining dynamic graphs We adopt an Exponentially Weighted Moving Average (EWMA): G t = θG t − 1 ⊕ (1 − θ) g t i.e. today’s graph is defined recursively as a convex combination of yesterday’s graph and today’s data Alternatively, this is: t G t = ω1g1 ⊕ ω 2 g 2 ⊕  ⊕ ω t g t = ⊕ i= 1 ωi g i t− i where ωi = θ (1 − θ) Through time, edge weights decay with decay rate θ • Advantages: - recent data has most influence - only one most recent graph need be stored 45
  • 46. Our Approach: Defining dynamic • graphs does the graph at Q: for transactional data, what timelet g(Gt)mean? of nodes and edges during the time period t - t be the collection t • We could use: Gt = gt Too narrow! • We could use the union of all time periods: t Gt = g1 ⊕ g 2 ⊕  ⊕ g t = ⊕i= 1 gi Too broad! • We could use a moving average of the most recent time periods: t Gt = g t − n ⊕ g t − n + 1 ⊕  ⊕ g t = ⊕ i= t − n gi Too many! 46
  • 47. Our Approach: Defining dynamic graphs Selecting θ θ closer to 1 • calls decay slower • more historical data included • smoother θ closer to 0 • faster decay • recent calls count more • more power to detect changes • less smooth θ = 1/(1-n) means weight reduces to 1/e times its original weight in n days 47
  • 48. Our Approach: Representation • Because we are interested in entities, and to facilitate efficient storage, we represent the entire graph as a union of entity graphs. • These are our atomic units of analysis, a signature of the node’s behavior. 2222222222 100.3 1111111111 90.1 3213232423 27.0 • Storing hundreds of millions of small 9098765453 11.3 8876457326 5.4 graphs is much more efficient than storing 2122121212 3.0 9908989898 0.9 one massive graph, especially in an indexed 8887878787 0.1 database. 48
  • 49. Our Approach: Representation Update the graph by updating all of the atomic units daily – so any time we access the data we have the most recent representation. Yesterday’s graph Today’s data Today’s graph 2222222222 100.3 1111111111 20.0 1111111111 92.1 1111111111 3213232423 90.1 27.0 + 2122121212 10.0 9991119999 5.0 = 2222222222 3213232423 90.3 24.3 9098765453 11.3 9098765453 10.1 8876457326 5.4 8876457326 4.9 2122121212 3.0 2122121212 3.7 9908989898 0.9 9991119999 0.5 8887878787 0.1 3990898989 0.8 8887878787 0.09 49
  • 50. Our Approach: Approximation • We also use two types of approximation of the graph, by pruning. – Global pruning of edges – overall threshold (ε) below which edges are removed from the graph – Local pruning of edges – designate a maximal degree (k) for each entity 50
  • 51. Our Approach: Approximation Removes stale edges 1111111111 92.1 1111111111 92.1 2222222222 90.3 2222222222 90.3 Reduces effect of 3213232423 24.3 3213232423 24.3 supernodes 9098765453 8876457326 10.1 4.9 = 9098765453 8876457326 10.1 4.9 2122121212 3.7 2122121212 3.7 Increases efficiency 9991119999 0.5 Other 1.4 3990898989 0.8 Preserves entity weight 8887878787 0.09 51
  • 52. Our Approach: Approximation • Defending k – Most entities have the vast majority of their weight in a fraction of their nodes 52
  • 53. Our Approach: Parameter Setting • Let A and B be two entities. I j∈ A∩ B ( p A ( j ) + p B ( j )) • Weighted Dice: WD( A, B) = 1+ ∑ pA ( j) j HD ( A, B ) = ∑ j∈ ( A∩ B ) p A ( j ) pB ( j ) • Hellinger Distance: 53
  • 54. 54
  • 56. Research Questions How could a firm use the consumer network to (network targeting) improve target marketing? Do consumers who have already interacted with someone on the existing customer network respond to a direct mailer at a higher rate than those that do not? Can variables constructed from the network enable the firm to better classify targets? Does collective inference help us to improve target marketing? 56
  • 57. Outline of Talk Experimental Setup 4.98 3.87 Directed network marketing 1 0.4 Non-Viral 1-21 V iral 1-21 Viral 22 Non-Targe t Viral Local Network Collective Network 57
  • 58. Motivation Consumer vs. Consumer “Network”  Consumer  Consumer “Network” – No link structure – Link structure – Additional consumer information – Proxy for homophily 58
  • 59. Motivation Consumer vs. Consumer “Network” 2 3 1 1 1 1 0 0 1 1 0 1 1 45 6 7 8 9 Relational 10 Weighted Database Directed Graph Relational 1 1 1 1 1 0 1 1 0 1 Vectors  Consumer  Consumer “Network” – No link structure – Link structure – Additional Information – Proxy for homophily 59
  • 60. Analyzing Consumer Networks Why is it hard? Scale – Tens or hundreds of millions of nodes and edges – Entire network can’t fit in main memory Dynamic – Large numbers of nodes coming and going continuously – Accounting for temporal component of changing graphs is a challenge Dependencies – Nodes are heterogeneous – Nodes are interdependent 60
  • 61. What is Viral Marketing? Explicit advocacy – Word-of-Mouth Implicit advocacy – Hotmail Network targeting – My study 61
  • 62. Viral Marketing Research Economics Marketing Info Sys Statistics Sociology Epidemiology CS 62
  • 63. Viral Marketing Research • Diffusion Economics • Customer Value Marketing Sys Info Statistics Sociology Epidemiology CS • Consumer Preferences 63
  • 64. Viral Marketing Research The Ideal Dataset? in dep • Diffusion Economics • Customer Marketing Sys Info Value Statistics Sociology Epidemiology CS • Consumer Preferences 64
  • 65. Evidence of Viral Marketing? We need explicit links as inputs and adoption response as the dependent … Our Testbed is closer to the Ideal than other published study! Remember wiretapping is illegal! 65
  • 66. Viral Marketing Data: Call Detail Internet telephony service Existing customers EXPERIMENT Viral targets Millions of calls a day 4.98 3.87 NET MKTG We observe calls to and 1 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral from existing customers LOCAL COLLECTIVE 66
  • 67. Viral Marketing Data: Response to Mailer EXPERIMENT Two months after mailer calculated how many targets responded 4.98 3.87 NET MKTG 1 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral LOCAL COLLECTIVE 67
  • 68. Do consumers who have already interacted with someone on the existing customer network respond to a direct mailer at a higher rate than those that do not? Model Variables Models EXPERIMENT Dependent Variable: Response Odds Ratio to direct mailer RES – If response is positive, NET MKTG 4.98 3.87 RES = 1. ANOVA 1 – If negative, RES = 0. 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral Analysis of Deviance Table Independent Variables: LOCAL Segment, traditional Classification with Logistic marketing attribute, viral regression evaluated by Area attribute under the ROC curve – Segment 1-21 COLLECTIVE – Loyalty, Demographics, Geographics – Binary Viral Attribute 68
  • 69. Do consumers who have already interacted with someone on the existing customer network respond to a direct mailer at a higher rate than those that do not? Model Variables EXPERIMENT Dependent Variable: Response to direct mailer RES – If response is positive, NET MKTG 1 4.98 3.87 RES = 1. – If negative, RES = 0. 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral Independent Variables: LOCAL Segment, traditional marketing attribute, viral attribute – Segment 1-21 COLLECTIVE – Loyalty, Demographics, Geographics – Binary Viral Attribute 69
  • 70. Do consumers who have already interacted with someone on the existing customer network respond to a direct mailer at a higher rate than those that do not? EXPERIMENT Model Deviance DF Change s Variable Deviance i g Intercept 11200 NET MKTG 1 4.98 3.87 Analysis of Deviance: The table Segment 10869 9 63 * confirms the significance of the main effects 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t * and of the interactions. Viral Segment + 10733 1 370 * Cell * Each level of the nested model is significant when using a chi-squared approximation for Segment + 10687 8 41 * the differences of the deviances. Cell + * LOCAL Interactions The fact that so many interactions are significant demonstrates that the viral effect is stronger for different segments of the prospect population. COLLECTIVE 70
  • 71. Does collective inference help to improve target marketing? Experiment Setup EXPERIMENT Dependent Variable: Response to direct mailer RES – If response is positive, RES = 1 NET MKTG 4.98 3.87 – If negative, RES = 0 1 – RES over two month time period after mailer 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral Independent Variables: Segment, traditional marketing attributes, LOCAL viral attribute – Segment 1-21 – Loyalty, demographics, geographics – Binary viral attribute COLLECTIVE – Local network attributes – Collective inference prediction 71 Sample: Subset of viral targets
  • 72. Does collective inference help to improve target marketing? EXPERIMENT Model Guilt-by-association weighted-vote RN Classifier (wvRN) NET MKTG 4.98 3.87 1 0.4 Non-V iral 1-21 V ir al 1-21 V ir al 22 Non-Tar ge t Viral ? LOCAL eta = β 0 + β 1 ( L) + β 2 (G) + β 3 ( D) + β 4 (O) + β 5 ( N B ) + β 6 ( N L ) + β 7 ( N C ) COLLECTIVE RESP = exp(eta) / 1 + exp(eta) 72
  • 73. • Introduction  Toolkit Relational classifiers • Case study Relational classifiers for case study – wvRN 1 p ( yi = c | N i ) = Z ∑ wi , j ⋅ p ( y j = c | N j ) v j ∈ Ni – nBC • Naïve Bayes on neighbor class labels • Markov Random Field, following Chakrabarti et al. (1998) – when uncertainty in neighbor labels – some minor modifications – nLB • following Lu & Getoor’s (2003) Link-based Classifier • for a node i, form its neighbor-class vector CV(i) • logistic regression based on CV(i) – cdRN • for each class cdRN estimates neighbor-class distribution RV(c) 73 • p(yi = c|Ni) is the normalized distance between CV(i) and
  • 74. • Introduction  Toolkit Collective inference • Case study – iterative classification (following Lu & Getoor, 2003) • initially assign a “prior” to all nodes using local classifier: p(0) (yi = C) • Select ordering O • walk down chain, classifying with MAP classification • Final class labels selected upon convergence or 1000 iterations – relaxation labeling (following Chakrabarti et al., 1998) • initially assign a “prior” to all nodes using local classifier: p(0) (yi = C) • estimate p(t)(yi = C) using relational classifier based on p(t-1) – Gibbs sampling (following Geman & Geman, 1984) • Select ordering O on nodes, randomly • initially sample labels based on priors 74
  • 75. Overview of Contributions Question 1 – This is the first evidence that viral marketing exists in explicit cons Question 2 – Show we can use constructed consumer network attributes to improve over traditional target marketing methods Question 3 – First time collective inference has been used in a real-world target marketing problem 75
  • 77. Prior Results Model Odds: p Odds = (Range [odds scale] : 0 ... ∞ ) 1- p Odds Ratio: ratio of odds (focus: risk indicator, covariate) odds of responding to the mailer in network neighbor target group / odds in non-network neighbor target group The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar. 77
  • 78. Prior Results 1 Cumulative % of Sales 0.8 0.6 0.4 All 0.2 "All + NN" 0 0 0.2 0.4 0.6 0.8 1 Cumulative % of Consumers Targeted (Ranked by Predicted Sales) 78
  • 79. Network-based Marketing Experiment Setup Dependent Variable: Response to direct mailer RES – If response is positive, RES = 1 – If negative, RES = 0 – RES over two month time period after mailer Independent Variables: Segment, traditional marketing attributes, viral attribute – Segment 1-21 – Loyalty, demographics, geographics – Binary NN attribute Sample: All targets 79
  • 80. Network-based Marketing Model Logistic Regression:Logistic Regression across all segments including viral attributes. eta = β 0 + β 1 ( L) + β 2 (G ) + β 3 ( D) + β 4 (O) + β 5 ( N B ) { } RESP = exp(eta ) / 1 + exp(eta ) 80
  • 82. More Sophisticated Local Network- based Attributes? Experiment Setup Dependent Variable: Response to direct mailer RES – If response is positive, RES = 1 – If negative, RES = 0 – RES over two month time period after mailer Independent Variables: Segment, traditional marketing attributes, viral attribute – Segment 1-21 – Loyalty, demographics, geographics – Binary viral attribute – Local network attributes Sample: All NN targets 82
  • 83. Local: Network Neighbor Attributes Model Logistic Regression:Logistic Regression across all segments including viral attribute, local network attributes eta = β 0 + β 1 ( L) + β 2 (G ) + β 3 ( D) + β 4 (O) +{ β 5 ( N B ) } {β 6 ( N L )} + RESP = exp(eta ) / 1 + exp(eta ) 83
  • 84. Ranking of “NN” targets 1 0.8 Cumulative % of Sales 0.6 0.4 All 0.2 "All + net" 0 0 0.2 0.4 0.6 0.8 1 Cumulative % of Consumers Targeted (Ranked by Predicted Sales) 84
  • 85. Results: The bottom line Hypothetical (future) profit improvement: targeted cost total cost resp 1-21 viral resp. viral hyp 6-mo. profit base profit viral profit hypothetical profit 5000000 0.2 1000000 0.30% 1.30% 4.40% 179.94 $1,699,100.00 $10,696,100.00 $38,586,800.00 improvement? $8,997,000.00 $36,887,700.00 85
  • 86. Contributions Results Directed network-based marketing Consumers that have already interacted with an existing customer adopt a product (eg., respond to a direct mailer) at a higher rate than those that have not. Variables constructed from the consumer’s immediate network enable the firm to (classify/rank targets, generate profit) better. 86
  • 87. Even more Sophisticated Network-based Attributes? Can we use collective inference to make simultaneous inferences about nodes on the graph? –what about massive size of network? 87
  • 88. Our Approach: Parameter Setting • We have now defined a representation of a dynamic graph by three parameters:  θ − controls the decay of edges and edge weights  ε − global pruning parameter  k – local pruning parameter • For a given application, we choose the parameter values by optimizing predictive performance, selecting the parameters which optimize a distance metric – Two distance metrics we apply: • Weighted Dice • Hellinger Distance … But may be domain dependent 88
  • 89. Our Approach: Parameter Setting θ = 1 , controls the decay of edges and edge weights Default : ε = 0 , global pruning parameter k = ∞ ,local pruning parameter 89
  • 90. Our Approach: Summary • Entities are updated daily for all 350 million phone numbers • Up-to-date representation of all entities. These entities are stored in an indexed data base for easy storage and retrieval • Our two main challenges: – Scale: updates the entities on a daily basis, don’t have to retrieve it. Entities are concise summaries, and are indexed for fast retrieval – Dynamic nature of data: entities are a summary of behavior over a time period (determined by θ) and can be tracked through time 90