More Related Content Similar to Machine learning on big data for personalized Internet advertising (20) More from Trieu Nguyen (20) Machine learning on big data for personalized Internet advertising1. M.
RECCE
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Machine
Learning
on
Big
Data
for
Personalized
Adver<sing
2. Adver<sing
has
long
wanted
be?er
algorithms
Half
the
money
I
spend
on
adverBsing
is
wasted;
the
trouble
is
I
don't
know
which
half.
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
2
John
Wanamaker
“The
Father
of
Modern
AdverBsing”
“
”
3. • Internet
adverBsing
(the
business)
• Internet
adverBsing
(the
data)
• Understanding
consumers
(the
models)
• Organizing
for
success
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
3
Outline
4. The
Personalized
Media
Economy
Media
is
transiBoning
from
a
“one
size
fits
all”
broadcast
model
to
dynamic
real-‐Bme
choice
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
4
Online
AdverBsing
Ecosystem
5. Money
Follows
Media
ConsumpBon
Globally,
hundreds
of
billions
of
dollars
of
ad
spend
will
shiY
11/18/2011
$30B
opportunity
©
2011
Quantcast.
All
Rights
Reserved
QCon
?
5
6. Why
the
Spending
Disparity?
• Media
spend
processes
are
well
established
• New
media
channels
lag
unBl
audiences
and
value
can
be
properly
quanBfied
• Historically,
digital
audiences
were
poorly
quanBfied
– StraBfied
sampling
has
been
the
norm
in
media
measurement
for
decades
– Bias
and
sampling
error
prevail
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
6
7. Enter
Quantcast
• Launched
September
2006
to
enable
addressable
adverBsing
at
scale
• First
we
had
to
fix
audience
measurement
• Launched
a
free
service
based
on
direct
measurement
of
media
consumpBon
• Use
machine
learning
to
infer
audience
characterisBcs
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
7
9. An
Adver<sing
Data
Explosion
• Massive
expansion
in
number
of
decisions
– Individuals,
not
whole
audiences
– Impressions,
not
whole
sites
– Screens/Bmes/locaBons/……
• Decision
Bmeframe
reduced
from
weeks
to
milliseconds
• This
problem
can
only
be
solved
algorithmically
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
9
10. 11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Data
Rich
Environment
4
Billion
Cookies
/mo.
observed
400,000+
Events
/sec
real-‐<me
transac<ons
600+
Billion
Events
/mo.
media
consump<on
WHOLE
LOT
OF
DATA!
1.3
Billion
Global
Users
240
Million
U.S.
Users
everyone
800x
/Person
per
month
avg.
observa<ons
5
Petabytes
per
day
data
processed
100+
Million
Des<na<ons
with
QC
tags
10
11. Rise
of
Real-‐Time
Audience
Targe<ng
“….let
adver<sers
buy
ads
in
the
milliseconds
between
the
Bme
someone
enters
a
site’s
Web
address
and
the
moment
the
page
appears.
The
technology,
called
real-‐Bme
bidding,
allows
adver<sers
to
examine
site
visitors
one
by
one
and
bid
to
serve
them
ads
almost
instantly…A
consumer
would
barely
noBce
the
shiY,
except
that
ads
might
seem
more
relevant
to
exactly
what
they
are
shopping
for.”
-‐
New
York
Times,
March
12
More
relevant
ads,
more
effec<ve
campaigns,
higher
inventory
u<liza<on
&
higher
CPMs
11
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
12. RTB
–
A
Rapid
&
Transforma<onal
Industry
Shib
Quantcast
AucBon
Volume
(UK
&
US)
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
12
7
5
3
2
1
4
Billions of Auctions / Day
Jul ‘11
5.4B
Apr ‘11
3.2B
Oct ‘10
1.2B
Feb ‘10
300M
Apr ‘10
400M
Jul ‘10
800M
Jan ‘11
2.0B
6
Sep ‘11
7.2B
13. Media
Buying
&
Execu<on
is
Changing
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
13
$200B
2005
Now
Æ
$200B
Buy
Whole
Sites
Real-‐Time
Bidding
TransacBon
Supply
Porlolio
100
Publishers
100’s
of
1000’s
Impressions/
Second
Data/Tools
Aggregate
Report
Human
Analysis
Petascale
CompuBng
+
Machine
Learning
14. Data
Mining
Challenges
Audience
EsBmaBon
Using
reference
data
from
a
small
number
of
people
and
a
small
number
of
web
sites
infer
the
demographics/anributes
of
the
audience
of
all
sites.
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
14
User
EsBmaBon
Using
media
consumpBon
records
and
audience
esBmates,
determine
the
characterisBcs
of
an
Internet
user
across
arbitrary
dimensions.
Lookalike
SelecBon
From
the
behavior
of
a
small
number
of
buyers
of
a
product,
determine
the
set
of
people
who
will
buy
it
next.
Live
Traffic
Modeling
Compute
the
value
for
showing
an
adverBsement
to
a
user
as
a
funcBon
of
the
user,
adverBsing
environment,
Bme
of
day
etc.
15. Quantcast
Lookalikes
for
Marketers
RevoluBonary
Ad
TargeBng
for
Performance
and
Brand
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
15
1.
Understand
marketer’s
BEST
CUSTOMERS
with
Quantcast
Measurement
2.
Isolate
DISTINCTIVE
INTERESTS
3.
Find
MILLIONS
OF
LOOKALIKES
4.
Reach
them
ANYWHERE
PERFORMANCE
LOOKALIKES
• Quantcast
technology
conBnually
opBmizes
real-‐
Bme
media
for
adverBser
BRAND
LOOKALIKES
• Buy
custom
audiences
from
trusted
media
partners
Your Site Traffic
16. Lookalike
Selec<on
• Given
an
archetype
group
of
users,
find
the
feature
set
that
best
separates
them
from
their
complement
• Features
can
be
posiBve
or
negaBve
indicators
of
content
relevance
• Find
more
that
look
like
them
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
16
17. • Math
compeBBon
• Largest
number
of
“conversions”
(purchasers)
during
contest
“wins”
• Leverage
informaBon
on
prior
purchasers
to
find
more
• Decide
how
to
compete
• Bring
mathemaBcians
• More
data
on
each
converter
• Management
by
metrics
• Know
what
the
compeBtors
are
doing
Problem
Statement
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
17
18. 11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Lookalike
Mass-‐Produc<on
Pipeline
Model
500 TB
1000s of Concurrent Models
Trained Models
Scoring
10M Potential Converters
1.3 Billion
20 TB / Day
Multi PB Internet Users
Training
10,000 Converters
Model Configuration
18
19. Lookalikes
Iden<fy
Consumers
that
Will
Take
Ac<on
-‐80
-‐60
-‐40
-‐20
-80 -60 -40 -20
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Iden<fy
Posi<ve
&
Nega<ve
indicators
of
purchase
Posi<ve
Nega<ve
4.
Consumers
who
purchased
product
Start
with
consumers
who
purchased
1.
2.
Select
consumers
who
didn’t
purchase
Evaluate
world’s
largest
database
of
human
interests
3.
If
a
new
consumer
looks
more
like
a
purchaser
than
a
non-‐purchaser,
they’re
a
Lookalike
5.
days
250
500
750
1000
0
0
Consumers
who
did
not
purchase
product
days
0 250 500 750 1000
0
19
20. Wide
Range
of
Ac<vity
Websites,
keywords,
geo-‐locaBon,
ads
and
more
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
20
Conversion
Event
21. RTLAL
Bidding
Architecture
Model
DefiniBon
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
21
Pixel
Data
Real
Time
Ad
Exchange
Model
Training
and
Scoring
AucBon
Mgmt
Bidding
24. Media
consumpBon
is
non-‐staBonary
13:00
13:30
14:00
14:30
15:00
15:30
16:00
16:30
17:00
17:30
18:00
18:30
19:00
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
‘Michael
Jackson’
Media
ConsumpBon
June
25,
2009
Pages
consumed
per
minute
24
25. Choose
the
Right
Objec<ve!
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
25
Clicks
don’t
always
lead
to
conversions
The
right
metric
is
criBcal!
Indexed
Click
Vs.
Conversion
Rates
26. Machines
High
Performance
Plalorm
MulBple
Global
Datacenters
Ultra-‐high
availability
with
advanced
traffic
management
450,000
/
Second
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
26
Real-‐Bme
events
5PB
/
Day
Processing
throughput
27. Collabora<on
• Regular
brainstorming
• Group
review
meeBngs
• Shared
wiki
environment
• Team
goals
Independence
• Everyone
free
to
implement
their
own
ideas
• Improved
models
• Bener
metrics
• VisualizaBon
methods,
etc.
Math
Team
Environment
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
27
28. Measuring
Lib
–
ROC
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
28
30. Learning
∝
experimentaBon
To
process
100TB
with
first
MapReduce
job
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
6
Hours
2
Days
Mins
New
model
development
New
model
in
producBon
Hours
Live
performance
assessment
2
Weeks
To
influence
billions
of
real-‐Bme
decisions
every
day
and
millions
of
dollars
of
adverBsing
spend
30
31. Technology
Maners
Leaders
will
be
world-‐class
in
every
discipline,
and
will
operate
all
as
a
fully
integrated
whole.
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Machine
Learning
&
OpBmizaBon
Comprehensive
Coherent
Data
Petascale
Big-‐Data
CompuBng
Real-‐Time
Tech
Mastery
31
32. If
you
have
all
that
then....
Having
more
Data
really
11/18/2011
maners.
©
2011
Quantcast.
All
Rights
Reserved
QCon
32
33. Numerous
Open
Challenges
11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
33
• Dealing
with
sparsity
• Feature
selecBon
• Real-‐Bme
scoring
and
bidding
• ‘True’
performance
&
anribuBon
modeling
• LiY,
liY
and
more
liY!
• Handling
100,000’s
of
concurrent
models
34. 11/18/2011
©
2011
Quantcast.
All
Rights
Reserved
QCon
Summary
• Digital
adverBsing
is
a
vast
analyBcal
environment
– Enormous
data
volumes
– Rich
behaviors
– ObjecBve
performance
metrics
• MarkeBng
will
be
transformed
by
computaBonal
approaches
• Hundreds
of
billions
of
dollars
of
spend
are
at
stake
34