Criteo AI Lab: from applied to fundamental AI

Jeremie Mary, 17/09/18
From applied to fundamental
research

Copyright © 2018 Criteo
AI applied to Criteo Dynamic Retargeting since 2008
Universal Match One user profile across
all devices
Product
Recommendations
Kinetic Design
Predictive Bidding
Chooses the right
products to display
Chooses the right look
and feel for the banners
in real time
Personalized Ads
Optimized
Performance
Chooses the right users /
advertiser / publisher
to display
 eCPM = CPC*pCTR*pCR*pOV
1
3
2
4
Optimized on
CTR
+
CR
+
Order Value

Outline
1. Fusion of modalities
2. Auction theory meets Machine Learning
3. Hot topics

Fusion of heterogeneous data
Problem How to build a predictor based on completly different kind of data ?
e.g. pictures and texts and you want to predict the interest of the user for the item.
Your favorite neural network
for pictures (Resnet?)
Some description text or
tags
for this (BiGRU with GA?)
E
m
b
e
d
d
i
n
g
E
m
b
e
d
d
i
n
g
Prediction 1
Prediction 2
Vote!
or average

Fusion of heterogeneous data
Problem How to build a predictor based on completly different kind of data ?
e.g. pictures and texts and you want to predict the interest of the user for the item.
for pictures (Resnet?)
What is the color of the
cat?
for this (BiGRU with GA?)
E
m
b
e
d
d
i
n
g
E
m
b
e
d
d
i
n
g
Prediction
M
e
r
g
e
Is it actually good to build the
embeddings independantly ?

Idea
Batch Norm Parameters
In a good network activation of neurons thought the data should be similar [1].
This was introduced as a reparametrization trick to ensure faster convergence
[1] I. Sergey and S. Christian. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML, 2015.

Few parameters but…
While Number of batch norms parameters is usually 0.2 to 5% of the net, their impact on the output is huge [2]
[2] V. Dumoulin, J. Shlens, and M. Kudlur. A Learned Representation For Artistic Style. In Proc. of ICLR, 2017.

An alternative way to fuse modalities
Image Text

… and this work well on VQA
[13] Modulating early visual processing by language. H De Vries, F Strub, J Mary, H Larochelle, O Pietquin, AC Courville, NIPS’17

And actually change the embedding construction from

And actually change the embedding construction to

Doing it using several states of the RNN

ReferIt / Guesswhat oracle problem

ReferIt / Guesswhat oracle
Visual Reasoning with a Multi-hop FiLM Generator
Florian Strub, Mathieu Seurin, Ethan Perez, Harm De Vries, Jeremie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

Cherry Picking

Cherry Picking Failures

We are a bidding company
More than 300 billion of bids a day. Less than 10ms to make a price.
1 seller with 1 item
n bidders, bidder i has private valuation vi
“valuation” = maximum willingness-to-pay
“private” = initially known only to bidder i
Second-price auction
collect bid bi from each bidder i
winner = highest bidder
price = second-highest bid
Very often our price is way higher than the competion.
Theorem:
renders truthful bidding a
dominant strategy
Problem

Reserve Prices (Seller point of view)
Will extract more $$$ at the cost of not selling some displays
How to choose it ?
Assumptions:
•Bidder’s valuation v drawn from distribution F. (F known to seller, v unknown)
•Seller aims to maximize expected revenue (w.r.t. v~F)
Solution: offer r* = argmaxr≥0 r  (1-F(r))
revenue
of a sale
probability of
a sale

Reserve price with several bidders
Theorem : [Myyerson 81] With n symmetric iid bidders, for second price auction with
reserve contributing to revenue, the revenue maximizing reserve price is independant
of the number of bidders
Theorem: [Bulow-Klemperer 96]: for every n:
expected revenue ≥ expected revenue
of reserve price 0 of monopoly reserve
[with (n+1) i.i.d. bidders] [with n i.i.d. bidders]

Personalized reserves1…
Theorem [Hartline/Roughgarden 09]: for any valuation distributions F1,...,Fn:
≥
expected revenue with
monopoly reserves
(ri = monopoly price for Fi)
50% of expected revenue of
Myerson’s optimal auction
for F1,...,Fn
1 Yes the bidder can loose the auction while having the highest bid

In real bidding
F is unknown and is estimated from the bids.
Done by [Ostrovsky/Schwarz 09] at Yahoo
Analysis leads to some finite time ML style bounds by [Morgenstern/Roughgarden
15,16].
Typically requires O(n log n) samples in the multiple bidders setting to achieve
expected revenue within ε of best possible.
This assume the bidders to reveal their true value

One strategic bidder setting
A two stage game.
First day: the seller receives billions of bids from the bidders. (we do not consider any
approximation error).
Second day: she sets for each bidder their reserve price as the exact monopoly price
computed on the bids she received during the first stage.
we denote by F1, ..., FN the distribution of the bidders. We assume bidder 1 is
strategic and the others continue to bid truthfully.
G is the distribution of the maximum value
of the competitors of bidder 1.
On all illustration true distribution of values is U[0;1]

Myerson lemma
Defining virtual values
Suppose bidder i has values Xi with distribution Fi and associated density fi . fi is
assumed to be positive on the support of Xi . For any incentive compatible auction,
when G represents the distribution of the bids faced by user i, we have, if r is the
reserve price set by the seller,
regardless of whether ψi is increasing.

Visualization of Myerson’s lemma

ß shading
The payoff of the strategic bidder using the strategy β (ψB denotes the virtual value
associated to the new distribution of bid) is:
And we can remark:
find a « good » ψB and then the corresponding β.

Which is the nicest ?

Thresholded virtual value
Just solve
On the uniform example this is
And identity for >0.5

Comparision of revenue
• the strategic bidder payoff
increases from 0.083 to 0.132 (a
59% increase !!
• the payoff of the truthful bidder
remains unchanged.
• item the payoff of the seller
remains unchanged.
• In particular, the seller does not
lose money.
• welfare increases from 0.583 to
0.632. (a 8% increase!!)

More on the topic
Does it cost something to the strategic bidder during the learning stage of the
auctioneer: No ! Since the strategy only changes bids below the reserve price, the
strategic bidders pay nothing to try to convince the seller to decrease the reserve price.
Can we do better Yes! We only presented the simplest way to improve a bidding
strategy. There exist some better strategies that lead to even higher payoffs.
In this setting, can we find a Nash equilibrium when all the bidders become
strategic? : Yes!
Are our proposed strategies stable against some approximation error of the
seller? Yes!
Thresholding the virtual value: a simple method to increase welfare and lower reserve prices in online auction systems
Thomas Nedelec, Marc Abeille, Clément Calauzènes, Noureddine El Karoui, Benjamin Heymann, Vianney Perchet
Explicit shading strategies for repeated truthful auctions. arXiv preprint arXiv:1805.00256, 2018
Marc Abeille, Clement Calauzenes, Noureddine El Karoui, Thomas Nedelec, Vianney Perchet.

3
Recommend
er
Systems
• Users can get bored seeing similar movies over and over
• Getting to know a new system can takes time and increase curiosity
at first and then decrease it after a while
Task
scheduling
• It might take a while to master a new task so performance increase
after being repeated
• Repeating always the same task can reduce productivity because of
weariness
Resource
balancing
• Always exploiting the same area can diminish returns if population can
not growth again
A B A B B
B A A B A B
Alternating Recommender Systems

3
|
state click probability on A
[A,A,B,B,A,A,A,B,B,A] 8.53%
[A,B,B,A,B,B,A,B,A,B] 9.12%
[B,B,B,B,A,A,A,B,B,A] 8.91%
• We use a real-world A/B testing dataset where our model assumptions are no longer satisfied. Users have been exposed to both A and B. We investigate how a long-
term policy alternating A and B on the basis of past choices can outperform each solution individually.
• simulator: measure click rate probability on a version based on the
last w = 10 pulled versions.
𝒔𝒔𝒔𝒔 𝒔𝒔 𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗, 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 = 𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩(𝒑𝒑)
Compared algorithms
• Oracle optimal optimal policy given the true parameters
• Oracle greedy greedy policy given the true parameters
• UCRL (Auer, Jaksch, and Ortner 2009) considering each action
and state independently
• linUCRL our algorithm
• Only B always play B (click rate of state [B, …, B])
• Only A always play A (click rate of state [A, …, A])
Avg reward on the T steps
Avg reward after T=1600
On Criteo’s A/B tests (NIPS’18)
Romain Warlop , Alessandro Lazaric, Jeremie Mary

More
• DPPs for basket completion (look at work of Mike Gartrell)
• Exploration / Exploration under brownian evolution of the world
• GANs
• RNNs (and approximations) for session modelization
• Causality, Incrementality and offline A/B tests.

Thank you !
j.mary@criteo.com
https://aiaheadofusbycriteoailab.splashthat.com/

Criteo AI Lab: from applied to fundamental AI

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Empfohlen

Empfohlen (20)

Criteo AI Lab: from applied to fundamental AI