This document discusses the data management platform (DMP) used for ad targeting and delivery in SmartNews Ads. The DMP collects, cleans, and aggregates over 14 million user profiles and ad data from multiple sources. It uses this first-party data to perform user clustering, CTR and CVR prediction using machine learning models, and lookalike targeting. Future work may include targeting based on user interests and collecting negative feedback to optimize the user experience.
2. Who am I
• Lan
• Veteran hacker but new in AD world
• someone who can make a computer do what he wants—whether the computer
wants to or not. (http://paulgraham.com/gba.html)
• ex-{Rakuten, GREE}
• Distribution System, Info Retrieval, ML
3. Today’s Talk
• DMP in SmartNews Ads
• #1. Prediction
• #2. Targeting
• Future Work & Summary
5. DMP in SmartNews Ads
• Private DMP ( 90%+1st-party data )
• Data Collect, Clean, Aggregation
• ID Mapping
• User Profiling
• User Clustering
• CTR / CVR Prediction
• Lookalike
• Custom Audience
6. DMP
Clusters
AD delivery
cluster
AD Log in
S3
Kinesis
AD tracker
Video AD
delivery
cluster
DMP
streaming
Audience
Data
in
DynamoDBRDB
Hadoop
ML
Analytics
Models
&
Targeting
SmartNews
Log
ML
Small company but not small data
•Article Meta > 200K/day
•Article x {read, share, read_related …}
•Channel x {subscribe, preview, view, …}
•Push, Live, Weather, Setting, …
•Survey result
•Audience Data > 14M (~5M MAU)
•AD Meta
•AD History
•AD Conversions
•AD Optout
• Managed/Compressed Data > 130TB
• Lookalike seeds
• ~1TB Data for training CTR prediction model
•> 1M unique features
•User Demographics
•Device
•Locations
•…
10. More than Ranking
• When we do AD auction
• eCPM (effective Cost per Mille) = CTR (Click Through Rate) x CPC (Cost per Click)
• Suppose we have
• CTRad1=0.05 > CTRad2=0.04 > CTRad3=0.03
• CPCad1 = 10JPY, CPCad2 = 13JPY, CPCad3 = 20JPY(winner)
• but if: pCTRad1 = 0.2 (winner) > pCTR’ad2 = 0.1 > pCTR’ad3 = 0.03
• then we lost 0.1JPY potential income
12. CTR Prediction v1
• Train and scoring daily
• One GBDT (Gradient Boosting Decision Tree) model per AD campaign
• using ~1month’s data
• Hundreds of small batches inside Hadoop Yarn
• Quick and Simple
• dev in 1 month
• pick up best features for every campaign
• minutes ~ 1 hour for model training
• explainable Tree models
• no need for AD feature
• Same approach for CVR prediction (CPC / CVR = CPA (Cost Per Acquisition) )
delivery
result
User
Features
generate
samples
Yarn
Users
predictions
sample
model
scoring
sample
model
scoring
sample
model
scoring
…
13. Metrics
• NE (Normalized Cross- Entropy)
• the average log loss when using predicted CTR / the average log loss per impression
• https://facebook.com//download/321355358042503/adkdd_2014_camera_ready_junfeng.pdf
• AUC (Area under the ROC curve, AUROC)
• measure ranking quality
• others: Precision/Recall, ECS(Effective catalog size), CTR / CVR / Sales, etc
14. Review of CTR Prediction v1
• Marked improvement, moderate AUC & NE
• And
• hard to do overall tuning
• hard to prediction online (feature set differs)
• latency for new campaigns
• relatively poor performance to new campaigns (cold start)
• lost the connections between campaigns even for the same advertiser
• …
15. CTR Prediction v2
• A simple model for all
• AD feature added
• Dynamic features extraction
• All calculation distributed
• GBDT + LogisticRegression
• Train once per day, scoring twice
16. About the Features
• >1M unique features, sparse
• GBDT provides great feature engineering
• (sometimes) feature engineering is kind of intuition and trial-and-error
• demographic, device, location, reading interests…
• AD history is helpful
• Feature Hashing, Binarization & Discretization, …
20. Profiling User by Statistics and ML
• Gender Prediction (precision: 0.90+), Age Prediction, …
• News Channel / Source Preference
• AD Slot Preference
• …
23. Lookalike Targeting
• Our solution
• Solve it as an classification problem
• Seed user as Positive Sample
• While all targeting candidates as Negative Sample
(w/ random sampling )
• based on Spark MLlib Logistic Regression
• 30%~50% CVR↑ comparing to normal targeting
25. Custom Audience
SmartNews
AD
tracker
Send any custom event
(S2S req, web beacon, etc)
Event
Audience
BloomFilter
Obj
Updating
per
Several Minutes
Your
Service / App / Site
SmartNews
AD
Delivery
Cluster
AD targeting
/
Delete Targeting
Lookalike
Lookalike Targeting
29. Summary of My 1st SmartNews Year
• Challenge place. We’re startup so we can move quick and break things
• Learn from the industry leaders. Keep trial-and-error.
• Number don’t lie. Don’t trust your intuition over number.
• But if you really doubt the number, look closely. there may be BUG
hidden.