The document discusses cross-device tracking techniques used to cluster cookies and user profiles into graphs to identify when different devices belong to the same user. It describes using features like location, time, IP addresses, and device types with machine learning classifiers to determine cookie associations across devices with high precision and recall. Challenges include limited data, anti-tracking mechanisms, and obscurity of identifying information.
1. Cross-Device Tracking
June 2019
Morten Arngren
Lead Data Scientist
Vlad Sandulescu
Senior Data Scientist
Jan Kremer
Data Scientist
Damian Pawlowski
Senior Software Engineer
Sergey Gluschenko
Senior Software Engineer
Tomasz Perek
Senior Software Engineer
16. by Pair-wise classification
👽👻
A B C D
…with ground truth
A B
C D
Positive Negative
A C
A D
B C
B D
Create cookie pairs
as classification
data set
E
A E
B E
C E
D E
Cookie association
20. Crafting Features
for each cookie pair…
Features
(http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html)
🌴 0.78 A B
Classifier
Gradient Boosted Decision Trees
🌴 🌴
Principal Component 1+2
Vectorial
Sets
Numerical
26 dim. feature vector
1 2 3 4 5 6 7 8 26
cookie_id
Location
device_type_id
os_id
browser_id
browser_language_codes
ip_v4
country_id
region_id
city_id
zip_code_id
log_time
ID
Time
Observations
Device
A B
23. Classification Clustering
(Connected Components Algorithm)
Pruning
A
C
B
D E
F
G
A
B
C
D
E
F
G
A
B
C
DE
F
G
Pre-Clustering
(Connected Components Algorithm)
Full Graph Components Sub-Graph
A
C
B
D E
F
G
Clusters Users
(XGBoost)