SlideShare a Scribd company logo
1 of 25
Download to read offline
User Behavior Hashing
for Audience Expansion
Praveen Pratury
Yingnan Zhu
Agenda
Praveen Pratury
▪ Overview of Samsung
▪ Samsung Audience platform
▪ Lookalike modeling introduction
Yingnan Zhu
▪ Lookalike approaches
▪ Speed up with Pandas UDF
▪ Model performance
▪ Results
▪ Q & A
Director of Engineering, Samsung
Research America
Lead Data Scientist, Samsung Research
America
Samsung Overview
Samsung Electronics Today
Samsung Audience Platform
LookAlike Modeling – Samsung Context
Improve Incremental Reach and Improved Targeting for:
▪ TV Networks (Identify new audiences to promote new shows)
▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef
The goal is to improve Reach and increase conversion for TV shows and New TV purchases
Goals
Approach
By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques:
▪ Identify TV viewers similar to existing audiences based on user behavior
- Find audiences that will respond favorably to show-specific TV ads
▪ Identify existing premium TV owners to expand to future buyer
Look Alike Audience Expansion Example
A: seed segment
B: expanded segment
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
* *
*
* *
* * *
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* *
*
*
* *
* * *
*
*
*
++ +++
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
++
A*
B
All TV Users’ hash code space All TV Users’ hash code space
*
+
8K TV users
non 8K TV users
A
LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet.
Targeted size to expand
Challenges and solutions
▪ Challenges
▪ Large-scale data:
▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users
▪ Efficiency:
▪ Each device could generate thousands of logs per day
▪ Look alike user retrieval has time constraint
▪ Possible solutions
▪ LSH, K-nearest neighbor, similar user search in recommender system and etc.
▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data
▪ Our solution
▪ Heterogeneous user behavior hash code
▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
Look Alike Work Flow
Online/Offline
Offline
Raw Data
Deep Binary
Hashing Models
(various bit length)
Deep Binary
Hashing Models
(various bit length)
User Hash
Codes
Lookalike ServiceSeed
Segments
Expended
Segments
Processed
User Behavior
Data
Hashing Model Training Flow
Df
User 1
Hash LayerNetwork LayersInput Similarity Label
1
0
Similar
Dissimilar
y
x
+1
-1
SGN
User 2
y
x
+1
-1
• By given two user pair, it first generates user embedding (continuous vector)
from Network Layers. After that we make K-bit dimension from Hash Layer.
Finally we apply SGN for binary representation. The output is 1: similar, 0:
dissimilar
Approach # 1: Time-Aware Attention CNN Model
Model Explained
▪ Input layer:
▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can
be processed by CNN.
▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s
history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted
with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given
session. After sessionization, we map each user’s behavior input into the high dimensional space.
▪ Embedding layer:
▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries
more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its
ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our
model.
▪ Time-Aware attention layer:
▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention
features into short-term and long-term features.
Approach #2: Categorical Attention Model
Distributed Inference
▪ Issues:
▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource
▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements
▪ Need efficient distributed inference methods
▪ Solution: Pandas UDF
▪ Scalar
▪ Scalar iterator
▪ Group map
▪ Group aggregate
Code Snippet
Model Performance
▪ We used the accuracy measure as the main performance metric for all
binary hashing algorithms because each user has the identical number
of similar and dissimilar user pairs.
Conclusion
▪ A novel deep binary hashing architecture to derive similarity
preserving binary hash codes for sequential behavior data.
▪ TAACNN explores evolving user’s attention preferences across
different time awareness level separately. Experiments results show
significant over-performance compared to other well-known hashing
methods
▪ Pandas UDF improved efficiency significantly. They have been
adopted in many of our projects.
Thank you !!
We are hiring:
www.sra.samsung.com/open-positions
Contact: Praveen.Pratury@Samsung.com
Yingnan.z@Samsung.com
https://www.linkedin.com/in/praveenpratury
https://www.linkedin.com/in/yingnan-zhu-66651113/
Q & A
User Behavior Hashing for Audience Expansion

More Related Content

What's hot

Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
AllAnalytics
 

What's hot (20)

Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Fp growth
Fp growthFp growth
Fp growth
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 
Database backup and recovery
Database backup and recoveryDatabase backup and recovery
Database backup and recovery
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
R Programming: Introduction To R Packages
R Programming: Introduction To R PackagesR Programming: Introduction To R Packages
R Programming: Introduction To R Packages
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Database System Architecture
Database System ArchitectureDatabase System Architecture
Database System Architecture
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google
 
Chapter 02 collaborative recommendation
Chapter 02   collaborative recommendationChapter 02   collaborative recommendation
Chapter 02 collaborative recommendation
 
Introduction to D3.js
Introduction to D3.jsIntroduction to D3.js
Introduction to D3.js
 
Data warehousing and Business intelligence project on Tourism sector's impact...
Data warehousing and Business intelligence project on Tourism sector's impact...Data warehousing and Business intelligence project on Tourism sector's impact...
Data warehousing and Business intelligence project on Tourism sector's impact...
 
IR Evaluation using Rank-Biased Precision
IR Evaluation using Rank-Biased PrecisionIR Evaluation using Rank-Biased Precision
IR Evaluation using Rank-Biased Precision
 

Similar to User Behavior Hashing for Audience Expansion

WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptx
Arthur240715
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
Siva Ayyakutti
 

Similar to User Behavior Hashing for Audience Expansion (20)

A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS MigrationOut With the Old, in With the Open-source: Brainshark's Complete CMS Migration
Out With the Old, in With the Open-source: Brainshark's Complete CMS Migration
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
From prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.ioFrom prototype to production - The journey of re-designing SmartUp.io
From prototype to production - The journey of re-designing SmartUp.io
 
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
Whitepaper: Software Defined Data Center – An Implementation view - Happiest ...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Arcadia overview nr2
Arcadia overview nr2Arcadia overview nr2
Arcadia overview nr2
 
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for...
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptx
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
WDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptxWDS trainer presentation - MLOps.pptx
WDS trainer presentation - MLOps.pptx
 
A Case for Outside-In Design
A Case for Outside-In DesignA Case for Outside-In Design
A Case for Outside-In Design
 
OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014OpenStack in the Enterprise - Interop Las Vegas 2014
OpenStack in the Enterprise - Interop Las Vegas 2014
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

User Behavior Hashing for Audience Expansion

  • 1.
  • 2. User Behavior Hashing for Audience Expansion Praveen Pratury Yingnan Zhu
  • 3. Agenda Praveen Pratury ▪ Overview of Samsung ▪ Samsung Audience platform ▪ Lookalike modeling introduction Yingnan Zhu ▪ Lookalike approaches ▪ Speed up with Pandas UDF ▪ Model performance ▪ Results ▪ Q & A Director of Engineering, Samsung Research America Lead Data Scientist, Samsung Research America
  • 6.
  • 8.
  • 9.
  • 10.
  • 11. LookAlike Modeling – Samsung Context Improve Incremental Reach and Improved Targeting for: ▪ TV Networks (Identify new audiences to promote new shows) ▪ Samsung New TV purchases (8K, QLED, Terrace etc)U ffhoffef The goal is to improve Reach and increase conversion for TV shows and New TV purchases Goals Approach By leveraging Samsung’s rich ACR viewership data on 50+ M TVs in US and by applying User Behavior Hashing techniques: ▪ Identify TV viewers similar to existing audiences based on user behavior - Find audiences that will respond favorably to show-specific TV ads ▪ Identify existing premium TV owners to expand to future buyer
  • 12. Look Alike Audience Expansion Example A: seed segment B: expanded segment * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ++ +++ + + + + + + + + + + + + + + + + + + + + + + + ++ A* B All TV Users’ hash code space All TV Users’ hash code space * + 8K TV users non 8K TV users A LookALike example for 8K TV campaign. The goal is to identify users who look alike to 8K TV owners, but have not owned an 8K TV yet. Targeted size to expand
  • 13. Challenges and solutions ▪ Challenges ▪ Large-scale data: ▪ Search space is huge: Hundreds of millions of Smart TV and Mobile users ▪ Efficiency: ▪ Each device could generate thousands of logs per day ▪ Look alike user retrieval has time constraint ▪ Possible solutions ▪ LSH, K-nearest neighbor, similar user search in recommender system and etc. ▪ These solutions sacrifice accuracy, efficiency, do not consider contextual information, or not optimized for time sequence data ▪ Our solution ▪ Heterogeneous user behavior hash code ▪ Provides LSH like bucketized fast searching and maintain high accuracy of user similarity
  • 14. Look Alike Work Flow Online/Offline Offline Raw Data Deep Binary Hashing Models (various bit length) Deep Binary Hashing Models (various bit length) User Hash Codes Lookalike ServiceSeed Segments Expended Segments Processed User Behavior Data
  • 15. Hashing Model Training Flow Df User 1 Hash LayerNetwork LayersInput Similarity Label 1 0 Similar Dissimilar y x +1 -1 SGN User 2 y x +1 -1 • By given two user pair, it first generates user embedding (continuous vector) from Network Layers. After that we make K-bit dimension from Hash Layer. Finally we apply SGN for binary representation. The output is 1: similar, 0: dissimilar
  • 16. Approach # 1: Time-Aware Attention CNN Model
  • 17. Model Explained ▪ Input layer: ▪ The input layer is the data pre-processing layer. In this layer, we will map sequential behavior data input into a 3D structure that can be processed by CNN. ▪ The first step in our data pre-processing step is to embed each item into a D dimension vector. The next step is to sessionize user’s history by a specific time unit (e.g., hour). For each session, we aggregate all items that the user in consideration had interacted with using the multi-hot encoding of the corresponding items. This will represent the summary of user’s behavior for the given session. After sessionization, we map each user’s behavior input into the high dimensional space. ▪ Embedding layer: ▪ Since the multi-hot encoding scheme used during our pre-processing step is a sparse and hand-crafted encoding scheme, it carries more conceptual information than similarity information itself. This would affect the overall performance of TAACNN, particularly its ability to preserve similarity information at large scale. To overcome this limitation, we introduce an embedding layer as part of our model. ▪ Time-Aware attention layer: ▪ The time-aware attention is used to abstract time-aware attention features in our TAACNN model. This layer separates attention features into short-term and long-term features.
  • 18. Approach #2: Categorical Attention Model
  • 19. Distributed Inference ▪ Issues: ▪ Large data scales and hundreds of millions user’s profile need update within limited time and computation resource ▪ Current Spark UDF is processed row-at-a-time and it won’t satisfy the requirements ▪ Need efficient distributed inference methods ▪ Solution: Pandas UDF ▪ Scalar ▪ Scalar iterator ▪ Group map ▪ Group aggregate
  • 21. Model Performance ▪ We used the accuracy measure as the main performance metric for all binary hashing algorithms because each user has the identical number of similar and dissimilar user pairs.
  • 22. Conclusion ▪ A novel deep binary hashing architecture to derive similarity preserving binary hash codes for sequential behavior data. ▪ TAACNN explores evolving user’s attention preferences across different time awareness level separately. Experiments results show significant over-performance compared to other well-known hashing methods ▪ Pandas UDF improved efficiency significantly. They have been adopted in many of our projects.
  • 23. Thank you !! We are hiring: www.sra.samsung.com/open-positions Contact: Praveen.Pratury@Samsung.com Yingnan.z@Samsung.com https://www.linkedin.com/in/praveenpratury https://www.linkedin.com/in/yingnan-zhu-66651113/
  • 24. Q & A