SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Data Science behind Display Ads in Digital
Marketing
Kushal Wadhwani
Senior Data Scientist
We Help Marketers Increase Digital Share of Business
$30M FUNDING
Singapore,
South East
Asia
Bangalore,
India
Dubai, UAE
Dallas,
USA
CERTIFICATIONS
FOCUS
Clients
INDIA & UAE
Use Case: Bring back a prospective user
1) User visits hdfc website , browsed
for personal loan
2) Drops off without submitting lead
3) Visits our publisher network
4) Vizury shows add with personalized
banners and quotes
5) User Clicks banner
6) Reaches back to hdfc website
Some of the Channels Powered by Vizury
Programmatic
Mobile Push
Browser Push
/ InstagFacebookram
Programmatic flow
Optimization problem behind Programmatic
Pays for impression
Maximize clicks
Publishers
Clients
Parameters to Optimize
1. What to bid
• Depends upon probability of click of that user
• Depends upon probability of click of that ad slot
bidValue ∝ P( click / ad slot, user)
ctr (click through rate) = 100* P( click / ad slot, user)
2. What to Show
• Products visited by the user
• Products and message suggested by the client
Data : Collection and processing
Data Collection
Bids
DB
Impressions
DB
Clicks
DB
User activity
DB
User variables and Ad slot variables
User variables
1) Time spent on website
2) Products visited
3) Number of impression’s shown
4) Number of clicks
Ad slot variables
1) Size of banner
2) Url of the ad slot
Problem formulation
• Classification problem
• 50 – 100 variables
• Both Numerical and categorical variables
• Massive amount of data to train
Id Categorical
variable 1
Categorical
variable 2
Numerical
variable 1
Numerical
variable 2
- - - - Click flag
1 xyz abc 1 0 0
2 - - - - 1
3 - - - - 0
xyz abc ?
?
?
Ad slot variables User level variables
Historical
data
New bid
request
ML Algorithms for
classification
Logistic Regression
Pros:
• Handles all linear interactions between variables
• There are established scalable algorithms for training
• Handles High cardinality categorical variables
Cons:
• Assumes that variables are linearly related to the log odds ratio
• Does not handles non linear interactions well
ln[p/(1-p)] =  + WTX
• p is the probability that the event Y occurs,
p(Y=1)
• p/(1-p) is the "odds ratio"
• ln[p/(1-p)] is the log odds ratio, or "logit"
p = 1/[1 + exp(- - WTX)]
Decision tree based Models
Pros:
• Handles non liner correlation of input variables with output variable
• Handles non linear interactions
• Models are intuitive, easy to understand and explain
Cons:
• Challenges in handling high cardinality categorical variables
Random Forrest
XGBoost
Neural Networks
Pros:
• Handles non liner correlation of input variables with output variable
• Handles non linear interactions of variables
• Handles High cardinality categorical variables
• Works well for large data sets
Cons:
• Models are not readable
Variable Insights and triage
1. Visualize variables
• Plot distributions
• Variable Vs ctr - visually try to see the
nature of correlation
• Cardinality of categorical variables
2. How to preprocess variable
3. Evaluate variable against ML techniques
Variable Insights : Numerical variable’s
Skewed Distribution Non linear correlation
var1var2
Distribution Correlation
Handling Skew and non linearity
Non Linear
correlation
Skewed Distribution
Logistic regression N N
Decision tree based models Y Y
Neural networks Y Y
• In general it is better to preprocess variables with skew
• Log transformation newvalue = log (oldvalue)
• Bucketization
Handling Skew and non linearity : Log transformationBeforeAfter
Distribution Correlation
Handling Skew and non linearity : Bucketization
Bucketized var1
Distribution within buckets
Variable Insights : Interaction of variables
Non linear
interaction
Logistic regression N
Decision tree based models Y
Neural networks Y
var1 vs var2 with size of circle representing ctr
Variable Insights : Categorical variables
Cardinality 104
Cardinality 10
Categorical variables
Neural network and logistic regression doesn’t handle categorical variables
out of the box, variable have to be converted into numerical variables
1. One hot encoding – creates one new variable for each categorical
value
2. Replace categorical value with its class weigh in our case ctr.
Interactions with other variables cannot be captured
High cardinality
categorical variables
Interaction between
categorical variables
Logistic regression Y N
Decision tree based models N Y
Neural networks Y Y
Evaluation Metrics
AUC (Area under curve) : 2 D plot of False positive rate Vs True positive rate
obtained by changing threshold
• Random probability will give auc of 0.5
• More the AUC better is the classification
• Quantifies how well model has ranked test
data but doesn’t consider magnitude of
response
Log Loss
Q & A
My Coordinates
LinkedIn : https://www.linkedin.com/in/kushal-wadhwani-02109a1a/
Email : kushal.wadhwani@vizury.com
To know more about Vizury visit : https://www.vizury.com/
Data Science Behind Display Ads in Digital Marketing

Weitere ähnliche Inhalte

Mehr von Digital Vidya

Social Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourSocial Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourDigital Vidya
 
How to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesHow to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesDigital Vidya
 
Native Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeNative Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeDigital Vidya
 
Personal Branding Using Social Media
Personal Branding Using Social MediaPersonal Branding Using Social Media
Personal Branding Using Social MediaDigital Vidya
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTDigital Vidya
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in SparkDigital Vidya
 
Community Development with Social Media
Community Development with Social MediaCommunity Development with Social Media
Community Development with Social MediaDigital Vidya
 
Framework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaFramework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaDigital Vidya
 
The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018Digital Vidya
 
People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First Digital Vidya
 
Going Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyGoing Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyDigital Vidya
 
How to Optimize your Online Presence for 6X Growth in Sales?
 How to Optimize your Online Presence for 6X Growth in Sales? How to Optimize your Online Presence for 6X Growth in Sales?
How to Optimize your Online Presence for 6X Growth in Sales?Digital Vidya
 
What Does The Shift To Digital PR Mean For Your Brand
What Does The Shift To Digital PR Mean For Your BrandWhat Does The Shift To Digital PR Mean For Your Brand
What Does The Shift To Digital PR Mean For Your BrandDigital Vidya
 
Building a Digital Video Strategy Without Breaking the Bank
Building a Digital Video Strategy Without Breaking the BankBuilding a Digital Video Strategy Without Breaking the Bank
Building a Digital Video Strategy Without Breaking the BankDigital Vidya
 
Life as a Digital Marketer
Life as a Digital MarketerLife as a Digital Marketer
Life as a Digital MarketerDigital Vidya
 
Accelerated Mobile Pages (AMP) to Win Search War in 2017
Accelerated Mobile Pages (AMP) to Win Search War in 2017Accelerated Mobile Pages (AMP) to Win Search War in 2017
Accelerated Mobile Pages (AMP) to Win Search War in 2017Digital Vidya
 
How to Master SEO in 2017
How to Master SEO in 2017How to Master SEO in 2017
How to Master SEO in 2017Digital Vidya
 
Content Marketing for B2B: Aligning Storytelling to Lead Generation
Content Marketing for B2B: Aligning Storytelling to Lead GenerationContent Marketing for B2B: Aligning Storytelling to Lead Generation
Content Marketing for B2B: Aligning Storytelling to Lead GenerationDigital Vidya
 

Mehr von Digital Vidya (20)

Social Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User BehaviourSocial Video Analytics: From Demography to Psychography of User Behaviour
Social Video Analytics: From Demography to Psychography of User Behaviour
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
 
How to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to SalesHow to Use Marketing Automation to Convert More Leads to Sales
How to Use Marketing Automation to Convert More Leads to Sales
 
Native Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising LandscapeNative Advertising: Changing Digital Advertising Landscape
Native Advertising: Changing Digital Advertising Landscape
 
Personal Branding Using Social Media
Personal Branding Using Social MediaPersonal Branding Using Social Media
Personal Branding Using Social Media
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoT
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
 
Community Development with Social Media
Community Development with Social MediaCommunity Development with Social Media
Community Development with Social Media
 
Framework of Digital Media Marketing in India
Framework of Digital Media Marketing in IndiaFramework of Digital Media Marketing in India
Framework of Digital Media Marketing in India
 
The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018The Secret to Search Engine Marketing Success in 2018
The Secret to Search Engine Marketing Success in 2018
 
People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First People Centric Marketing - Create Impact by Putting People First
People Centric Marketing - Create Impact by Putting People First
 
Going Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business GloballyGoing Global? Key Steps to Expanding Your Business Globally
Going Global? Key Steps to Expanding Your Business Globally
 
How to Optimize your Online Presence for 6X Growth in Sales?
 How to Optimize your Online Presence for 6X Growth in Sales? How to Optimize your Online Presence for 6X Growth in Sales?
How to Optimize your Online Presence for 6X Growth in Sales?
 
What Does The Shift To Digital PR Mean For Your Brand
What Does The Shift To Digital PR Mean For Your BrandWhat Does The Shift To Digital PR Mean For Your Brand
What Does The Shift To Digital PR Mean For Your Brand
 
Building a Digital Video Strategy Without Breaking the Bank
Building a Digital Video Strategy Without Breaking the BankBuilding a Digital Video Strategy Without Breaking the Bank
Building a Digital Video Strategy Without Breaking the Bank
 
Life as a Digital Marketer
Life as a Digital MarketerLife as a Digital Marketer
Life as a Digital Marketer
 
Accelerated Mobile Pages (AMP) to Win Search War in 2017
Accelerated Mobile Pages (AMP) to Win Search War in 2017Accelerated Mobile Pages (AMP) to Win Search War in 2017
Accelerated Mobile Pages (AMP) to Win Search War in 2017
 
How to Master SEO in 2017
How to Master SEO in 2017How to Master SEO in 2017
How to Master SEO in 2017
 
Content Marketing for B2B: Aligning Storytelling to Lead Generation
Content Marketing for B2B: Aligning Storytelling to Lead GenerationContent Marketing for B2B: Aligning Storytelling to Lead Generation
Content Marketing for B2B: Aligning Storytelling to Lead Generation
 
Marketing Metrics
Marketing MetricsMarketing Metrics
Marketing Metrics
 

Kürzlich hochgeladen

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Data Science Behind Display Ads in Digital Marketing

  • 1.
  • 2. Data Science behind Display Ads in Digital Marketing Kushal Wadhwani Senior Data Scientist
  • 3. We Help Marketers Increase Digital Share of Business $30M FUNDING Singapore, South East Asia Bangalore, India Dubai, UAE Dallas, USA CERTIFICATIONS FOCUS
  • 5. Use Case: Bring back a prospective user 1) User visits hdfc website , browsed for personal loan 2) Drops off without submitting lead 3) Visits our publisher network 4) Vizury shows add with personalized banners and quotes 5) User Clicks banner 6) Reaches back to hdfc website
  • 6. Some of the Channels Powered by Vizury Programmatic Mobile Push Browser Push / InstagFacebookram
  • 8. Optimization problem behind Programmatic Pays for impression Maximize clicks Publishers Clients
  • 9. Parameters to Optimize 1. What to bid • Depends upon probability of click of that user • Depends upon probability of click of that ad slot bidValue ∝ P( click / ad slot, user) ctr (click through rate) = 100* P( click / ad slot, user) 2. What to Show • Products visited by the user • Products and message suggested by the client
  • 10. Data : Collection and processing
  • 12. User variables and Ad slot variables User variables 1) Time spent on website 2) Products visited 3) Number of impression’s shown 4) Number of clicks Ad slot variables 1) Size of banner 2) Url of the ad slot
  • 13. Problem formulation • Classification problem • 50 – 100 variables • Both Numerical and categorical variables • Massive amount of data to train Id Categorical variable 1 Categorical variable 2 Numerical variable 1 Numerical variable 2 - - - - Click flag 1 xyz abc 1 0 0 2 - - - - 1 3 - - - - 0 xyz abc ? ? ? Ad slot variables User level variables Historical data New bid request
  • 15. Logistic Regression Pros: • Handles all linear interactions between variables • There are established scalable algorithms for training • Handles High cardinality categorical variables Cons: • Assumes that variables are linearly related to the log odds ratio • Does not handles non linear interactions well ln[p/(1-p)] =  + WTX • p is the probability that the event Y occurs, p(Y=1) • p/(1-p) is the "odds ratio" • ln[p/(1-p)] is the log odds ratio, or "logit" p = 1/[1 + exp(- - WTX)]
  • 16. Decision tree based Models Pros: • Handles non liner correlation of input variables with output variable • Handles non linear interactions • Models are intuitive, easy to understand and explain Cons: • Challenges in handling high cardinality categorical variables Random Forrest XGBoost
  • 17. Neural Networks Pros: • Handles non liner correlation of input variables with output variable • Handles non linear interactions of variables • Handles High cardinality categorical variables • Works well for large data sets Cons: • Models are not readable
  • 18. Variable Insights and triage 1. Visualize variables • Plot distributions • Variable Vs ctr - visually try to see the nature of correlation • Cardinality of categorical variables 2. How to preprocess variable 3. Evaluate variable against ML techniques
  • 19. Variable Insights : Numerical variable’s Skewed Distribution Non linear correlation var1var2 Distribution Correlation
  • 20. Handling Skew and non linearity Non Linear correlation Skewed Distribution Logistic regression N N Decision tree based models Y Y Neural networks Y Y • In general it is better to preprocess variables with skew • Log transformation newvalue = log (oldvalue) • Bucketization
  • 21. Handling Skew and non linearity : Log transformationBeforeAfter Distribution Correlation
  • 22. Handling Skew and non linearity : Bucketization Bucketized var1 Distribution within buckets
  • 23. Variable Insights : Interaction of variables Non linear interaction Logistic regression N Decision tree based models Y Neural networks Y var1 vs var2 with size of circle representing ctr
  • 24. Variable Insights : Categorical variables Cardinality 104 Cardinality 10
  • 25. Categorical variables Neural network and logistic regression doesn’t handle categorical variables out of the box, variable have to be converted into numerical variables 1. One hot encoding – creates one new variable for each categorical value 2. Replace categorical value with its class weigh in our case ctr. Interactions with other variables cannot be captured High cardinality categorical variables Interaction between categorical variables Logistic regression Y N Decision tree based models N Y Neural networks Y Y
  • 26. Evaluation Metrics AUC (Area under curve) : 2 D plot of False positive rate Vs True positive rate obtained by changing threshold • Random probability will give auc of 0.5 • More the AUC better is the classification • Quantifies how well model has ranked test data but doesn’t consider magnitude of response Log Loss
  • 27. Q & A My Coordinates LinkedIn : https://www.linkedin.com/in/kushal-wadhwani-02109a1a/ Email : kushal.wadhwani@vizury.com To know more about Vizury visit : https://www.vizury.com/