SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Machine	
  learning	
  workshop	
  
guodong@hulu.com	
  

Machine	
  learning	
  introduc7on	
  

Logis&c	
  regression	
  
Feature	
  selec7on	
  
Boos7ng,	
  tree	
  boos7ng	
  
	
  

See	
  more	
  machine	
  learning	
  post:	
  h>p://dongguo.me	
  	
  
Overview	
  of	
  machine	
  learning	
  
	
  	
  

Machine	
  Learning	
  

Unsupervised	
  
Learning	
  

Supervised	
  
Learning	
  

Classifica7on	
  
Logis7c	
  
regression	
  

Semi-­‐supervised	
  
Learning	
  

Regression	
  
How	
  to	
  choose	
  a	
  suitable	
  model?	
  
Characteris&c	
  

Naïve	
  
Bayes	
  

Trees	
   K	
  Nearest	
  
neighbor	
  

Logis&c	
  
regression	
  

Neural	
  
SVM	
  
Networks	
  

Computa7onal	
  
scalability	
  

3	
  

3	
  

1	
  

3	
  

1	
  

1	
  

Interpretability	
  

	
  2	
  

2	
  	
  

1	
  

	
  2	
  

1	
  

1	
  

Predic7ve	
  power	
  

1	
  

1	
  

	
  3	
  

2	
  

3	
  

3	
  

Natural	
  handling	
  
data	
  of	
  “mixed”	
  type	
  

1	
  

3	
  

1	
  

1	
  

1	
  

1	
  

Robustness	
  to	
  
outliers	
  in	
  input	
  
space	
  

	
  3	
  

3	
  

3	
  

3	
  	
  

1	
  

1	
  

<Elements	
  of	
  Sta-s-cal	
  Learning>	
  II	
  P351	
  	
  	
  
Why	
  model	
  can’t	
  perform	
  perfectly	
  on	
  unseen	
  data	
  
•  Expected	
  risk	
  

•  Empirical	
  risk	
  
•  Choose	
  func7on	
  family	
  for	
  predic7on	
  func7ons	
  	
  
•  Error	
  
Logis7c	
  regression	
  
Outline	
  
• 
• 
• 
• 
• 

Introduc7on	
  
Inference	
  
Regulariza7on	
  
Experiments	
  
More	
  
–  Mul7-­‐nominal	
  LR	
  
–  Generalized	
  linear	
  model	
  

•  Applica7on	
  
Logit	
  func7on	
  and	
  logis7c	
  func7on	
  
•  Logit	
  func7on	
  
	
  
•  logis7c	
  func7on:	
  Inversed	
  logit	
  
Logis7c	
  regression	
  
•  Predic7on	
  func7on	
  
Inference	
  with	
  maximize	
  likelihood	
  (1)	
  
•  Likelihood	
  

•  Inference	
  
Inference	
  with	
  maximize	
  likelihood	
  (2)	
  
•  Inference	
  cont.	
  

•  Use	
  gradient	
  descent	
  

	
  
•  Stochas7c	
  gradient	
  descent	
  
Regulariza7on	
  
•  Penalize	
  large	
  weight	
  to	
  avoid	
  overfi`ng	
  
	
  
–  L2	
  regulariza7on	
  

	
  
–  L1	
  regulariza7on	
  
Regulariza7on:	
  Maximum	
  a	
  posteriori	
  
•  MAP	
  
L2	
  regulariza7on	
  :	
  Gaussian	
  Prior	
  	
  
•  Gaussian	
  prior	
  
	
  
•  MAP	
  
	
  
•  Gradient	
  descent	
  step	
  
L1	
  regulariza7on	
  :	
  Laplace	
  Prior	
  	
  
•  Laplace	
  prior	
  

•  MAP	
  
	
  
•  Gradient	
  descent	
  step	
  
Implementa7on	
  
•  L2	
  LR	
  
_weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);
	
  
•  L1	
  LR	
  
if (_weightOfFeatures[fea] > 0)
{
_weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam;
if (_weightOfFeatures[fea] < 0)
_weightOfFeatures[fea] = 0;
}else if (_weightOfFeatures[fea] < 0)
{
_weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam;
if (_weightOfFeatures[fea] > 0)
_weightOfFeatures[fea] = 0;
}else{
_weightOfFeatures[fea] += step * (feaValue * error);
}
L2	
  VS.	
  L1	
  
•  L2	
  regulariza7on	
  
–  Almost	
  all	
  weights	
  are	
  not	
  equal	
  to	
  zero	
  
–  Not	
  suitable	
  when	
  training	
  samples	
  are	
  scarce	
  

•  L1	
  regulariza7on	
  
–  Produces	
  sparse	
  parameter	
  vectors	
  
–  More	
  suitable	
  when	
  most	
  features	
  are	
  irrelevant	
  
–  Could	
  handle	
  scarce	
  training	
  samples	
  be>er	
  
Experiments	
  
•  Dataset	
  

–  Goal:	
  gender	
  predic7on	
  
–  Dataset:	
  train	
  samples	
  (431k),	
  test	
  samples	
  (167k)	
  

•  Comparison	
  algorithms	
  

–  A:	
  gradient	
  descent	
  with	
  L1	
  regulariza7on	
  
–  B:	
  gradient	
  descent	
  with	
  L2	
  regulariza7on	
  
–  C:	
  OWL-­‐QN	
  (L-­‐BFGS	
  based	
  op7miza7on	
  with	
  L1	
  regulariza7on)	
  

•  Parameters	
  choice	
  
– 
– 
– 
– 

Regulariza7on	
  value	
  
Step(learning	
  speed)	
  
Decay	
  ra7o	
  
Itera7on	
  over	
  condi7on	
  

•  Max	
  itera7on	
  7mes(50)	
  ||	
  	
  AUC	
  change	
  <=0.0005	
  
Experiments	
  (cont.)	
  
•  Experiments	
  results	
  
Parameters	
  and	
  
metrics	
  

gradient	
  descent	
  with	
   gradient	
  descent	
  with	
  
L1	
  
L2	
  

OWL-­‐QN	
  

‘best’	
  regulariza7on	
   0.001~0.005	
  
term	
  

0.0002~0.001	
  

1	
  

Best	
  step	
  

0.05	
  

0.02~0.05	
  

-­‐	
  

Best	
  decay	
  ra7o	
  

0.85	
  

0.85	
  

-­‐	
  

Itera7on	
  7mes	
  

26	
  

20~26	
  

48	
  

Not	
  zero	
  feature	
  /	
  
all	
  feature	
  

10492/10938	
  

10938/10938	
  

6629/10938	
  

AUC	
  

0.8470	
  

0.8463	
  

0.8467	
  
Mul7-­‐nominal	
  logis7c	
  regression	
  
•  Predic7on	
  func7on	
  

	
  
•  Inference	
  with	
  maximize	
  likelihood	
  

•  Gradient	
  descent	
  step	
  (L2)	
  
More	
  Link	
  func7ons	
  
•  Inference	
  with	
  maximize	
  likelihood	
  
	
  
•  Link	
  func7on	
  
•  Link	
  func7ons	
  for	
  binomial	
  distribu7on	
  
–  Logit	
  func7on	
  

–  Probit	
  func7on	
  
–  Log-­‐log	
  func7on	
  
Generalized	
  linear	
  model	
  
• 

What	
  is	
  GLM	
  

–  Generaliza7on	
  of	
  linear	
  regression	
  
–  Connect	
  linear	
  model	
  with	
  response	
  variable	
  by	
  link	
  func7on	
  
–  More	
  distribu7on	
  for	
  response	
  variable	
  

• 

Typical	
  GLM	
  

• 

Overview	
  

	
  	
  

–  Linear	
  regression	
  ,	
  Logis7c	
  regression,	
  Poisson	
  regression	
  
Applica7on	
  
•  Yahoo	
  

–  <Personalized	
  Click	
  Predic7on	
  in	
  Sponsored	
  Search>	
  WSDM’10	
  

•  Microsoq	
  

–  <Scalable	
  Training	
  of	
  L1-­‐Regularized	
  Log-­‐Linear	
  Models>	
  ICML’07	
  

•  Baidu	
  

–  Contextual	
  ads	
  CTR	
  predic7on	
  

•  h>p://www.docin.com/p-­‐376254439.html	
  

•  Hulu	
  
– 
– 
– 
– 

Demographic	
  targe7ng	
  
Other	
  ad-­‐targe7ng	
  project	
  
Custom	
  churn	
  predic7on	
  
More…	
  
Reference	
  
•  ‘Scalable	
  Training	
  of	
  L1-­‐Regularized	
  Log-­‐Linear	
  
Models’	
  ICML’07	
  
–  h>p://www.docin.com/p-­‐376254439.html#	
  

•  ‘Genera-ve	
  and	
  discrimina-ve	
  classifiers:	
  Naïve	
  
Bayes	
  and	
  logis-c	
  regression’	
  by	
  Mitchell	
  
•  ‘Feature	
  selec-on,	
  L1	
  vs.	
  L2	
  regulariza-on,	
  and	
  
rota-onal	
  invariance’	
  ICML’04	
  
Recommended	
  resources	
  
•  Machine	
  Learning	
  open	
  class	
  –	
  by	
  Andrew	
  Ng	
  
–  //10.20.0.130/TempShare/Machine-­‐Learning	
  Open	
  Class	
  

•  h>p://www.cnblogs.com/vivounicorn/archive/
2012/02/24/2365328.html	
  
•  logis7c	
  regression	
  Implementa7on[link]	
  
–  //10.20.0.130/TempShare/guodong/Logis7c	
  regression	
  Implementa7on/	
  

–  Support	
  binomial	
  and	
  mul7nominal	
  LR	
  with	
  L1	
  and	
  L2	
  regulariza7on	
  

•  OWL-­‐QN	
  
–  //10.20.0.130/TempShare/guodong/OWL-­‐QN/	
  
Thanks	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Dong Guo
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
Bernard Ong
 

Was ist angesagt? (19)

Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...Optimization Technique for Feature Selection and Classification Using Support...
Optimization Technique for Feature Selection and Classification Using Support...
 
Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)Feature Selection Techniques for Software Fault Prediction (Summary)
Feature Selection Techniques for Software Fault Prediction (Summary)
 
Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...Network Based Intrusion Detection System using Filter Based Feature Selection...
Network Based Intrusion Detection System using Filter Based Feature Selection...
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Feature Selection
Feature Selection Feature Selection
Feature Selection
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Query processing-and-optimization
Query processing-and-optimizationQuery processing-and-optimization
Query processing-and-optimization
 
Using FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources CanadaUsing FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources Canada
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
 

Ähnlich wie Logistic Regression

2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
Jungkyu Lee
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
Nitay Joffe
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
Nitay Joffe
 

Ähnlich wie Logistic Regression (20)

Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
RapidMiner: Advanced Processes And Operators
RapidMiner:  Advanced Processes And OperatorsRapidMiner:  Advanced Processes And Operators
RapidMiner: Advanced Processes And Operators
 
RapidMiner: Advanced Processes And Operators
RapidMiner:  Advanced Processes And OperatorsRapidMiner:  Advanced Processes And Operators
RapidMiner: Advanced Processes And Operators
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model머피의 머신러닝 13 Sparse Linear Model
머피의 머신러닝 13 Sparse Linear Model
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
Workflowsim escience12
Workflowsim escience12Workflowsim escience12
Workflowsim escience12
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
Learning to Optimize
Learning to OptimizeLearning to Optimize
Learning to Optimize
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 

Mehr von Dong Guo (6)

Convex optimization methods
Convex optimization methodsConvex optimization methods
Convex optimization methods
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
机器学习概述
机器学习概述机器学习概述
机器学习概述
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Logistic Regression

  • 1. Machine  learning  workshop   guodong@hulu.com   Machine  learning  introduc7on   Logis&c  regression   Feature  selec7on   Boos7ng,  tree  boos7ng     See  more  machine  learning  post:  h>p://dongguo.me    
  • 2. Overview  of  machine  learning       Machine  Learning   Unsupervised   Learning   Supervised   Learning   Classifica7on   Logis7c   regression   Semi-­‐supervised   Learning   Regression  
  • 3. How  to  choose  a  suitable  model?   Characteris&c   Naïve   Bayes   Trees   K  Nearest   neighbor   Logis&c   regression   Neural   SVM   Networks   Computa7onal   scalability   3   3   1   3   1   1   Interpretability    2   2     1    2   1   1   Predic7ve  power   1   1    3   2   3   3   Natural  handling   data  of  “mixed”  type   1   3   1   1   1   1   Robustness  to   outliers  in  input   space    3   3   3   3     1   1   <Elements  of  Sta-s-cal  Learning>  II  P351      
  • 4. Why  model  can’t  perform  perfectly  on  unseen  data   •  Expected  risk   •  Empirical  risk   •  Choose  func7on  family  for  predic7on  func7ons     •  Error  
  • 6. Outline   •  •  •  •  •  Introduc7on   Inference   Regulariza7on   Experiments   More   –  Mul7-­‐nominal  LR   –  Generalized  linear  model   •  Applica7on  
  • 7. Logit  func7on  and  logis7c  func7on   •  Logit  func7on     •  logis7c  func7on:  Inversed  logit  
  • 8. Logis7c  regression   •  Predic7on  func7on  
  • 9. Inference  with  maximize  likelihood  (1)   •  Likelihood   •  Inference  
  • 10. Inference  with  maximize  likelihood  (2)   •  Inference  cont.   •  Use  gradient  descent     •  Stochas7c  gradient  descent  
  • 11. Regulariza7on   •  Penalize  large  weight  to  avoid  overfi`ng     –  L2  regulariza7on     –  L1  regulariza7on  
  • 12. Regulariza7on:  Maximum  a  posteriori   •  MAP  
  • 13. L2  regulariza7on  :  Gaussian  Prior     •  Gaussian  prior     •  MAP     •  Gradient  descent  step  
  • 14. L1  regulariza7on  :  Laplace  Prior     •  Laplace  prior   •  MAP     •  Gradient  descent  step  
  • 15. Implementa7on   •  L2  LR   _weightOfFeatures[fea] += step * (feaValue * error - reguParam * _weightOfFeatures[fea]);   •  L1  LR   if (_weightOfFeatures[fea] > 0) { _weightOfFeatures[fea] += step * (feaValue * error) - step * reguParam; if (_weightOfFeatures[fea] < 0) _weightOfFeatures[fea] = 0; }else if (_weightOfFeatures[fea] < 0) { _weightOfFeatures[fea] += step * (feaValue * error) + step * reguParam; if (_weightOfFeatures[fea] > 0) _weightOfFeatures[fea] = 0; }else{ _weightOfFeatures[fea] += step * (feaValue * error); }
  • 16. L2  VS.  L1   •  L2  regulariza7on   –  Almost  all  weights  are  not  equal  to  zero   –  Not  suitable  when  training  samples  are  scarce   •  L1  regulariza7on   –  Produces  sparse  parameter  vectors   –  More  suitable  when  most  features  are  irrelevant   –  Could  handle  scarce  training  samples  be>er  
  • 17. Experiments   •  Dataset   –  Goal:  gender  predic7on   –  Dataset:  train  samples  (431k),  test  samples  (167k)   •  Comparison  algorithms   –  A:  gradient  descent  with  L1  regulariza7on   –  B:  gradient  descent  with  L2  regulariza7on   –  C:  OWL-­‐QN  (L-­‐BFGS  based  op7miza7on  with  L1  regulariza7on)   •  Parameters  choice   –  –  –  –  Regulariza7on  value   Step(learning  speed)   Decay  ra7o   Itera7on  over  condi7on   •  Max  itera7on  7mes(50)  ||    AUC  change  <=0.0005  
  • 18. Experiments  (cont.)   •  Experiments  results   Parameters  and   metrics   gradient  descent  with   gradient  descent  with   L1   L2   OWL-­‐QN   ‘best’  regulariza7on   0.001~0.005   term   0.0002~0.001   1   Best  step   0.05   0.02~0.05   -­‐   Best  decay  ra7o   0.85   0.85   -­‐   Itera7on  7mes   26   20~26   48   Not  zero  feature  /   all  feature   10492/10938   10938/10938   6629/10938   AUC   0.8470   0.8463   0.8467  
  • 19. Mul7-­‐nominal  logis7c  regression   •  Predic7on  func7on     •  Inference  with  maximize  likelihood   •  Gradient  descent  step  (L2)  
  • 20. More  Link  func7ons   •  Inference  with  maximize  likelihood     •  Link  func7on   •  Link  func7ons  for  binomial  distribu7on   –  Logit  func7on   –  Probit  func7on   –  Log-­‐log  func7on  
  • 21. Generalized  linear  model   •  What  is  GLM   –  Generaliza7on  of  linear  regression   –  Connect  linear  model  with  response  variable  by  link  func7on   –  More  distribu7on  for  response  variable   •  Typical  GLM   •  Overview       –  Linear  regression  ,  Logis7c  regression,  Poisson  regression  
  • 22. Applica7on   •  Yahoo   –  <Personalized  Click  Predic7on  in  Sponsored  Search>  WSDM’10   •  Microsoq   –  <Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear  Models>  ICML’07   •  Baidu   –  Contextual  ads  CTR  predic7on   •  h>p://www.docin.com/p-­‐376254439.html   •  Hulu   –  –  –  –  Demographic  targe7ng   Other  ad-­‐targe7ng  project   Custom  churn  predic7on   More…  
  • 23. Reference   •  ‘Scalable  Training  of  L1-­‐Regularized  Log-­‐Linear   Models’  ICML’07   –  h>p://www.docin.com/p-­‐376254439.html#   •  ‘Genera-ve  and  discrimina-ve  classifiers:  Naïve   Bayes  and  logis-c  regression’  by  Mitchell   •  ‘Feature  selec-on,  L1  vs.  L2  regulariza-on,  and   rota-onal  invariance’  ICML’04  
  • 24. Recommended  resources   •  Machine  Learning  open  class  –  by  Andrew  Ng   –  //10.20.0.130/TempShare/Machine-­‐Learning  Open  Class   •  h>p://www.cnblogs.com/vivounicorn/archive/ 2012/02/24/2365328.html   •  logis7c  regression  Implementa7on[link]   –  //10.20.0.130/TempShare/guodong/Logis7c  regression  Implementa7on/   –  Support  binomial  and  mul7nominal  LR  with  L1  and  L2  regulariza7on   •  OWL-­‐QN   –  //10.20.0.130/TempShare/guodong/OWL-­‐QN/  

Hinweis der Redaktion

  1. Unsupervised learning(聚类,降维(topic model)): learn structure from unlabeled data. Closely related with density estimation; summarize the dataSemi-supervised learning: use both labeled and unlabeled samples for training; It’s cost to collect lots of labels sometimes, use both
  2. 除此之外,你对模型的熟悉程度。
  3. Expected risk: 定义好loss function,选择一个预估函数,有一个输入变量和response value的联合分布,在该联合分布上对损失函数求积分,即为期望风险;通过最小化该期望风险,我们找到一个最优的预估函数。但是实际上,我们并不知道该联合分布,我们有的是从该联合分布中有偏或无偏采样得到的有限样本,可能还有一些noise点。我们转为最小化在该有限样本上的最小化loss function寻找预估函数。 即我们转为最小化经验风险另一方面,我们往往给目标函数指定function family,该function family极有可能没有包含最优或者较优的那些目标函数。误差的大小: 第一部分:函数family F中的预估函数有多接近真正最优的预估函数;第二部分:我们选择优化经验而不是经验风险
  4. Logistic regression is one of the most popular classifier.Advantage: 1. easy understand and implement; 2. not bad performance; 3. light weight and less time taken for training and prediction;(can handle large dataset) 4. easy parallelizationValue to attendances:Know about what is logistic regression, what’s the advantages and disadvantage. what kind of problems are suitable apply to.L1 and L2 regularizationHow to inference through maximize likelihood with gradient descent. And know how to implement it
  5. 对于generalized linear model,如果response variable是binomial或者multinomial分布,且选择了logit function作为link function 就是logistic regressionLogistic function 是logit function的反函数
  6. Binary(binomial) logistic regression
  7. 负梯度方向是使得函数值下降最快的方向
  8. 在重新计算likelihood前,我们看一下这2种正则化背后的理论基础
  9. 假设全部的weight服从一致的分布
  10. Laplace 分布一阶倒数不连续假设全部的weight服从一致的分布(均值为0,Laplace参数也一样)W_k在一次更新中不能变换正负号
  11. L1拟合得到的weight通常较稀疏,带来2点好处: 帮我们做特征选择,工程上更有利
  12. 增加了decay ratio:AUC稍有提高(0.845 -&gt; 0.847)在不同step时,适合的decay ratio也不一样Iteration times: 与样本量的大小有关
  13. 例子:今天是高考第一天,高考选专业,每个人有多个候选,但是仅能选择一门专业(计算机,金融,化学,数学,物理,生物)和binomial分布应用的区别多类问题,可以转化为多个两类问题,如果我们的问题是“找出每门课成绩前10%的学生”,我们可以用两类logistic regression来做如果问题是“对于每个学生找出其成绩最好的课,或者最好的几门课”,两类问题就不是很适合 (每一类上的预估概率之和不等于1,无法比较不同类上的概率)Multi-nominal适用于response value为category的情况,不太适合ordinal的情况。我实现了。
  14. Link function: (1) generalized linear model的重要组成部分:将linear regression拓展到generalized linear model;(2)link function的反函数的自变量介于(-无穷,+无穷),若y服从binominal分布,应变量介于【0,1】区间The inverse of any continuous cumulative distribution function (CDF) can be used for the link since the CDF’s range is [0,1]
  15. Generalized linear model 广义上的线性模型,都有一个基本的线性单元W*X(linear regression),通过各种link function建立该线性单元和各种分布的response variable的关系。包含linear regression (normal distribution),logistic regression (binominal/multi-nominal distribution), Poisson regression (Poisson distribution)对于binominal/multi-nominal distribution,我们也可以选择除logit link function之外的link function (广义的logistic regression)