16. L2
VS.
L1
• L2
regulariza7on
– Almost
all
weights
are
not
equal
to
zero
– Not
suitable
when
training
samples
are
scarce
• L1
regulariza7on
– Produces
sparse
parameter
vectors
– More
suitable
when
most
features
are
irrelevant
– Could
handle
scarce
training
samples
be>er
17. Experiments
• Dataset
– Goal:
gender
predic7on
– Dataset:
train
samples
(431k),
test
samples
(167k)
• Comparison
algorithms
– A:
gradient
descent
with
L1
regulariza7on
– B:
gradient
descent
with
L2
regulariza7on
– C:
OWL-‐QN
(L-‐BFGS
based
op7miza7on
with
L1
regulariza7on)
• Parameters
choice
–
–
–
–
Regulariza7on
value
Step(learning
speed)
Decay
ra7o
Itera7on
over
condi7on
• Max
itera7on
7mes(50)
||
AUC
change
<=0.0005
18. Experiments
(cont.)
• Experiments
results
Parameters
and
metrics
gradient
descent
with
gradient
descent
with
L1
L2
OWL-‐QN
‘best’
regulariza7on
0.001~0.005
term
0.0002~0.001
1
Best
step
0.05
0.02~0.05
-‐
Best
decay
ra7o
0.85
0.85
-‐
Itera7on
7mes
26
20~26
48
Not
zero
feature
/
all
feature
10492/10938
10938/10938
6629/10938
AUC
0.8470
0.8463
0.8467
20. More
Link
func7ons
• Inference
with
maximize
likelihood
• Link
func7on
• Link
func7ons
for
binomial
distribu7on
– Logit
func7on
– Probit
func7on
– Log-‐log
func7on
21. Generalized
linear
model
•
What
is
GLM
– Generaliza7on
of
linear
regression
– Connect
linear
model
with
response
variable
by
link
func7on
– More
distribu7on
for
response
variable
•
Typical
GLM
•
Overview
– Linear
regression
,
Logis7c
regression,
Poisson
regression
22. Applica7on
• Yahoo
– <Personalized
Click
Predic7on
in
Sponsored
Search>
WSDM’10
• Microsoq
– <Scalable
Training
of
L1-‐Regularized
Log-‐Linear
Models>
ICML’07
• Baidu
– Contextual
ads
CTR
predic7on
• h>p://www.docin.com/p-‐376254439.html
• Hulu
–
–
–
–
Demographic
targe7ng
Other
ad-‐targe7ng
project
Custom
churn
predic7on
More…
23. Reference
• ‘Scalable
Training
of
L1-‐Regularized
Log-‐Linear
Models’
ICML’07
– h>p://www.docin.com/p-‐376254439.html#
• ‘Genera-ve
and
discrimina-ve
classifiers:
Naïve
Bayes
and
logis-c
regression’
by
Mitchell
• ‘Feature
selec-on,
L1
vs.
L2
regulariza-on,
and
rota-onal
invariance’
ICML’04
24. Recommended
resources
• Machine
Learning
open
class
–
by
Andrew
Ng
– //10.20.0.130/TempShare/Machine-‐Learning
Open
Class
• h>p://www.cnblogs.com/vivounicorn/archive/
2012/02/24/2365328.html
• logis7c
regression
Implementa7on[link]
– //10.20.0.130/TempShare/guodong/Logis7c
regression
Implementa7on/
– Support
binomial
and
mul7nominal
LR
with
L1
and
L2
regulariza7on
• OWL-‐QN
– //10.20.0.130/TempShare/guodong/OWL-‐QN/
Unsupervised learning(聚类,降维(topic model)): learn structure from unlabeled data. Closely related with density estimation; summarize the dataSemi-supervised learning: use both labeled and unlabeled samples for training; It’s cost to collect lots of labels sometimes, use both
Logistic regression is one of the most popular classifier.Advantage: 1. easy understand and implement; 2. not bad performance; 3. light weight and less time taken for training and prediction;(can handle large dataset) 4. easy parallelizationValue to attendances:Know about what is logistic regression, what’s the advantages and disadvantage. what kind of problems are suitable apply to.L1 and L2 regularizationHow to inference through maximize likelihood with gradient descent. And know how to implement it
对于generalized linear model,如果response variable是binomial或者multinomial分布,且选择了logit function作为link function 就是logistic regressionLogistic function 是logit function的反函数
Link function: (1) generalized linear model的重要组成部分:将linear regression拓展到generalized linear model;(2)link function的反函数的自变量介于(-无穷,+无穷),若y服从binominal分布,应变量介于【0,1】区间The inverse of any continuous cumulative distribution function (CDF) can be used for the link since the CDF’s range is [0,1]
Generalized linear model 广义上的线性模型,都有一个基本的线性单元W*X(linear regression),通过各种link function建立该线性单元和各种分布的response variable的关系。包含linear regression (normal distribution),logistic regression (binominal/multi-nominal distribution), Poisson regression (Poisson distribution)对于binominal/multi-nominal distribution,我们也可以选择除logit link function之外的link function (广义的logistic regression)