Logistic regression is a classification algorithm that uses a linear regression approach to model the probability of discrete outcomes. It assumes the logarithm of the odds ratio for each class is a linear combination of the input features. While logistic regression expects linear relationships, non-linear transformations of features can allow it to model non-linear data. It outputs coefficients that indicate the impact and direction of influence for each feature.
3. BigML, Inc 3Topic Models
Logistic Regression
• Classification implies a discrete objective. How can this be a
regression?
• Why do we need another classification algorithm?
• more questions….
Logistic Regression is a classification algorithm
Potential Confusion:
7. BigML, Inc 7Topic Models
Regression
• Linear Regression: 𝛽₀+𝛽1·(INPUT) ≈ OBJECTIVE
• Quadratic Regression: 𝛽₀+𝛽1·(INPUT)+𝛽2·(INPUT)2 ≈
OBJECTIVE
• Decision Tree Regression: DT(INPUT) ≈ OBJECTIVE
• Problem:
• What if we want to do a classification problem: T/F or 1/0
• What function can we fit to discrete data?
Regression is the process of "fitting" a function to the data
Key Take-Away:
10. BigML, Inc 10Topic Models
Logistic Function
𝑥➝-∞
𝑓(𝑥)➝0
• Looks promising, but still not "discrete"
• What about the "green" in the middle?
• Let’s change the problem…
𝑥➝∞
𝑓(𝑥)➝1
Goal
1
1 + 𝒆− 𝑥
𝑓(𝑥) =
Logistic Function
12. BigML, Inc 12Topic Models
Logistic Regression
• Assumes that output is linearly related to "predictors"
• What? (hang in there…)
• Sometimes we can "fix" this with feature engineering
• Question: how do we "fit" the logistic function to real data?
LR is a classification algorithm … that uses a regression …
to model the probability of the discrete objective
Clarification:
Caveats:
13. BigML, Inc 13Topic Models
Logistic Regression
𝛽₀ is the "intercept"
𝛽₁ is the "coefficient"
• In which case solving is now a linear regression
• But this is only one dimension, that is one feature 𝑥…
• Given training data consisting of inputs 𝑥, and probabilities 𝑃
• Solve for 𝛽₀ and 𝛽₁ to fit the logistic function
• How? The inverse of the logistic function is called the "logit":
𝑃(𝑥)=
1
1+𝑒−(𝛽0+𝛽1 𝑥)
𝑙𝑛( )𝑃(𝑥)
𝑃(𝑥ʹ)
=𝑙𝑛 ( )1-𝑃(𝑥ʹ)
𝑃(𝑥)
=𝛽0+𝛽1 𝑥
17. BigML, Inc 17Topic Models
LR Parameters
1. Default Numeric: Replaces missing numeric values
2. Missing Numeric: Adds a field for missing numerics
3. Stats: Extended statistics, ex: p-value (runs slower)
4. Bias: Enables/Disables the intercept term - 𝛽₀
• Don’t disable this…
5. Regularization: Reduces over-fitting by minimizing 𝛽𝑗
• L1: prefers reducing individual coefficients
• L2 (default): prefers reducing all coefficients
6. Strength "C": Higher values reduce regularization
7. EPS: The minimum error between steps to stop
Larger values stop earlier but quality may be less
8. Auto-scaling: Ensures that all features contribute equally
• Don’t change this unless you have a specific reason
18. BigML, Inc 18Topic Models
LR Questions
• How do we handle multiple classes?
• Binary class True/False only need to solve for one
𝑃(True) ≡ 1− 𝑃(False)
• What about non-numeric inputs?
• Text/Items fields
• Categorical fields
Questions:
19. BigML, Inc 19Topic Models
LR - Multi Class
• Instead of a binary class ex: [ true, false ],
we have multi-class ex: [ red, green, blue, … ]
• "𝑘" classes: 𝑪=[𝑐1, 𝑐2,⋯, 𝑐 𝑘]
• solve one-vs-rest LR
• Result: 𝞫𝑗 for each class 𝑐𝑗
• apply combiner to ensure
all probabilities add to 1
𝑙𝑛 ( )𝑃(𝑐1)
𝑃(𝑐1ʹ)
=𝛽1,0+𝞫1·𝑿
𝑙𝑛 ( )𝑃(𝑐2)
𝑃(𝑐2ʹ)
=𝛽2,0+𝞫2·𝑿
⋯
𝑙𝑛 ( )𝑃(𝑐 𝑘)
𝑃(𝑐 𝑘ʹ)
=𝛽 𝑘,0+𝞫 𝑘·𝑿
20. BigML, Inc 20Topic Models
LR - Field Codings
• LR is expecting numeric values to perform regression.
• How do we handle categorical values, or text?
Class color=red color=blue color=green color=NULL
red 1 0 0 0
blue 0 1 0 0
green 0 0 1 0
MISSING 0 0 0 1
One-hot encoding
• Only one feature is "hot" for each class
• This is the default
21. BigML, Inc 21Topic Models
LR - Field Codings
Dummy Encoding
• Chooses a *reference class*
• requires one less degree of freedom
Class color_1 color_2 color_3
*red* 0 0 0
blue 1 0 0
green 0 1 0
MISSING 0 0 1
22. BigML, Inc 22Topic Models
LR - Field Codings
Contrast Encoding
• Field values must sum to zero
• Allows comparison between classes
Class field "influence"
red 0,5 positive
blue -0,25 negative
green -0,25 negative
MISSING 0 excluded
23. BigML, Inc 23Topic Models
LR - Field Codings
Which one to use?
• One-hot is the default
• Use this unless you have a specific need
• Dummy
• Use when there is a control group in mind, which
becomes the reference class
• Contrast
• Allows for testing specific hypothesis of relationships.
• Ex: customers give a "rating" of bad / ok / good
rating
Contrast
Encoding
bad -0,66
ok 0,33
good 0,33
Hypothesis is a
good and ok
review have the
same impact, but
a bad review has
a negative impact
twice as great.
rating
Contrast
Encoding
bad -0,5
ok 0
good 0,5
Hypothesis is that
a good and bad
review have an
equal but opposite
impact, while an
ok rating has no
impact.
24. BigML, Inc 24Topic Models
LR - Field Codings
• Text/Items field types are handled by creating a field
for text token/item and setting 1 or 0
Text "hippo" "safari" "zebra"
“we saw hippos and
zebras…
1 0 1
“The best safari for
seeing zebras”
0 1 1
“The Oregon coast
is rainy in winter”
0 0 0
“Have you ever tried
a hippo burger”
1 0 0
Text / Items
26. BigML, Inc 26Topic Models
Curvilinear LR
• Logistic Regression is expecting a linear relationship
between the features and the objective
• Remember - it’s a linear regression under the hood
• This is actually pretty common in natural datasets
• But non-linear relationships will impact model quality
• This can be addressed by adding non-linear
transformations to the features
• Knowing which transformations requires
• domain knowledge
• experimentation
• both
27. BigML, Inc 27Topic Models
Curvilinear LR
Instead of
We could add a feature
Where
????
Possible to add any higher order terms or other functions to
match shape of data
𝛽0+𝛽1 𝑥1
𝛽0+𝛽1 𝑥1+𝛽2 𝑥2
𝑥1 ≡
29. BigML, Inc 29Topic Models
LR vs DT
• Expects a "smooth" linear
relationship with predictors.
• LR is concerned with probability of
a discrete outcome.
• Lots of parameters to get wrong:
regularization, scaling, codings
• Slightly less prone to over-fitting
• Because fits a shape, might work
better when less data available.
• Adapts well to ragged non-linear
relationships
• No concern: classification,
regression, multi-class all fine.
• Virtually parameter free
• Slightly more prone to over-fitting
• Prefers surfaces parallel to
parameter axes, but given enough
data will discover any shape.
Logistic Regression Decision Tree
31. BigML, Inc 31Topic Models
Summary
• Logistic Regression is a classification algorithm that
models the probabilities of each class
• How the algorithm works and why this is important
• Expects a linear relationship between the features
and the objective, and how to fix it
• Categorical encodings
• LR outputs a set of coefficients and how to interpret
• Scale relates to impact
• Sign relates to direction of impact
• Guidelines for comparing to Decision Trees