SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Sparse Linear Model
Jungkyu Lee
Daum Search Quality Team
13.1 Introduction
• model-based approach를 사용해서 feature selection하는 방법에 대해서 알아본다
• Application
• small N, large D proble의 경우, featur가 너무 많기 때문에, feature selection을 하고 싶다
• 14장에서, kernel function에 대해서 다룬다. (sparse kernel machine)
• feature selecton이 N개의 training example 중 부분 집합만 사용하는 방법이다.
5.3 Bayesian model selection
• regression시 너무 높은 degree의 polynomial을 쓰면 overfitting이 일어날 수 있고 반대로 너무 낮
은 degee의 polynomial을 쓰면 underfiting이 일어날 수 있다
• 다른 복잡도를 가진 모델을 만날을 때, 일반적으로 어떤 것이 가장 좋은 모델인가?
• 13장에서 다룰 것 model = feature subset 입니다
• Approach
• One approach is to use cross-validation to estimate the generalization error of all the
candidate models, and then to pick the model that seems the best.
• A more efficient approach is to compute the posterior over models (Bayesian model selection.

• If we use a uniform prior over models, p(m)∝1, this amounts to picking the model which
maximizes

marginal likelihood
cross-validation은 train와 test셋을 나누어야 하고 (보통 cs community에서 많이 함)
posterior 법은 train set으로만 하는 것 같다 (bic,aic)  이건 솔직히 왜 하는지는 아직 이해는 안가지만
5.3.2.4 BIC approximation to log marginal likelihood
• In general, computing the integral in Equation 5.13 can be quite difficult.
• Bayesian information criterion or BIC
likelihood

model complexity

• dof (ˆθ) is the number of degrees of freedom
• penalized log likelihood
Bayesian variable selection
p(D|γ) 구하는 방법
The spike and slab model
wj를 계속 살림
Beroulli Gaussian Model

l0 regulization
최적화 어려움
l1 regulization (lasso)
13.2 Bayesian variable selection
• 어떤 피쳐가 릴러번트한지를 랜덤변수로 본다.
• model = m = γ
• Let γj =1 if feature j is “relevant”, and let γj =0 otherwise.
• Our goal is to compute the posterior over models
13.2 Bayesian variable selection
linregAllsubsetsGraycodeDemo.
Bayesian variable selection
p(D|γ) 구하는 방법
The spike and slab model
wj를 계속 살림
Beroulli Gaussian Model

l0 regulization
최적화 어려움
l1 regulization (lasso)
13.2.1 The spike and slab model
•

을 구체적으로 구하는 방법에 대해서 논의한다 (linear regression의 경우)

• The posterior is given by

the number of non-zero elements of the vector.
13.2 Bayesian variable selection
13.2.1 The spike and slab model

• γ이 0인 것의 feature를 X와 w에서 없앤다, Xr, wr

feature selection γ 에 따라 p(D|γ)의 분산이 바뀐다
13.2 Bayesian variable selection
13.2.1 The spike and slab model
• When the marginal likelihood cannot be computed in closed form (e.g., if we are using logistic
regression or a nonlinear model) . we can approximate it using BIC

model complexity로 페널티
13.2 Bayesian variable selection
13.2.1 The spike and slab model
• 요약하면, p(γ|D)을 구하기 위해

• 결과적으로 feature relevance vector γ의 posterior는

• 즉 (maginal likelihood) – (model complexity)
= (likelihood – model complexity) – (model complexity)
• complexity에 대한 penalties가 두 번 일어나는데, 그냥 λ하나로 묶는다
Bayesian variable selection
p(D|γ) 구하는 방법
The spike and slab model
wj를 계속 살림
Beroulli Gaussian Model

l0 regulization
최적화 어려움
l1 regulization (lasso)
13.2 Bayesian variable selection
13.2.2 From the Bernoulli-Gaussian model to

l0 regularization

• Bernoulli Gaussian model, binary mask model

• spike and slab model 과는 다르게, irrelevant한 coefficients들이 사라지지 않는다
• the binary mask model has the form γj →y←wj, whereas the spike and slab model has the form
γj →wj →y.
13.2 Bayesian variable selection
13.2.2 From the Bernoulli-Gaussian model to
• the Bernoulli-Gaussian model은 l0 regularization을 유도하는데 사용된다.
• 데이터가 주어졌을 때, γ와 w의 posterior는

• joint prior p(γ, w)는 다음과 같이 정의한다

즉 위의 함수를 최소화하는 γ와 w = posterior가 가장 큰 γ와 w

l0 regularization
13.2 Bayesian variable selection
13.2.2 From the Bernoulli-Gaussian model to

l0 regularization

• σ2w→∞,이면,

• likelihood에 model complexity를 더한 BIC 근사와 비슷한 모양이 되었다

• bit vector γ을 없애고 0이 아닌 wj만 표현하므로써, 다음과 같이 표현할 수 있다.

• 이것을 l0 regularization이라고 부른다.

• 하지만 lo regularization은 최적화하기 어렵다.
• 이 장의 나머지에서 l0 regularization을 최적화하는 방법에 대해서 알아본다(lasso)
13.2 Bayesian variable selection
13.2.3 Algorithms
• 앞에서는 γ를 찾을 때 최적화로 찾을 수도 있다(lasso)
• 하지만, 이러한 γ 최적화가 불가능한 경우도 있다.
• Since there are 2D models, we cannot explore the full posterior, or find the globally optimal model.
• Instead we will have to resort to heuristics of one form or another.
• All of the methods we will discuss involve searching through the space of models, and evaluating the
cost f(γ) at each point.
13.2 Bayesian variable selection
13.2.3 Algorithms
13.2.3.1 Greedy search
• Single best replacement:

• 가장 간단한 방법은 greedy hill climbing을 사용하는 것이다.
• 각 단계에서, 변수 하나를 추가하거나 뺌으로써, 도달할 수 있는 모델의 이웃을 정의한다.
• 즉 각 변수에 대해서, 그 것을 추가해서 현재 모델을 능가한다면 추가하고, 그 변수를 뺌으로써
능가한다면, 그 변수를 뺀다.
13.2 Bayesian variable selection
13.2.3 Algorithms
13.2.3.1 Greedy search

(13.27)

• Orthogonal least squares

• λ=0 이면, 식(13.27)에서 모델의 complexity penalty는 없어지고, deletion step의 이유가 없어진다.
왜냐하면, 변수를 쓰지 않음으로써 얻는 이점이 사라지기 때문이다(training error는 계속 준다)
• 이 경우, SBR은 orthogonal least squares = greedy forwards selection와 같아진다

• 현재 feature 집합에서, feature를 하나씩 추가해보고 w를 최적화하면서, 에러가 가장 적은
feature를 고른다.
• We then update the active set by setting γ (t+1)=γ(t)∪{j∗}
• To choose the next feature to add at step t, we need to solve D−Dt least squares problems at step
t,where Dt =|γt| is cardinality of the current active set.
13.2 Bayesian variable selection
13.2.3 Algorithms
13.2.3.1 Greedy search
• Orthogonal matching pursuits

• so we are just looking for the column that is most correlated with the current residual
• This only requires one least squares calculation per iteration and so is faster than orthogonal least
squares, but is not quite as accurate
• 다해보지 말고, 가장, residual과 연관 있는 feature만 테스트한다
• even more aggressive approximation is to just greedily add the feature that is most correlated with
the current residual.
• This is called matching pursuits(Mallat and Zhang 1993).
• This is also equivalent to a method known as least squares boosting (Section 16.4.6).
13.2 Bayesian variable selection
13.2.3 Algorithms
13.2.3.1 Greedy search

• Backwards selection Backwards selection
• starts with all variables in the model (the so called saturated model), and then deletes the worst
one at each step.
• This is equivalent to performing a greedy search from the top of the lattice downwards.
• This can give better results than a bottom-up search, since the decision about whether to keep
a variable or not is made in the context of all the other variables that might depend on it.
(의존 관계가 있을 feature들이 있는 상태에서 selection을 하므로, 성능은 더 좋음)
• However, this method is typically infeasible for large problems, since the saturated model will
be too expensive to fit.(=fit할 feature가 많아서 계산은 많이 한다)
• Bayesian Matching pursuit
• The algorithm of (Schniter et al. 2008) is similiar to OMP except it uses a Bayesian marginal
likelihood scoring criterion (under a spike and slab model) instead of a least squares objective.
13.2 Bayesian variable selection
13.2.3 Algorithms
13.2.3.2 Stochastic search

• If we want to approximate the posterior, rather than just computing a mode (e.g. because we want to
compute marginal inclusion probabilities), one option is to use MCMC.

• The standard approach is to use Metropolis Hastings, where the proposal distribution just flips single
bits
• This enables us to efficiently compute p(γ’|D) given p (γ|D).
• The probability of a state (bit configuration) is estimated by counting how many times the random
walk visits this state.

γ 을 이렇게, 2^D 조합을 다 찾거나, 휴리스틱하게 찾는 방법 말고,
analytically 최적화 하는 방법은 없는 걸까? lasso
Bayesian variable selection
p(D|γ) 구하는 방법
The spike and slab model
wj를 계속 살림
Beroulli Gaussian Model

l0 regulization
최적화 어려움
l1 regulization (lasso)
13.3

l1 regularization: basics 왜 l0에서 l1으로 바꾸는가?

• When we have many variables, it is computationally difficult to find the posterior mode of p(γ|D).
• Part of the problem is due to the fact that the γj variables are discrete, γj ∈{0,1}.
• In the optimization community, it is common to relax hard constraints of this form by replacing
discrete variables with continuous variables.
• We can do this by replacing the spike-and-slab style prior, that assigns finite probability mass to the
event that wj =0, to continuous priors that “encourage” wj =0 by putting a lot of probability density
near the origin, such as a zero-mean Laplace distribution.

• l1 regularization

• In the case of linear regression, the l1 objective becomes
13.3.1 Why does l1 regularization yield sparse solutions?
• lasso, which stands for “least absolute shrinkage and selection operator”

• 코너에 거칠 확률이 더 커진다
목적함수

제약조건

모서리에서의 페널티가 더 작다.
모서리에 붙는 w가 최적화에 선호된다
모서리에 붙는 w라는 건 sparse한 w이다
13.3 l1 regularization: basics
13.3.2 Optimality conditions for lasso
• The lasso objective has the form
• Unfortunately, the||w||1 term is not differentiable whenever wj =0.
• This is an example of a non-smooth optimization problem.
13.3 l1 regularization: basics
13.3.2 Optimality conditions for lasso
• To handle non-smooth functions, we need to extend the notion of a derivative.
• We define a subderivative or subgradient of a (convex) function f: I→R at a point θ0 to be a scalar g
such that

• We define the set of subderivatives as the interval[a, b] where a and b are the one-sided limits
13.3 l1 regularization: basics
13.3.2 Optimality conditions for lasso
• The set [a, b] of all subderivatives is called the subdifferential of the function f at θ0 and is denoted
∂f(θ)|θ0.
• For example, in the case of the absolute value function f(θ)=|θ|, the subderivative is given by

• If the function is everywhere differentiable, then ∂f(θ)={df(θ)/dθ}.

0에서의 미분값이 무한히 많다
13.3 l1 regularization: basics
13.3.2 Optimality conditions for lasso
• Let us apply these concepts to the lasso problem.
• Let us initially ignore the non-smooth penalty term.

j feature 를 제외한 나머지 feature
로 예측한 residual과 j feature와의
correlation

• where w−j is w without component j, and similarly for xi,−j.
• We see that cj is (proportional to) the correlation between the j’th feature x:,j and the residual due to
the other features, r−j =y−X:,−jw−j.
•

X행렬에서 j 번째 feature만 고른 벡터와 j번째 feature만 빼고 예측한 값과 실제 값과의 차이 벡터와의 correlation

• Hence the magnitude of cj is an indication of how relevant feature j is for predicting y(relative to the
other features and the current parameters).
•

j가 예측에 포함됨으로써, y와의 차이를 메꿔줄 수 있는지의 정도?
• 그러므로 f를 최적화하는 w는 cj의 범위에 따라 다음과 같이 정의할 수 있다

• where

• and x+= max(x,0) is the positive part of x. This is called soft thresholding.
j feature 를 제외한 나머지 feature로 예측한 residual과
j feature와의 correlation cj가 –λ보다 크게 음의 상관관
계가 있지 않거나, λ의 이상의 음의 상관관계가 있지
않으면 feature j는 0 즉 안쓴다
13.4.1 Coordinate descent
• j번째 featur를 제외한 나머지 features는 고정하고, j번째 feature만 최적화 한다
Coordinate descent for lasso (aka shooting algorithm)
• Coordinate descent는 one-dimensianl optimization problem이 analytically 풀리면 유용하다.
• 앞에서 보았듯이 lasso의 최적해 w는 나머지 coefficien가 고정된 상태에서, 특정 featur에 대한 wj
를 최적화 할 수 있다.

• See (Yaun et al. 2010) for some extensions of this method to the logistic regression case.
• resulting algorithm was the fastest method in their experimental comparison, which concerned
document classification with large sparse feature vectors(representing bags of words)
• By contrast, in Figure 13.5(b), we illustrate hard thresholding.
• This sets values of wj to 0 if −λ≤cj ≤λ, but it does not shrink the values of wj outside of this interval.
• The slope of the soft thresholding line does not coincide with the diagonal, which means that even
large coefficients are shrunk towards zero;
• consequently lasso is a biased estimator.
• This is undesirable, since if the likelihood indicates (via cj) that the coefficient wj should be large, we
do not want to shrink it. We will discuss this issue in more detail in Section 13.6.2.
13.3.3 Comparison of least squares, lasso, ridge and subset
selection
• For simplicity, assume all the features of X are orthonormal, so XTX=I. In this case, the RSS is given by
13.3.3 Comparison of least squares, lasso, ridge and subset
selection
• LS = least squares,
• Subset = best subset regression(all possible subsets regression procedure)
• lasso gives better prediction accuracy
• Lasso also gives rise to a sparse solution. Of course, for other problems, ridge may give better
predictive accuracy.
• In practice, a combination of lasso and ridge, known as the elastic net, often performs best, since it
provides a good combination of sparsity and regularization (see Section 13.5.3)
13.3.4 Regularization path
• As we increase λ, the solution vector ˆ w(λ) will tend to get sparser, although not necessarily
monotonically.

• We can plot the values
path.

for each feature j; this is known as the regularization
• W가 발현되는 critical한 시점이 있다
• 한번 fitting 시 feature마다 critical한 시점까지 구하는 알고리즘(LARS, least angle regression
and shrinkage)
13.5.3 Elastic net (ridge and lasso combined)
• 강하게 연관되어 있는 features들이 많을 때 lasso는 그 중에 하나를 임의적으로 고르려 한다.
• In the D>N case, lasso can select at most N variables before it saturates.
• If N>D, but the variables are correlated, it has been empirically observed that the prediction
performance of ridge is better than that of lasso

• grouping effect = 높은 연관 관계가 있는 feature들은 같은 weight를 가지려 한다(lasso는 고름)
• For example, if two features are equal, so X:j =X:k, one can show that their estimates are also
equal, ˆ wj =ˆwk.
• By contrast, with lasso, we may have that ˆ wj =0and ˆ wk=0or vice versa.
• 그니까 Elastic net은 강한 상관 관계 때문에 없어지는 feature는 살려주고, response랑 관계 없는
feature만 걸러주는 장점이 있는듯
conclusion
Bayesian variable selection
p(D|γ) 구하는 방법
The spike and slab model
wj를 계속 살림
Beroulli Gaussian Model

l0 regulization
최적화 어려움
l1 regulization (lasso)

Weitere ähnliche Inhalte

Was ist angesagt?

머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithmJungkyu Lee
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじsleepy_yoshi
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 tmtm otm
 
Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Ha Phuong
 
統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半Prunus 1350
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.Yongho Ha
 
はじパタ11章 後半
はじパタ11章 後半はじパタ11章 後半
はじパタ11章 後半Atsushi Hayakawa
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learningharmonylab
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)Euijin Jeong
 
Kaggle Avito Demand Prediction Challenge 9th Place Solution
Kaggle Avito Demand Prediction Challenge 9th Place SolutionKaggle Avito Demand Prediction Challenge 9th Place Solution
Kaggle Avito Demand Prediction Challenge 9th Place SolutionJin Zhan
 
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)Jeonghun Yoon
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現hagino 3000
 
2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワークTakeshi Sakaki
 
인공지능 방법론 - 딥러닝 이해하기
인공지능 방법론 - 딥러닝 이해하기인공지능 방법론 - 딥러닝 이해하기
인공지능 방법론 - 딥러닝 이해하기Byoung-Hee Kim
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度Masa Kato
 
Prml 2_3_8
Prml 2_3_8Prml 2_3_8
Prml 2_3_8brownbro
 
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデルlogics-of-blue
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, StrongerDeep Learning JP
 

Was ist angesagt? (20)

머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm머피's 머신러닝, Mixture model and EM algorithm
머피's 머신러닝, Mixture model and EM algorithm
 
PRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじPRML復々習レーン#9 前回までのあらすじ
PRML復々習レーン#9 前回までのあらすじ
 
PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2 PRML EP法 10.7 10.7.2
PRML EP法 10.7 10.7.2
 
Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)Approximate Inference (Chapter 10, PRML Reading)
Approximate Inference (Chapter 10, PRML Reading)
 
統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
はじパタ11章 後半
はじパタ11章 後半はじパタ11章 後半
はじパタ11章 後半
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
 
Kaggle Avito Demand Prediction Challenge 9th Place Solution
Kaggle Avito Demand Prediction Challenge 9th Place SolutionKaggle Avito Demand Prediction Challenge 9th Place Solution
Kaggle Avito Demand Prediction Challenge 9th Place Solution
 
Neural network (perceptron)
Neural network (perceptron)Neural network (perceptron)
Neural network (perceptron)
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現
 
2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク2014.02.20_5章ニューラルネットワーク
2014.02.20_5章ニューラルネットワーク
 
인공지능 방법론 - 딥러닝 이해하기
인공지능 방법론 - 딥러닝 이해하기인공지능 방법론 - 딥러닝 이해하기
인공지능 방법론 - 딥러닝 이해하기
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 
Prml 2_3_8
Prml 2_3_8Prml 2_3_8
Prml 2_3_8
 
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル
2 5 3.一般化線形モデル色々_Gamma回帰と対数線形モデル
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger
 

Andere mochten auch

Eigenvalues of regular graphs
Eigenvalues of regular graphsEigenvalues of regular graphs
Eigenvalues of regular graphsJungkyu Lee
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete dataJungkyu Lee
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jungkyu Lee
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical ModelJungkyu Lee
 
파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략Jungkyu Lee
 
ThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationJungkyu Lee
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelJungkyu Lee
 
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsJungkyu Lee
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelJungkyu Lee
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 
Murpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelMurpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelJungkyu Lee
 
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGJungkyu Lee
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear RegressionJungkyu Lee
 
4. Gaussian Model
4. Gaussian Model4. Gaussian Model
4. Gaussian ModelJungkyu Lee
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발Jungkyu Lee
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMMJungkyu Lee
 
Fiddler 피들러에 대해 알아보자
Fiddler 피들러에 대해 알아보자Fiddler 피들러에 대해 알아보자
Fiddler 피들러에 대해 알아보자용진 조
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecJungkyu Lee
 
Support Vector Machine Tutorial 한국어
Support Vector Machine Tutorial 한국어Support Vector Machine Tutorial 한국어
Support Vector Machine Tutorial 한국어Jungkyu Lee
 

Andere mochten auch (20)

Eigenvalues of regular graphs
Eigenvalues of regular graphsEigenvalues of regular graphs
Eigenvalues of regular graphs
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete data
 
Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘 Jensen's inequality, EM 알고리즘
Jensen's inequality, EM 알고리즘
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model
 
파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략파이널 판타지 3 루트 공략
파이널 판타지 3 루트 공략
 
ThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulationThinkBayes: chapter 13  simulation
ThinkBayes: chapter 13  simulation
 
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
 
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensions
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear Model
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
Murpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical ModelMurpy's Machine Learing: 10. Directed Graphical Model
Murpy's Machine Learing: 10. Directed Graphical Model
 
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNING
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear Regression
 
4. Gaussian Model
4. Gaussian Model4. Gaussian Model
4. Gaussian Model
 
앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발앙상블 학습 기반의 추천시스템 개발
앙상블 학습 기반의 추천시스템 개발
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
 
Fiddler 피들러에 대해 알아보자
Fiddler 피들러에 대해 알아보자Fiddler 피들러에 대해 알아보자
Fiddler 피들러에 대해 알아보자
 
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
 
Support Vector Machine Tutorial 한국어
Support Vector Machine Tutorial 한국어Support Vector Machine Tutorial 한국어
Support Vector Machine Tutorial 한국어
 

Ähnlich wie Sparse Linear Models Guide

Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptxkumarkaushal17
 
Linear regression
Linear regressionLinear regression
Linear regressionansrivas21
 
Regression analysis and its type
Regression analysis and its typeRegression analysis and its type
Regression analysis and its typeEkta Bafna
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic RegressionDong Guo
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionBennoG1
 
Linear Regression
Linear RegressionLinear Regression
Linear Regressionmailund
 

Ähnlich wie Sparse Linear Models Guide (20)

Sparsenet
SparsenetSparsenet
Sparsenet
 
4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Regression analysis and its type
Regression analysis and its typeRegression analysis and its type
Regression analysis and its type
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
MF Presentation.pptx
MF Presentation.pptxMF Presentation.pptx
MF Presentation.pptx
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
 
Shrinkage Methods in Linear Regression
Shrinkage Methods in Linear RegressionShrinkage Methods in Linear Regression
Shrinkage Methods in Linear Regression
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Sparse Linear Models Guide

  • 1. Sparse Linear Model Jungkyu Lee Daum Search Quality Team
  • 2. 13.1 Introduction • model-based approach를 사용해서 feature selection하는 방법에 대해서 알아본다 • Application • small N, large D proble의 경우, featur가 너무 많기 때문에, feature selection을 하고 싶다 • 14장에서, kernel function에 대해서 다룬다. (sparse kernel machine) • feature selecton이 N개의 training example 중 부분 집합만 사용하는 방법이다.
  • 3. 5.3 Bayesian model selection • regression시 너무 높은 degree의 polynomial을 쓰면 overfitting이 일어날 수 있고 반대로 너무 낮 은 degee의 polynomial을 쓰면 underfiting이 일어날 수 있다 • 다른 복잡도를 가진 모델을 만날을 때, 일반적으로 어떤 것이 가장 좋은 모델인가? • 13장에서 다룰 것 model = feature subset 입니다 • Approach • One approach is to use cross-validation to estimate the generalization error of all the candidate models, and then to pick the model that seems the best. • A more efficient approach is to compute the posterior over models (Bayesian model selection. • If we use a uniform prior over models, p(m)∝1, this amounts to picking the model which maximizes marginal likelihood cross-validation은 train와 test셋을 나누어야 하고 (보통 cs community에서 많이 함) posterior 법은 train set으로만 하는 것 같다 (bic,aic)  이건 솔직히 왜 하는지는 아직 이해는 안가지만
  • 4. 5.3.2.4 BIC approximation to log marginal likelihood • In general, computing the integral in Equation 5.13 can be quite difficult. • Bayesian information criterion or BIC likelihood model complexity • dof (ˆθ) is the number of degrees of freedom • penalized log likelihood
  • 5. Bayesian variable selection p(D|γ) 구하는 방법 The spike and slab model wj를 계속 살림 Beroulli Gaussian Model l0 regulization 최적화 어려움 l1 regulization (lasso)
  • 6. 13.2 Bayesian variable selection • 어떤 피쳐가 릴러번트한지를 랜덤변수로 본다. • model = m = γ • Let γj =1 if feature j is “relevant”, and let γj =0 otherwise. • Our goal is to compute the posterior over models
  • 7. 13.2 Bayesian variable selection linregAllsubsetsGraycodeDemo.
  • 8. Bayesian variable selection p(D|γ) 구하는 방법 The spike and slab model wj를 계속 살림 Beroulli Gaussian Model l0 regulization 최적화 어려움 l1 regulization (lasso)
  • 9. 13.2.1 The spike and slab model • 을 구체적으로 구하는 방법에 대해서 논의한다 (linear regression의 경우) • The posterior is given by the number of non-zero elements of the vector.
  • 10. 13.2 Bayesian variable selection 13.2.1 The spike and slab model • γ이 0인 것의 feature를 X와 w에서 없앤다, Xr, wr feature selection γ 에 따라 p(D|γ)의 분산이 바뀐다
  • 11. 13.2 Bayesian variable selection 13.2.1 The spike and slab model • When the marginal likelihood cannot be computed in closed form (e.g., if we are using logistic regression or a nonlinear model) . we can approximate it using BIC model complexity로 페널티
  • 12. 13.2 Bayesian variable selection 13.2.1 The spike and slab model • 요약하면, p(γ|D)을 구하기 위해 • 결과적으로 feature relevance vector γ의 posterior는 • 즉 (maginal likelihood) – (model complexity) = (likelihood – model complexity) – (model complexity) • complexity에 대한 penalties가 두 번 일어나는데, 그냥 λ하나로 묶는다
  • 13. Bayesian variable selection p(D|γ) 구하는 방법 The spike and slab model wj를 계속 살림 Beroulli Gaussian Model l0 regulization 최적화 어려움 l1 regulization (lasso)
  • 14. 13.2 Bayesian variable selection 13.2.2 From the Bernoulli-Gaussian model to l0 regularization • Bernoulli Gaussian model, binary mask model • spike and slab model 과는 다르게, irrelevant한 coefficients들이 사라지지 않는다 • the binary mask model has the form γj →y←wj, whereas the spike and slab model has the form γj →wj →y.
  • 15. 13.2 Bayesian variable selection 13.2.2 From the Bernoulli-Gaussian model to • the Bernoulli-Gaussian model은 l0 regularization을 유도하는데 사용된다. • 데이터가 주어졌을 때, γ와 w의 posterior는 • joint prior p(γ, w)는 다음과 같이 정의한다 즉 위의 함수를 최소화하는 γ와 w = posterior가 가장 큰 γ와 w l0 regularization
  • 16. 13.2 Bayesian variable selection 13.2.2 From the Bernoulli-Gaussian model to l0 regularization • σ2w→∞,이면, • likelihood에 model complexity를 더한 BIC 근사와 비슷한 모양이 되었다 • bit vector γ을 없애고 0이 아닌 wj만 표현하므로써, 다음과 같이 표현할 수 있다. • 이것을 l0 regularization이라고 부른다. • 하지만 lo regularization은 최적화하기 어렵다. • 이 장의 나머지에서 l0 regularization을 최적화하는 방법에 대해서 알아본다(lasso)
  • 17. 13.2 Bayesian variable selection 13.2.3 Algorithms • 앞에서는 γ를 찾을 때 최적화로 찾을 수도 있다(lasso) • 하지만, 이러한 γ 최적화가 불가능한 경우도 있다. • Since there are 2D models, we cannot explore the full posterior, or find the globally optimal model. • Instead we will have to resort to heuristics of one form or another. • All of the methods we will discuss involve searching through the space of models, and evaluating the cost f(γ) at each point.
  • 18. 13.2 Bayesian variable selection 13.2.3 Algorithms 13.2.3.1 Greedy search • Single best replacement: • 가장 간단한 방법은 greedy hill climbing을 사용하는 것이다. • 각 단계에서, 변수 하나를 추가하거나 뺌으로써, 도달할 수 있는 모델의 이웃을 정의한다. • 즉 각 변수에 대해서, 그 것을 추가해서 현재 모델을 능가한다면 추가하고, 그 변수를 뺌으로써 능가한다면, 그 변수를 뺀다.
  • 19. 13.2 Bayesian variable selection 13.2.3 Algorithms 13.2.3.1 Greedy search (13.27) • Orthogonal least squares • λ=0 이면, 식(13.27)에서 모델의 complexity penalty는 없어지고, deletion step의 이유가 없어진다. 왜냐하면, 변수를 쓰지 않음으로써 얻는 이점이 사라지기 때문이다(training error는 계속 준다) • 이 경우, SBR은 orthogonal least squares = greedy forwards selection와 같아진다 • 현재 feature 집합에서, feature를 하나씩 추가해보고 w를 최적화하면서, 에러가 가장 적은 feature를 고른다. • We then update the active set by setting γ (t+1)=γ(t)∪{j∗} • To choose the next feature to add at step t, we need to solve D−Dt least squares problems at step t,where Dt =|γt| is cardinality of the current active set.
  • 20. 13.2 Bayesian variable selection 13.2.3 Algorithms 13.2.3.1 Greedy search • Orthogonal matching pursuits • so we are just looking for the column that is most correlated with the current residual • This only requires one least squares calculation per iteration and so is faster than orthogonal least squares, but is not quite as accurate • 다해보지 말고, 가장, residual과 연관 있는 feature만 테스트한다 • even more aggressive approximation is to just greedily add the feature that is most correlated with the current residual. • This is called matching pursuits(Mallat and Zhang 1993). • This is also equivalent to a method known as least squares boosting (Section 16.4.6).
  • 21. 13.2 Bayesian variable selection 13.2.3 Algorithms 13.2.3.1 Greedy search • Backwards selection Backwards selection • starts with all variables in the model (the so called saturated model), and then deletes the worst one at each step. • This is equivalent to performing a greedy search from the top of the lattice downwards. • This can give better results than a bottom-up search, since the decision about whether to keep a variable or not is made in the context of all the other variables that might depend on it. (의존 관계가 있을 feature들이 있는 상태에서 selection을 하므로, 성능은 더 좋음) • However, this method is typically infeasible for large problems, since the saturated model will be too expensive to fit.(=fit할 feature가 많아서 계산은 많이 한다) • Bayesian Matching pursuit • The algorithm of (Schniter et al. 2008) is similiar to OMP except it uses a Bayesian marginal likelihood scoring criterion (under a spike and slab model) instead of a least squares objective.
  • 22. 13.2 Bayesian variable selection 13.2.3 Algorithms 13.2.3.2 Stochastic search • If we want to approximate the posterior, rather than just computing a mode (e.g. because we want to compute marginal inclusion probabilities), one option is to use MCMC. • The standard approach is to use Metropolis Hastings, where the proposal distribution just flips single bits • This enables us to efficiently compute p(γ’|D) given p (γ|D). • The probability of a state (bit configuration) is estimated by counting how many times the random walk visits this state. γ 을 이렇게, 2^D 조합을 다 찾거나, 휴리스틱하게 찾는 방법 말고, analytically 최적화 하는 방법은 없는 걸까? lasso
  • 23. Bayesian variable selection p(D|γ) 구하는 방법 The spike and slab model wj를 계속 살림 Beroulli Gaussian Model l0 regulization 최적화 어려움 l1 regulization (lasso)
  • 24. 13.3 l1 regularization: basics 왜 l0에서 l1으로 바꾸는가? • When we have many variables, it is computationally difficult to find the posterior mode of p(γ|D). • Part of the problem is due to the fact that the γj variables are discrete, γj ∈{0,1}. • In the optimization community, it is common to relax hard constraints of this form by replacing discrete variables with continuous variables. • We can do this by replacing the spike-and-slab style prior, that assigns finite probability mass to the event that wj =0, to continuous priors that “encourage” wj =0 by putting a lot of probability density near the origin, such as a zero-mean Laplace distribution. • l1 regularization • In the case of linear regression, the l1 objective becomes
  • 25. 13.3.1 Why does l1 regularization yield sparse solutions? • lasso, which stands for “least absolute shrinkage and selection operator” • 코너에 거칠 확률이 더 커진다 목적함수 제약조건 모서리에서의 페널티가 더 작다. 모서리에 붙는 w가 최적화에 선호된다 모서리에 붙는 w라는 건 sparse한 w이다
  • 26. 13.3 l1 regularization: basics 13.3.2 Optimality conditions for lasso • The lasso objective has the form • Unfortunately, the||w||1 term is not differentiable whenever wj =0. • This is an example of a non-smooth optimization problem.
  • 27. 13.3 l1 regularization: basics 13.3.2 Optimality conditions for lasso • To handle non-smooth functions, we need to extend the notion of a derivative. • We define a subderivative or subgradient of a (convex) function f: I→R at a point θ0 to be a scalar g such that • We define the set of subderivatives as the interval[a, b] where a and b are the one-sided limits
  • 28. 13.3 l1 regularization: basics 13.3.2 Optimality conditions for lasso • The set [a, b] of all subderivatives is called the subdifferential of the function f at θ0 and is denoted ∂f(θ)|θ0. • For example, in the case of the absolute value function f(θ)=|θ|, the subderivative is given by • If the function is everywhere differentiable, then ∂f(θ)={df(θ)/dθ}. 0에서의 미분값이 무한히 많다
  • 29. 13.3 l1 regularization: basics 13.3.2 Optimality conditions for lasso • Let us apply these concepts to the lasso problem. • Let us initially ignore the non-smooth penalty term. j feature 를 제외한 나머지 feature 로 예측한 residual과 j feature와의 correlation • where w−j is w without component j, and similarly for xi,−j. • We see that cj is (proportional to) the correlation between the j’th feature x:,j and the residual due to the other features, r−j =y−X:,−jw−j. • X행렬에서 j 번째 feature만 고른 벡터와 j번째 feature만 빼고 예측한 값과 실제 값과의 차이 벡터와의 correlation • Hence the magnitude of cj is an indication of how relevant feature j is for predicting y(relative to the other features and the current parameters). • j가 예측에 포함됨으로써, y와의 차이를 메꿔줄 수 있는지의 정도?
  • 30.
  • 31. • 그러므로 f를 최적화하는 w는 cj의 범위에 따라 다음과 같이 정의할 수 있다 • where • and x+= max(x,0) is the positive part of x. This is called soft thresholding. j feature 를 제외한 나머지 feature로 예측한 residual과 j feature와의 correlation cj가 –λ보다 크게 음의 상관관 계가 있지 않거나, λ의 이상의 음의 상관관계가 있지 않으면 feature j는 0 즉 안쓴다
  • 32. 13.4.1 Coordinate descent • j번째 featur를 제외한 나머지 features는 고정하고, j번째 feature만 최적화 한다
  • 33. Coordinate descent for lasso (aka shooting algorithm) • Coordinate descent는 one-dimensianl optimization problem이 analytically 풀리면 유용하다. • 앞에서 보았듯이 lasso의 최적해 w는 나머지 coefficien가 고정된 상태에서, 특정 featur에 대한 wj 를 최적화 할 수 있다. • See (Yaun et al. 2010) for some extensions of this method to the logistic regression case. • resulting algorithm was the fastest method in their experimental comparison, which concerned document classification with large sparse feature vectors(representing bags of words)
  • 34. • By contrast, in Figure 13.5(b), we illustrate hard thresholding. • This sets values of wj to 0 if −λ≤cj ≤λ, but it does not shrink the values of wj outside of this interval. • The slope of the soft thresholding line does not coincide with the diagonal, which means that even large coefficients are shrunk towards zero; • consequently lasso is a biased estimator. • This is undesirable, since if the likelihood indicates (via cj) that the coefficient wj should be large, we do not want to shrink it. We will discuss this issue in more detail in Section 13.6.2.
  • 35. 13.3.3 Comparison of least squares, lasso, ridge and subset selection • For simplicity, assume all the features of X are orthonormal, so XTX=I. In this case, the RSS is given by
  • 36. 13.3.3 Comparison of least squares, lasso, ridge and subset selection • LS = least squares, • Subset = best subset regression(all possible subsets regression procedure) • lasso gives better prediction accuracy • Lasso also gives rise to a sparse solution. Of course, for other problems, ridge may give better predictive accuracy. • In practice, a combination of lasso and ridge, known as the elastic net, often performs best, since it provides a good combination of sparsity and regularization (see Section 13.5.3)
  • 37. 13.3.4 Regularization path • As we increase λ, the solution vector ˆ w(λ) will tend to get sparser, although not necessarily monotonically. • We can plot the values path. for each feature j; this is known as the regularization
  • 38. • W가 발현되는 critical한 시점이 있다 • 한번 fitting 시 feature마다 critical한 시점까지 구하는 알고리즘(LARS, least angle regression and shrinkage)
  • 39. 13.5.3 Elastic net (ridge and lasso combined) • 강하게 연관되어 있는 features들이 많을 때 lasso는 그 중에 하나를 임의적으로 고르려 한다. • In the D>N case, lasso can select at most N variables before it saturates. • If N>D, but the variables are correlated, it has been empirically observed that the prediction performance of ridge is better than that of lasso • grouping effect = 높은 연관 관계가 있는 feature들은 같은 weight를 가지려 한다(lasso는 고름) • For example, if two features are equal, so X:j =X:k, one can show that their estimates are also equal, ˆ wj =ˆwk. • By contrast, with lasso, we may have that ˆ wj =0and ˆ wk=0or vice versa. • 그니까 Elastic net은 강한 상관 관계 때문에 없어지는 feature는 살려주고, response랑 관계 없는 feature만 걸러주는 장점이 있는듯
  • 40. conclusion Bayesian variable selection p(D|γ) 구하는 방법 The spike and slab model wj를 계속 살림 Beroulli Gaussian Model l0 regulization 최적화 어려움 l1 regulization (lasso)