Jokyokai

Theory of Information Integration in Statistical Learning ( 統計的学習における情報統合の理論 ) 情報理工学系研究科　数理情報学専攻数理第五研究室　助教鈴木　大慈 2011 年 4 月 25 日

博士論文のテーマ（？） ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

発表の概要 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

ベイズ予測分布 - 事前分布の選択と α ダイバージェンス -

ベイズ予測分布分布を事後分布で積分真の分布　　　　を推定したい．ベイズ予測分布モデル：事前分布：事後分布

ベイズ予測分布ベイズ予測分布はモデルをはみ出る（ Komaki, ’96 ）．モデル真最尤推定量事前分布による変動ベイズ予測分布真の分布がモデルに含まれている場合

事前分布の選択 KL- リスクをなるべく小さくしたい -> 　事前分布の選択 Jeffreys 事前分布のリスク事前分布 π のリスク (Komaki, 2006) : Jeffreys 事前分布 : Fisher 計量 : KL- ダイバージェンスラプラシアン：これが負であれば良いラプラシアン我々の結果：これを拡張定理

α- ベイズ ,[object Object],： KL ダイバージェンスの一般化 (α=-1 で KL) ,[object Object],-> α ダイバージェンスに関するベイズリスクを最小化している： α=-1 の時は普通のベイズ予測分布

リスク β- 予測分布の真の分布からの α- ダイバージェンス β- 予測分布： α- ダイバージェンス：を小さくする事前分布を選ぶ．できる限り一般化

結果：　　を事前分布としたときの β- ベイズ予測分布漸近的なリスクの差 [Suzuki&Komaki,2010] 二階微分作用素 ← 補正項 (α=β で 0) A 優調和関数であれば良い ,[object Object],[object Object]

大域幾何学との関係定理 (Aomoto, 1966) 　　断面局率が至る所負なら非負優調和関数が存在 (d≧2) 断面曲率正負ブラウン運動が非再帰的　⇔　正値優調和関数が存在ブラウン運動との関係

ラプラシアンと統計的推測 ,[object Object],[object Object],( ある有界性条件のもと ) 　　ベイズ推定量が admissible ⇔ 　事前分布で特徴付けられるブラウン運動が再帰的　　　（スタイン推定）　　事前分布が優調和関数 ⇒ 　ベイズ予測分布が minimax

真がモデルからはずれている場合モデルベイズ予測分布最尤推定量モデルのどちら側に真があるかで推定の良し悪しが変わる

Fisher 計量 ,[object Object],[object Object],他の情報量（真がモデルに含まれる）なら

結果最尤推定ベイズ予測分布 β- ベイズ予測分布真の分布から推定した分布への KL- ダイバージェンス -> 　 β の値（分布の統合の仕方）によって　　モデルのどちら側に飛び出るかが変わる．真 ( 最近点 )

結論 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

相互情報量に関する研究 - 二乗損失型相互情報量 -

Mutual Information Common strategy : Find W which makes as independent as possible. Mutual Information is a good independence measure. are mutually independent. ⇔ : joint distribution of : marginal distribution of

Our Proposal Squared-loss Mutual Information (SMI) are mutually independent. ⇔ ,[object Object],[object Object],[object Object],[object Object]

Estimation Method Estimate the density ratio : (Legendre-Fenchel convex duality [Nguyen et al. 08] ) Define , then we can write where sup is taken over all measurable functions . the optimal function is the density ratio

[object Object],Empirical Approximation The objective function is empirically approximated as V-statistics (Decoupling) Assume we have n samples:

Linear model for g Linear model is basis function, e.g., Gaussian kernel penalty term

Gaussian Kernel We use a Gaussian kernel for basis functions: where are center points randomly chosen from sample points: . Linear combinations of Gaussian kernels span a broad function class. Distribution Free

Model Selection ,[object Object],[object Object],Model selection is available Now we have two parameters : regularization parameter : Gaussian width

Asymptotic Analysis Regularization parameter : Theorem : Complexity of the model ( large:complex, small:simple ) Theorem Nonarametric Parametric : matrices like Fisher Information matrix (bracketing entropy condition)

Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

ICA mixed signal (observation) ,[object Object],[object Object],estimated signal (demixed signal) :mixing matrix ( d × d matrix) Goal : estimating demixing matrix ( d × d matrix) Ideally

Supervised Dimension Reduction Input Output :“ good ” low dimensional representation -> 　 Sufficient Dimension Reduction (SDR) A natural choice of W :

Artificial Data Set ,[object Object],[object Object],[object Object],[object Object],[object Object],Performance measure: We used median distance for Gaussian width of KDR and HSIC .

Data Sets d=1 d=1 d=1 d=1 d=1 d=2

Result one-sided t-test with sig. level 1 %. Mean and standard deviation over 50 times trials Our method nicely performs.

UCI Data Set one-sided t-test with sig. level 1 %. Choose 200 samples and train SVM on the low dimensional representation. Classification error over 20 trials.

[object Object],[object Object],[object Object],[object Object],汎化誤差を理論的に解析スパース性と汎化誤差の関係どのような正則化が好ましい？

Sparse Learning ： n samples ： Convex loss （ hinge, square, logistic ） L 1 -regularization-> sparse Lasso Group Lasso I : subset of indices [Yuan&Lin:JRSS2006] [Tibshirani :JRSS1996]

教師有りカーネル法回帰 , 判別 : SVM, SVR, …. カーネル関数（　：再生核ヒルベルト空間）

Reproducing Kernel Hilbert Space (RKHS) ： Hilbert space of real valued functions ： map to the Hilbert space such that Reproducing kernel Representer theorem

Moore-Aronszajn Theorem : positive (semi-)definite, symmetric : RKHS with reproducing kernel k one to one

[object Object],[object Object],[object Object],[object Object],[object Object],カーネル関数の例 MKL ：カーネルを選択して統合

MKL: Multiple Kernel Learning : M 個のカーネル関数：カーネル関数 k m に付随した RKHS [ Lanckriet et al. 2004 ] L1 正則化：スパース ,[object Object],[Bach, Lanchriet, Jordan:ICML 2004 ]

カーネル重みとの関係 [Micchelli & Pontil: JMLR2005] 目的関数をカーネル関数の凸結合の中で最小化： given k は k m らの凸結合 Young の不等式

カーネル重み : L 2 Ｌ１ (MKL) Ｌ 2 (Uniform) ：単なる一様重みでの重ね合わせスパースデンス結構良い性能

L 1 L 2 スパースデンス

L 1 と L 2 の橋渡し Elasticnet MKL Lp-norm MKL (1≦p≦2) [Marius et al.: NIPS2009] [Shawe-Taylor: NIPS workshop 2008, Tomioka & Suzuki: NIPS workshop 2009] cf. elastic-net: [Zou & Hastie: JRSS, 2005]

Best Medium density dense [Tomioka & Suzuki: NIPS 2009 Workshop ] Elasticnet MKL: caltech 101 dataset L1 L2 中間的なスパースさが良い

[Cortes, Mohri, and Rostamizadeh: UAI 2009] MKL (sparse) 一様重み (dense) 中間 (p=4/3) Lp-norm MKL # of features

ここまでのまとめ ,[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],概要

双対問題表現定理：なめらか！降下法（ Newton 法など）が使える Fenchel 双対

数値実験 UCI:Ringnorm UCI:Splice SimpleMKL(L1) SpicyMKL(L1) Elasticnet MKL

漸近的汎化誤差の解析これからは二乗ロス（回帰）を想定：

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],mini-max レートスパース学習の収束レート

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],タイトではない MKL に関する既存の結果

Jokyokai

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie Jokyokai

Ähnlich wie Jokyokai (20)

Mehr von Taiji Suzuki

Mehr von Taiji Suzuki (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (10)

Jokyokai