Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)

Motivation
Methods
Evaluation
Conclusion
Semi-Supervised Learning Using
Gaussian Fields and Harmonic
Functions (ICML2003)
パターン認識と機械学習の勉強会 #8
上田隼也 (筑波大学)
情報数理研究室修士 1 年
October 30, 2015
1 / 20

Motivation
Methods
Evaluation
Conclusion
1 Motivation
2 Methods
3 Evaluation
4 Conclusion
2 / 20

Motivation
Methods
Evaluation
Conclusion
概要・著者
何を解決・解明したいのか?
概要
ガウス確率分布を元にした半教師付き機械学習
(Semi-Supervised-Learning :SSL) の提案
類似度からなる重み付きグラフがあり、各ノードにはラベ
リング・非ラベリングデータが混在
ガウス確率分布のクラス分類アルゴリズムは
Nearest-Neighbor(NN) と見なすことができる。(酔歩
(Random-Walk) でグラフを試行した際に計算された物が NN
だと考える)
3 / 20

Motivation
Methods
Evaluation
Conclusion
概要・著者
提案手法のイメージ
図 1: 画像のピクセル間のユークリッド距離でのラベル伝搬 1
1
Semi-Supervised Learning Tutorial(ICML2007)
4 / 20

Motivation
Methods
Evaluation
Conclusion
概要・著者
著者
1 Xiaojin Jerry Zhu(ウィスコンシン大学助教)
• Semi-supervised learning literature survey(2005)
2 Zoubin Ghahramani(ケンブリッジ大学教授)
• An introduction to variational methods for graphical
models(1999)
• Learning from labeled and unlabeled data with label
propagation(2002)
3 J Lafferty(シカゴ大学教授)
• Conditional random fields: Probabilistic models for
segmenting and labeling sequence data(2001)
• Diffusion kernels on graphs and other discrete
structures(2002)
5 / 20

Motivation
Methods
Evaluation
Conclusion
概要・著者
教師あり学習の問題点と仮説
教師あり学習
事前に与えられたデータをいわば「例題（＝先生からの助
言）」とみなして、それをガイドに学習（＝データへの何ら
かのフィッティング）2
データに対してアノテーションを付けるには高いコストと
熟練したテクニックが必要
半教師付き機械学習 (SSL)
ラベリングデータと非ラベリングデータの両方を使って精
度を向上させる
2
教師あり学習 — Wikipedia
6 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
半教師付き学習
半教師付き機械学習 (SSL)
ラベリングデータと非ラベリングデータの両方を使って精
度を向上させる
Graph based SSL
SSL はモデルを仮定することが最も重要 3
3
半教師あり学習のモデル仮定 — でっかいチーズをベーグルする
7 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
Notation
• ラベリングデータ : l 個 (x1, y1), . . . (xl, yl)
• 非ラベリングデータ : u 個 xl+1, . . . , xl+u
• l << u
• データ総数 n 個 : n = l + u
• ラベルは２値 y ∈ {0, 1}
• グラフ G = (V, E)
• node L = {1, . . . , l}
• node U = {l + 1, . . . , l + u}
データの重み付け関数 wij: Gaussian Kernel.
8 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
仮説
データの構造から手法の仮説を立てる
図 2: 手書き文字の重み付き類似度グラフ
9 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
提案手法
実数関数 f : V → R (ノードから実数へ写像する関数)
• なぜ離散値から連続値へ拡張するのか?
• 離散値から連続値へ緩和することで多くの利点
• 仮説直感的に非ラベリングのノードがある時、近い
ノードは同じラベルを持つ
E(f) =
1
2
∑
i,j
wij(f(i) − f(j))2
(2)
f = argminf|L=fl
E(f)
f() は ‘Harmonic Function‘(調和関数),∆f = 0, ∆ = D − W
10 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
ラベル推定
非ラベリングデータをラベリングデータから推定する
f(j) =
1
dj
∑
i−j
wijf(i), j ∈ U (3)
f を f = fP, P = D−1
W と定義しなおすことで、調和関数
が最大化される原理を得る
11 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
ラベル推定
調和関数を計算するために、まず行列 W を４つのブロック
に分割
W =
[
Wll Wlu
Wul Wuu
]
(4)
f =
[
fl
fu
]
を定義、fu は非ラベリングデータである
fl を満たす ∆f = 0 は以下の式から与えられる。
fu = (Duu − Wuu)−1
Wul
fl = (I − Puu)−1
Pulfl (5)
Matlab Demo
12 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
調和関数のデモ
• 左側:データ数 181 個,l = 3, u = 178, σ = 0.22
• 右側:データ数 186 個,l = 2, u = 184, σ = 0.43
13 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
グラフの酔歩
• グラフ G を正規化することで、確率遷移行列 P へ
• グラフ上の酔歩 (Random Walk) を考える
• 非ラベリングのノードから酔歩を初め、ラベリングさ
れたノードに到着するまで酔歩を継続
14 / 20

Motivation
Methods
Evaluation
Conclusion
既存手法
提案手法
CMN
CMN(Class Mass Normalization)
クラス分布を事前知識に適応させる
Class 1 = q, Other Class = 1 − q, q の値はラベルの値から推
定ポイント i は以下の条件が成り立つときに 1 となる
q
fu(j)
∑
i fu(i)
> (1 − q)
1 − fu(i)
∑
i(1 − fu(i))
(9)
確率として考えると以下の式になる
f(i) =
q(u −
∑
j fu(j))fu(i)
q(u −
∑
j fu(j))fu(i) + (1 − q)
∑
j(1 − fu(j))fu(j)
(15)
15 / 20

Motivation
Methods
Evaluation
Conclusion
評価
考察
評価
• 手書き画像を 16×16 にダウンサンプリング、ガウシア
ンフィルタで平滑化
• 各ピクセルは 0-255 の特徴量があり、画像データは 256
次元のベクトルと考える
図 3: 手書き数字の分類精度比較
16 / 20

Motivation
Methods
Evaluation
Conclusion
評価
考察
評価
図 4: ニュース記事の分類精度 tf.idf によるグラフ作成
17 / 20

Motivation
Methods
Evaluation
Conclusion
評価
考察
エントロピーの関係性
図 5: σ の影響とエントロピー
H(bits) CMN theres
start 0.6931 97.25 ± 0.73% 94.70 ± 1.19%
end 0.6542 98.56 ± 0.43% 98.02 ± 1.19%
18 / 20

Motivation
Methods
Evaluation
Conclusion
評価
考察
エントロピーの関係性
図 6: 画像のピクセル間のユークリッド距離でのラベル伝搬イメージ 4
図 7: σ の変化 (σ:エントロピーのパラメータ)
4
19 / 20

Motivation
Methods
Evaluation
Conclusion
まとめ
結論・貢献
結論
1 調和関数とガウス確率分布を用いた Graph Based SSL
の提案
2 SSL は仮説が大事
3 Graph Based SSL: ラベルは伝搬する。
図 8: 画像のピクセル間のユークリッド距離でのラベル伝搬イメージ 5
5
20 / 20

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Mehr von Shunya Ueta

Mehr von Shunya Ueta (10)

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)