[DL Hacks]Self-Attention Generative Adversarial Networks

Self-Attention Generative Adversarial
Networks
1
2018.07.30
m.yokoo

選択理由
• 今回、この論文・実装を選択した理由。
GANから出発し、DCGANを基に
wassesutainGANを基に、spectral正則化、さらにTTURを実装
していることから、さらにself- attensionを追加して完成してい
ることから、いろいろと経験できるのではないかと思い選択しま
した。
2

GPUの使用
• 実装にあたっては、どうしてもGPUを使用することが必要だっ
たため、AWSのDeepLarning AMIを使って実装しました。
• EC2 インスタンスリージョン: 米国東部（バージニア北部）
プライマリインスタンスタイプ: p2.xlarge
• NVIDIA K80 GPU 1個を使用
3

• 学習にGPUを使って、５日と3時間57分かかりました。
• 更に、10000枚の画像をawsからdownloadするのに、5時間20
分かかりました。１万枚（100万iterateから100stepで1枚で1
万枚)
4

• GANの問題点
• 学習が難しい
• 勾配消失問題が起こる
• 生成結果のクオリティを損失関数から判断しにくい
• モード崩壊が起こる
6

• GANによるClass-conditionalな画像の生成においてSOTAを達
成
7

• SNGAN-projection(Miyato et al, 2018)でdiscriminatorに適用してい
た spectoral normalizationをGeneratorにも適用した。
• SNGAN-projection(Miyato et al, 2018)では，Discriminatorの
Lipschitzs定数をコントロールし安定性を向上させた。
• Generatorもspectoral normalizationの恩恵を受けることを示した。
• Generatorの1更新に対するDiscriminatorの更新回数を減らすこと
が可能となり，計算量をへらすことができた。学習が安定すること
も示した。
8

• Two-timescale update rule (TTUR) (Heusel et al, 2017)を適
用した．
• – Generator側の学習率をDiscriminator側の学習率より小さく
するとナッシュ均衡解に収束することが示せる．
• – 学習率: generator 0.0001 discriminator 0.0004
9

Self-Attention Generative Adversarial Networks
• arXiv preprint by Zhang et al. 論文
• Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena
• (Submitted on 21 May 2018)
• Han Zhang∗ Rutgers University Ian Goodfellow Google Brain
Dimitris Metaxas Rutgers University Augustus Odena
• Google Brain
10

• 既存のGANはCNNベースのため局所特徴に依存しており、離れ
た場所の情報を参照することができない。そのため、Attention
の仕組みを導入して離れた局所特徴を重みをかけて参照できる
ようにする手法。局所特徴とAttention情報の利用の度合いは、
係数でもって調整を行う。
11

• In this paper, we proposed Self-Attention Generative Adversarial
Networks (SAGANs), which incorporate a self-attention
mechanism into the GAN framework.
• The self-attention module is effective in modeling long-range
dependencies.
• In addition, we show that spectral normalization applied to the
generator stabilizes GAN training and that TTUR speeds up
training of regularized discriminators.
• SAGAN achieves the state-of-the-art performance on class-
conditional image generation on ImageNe
12

損失関数は以下のものを使用する
21

実装にあたって
• DeepMindの論文では、コードが出ていませんでしたが、
• Pythorchによる実装が、GitHubに出ましたので、それを参考に
実装しました。
30

• Heykeetae 今回、こちらのSAGANを実装してみた。
• この後に、 christiancosgroveのSAGANの実装も出てきた。
31

実装にあたって
• 論文では、ImageNet 1400枚超の画像とタグのデータセットを
使っていますが、
• この実装では、celebFaces Attribute Dataset 200K (めいめい
40のattribute アノテーションを持つ)を使っています。
• 論文では、 SN on G/D+TTUR 1Mイノテーションを使って
重みを更新しています。で、FID 22.96を出しています。
• 論文に合わせて、 SN on G/D+TTUR で、1Mイノテーション
使って重みを更新しています。 Adv-lossとしてwgn−gpを使用
しています。
32

• Meta overview
• This repository provides a PyTorch implementation of SAGAN.
Both wgan-gp and wgan-hinge loss are ready, but note that
wgan-gp is somehow not compatible with the spectral
normalization. Remove all the spectral normalization at the
model for the adoption of wgan-gp.
• Self-attentions are applied to later two layers of both
discriminator and generator
33

今回のhyper-parameter
parameter.py
34

Training setting
parameter.py
35

実装結果 SN on G/D+TTUR 100
56

実装結果 SN on G/D+TTUR 100iter 200iter
57

実装結果 SN on G/D+TTUR 300 400
58

59

60

実装結果 SN on G/D+TTUR 10万回
61

62

63

64

GANs Trained by a Two Time-Scale Update Rule Converge to a
Local Nash Equilibrium
https://arxiv.org/abs/1706.08500 TTURの論文
この論文では、TTURトレーニングと、GANの評価方法として、
FIDの二つを紹介している。
尚、 TTURトレーニングの結果、DCGANs とImproved
Wasserstein GANs (WGAN-GP)を、改善したと述べている。
65
TTUR

2つの時間スケールの更新ルールによって訓練されたGANは、ローカルナッ
シュ均衡に収束する
• Martin Heusel, Hubert Ramsauer, Thomas
Unterthiner, Bernhard Nessler, Sepp Hochreiter
• (Submitted on 26 Jun 2017 (v1), last revised 12 Jan 2018 (this
version, v6))
• GANの学習で、生成モデル側の学習率を識別モデルより小さく
するとODEの理論を使ってナッシュ均衡解に収束することが示
せる。また、生成画像の良さを測る、inception scoreより優れ
ているFIDを提案。
66

• Generative Adversarial Networks（GAN）は、最尤が実行不可能な
複雑なモデルを用いて現実的な画像を作成することに優れていま
す。しかし、GANトレーニングのコンバージェンスはまだ証明され
ていません。
• 我々は、任意のGAN損失関数に対する確率的勾配降下を伴うGANを
トレーニングするための2つの時間スケール更新ルール（TTUR）を
提案する。
• TTURは、識別器とジェネレータの両方に個別の学習率を持ちます。
• 確率論的近似の理論を用いて、TTURが穏やかな仮定のもとで定常局
所ナッシュ平衡に収束することを証明する。このコンバージェンス
は、人気のあるアダムの最適化に引き継がれています。この最適化は、
摩擦を伴う重いボールのダイナミクスに従うことを証明しています。
67

• 画像生成時にGANの性能を評価するために、生成された画像の
類似度を実際のものと比較してInception Scoreよりも優れた
Fr'echet Inception Distance」（FID）を導入する。
• 実験では、TTURはCelebA、CIFAR-10、SVHN、LSUNの寝室、
および10億語のベンチマークに関する従来のGANトレーニング
よりも優れたDCGANおよびWasserstein GAN（WGAN-GP）
の学習を改善します。
68

FIDの計算方法
• http://bluewidz.blogspot.com/2017/12/frechet-inception-
distance.html
• FIDはそのようには計算できません。GANで再現したい真の分
布から生成された画像の集合と、GANで再現した分布から生成
した画像の集合との距離を計算することになります。距離が近
ければ近いほど良い画像であると判断します。FIDは、Google
Brainが実施したGANの大規模評価の評価指標にも用いられて
います
• https://github.com/bioinf-jku/TTUR/blob/master/fid.py
69

Inception score
• このスコアは、GAN (Generative Adversarial Network)が生成
した画像の評価値として使われることがあります。
• Inceptionモデルで識別しやすい画像であるほど、かつ、識別さ
れるラベルのバリエーションが豊富であるほどスコアが高くな
るように設計されたスコアです。
• http://bluewidz.blogspot.com/2017/12/inception-score.html
• https://github.com/hvy/chainer-inception-score
70

Adam Follows an HBF ODE and Ensures
TTUR Convergence
• In our experiments, we aim at using Adam stochastic
approximation to avoid mode collapsing. GANs suffer from
“mode collapsing” where large masses of probability are
mapped onto a few modes that cover only small regions.
While these regions represent meaningful samples, the
variety of the real world data is lost and only few prototype
samples are generated. Different methods have been
proposed to avoid mode collapsing [11, 43]. We obviate mode
collapsing by using Adam stochastic approximation [29].
Adam can be described as Heavy Ball with Friction (HBF)
(see below), since it averages over past gradients
74

Spectral Normalization
• https://arxiv.org/abs/1802.05957 Spectral Normalization
• の論文
• 新しいweight normalization technique called spectral
normalizationを提案して、training of the discriminatorの安定
性を更新したと言っている。TraininngにおけるDiscriminator
の制限をだけを言っている。
75

• 一言でいうと
• GANの要件であるDiscriminatorのLipschitzs制約を重要視し、Discriminatorの各層に
Spectral Normalizationを適用することでGeneratorが精度の高い出力を得られるように
なる。
• Discriminatorの各層にSpectrum Normalizationを行うことで、これまで行われていた
様々な正規化(batch normalization, weight decay, feature matching, gradient penarty)
を必要とすることなく学習が安定し、良い結果が得られる。
• DCGANのDiscriminatorにSpectrum Normalizationを導入し、CIFAR-10において
Inception score 7.41を出し、WGAN-GP, DFM, Cramer GANなど既存のGANより良い
結果を出している。
• ただ、本論文の場合はちょっと前提事項が多いため、この点について可能な範囲で
言及を行った方が読み手の理解を助けると思います。具体的には、本論文はまず
GANにおいてDiscriminatorの挙動が性能のカギを握っているという論説があること、
かつその挙動を制御するのにLipschitz constant(リプシッツ定数：写像後の空間で
の距離が元空間における距離の何倍であるかを表す係数？)が正則化項として有用で
あること、の2点の前提がしかれており、それを踏まえLipschitz constantを制御す
るためにスペクトルノルム (spectral norm)による制約を加えているように読めます。
76

• we used the Adam optimizer
• the number of updates of the discriminator per one update of
the generator and (2) learning rate α and the first and
second order momentum parameters (β1, β2) of Adam
77

_
• pytorch-spectral-normalization-gan
• Main.py datasetsはCIFAR10を使用している。
• get resnet model working with wasserstein and hinge
losses
• Model.py DCGAN-like generator and discriinatorを作ってい
る。
• Model_resnet.py ResNet generator and discriminatorを作成。
• Spectral_normalization.py
• Spectral_normalization_nondiff.py
78

• 生成的な反復的ネットワークのためのスペクトル正規化
• 宮藤武人、片岡俊樹、小山正則、吉田祐一
• （2018年2月16日に提出）
• 生成的な対立ネットワークの研究における課題の1つは、そのトレー
ニングの不安定さです。本論文では、識別器の学習を安定させるた
めのスペクトル正規化と呼ばれる新しい重み正規化手法を提案す
る。私たちの新しい正規化手法は計算上軽く、既存の実装に組み込
むのが容易です。 CIFAR10、STL-10、ILSVRC2012のデータセット
でスペクトル正規化の有効性を検証し、従来のトレーニング安定化
手法と比較して、スペクトル正規化GAN（SN-GAN）がより優れた
品質または同等の品質の画像を生成できることを実験的に確認しま
した。
79

• Wasserstein GAN
• Martin Arjovsky 、 Soumith Chintala 、 LéonBottou
• （2017年1月26日に提出（ v1 ）、2017年12月6日に最後に改
訂された（このバージョン、v3））
• 従来のGANトレーニングの代替案であるWGANという新しいア
ルゴリズムを紹介します。この新しいモデルでは、学習の安定
性を向上させ、モード崩壊などの問題を取り除き、デバッグや
ハイパーパラメータ検索に役立つ有意義な学習曲線を提供でき
ることを示します。さらに、対応する最適化問題は健全であり、
分布間の他の距離への深いつながりを強調する広範な理論的作
業を提供することを示す。
80

• Wasserstein GAN（以下WGAN）はEarth Mover’s Distance
（またはWasserstein Distance）を最小化する全く新しいGAN
の学習方法を提案しています。
• Wasserstein GAN と Kantorovich-Rubinstein 双対性
81

• この論文の唯一の太字箇所にこう書かれていますが、
• In no experiment did we see evidence of mode collapse
for the WGAN algorithm.
• 確かにWGANはmode collapseを回避できているように見えま
す
82

84
PyTorchでspectral_normできる

GANの評価指標（ICML2018)
• Assessing Generative Models via Precision and Recall
• Recent advances in generative modeling have led to an increased interest in the study
of statistical divergences as means of model comparison. Commonly used evaluation
methods, such as Fréchet Inception Distance (FID), correlate well with the perceived
quality of samples and are sensitive to mode dropping. However, these metrics are
unable to distinguish between different failure cases since they yield one-dimensional
scores. We propose a novel definition of precision and recall for distributions which
disentangles the divergence into two separate dimensions. The proposed notion is
intuitive, retains desirable properties, and naturally leads to an efficient algorithm that
can be used to evaluate generative models. We relate this notion to total variation as
well as to recent evaluation metrics such as Inception Score and FID. To demonstrate
the practical utility of the proposed approach we perform an empirical study on
several variants of Generative Adversarial Networks and the Variational Autoencoder.
In an extensive set of experiments we show that the proposed metric is able to
disentangle the quality of generated samples from the coverage of the target
distribution.
85
Precesion（品質） and Recall（多様性） Distrbution

GANの評価指標（ICML2018)
• Geometry Score: A Method For Comparing Generative Adversarial Networks
• Recent advances in generative modeling have led to an increased interest in the study
of statistical divergences as means of model comparison. Commonly used evaluation
methods, such as Fr¥'echet Inception Distance (FID), correlate well with the
perceived quality of samples and are sensitive to mode dropping. However, these
metrics are unable to distinguish between different failure cases since they yield one-
dimensional scores. We propose a novel definition of precision and recall for
distributions which disentangles the divergence into two separate dimensions. The
proposed notion is intuitive, retains desirable properties, and naturally leads to an
efficient algorithm that can be used to evaluate generative models. We relate this
notion to total variation as well as to recent evaluation metrics such as Inception
Score and FID. To demonstrate the practical utility of the proposed approach we
perform an empirical study on several variants of Generative Adversarial Networks
and the Variational Autoencoder. In an extensive set of experiments we show that the
proposed metric is able to disentangle the quality of generated samples from the
coverage of the target distribution.
86
パーシステントホモロジーを使って

[DL Hacks]Self-Attention Generative Adversarial Networks

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie [DL Hacks]Self-Attention Generative Adversarial Networks

Ähnlich wie [DL Hacks]Self-Attention Generative Adversarial Networks (20)

Mehr von Deep Learning JP

Mehr von Deep Learning JP (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (7)

[DL Hacks]Self-Attention Generative Adversarial Networks