6. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Το Δίκτυο του Κριτή και
το Πρόβλημα του Διαχωρισμού
Cψ
Cψ
q
p
Χαρακτήρισε
τα q με μία
τιμή στόχο
Διαφορετική
τιμή για να
διαχωριστούν
τα p από τα q
Ανάθεση στόχων μέσω
αναπληρωματικής αντικειμενικής
συνάρτησης,
π.χ. αρνητική δι-εντροπία
−Σip(y = i|x) log
(
˜p(l = i|x, ψ)
)
)
όπου η p(y|x) ορίζει τιμές στόχους
για τα δεδομένα και η ˜p(y|x, ψ) θα
μπορούσε να είναι μία σιγμοειδής
ή softmax στην Cψ(x).
Πρόβλημα: Να βρεθούν τα ψ,
ώστε η συνάρτηση κριτής:
˜p(y|x, ψ∗) = p(y|x)
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 6 / 46
29. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Μελλοντικές Μελέτες
1 Μοντελοποίηση δεδομένων εικόνων για εμπειρικά τεκμήρια:
32x32: CIFAR-10
128x128: Imagenet
256x256: CelebA, LSUN
1024x1024: CelebAHQ
2 Περαιτέρω πειράματα συγκριτικά με τις μεθόδους που
χρειάζονται την Ποινή Κλίσης για να συγκλίνουν
3 Σημασία των αμετάβλητων συνόλων της δυναμικής,
ολοκλήρωση απόδειξης ευστάθειας
4 Σημασία της κανονικοποίησης SN ή GP στις JSD εκτιμήτριες
μεθόδους (πχ στα GAN ή XORGAN)
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 29 / 46
30. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Βιβλιογραφία (1/2)
Arjovsky, Martin, et al. (2017)
“Wasserstein GAN.”
International Conference on Learning Representations
Arjovsky, Martin, and Léon Bottou (2017)
“Towards Principled Methods for Training Generative Adversarial Networks.”
International Conference on Learning Representations
Goodfellow, Ian, et al. (2014)
“Generative Adversarial Nets.”
Advances in Neural Information Processing Systems.
Heusel, Martin, et al. (2017)
“GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash
Equilibrium.”
Advances in Neural Information Processing Systems.
Kingma, Diederik, and Jimmy Ba. (2014)
“Adam: A Method for Stochastic Optimization.”
ArXiv:1412.6980 [Cs]
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 30 / 46
31. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Βιβλιογραφία (2/2)
Mescheder, Lars, et al. (2018)
“Which Training Methods for GANs Do Actually Converge?”
International Conference on Machine Learning.
Miyato, Takeru, et al. (2018)
"Spectral Normalization for Generative Adversarial Networks."
International Conference on Learning Representations
Nagarajan, Vaishnavh, and J. Zico Kolter (2017)
“Gradient Descent GAN Optimization Is Locally Stable.”
Advances in Neural Information Processing Systems.
Roth, Kevin, et al. (2017)
“Stabilizing Training of Generative Adversarial Networks through
Regularization.”
Advances in Neural Information Processing Systems.
Sriperumbudur, Bharath K., et al. (2008)
“Injective Hilbert Space Embeddings of Probability Measures.”
Proceedings of the 21st Annual Conference on Learning Theory.
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 31 / 46
33. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extras: GAN merits and challenges
Pros
End-to-end differentiable =⇒ gradient-based optimization
No explicit intractable integral
Cheap sampling process: One only has to sample a simple
distribution and perform a forward pass through a generative
model
Potential for high-fidelity generated samples
Cons
Hard to train: Instabilities, lack of robust hyperparameters,
unclear stopping criteria, vanishing gradients
Hard to objectively evaluate results
Hard to get an inference model from the generative one
No explicit representation of Qθ∗
Prone to mode dropping: insufficient modelling of the target
distribution
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 33 / 46
34. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extras: WGAN merits and challenges
Pros
Smooth discrimination of disconnected target and model
distributions. Tackles vanishing gradients problem.
Smooth and interpretable loss functions over training steps,
Cons
Must fulfill Lipschitz constraint in a way that does not
over-restrict the class of critic functions (we will refer methods
later)
Mode dropping still exists but it is mitigated. It is attributed to
the local nature of gradient updates.
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 34 / 46
35. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extras: Hypothesis on GAN quality (1/2)
GAN and WGAN Equilibria
Qθ∗ = P and Cψ∗ (x) = 0 ∀x ∈ a neighborhood of supp{P}
This class of equilibria seems to be a necessary assumption in the
cases of GAN and WGAN objectives, in order to guarantee local
convergence. The reason is that it implies that ∇xCψ∗ (x) = 0 for every
x ∈ supp(P).
However, one can argue that this is not the ideal case if the generated
and real data distributions have supports on lower dimensional
manifolds in the data space. Then, the optimal critic is not able to
distinguish between real data points and generated data points
that lie close enough to the support of P, so that they belong to the
required local neighborhood but are not on supp(P).
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 35 / 46
36. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extras: Hypothesis on GAN quality (2/2)
XORGAN Equilibria
Qθ∗ = P and Cψ∗ (x) = 0 ∀x ∈ supp{P}
XORGAN requires less assumptions on the equilibria and doesn't
need Gradient Penalty to locally converge, provably.
Spectral Normalization ideally places an upper bound (≤ 1) to the
critic's Lipschitz constant, it does not zero out the gradient during
the training, risking under-capacity. But this way it does not also
drive training to convergence.
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 36 / 46
37. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Extras: Maximum Mean Discrepancy
[Sriperumbudur et al. (2008)] proposed Maximum Mean Discrepancy
(MMD), a metric on measures which embeds injectively a
(probability) measure in a Reproducting Kernel Hilbert Space and
calculates the distance in that kernel-dependent Hilbert space.
Maximum Mean Discrepancy
Let k be a characteristic kernel function, then γk is a metric:
γk(P, Q) := sup
∥C∥Hk
≤1
EPC − EQC
Closed form solution:
γ2
k (P, Q) = E
x∼P
y∼P
k(x, y) + E
x∼Q
y∼Q
k(x, y) − 2 E
x∼P
y∼Q
k(x, y)
In experiments, an average of k(x, y) = exp
(
−∥x − y∥1/σ
)
for
σ ∈ {0.01, 0.025, 0.1, 0.25, 1} was used.
Χρήστος Τσιριγώτης (Α.Π.Θ.) GANs without Gradient Penalty 28 Μαρτίου 2019 37 / 46