2. Problem
description
Human parsing aims to segment a human image into
multiple semantic parts.
It is a pixel-wise parsing problem.
It is a supervised machine learning problem.
3. Challenges
Occluded (especially by other people)
Multi-scale
Cross-domain
Label conflict
Blurry
Cavity
…
Main conflict is the desire for both larger
field of view & more accurate location
(Deeper or Denser?)
}
}
Need larger field
of view
Need denser &
more accurate
location
9. Two GANs
Patch GAN focuses on low-level and local features,
which guarantees sharp and clear labelmaps.
Pose GAN focuses on high-level and global features,
which helps generating labelmaps that consist with
human pose priors.
10. ASPP
Patch
D
Patch
GAN loss
Shallow
NLL loss
Deep
NLL loss
Resize
Concat
Totalloss
Copy
3*256*256 20*256*256 20*256*256
3*256*256
20*16*16
64*128*128
20*16*16
fake real
fake
256*64*64
512*32*32
real
1024*16*16
8192*16*16
2048*16*16
Resnet101 Block
Resnet101 Block with Atrous Conv
Tensor Transfer
Upsampling
13. ASPP
Patch
D
Pose
D
Patch
GAN loss
Shallow
NLL loss
Deep
NLL loss
Pose GAN
loss
Resize
Concat
Concat
Totalloss
Copy
3*256*256
19*16*16
20*256*256 20*256*256
3*256*256
19*16*16
20*16*16
64*128*128
Openpose
20*16*16
fake real
fake
256*64*64
512*32*32
real
1024*16*16
8192*16*16
2048*16*16
Resnet101 Block
Resnet101 Block with Atrous Conv
Tensor Transfer
Upsampling
Resize
Concat
14. Real:
1 ⋯ 1
⋮ ⋱ ⋮
1 ⋯ 1
Fake:
0 ⋯ 0
⋮ ⋱ ⋮
0 ⋯ 0
Real: 1
Fake: 0
Patch GAN
Pose GAN
Difference
between two
discriminator
RGB image Pose Label map Feature map
19. Contributions
We propose an effective PP-GAN for human parsing, which employs two
conditional GANs as supplementary supervisions on shallow, fine layers
and deep, coarse layers of the network respectively. Our model explicitly
divides the human parsing into "what" and "where" subtasks in an unified
framework and boosts the parsing performance on both image level and
semantic level.
To our best knowledge, it is the first attempt to integrate human pose
information into a conditional GAN framework for human parsing task,
which significantly reduces the structural error of parsing results.
In the proposed framework, discrimination process is naturally divided into
two easier tasks and two different discriminators are employed. The
experiments demonstrate that multiple discriminators, which only focus on
their own areas, prevail over single discriminator which is prone to saturate
when facing with complex task.
The proposed PP-GAN significantly surpasses the previous methods on
both challenging LIP and XXX benchmark datasets.