4. 4
Introduction
• Human pose estimation
Single person Multi person
1. Right_Shoulder
2. Right_Elbow
3. Right_Wrist
4. Left_Shoulder
5. Left_Elbow
6. Left_Wrist
7. Right_Hip
8. Right_Knee
9. Right_Ankle
10. Left_Hip
11. Left_Knee
12. Left_Ankle
13. Head
14. Neck
15. Spine
16. Pelvis
5. 5
Applications
• Human action recognition
• Human-computer interaction
• Animation
• Intelligent Retail, such as self-service supermarket and intelligent
warehouses
6. 6
Challenges
• Various appearances and low-resolutions
• Diverse human poses and views
• Occluded or invisible key points
• Crowded background
7. 7
Top-down Methods
[1] Stacked hourglass net-works for human pose estimation. [Newell, ECCV2016]
[2] Towards accurate multi-person pose estimation in the wild. [Papandreou, CVPR2017]
[3] RMPE: Regional Multi-Person Pose Estimation. [Fang, ICCV2017]
[4] Simple Baselines for Human Pose Estimation and Tracking. [Xiao, ECCV2018]
[5] Cascaded Pyramid Network for Multi-Person Pose Estimation. [Chen, CVPR2018]
[6] HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation.[Sun,
CVPR2019)
Human detection + single person key points detection
Advantage: State-of-the-art accuracy
Problem: Lower speed, human detection accuracy.
8. 8
Bottom-Up Methods
[1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017]
[2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell
A, NeurIPS 2017]
[3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas,
ECCV2018]
[4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-
Based, Geometric. [Papandreou, ECCV2018]
[5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019]
[6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019]
Detecting key points + synthesizing human bodies
Advantage: Higher speed, do not rely on human detection
Problem: Lower accuracy
9. 9
• Single person: stacked hourglass – basic network backbone [1]
• Each hourglass first subsamples the feature maps, and then upsamples the feature
maps with the combination of higher resolution features from bottom layers.
• This bottom-up, top-down processing is repeated for several times.
Single Person
Alejandro Newell, Kaiyu Yang, Jia Deng: Stacked Hourglass Networks for
Human Pose Estimation. ECCV (8) 2016: 483-499.
10. 10
• Single person: feature pyramid module [2]
• Feature pyramid representation can provide sufficient context information,
especially for the occluded and invisible key points.
• The residual blocks are substituted by feature pyramid modules. Each module
consists of bottlenecks at different resolutions.
Learning feature pyramids for human pose estimation. W. Yang, S. Li, W. Ouyang, et al. ICCV 2017.
Single Person
https://github.com
/bearpaw/PyraNet
11. 11
Top-down Methods
• George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris
Bregler, Kevin Murphy: Towards Accurate Multi-person Pose Estimation in the Wild. CVPR
2017: 3711-3719
14. 14
• Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun:
Cascaded Pyramid Network for Multi-Person Pose Estimation. CVPR 2018: 7103-7112
Top-down Methods
• This model applies pyramid features. In globalnet, different level features are
added together to give a rough prediction of key point positions.
• Refinenet utilizes globalnet’s output, upsamples the pyramid features and use
hard point mining to improve the accuracy.
15. 15
• Bin Xiao, Haiping Wu, Yichen Wei: Simple Baselines for Human Pose Estimation and
Tracking. ECCV (6) 2018: 472-487
Top-down Methods
https://github.com/leoxiaobin/pose.pytorch
How high resolution feature maps
are generated
This method combines the
upsampling and convolutional
parameters into deconvolutional
layers in a much simpler way,
without using skip layer connections.
16. 16
• Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation
Learning for Human Pose Estimation. CVPR 2019
Top-down Methods
1. Proposed human pose estimation network maintains high-resolution representations through the whole
process;
2. start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks
one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
3. repeated multi-scale fusions such that each of the high-to-low resolution representations receives
information from other parallel representations over and over, leading to rich high-resolution representations.
https://github.com/leoxiaobin/deep-high-resolution-
net.pytorch
17. 17
Bottom-Up Methods
[1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017]
[2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell
A, NeurIPS 2017]
[3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas,
ECCV2018]
[4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-
Based, Geometric. [Papandreou, ECCV2018]
[5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019]
[6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019]
Detecting key points + synthesizing human bodies
Advantage: Higher speed
Problem: Lower accuracy
20. 20
• Associative Embedding: End-to-end Learning for Joint Detection and Grouping. Alejandro Newell, Zhiao Huang,
and Jia Deng. Neural Information Processing Systems (NIPS), 2017.
Bottom-Up Methods
https://github.com/princeton-vl/pose-ae-train
Detection + Grouping
21. 21
• Muhammed Kocabas, Salih Karagoz, Emre Akbas: MultiPoseNet: Fast Multi-Person Pose Estimation Using
Pose Residual Network. ECCV (11) 2018: 437-453
Bottom-Up Methods
https://github.com/mkocabas/pose-residual-network
MultiPoseNet can jointly handle person detection, keypoint detection, person
segmentation and pose estimation problems.
22. 22
• George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy:
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric
Embedding Model. ECCV (14) 2018: 282-299
Bottom-Up Methods
• PersonLab system consists of a CNN model that
predicts: (1) keypoint heatmaps, (2) short-range offsets,
(3) mid-range pairwise offsets, (4) person segmentation
maps, and (5) long-range offsets.
• The first three predictions are used by the Pose
Estimation Module in order to detect human poses.
• The latter two, along with the human pose detections,
are used by the Instance Segmentation Module in order
to predict person instance segmentation masks.
23. 24
Pose Estimation Dataset
Dataset Single person Multi-person Num of Kpts Num of Person
LSP Y N 14 ~2K
FLIC Y N 9 ~20K
MPII Y Y 16 ~25K
COCO N Y 17 ~100K
AI Challenger N Y 14 ~700K
PoseTrack N Y 15 ~160K
26. 27
Human Pose Estimation API @ Neuhub
(1)CVPR 2018 LIP Challenge Single Human Pose Estimation 1st place
(2)CVPR 2018 LIP Challenge Multi-Human Pose Estimation 1st place
27. 28
‘Finger Heart & 618’Gesture for
AR Scan
WeChat Mini Program for
Halloween
WeChat Mini Program for
POPMART
Human Pose Estimation API @ Neuhub
29. 30
PoseTrack
• Mykhaylo Andriluka, Google Research, Zürich, Switzerland
• Umar Iqbal, University of Bonn, Germany
• Anton Milan, Amazon
• Christoph Lassner, Amazon
• Eldar Insafutdinov, MPI for Informatics, Saarbrücken, Germany
• Leonid Pishchulin, MPI for Informatics, Saarbrücken, Germany
• Juergen Gall, University of Bonn, Germany
• Bernt Schiele, MPI for Informatics, Saarbrücken, Germany
PoseTrack is a joint project of
the Max Planck Institute for
Informatics, University of Bonn
and the PoseTrack team.
30. 31
PoseTrack
Key Figures
1356 video sequences
46K annotated video frames
276K body pose annotations
Two challenges:
Multi-Person Pose Estimation
Multi-Person Pose Tracking
31. 32
Challenges
• Large pose and scale variations
• Fast motions
• a varying number of persons
• Visible body parts due to occlusion or truncation
32. 33
Related Work
Bottom-up Methods
[1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018.
[2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack:
Articulated Multi-Person Tracking in the Wild. In CVPR 2017.
[4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
[5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible
and Occluded Body Joints in a Virtual World. In ECCV 2018.
[6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a
virtual world. In ECCV 2018.
[7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019
Top-down Methods
[1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In
CVPR 2018.
[2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018.
[3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
33. 34
Top-down Methods
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-
Track: Effcient Pose Estimation in Videos. In CVPR 2018.
https://github.com/facebookresearch/DetectAndTrack
They propose a two-stage approach to keypoint estimation
and tracking in videos.
1) a novel video pose estimation formulation, 3D Mask R-
CNN, that takes a short video clip as input and produces
a tubelet per person and keypoints within those.
2) lightweight optimization to link the detections over time.
34. 35
Top-down Methods
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online
Pose Tracking. In BMVC 2018. https://github.com/YuliangXiu/PoseFlow
• Overall Pipeline: 1) Pose Estimator. 2) Pose
Flow Builder. 3) Pose Flow NMS.
• First, they estimate multi-person poses.
• Second, they build pose flows by maximizing
overall confidence and purify them by Pose
Flow NMS.
• Finally, reasonable multi-pose trajectories
can be obtained.
35. 36
Top-down Methods
Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking.
In ECCV 2018
https://github.com/microsoft/human-pose-estimation.pytorch
36. 37
Related Work
Bottom-up Methods
[1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018.
[2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack:
Articulated Multi-Person Tracking in the Wild. In CVPR 2017.
[4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
[5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible
and Occluded Body Joints in a Virtual World. In ECCV 2018.
[6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a
virtual world. In ECCV 2018.
[7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019
Top-down Methods
[1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In
CVPR 2018.
[2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018.
[3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
37. 38
Bottom-up Methods
• Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and
Tracking. In CVPR 2017.
• Mykhaylo Andriluka, Umar Iqbal, Anton Milan, Eldar Insafutdinov, Leonid Pishchulin, Juergen
Gall, and Bernt Schiele. PoseTrack: A Benchmark for Human Pose Estimation and Tracking. In
CVPR 2018.
OpenPose / DeepCut + Graph partition
38. 39
Bottom-up Methods
• Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern
Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR
2017. https://github.com/eldar/pose-tensorflow
39. 40
Bottom-up Methods
• Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for
Multi Person Pose Tracking. In BMVC 2018.
40. 41
Bottom-up Methods
Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita
Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In
ECCV 2018.
41. 42
Bottom-up Methods
• Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for
Multi Person Pose Tracking. In BMVC 2018.
42. 43
Bottom-up Methods
Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita
Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In
ECCV 2018.
43. 44
Bottom-up Methods
Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial
and Temporal Embeddings. CVPR 2019
A unified framework for
Pose estimation and tracking
A bottom-up method
State-of-the-art result
Part-level grouping
Part appearance
Geometric information
Temporal grouping
Human embedding
Temporal embedding
Pose tracking
bipartite
graph matching
44. 45
Bottom-up Methods
Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial
and Temporal Embeddings. CVPR 2019
Hourglass
Model [20]
Human
Embedding (HE)
Temporal Instance
Embedding (TIE)
Human-level
representation
Temporal representation for ID
association
45. 46
PoseTrack in JD AI Research
1. An end-to-end POINet: feature extraction and identity association in a unified network.
2. Pose-guided feature extraction network: pose information + part-alignment attention in hierarchical
convolution features.
3. Ovonic insight network to learn the identity matching and switching across frames.
[ACM MM 2019]
51. 52
Challenges of Human Parsing?
• Intrinsic
Varied Person Appearance
Ambiguity of Clothing
Complexity of Clothing
Low Efficiency
Small Targets
Unbalance of Data
• Extrinsic
Occlusion
Clutter
52. 53
Human Parsing History
Clothing
Parsing
Human & Object
Parsing
Pedestrian
Parsing
[Bo et al., CVPR11]
Fashion Parsing
[Yamaguchi et al., CVPR12 ] [Liu et al., MM14,
TMM14, MM15 ]
[Liang et al., ICCV15,
TPAMI15, ECCV16 ]
Constrained Un-constrained
53. 54
Related Work
• Single Human parsing [Bo et al., CVPR11 ]
• Unsupervised super-pixel
• Shape-based matching
• Spatial constraints
Conventional methods:
Yihang Bo, Charless C. Fowlkes: Shape-based pedestrian parsing. CVPR 2011: 2265-2272
54. 55
Related Work
• Single Human parsing
• Conventional methods:
• Yamaguchi, Kota, et al. "Parsing clothing in fashion photographs." CVPR, 2012.
• Yamaguchi, Kota, M. Hadi Kiapour, and Tamara L. Berg. "Paper doll parsing: Retrieving similar
styles to parse clothing items." ICCV, 2013.
• Dong, Jian, et al. "A deformable mixture parsing model with parselets." ICCV, 2013.
Pose Parsing
55. 56
Related Work
• Single Human parsing
• Conventional methods:
• Liu, Si, et al. "Fashion parsing with video context." MM2014, TMM2015.
• Liu, Si, et al. "Fashion parsing with weak color-category labels." TMM, 2014.
weak supervision
56. 57
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Luo, Ping, Xiaogang Wang, and Xiaoou Tang. "Pedestrian parsing via deep
decompositional network." ICCV, 2013.
Hog + DNN Deep Decompositional Network
57. 58
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Liu, Si, et al. "Matching-cnn meets knn: Quasi-parametric human parsing." CVPR. 2015.
• Liang, Xiaodan, et al. "Deep human parsing with active template regression." TPAMI, 2015
Parsing by Matching
58. 59
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Liang, Xiaodan, et al. "Human parsing with contextualized convolutional neural network."
ICCV2015, TPAMI2017.
Parsing
Image-level
Label
Edge Superpixel
59. 60
Related Work
• Single Human parsing
• Deep learning-based methods in 2017
• Gong, Ke, et al. "Look into Person: Self-Supervised Structure-Sensitive Learning and
a New Benchmark for Human Parsing." CVPR. 2017.
SSL: Self-supervised Structure-sensitive Learning
https://github.com/Engineering-Course/LIP_SSL
60. 61
Related Work
• Single Human parsing
• Deep learning-based methods in 2017
• Liang, Xiaodan, et al. "Look into Person: Joint Body Parsing & Pose Estimation
Network and A New Benchmark." TPAMI, 2018.
JPP-Net: Joint Body Parsing & Pose Estimation Network
Pose
Parsing
https://github.com/Engineering-Course/LIP_JPPNet
61. 62
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Luo, Yawei, et al. "Macro-micro adversarial network for human parsing." ECCV. 2018.
MMAN: Macro-Micro Adversarial Network
Parsing
GAN
https://github.com/RoyalVane/MMAN
62. 63
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Liu, Si, et al. "Cross-domain human parsing via adversarial feature and label
adaptation.“ AAAI, 2018.
Cross-domain Human Parsing
Parsing
GAN
https://github.com/mathfinder/Cross-domain-Human-
Parsing-via-Adversarial-Feature-and-Label-Adaptation
63. 64
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Luo, Xianghui, et al. "Trusted Guidance Pyramid Network for Human Parsing."
ACMMM, 2018
TGPNet: Trusted Guidance Pyramid Network
64. 65
Related Work
• Multi Human parsing
• Li, Qizhu, Anurag Arnab, and Philip HS Torr. "Holistic, Instance-level Human Parsing."
BMVC, 2017.
Detector FCN“parsing-by-detection”
65. 67
Related Work
• Multi Human parsing
• Fang, Hao-Shu, et al. “Weakly and Semi Supervised Human Body Part Parsing via
Pose-Guided Knowledge Transfer.” CVPR, 2018.
Parsing Pose RefineNet
https://github.com/MVIG-SJTU/WSHP
66. 68
Related Work
• Multi Human parsing
• Gong, Ke, et al. "Instance-level human parsing via part grouping network." ECCV,
2018
Parsing Edge
https://github.com/Engineering-
Course/CIHP_PGN
67. 69
Related Work
• Multi Human parsing
• Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial
Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student
Paper.
https://github.com/ZhaoJ901
4/Multi-Human-Parsing
68. 70
Related Work
• Multi Human parsing
• Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial
Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student
Paper.
Parsing GAN
semantic saliency
prediction
instance-agnostic
parsing
instance-aware
clustering
https://github.com/ZhaoJ901
4/Multi-Human-Parsing
69. 71
Related Work
• Multi Human parsing
• Li, Jianshu, et al. "Multi-Human Parsing Machines." ACM MM, 2018.
GAN
Instance
Segmentation
Parsing
70. 72
Related Work
• Multi Human parsing
• Tao Ruan, Ting Liu, et al. "Devil in the details: Towards accurate single and
multiple human parsing." AAAI, 2019.
Parsing Edge
Context Embedding with Edge Perceiving
PSPNet
U-Net
Edge-Net
https://github.com/liutinglt/CE2P
CE2P
71. 73
Related Work
• Multi Human parsing
• Liu, Ting, et al. "Devil in the details: Towards accurate single and multiple human parsing."
AAAI, 2019.
Parsing
Mask-
RCNN
72. 74
Related Work
• Multi Human parsing
• Gong, Ke et al. "Graphonomy: Universal Human Parsing via Graph Transfer Learning."
CVPR, 2019.
Universal Human Parsing: One Model for Different Datasets
Parsing Graph
Transfer
Learning
https://github.com/
Gaoyiminggithub/
Graphonomy
73. 75
Related Work
• Multi Human parsing
• Yang, Lu et al. "Parsing R-CNN for Instance-Level Human Analysis." CVPR, 2019.
An End-to-end Framework for Multi-Human Parsing
FPN RPN
Non-
Local
Parsing R-CNN
74. 76
Related Work
• Video Human parsing
• Zhou, Qixian, et al. “Adaptive Temporal Encoding Network for Video Instance-level
Human Parsing.” ACMMM, 2018. https://github.com/HCPLab-SYSU/ATEN
75. 77
Related Work
• Multi Human parsing
• Xinchen, Liu, et al. “Devil in the details: Towards accurate single
and multiple human parsing.” MM, 2019.
A Braiding Network with
two sub-nets:
• A deep-and-narrow net to
learn semantic knowledge;
• A shallow-but-wide net to
capture local structures.
A novel Braiding Module:
• Exchange information
between the two sub-nets
• Learn robust and effective
features for small targets.
Pairwise Hard Region
Embedding:
• Differentiate ambiguous
parsing targets through a
hard-aware regional metric
learning loss.
77. 79
Evaluation Metric
• Single Human Parsing
• Pixel accuracy
• Mean pixel accuracy
• Mean IoU
• Frequency weighted IoU
• F1-score
F1 = 2 ∙
𝑃 ∙ 𝑅
𝑃 + 𝑅
78. 80
Evaluation Metric
• Multi Human Parsing
• Mean IoU
• APr & mAP
• Percentage of Correctly Parsed (PCP)
• Video Human Parsing
• Similar to Single & Multi Human Parsing
• Additional: FPS