SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Downloaden Sie, um offline zu lesen
1
Human Behavior Understanding:
From Human-Oriented Analysis to Action
Recognition
liuwu1@jd.com
CV Lab
JD AI Research
Wu Liu
2
Human Behavior Understanding: Human-Oriented Analysis
ParsingPose PoseTrack
3
ParsingPose PoseTrack
Human Behavior Understanding: Human-Oriented Analysis
4
Introduction
• Human pose estimation
Single person Multi person
1. Right_Shoulder
2. Right_Elbow
3. Right_Wrist
4. Left_Shoulder
5. Left_Elbow
6. Left_Wrist
7. Right_Hip
8. Right_Knee
9. Right_Ankle
10. Left_Hip
11. Left_Knee
12. Left_Ankle
13. Head
14. Neck
15. Spine
16. Pelvis
5
Applications
• Human action recognition
• Human-computer interaction
• Animation
• Intelligent Retail, such as self-service supermarket and intelligent
warehouses
6
Challenges
• Various appearances and low-resolutions
• Diverse human poses and views
• Occluded or invisible key points
• Crowded background
7
Top-down Methods
[1] Stacked hourglass net-works for human pose estimation. [Newell, ECCV2016]
[2] Towards accurate multi-person pose estimation in the wild. [Papandreou, CVPR2017]
[3] RMPE: Regional Multi-Person Pose Estimation. [Fang, ICCV2017]
[4] Simple Baselines for Human Pose Estimation and Tracking. [Xiao, ECCV2018]
[5] Cascaded Pyramid Network for Multi-Person Pose Estimation. [Chen, CVPR2018]
[6] HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation.[Sun,
CVPR2019)
Human detection + single person key points detection
Advantage: State-of-the-art accuracy
Problem: Lower speed, human detection accuracy.
8
Bottom-Up Methods
[1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017]
[2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell
A, NeurIPS 2017]
[3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas,
ECCV2018]
[4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-
Based, Geometric. [Papandreou, ECCV2018]
[5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019]
[6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019]
Detecting key points + synthesizing human bodies
Advantage: Higher speed, do not rely on human detection
Problem: Lower accuracy
9
• Single person: stacked hourglass – basic network backbone [1]
• Each hourglass first subsamples the feature maps, and then upsamples the feature
maps with the combination of higher resolution features from bottom layers.
• This bottom-up, top-down processing is repeated for several times.
Single Person
Alejandro Newell, Kaiyu Yang, Jia Deng: Stacked Hourglass Networks for
Human Pose Estimation. ECCV (8) 2016: 483-499.
10
• Single person: feature pyramid module [2]
• Feature pyramid representation can provide sufficient context information,
especially for the occluded and invisible key points.
• The residual blocks are substituted by feature pyramid modules. Each module
consists of bottlenecks at different resolutions.
Learning feature pyramids for human pose estimation. W. Yang, S. Li, W. Ouyang, et al. ICCV 2017.
Single Person
https://github.com
/bearpaw/PyraNet
11
Top-down Methods
• George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris
Bregler, Kevin Murphy: Towards Accurate Multi-person Pose Estimation in the Wild. CVPR
2017: 3711-3719
12
• Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose
Estimation. ICCV 2017: 2353-2362
Top-down Methods
• Handle inaccurate bounding boxes and redundant detections
• Symmetric Spatial Transformer Network (SSTN)
• Parametric Pose Non-Maximum-Suppression (NMS)
• Pose-Guided Proposals Generator (PGPG)
https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/
13
• Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose
Estimation. ICCV 2017: 2353-2362
Top-down Methods
https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/
. Problem of bounding box localization errors Symmetric Spatial Transformer Network
• Symmetric Spatial Transformer Network (SSTN)
• Parametric Pose Non-Maximum-Suppression (NMS)
• Pose-Guided Proposals Generator (PGPG)
14
• Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun:
Cascaded Pyramid Network for Multi-Person Pose Estimation. CVPR 2018: 7103-7112
Top-down Methods
• This model applies pyramid features. In globalnet, different level features are
added together to give a rough prediction of key point positions.
• Refinenet utilizes globalnet’s output, upsamples the pyramid features and use
hard point mining to improve the accuracy.
15
• Bin Xiao, Haiping Wu, Yichen Wei: Simple Baselines for Human Pose Estimation and
Tracking. ECCV (6) 2018: 472-487
Top-down Methods
https://github.com/leoxiaobin/pose.pytorch
How high resolution feature maps
are generated
This method combines the
upsampling and convolutional
parameters into deconvolutional
layers in a much simpler way,
without using skip layer connections.
16
• Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation
Learning for Human Pose Estimation. CVPR 2019
Top-down Methods
1. Proposed human pose estimation network maintains high-resolution representations through the whole
process;
2. start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks
one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
3. repeated multi-scale fusions such that each of the high-to-low resolution representations receives
information from other parallel representations over and over, leading to rich high-resolution representations.
https://github.com/leoxiaobin/deep-high-resolution-
net.pytorch
17
Bottom-Up Methods
[1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017]
[2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell
A, NeurIPS 2017]
[3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas,
ECCV2018]
[4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-
Based, Geometric. [Papandreou, ECCV2018]
[5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019]
[6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019]
Detecting key points + synthesizing human bodies
Advantage: Higher speed
Problem: Lower accuracy
18
• Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh: Convolutional Pose Machines. CVPR 2016:
4724-4732
• Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh: Realtime Multi-person 2D Pose Estimation Using Part
Affinity Fields. CVPR 2017: 1302-1310
• Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh: OpenPose: Realtime Multi-Person 2D Pose
Estimation using Part Affinity Fields. CoRR abs/1812.08008 (2018)
Bottom-Up Methods
OpenPose
19
Bottom-Up Methods
OpenPose
Part association strategies.Architecture of the two-branch multi-stage CNN. Graph matching.
20
• Associative Embedding: End-to-end Learning for Joint Detection and Grouping. Alejandro Newell, Zhiao Huang,
and Jia Deng. Neural Information Processing Systems (NIPS), 2017.
Bottom-Up Methods
https://github.com/princeton-vl/pose-ae-train
Detection + Grouping
21
• Muhammed Kocabas, Salih Karagoz, Emre Akbas: MultiPoseNet: Fast Multi-Person Pose Estimation Using
Pose Residual Network. ECCV (11) 2018: 437-453
Bottom-Up Methods
https://github.com/mkocabas/pose-residual-network
MultiPoseNet can jointly handle person detection, keypoint detection, person
segmentation and pose estimation problems.
22
• George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy:
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric
Embedding Model. ECCV (14) 2018: 282-299
Bottom-Up Methods
• PersonLab system consists of a CNN model that
predicts: (1) keypoint heatmaps, (2) short-range offsets,
(3) mid-range pairwise offsets, (4) person segmentation
maps, and (5) long-range offsets.
• The first three predictions are used by the Pose
Estimation Module in order to detect human poses.
• The latter two, along with the human pose detections,
are used by the Instance Segmentation Module in order
to predict person instance segmentation masks.
24
Pose Estimation Dataset
Dataset Single person Multi-person Num of Kpts Num of Person
LSP Y N 14 ~2K
FLIC Y N 9 ~20K
MPII Y Y 16 ~25K
COCO N Y 17 ~100K
AI Challenger N Y 14 ~700K
PoseTrack N Y 15 ~160K
25
Pose Estimation COCO leaderboard
26
Pose Estimation Paper Leaderboard
Category Method Pub mAP
Bottom-up Methods
Openpose CVPR2017 61.8
Associative Embedding NeurlPS 2017 65.5
MultiPoseNet ECCV2018 69.6
PersonLab ECCV2018 68.7
Pifpaf CVPR2019 66.7
Multi-person Articulated Tracking CVPR2019 68.0
Top-down Methods
G-RMI CVPR2017 64.9
Mask RCNN ICCV2017 63.1
RMPE ICCV2017 72.3
Simple Baseline ECCV2018 73.7
CPN CVPR2018 72.1
HRNet CVPR2019 75.5
Category Method Pub PCKh@50
Bottom-up
Methods
Openpose CVPR2017 75.6
Associative Embedding NeurlPS 2017 77.5
Top-down
Methods
RMPE ICCV2017 82.1
Simple Baseline ECCV2018 91.5
HRNet CVPR2019 92.3
COCO
MPII
27
Human Pose Estimation API @ Neuhub
(1)CVPR 2018 LIP Challenge Single Human Pose Estimation 1st place
(2)CVPR 2018 LIP Challenge Multi-Human Pose Estimation 1st place
28
‘Finger Heart & 618’Gesture for
AR Scan
WeChat Mini Program for
Halloween
WeChat Mini Program for
POPMART
Human Pose Estimation API @ Neuhub
29
ParsingPose PoseTrack
Human Behavior Understanding: Human-Oriented Analysis
30
PoseTrack
• Mykhaylo Andriluka, Google Research, Zürich, Switzerland
• Umar Iqbal, University of Bonn, Germany
• Anton Milan, Amazon
• Christoph Lassner, Amazon
• Eldar Insafutdinov, MPI for Informatics, Saarbrücken, Germany
• Leonid Pishchulin, MPI for Informatics, Saarbrücken, Germany
• Juergen Gall, University of Bonn, Germany
• Bernt Schiele, MPI for Informatics, Saarbrücken, Germany
PoseTrack is a joint project of
the Max Planck Institute for
Informatics, University of Bonn
and the PoseTrack team.
31
PoseTrack
Key Figures
 1356 video sequences
 46K annotated video frames
 276K body pose annotations
Two challenges:
 Multi-Person Pose Estimation
 Multi-Person Pose Tracking
32
Challenges
• Large pose and scale variations
• Fast motions
• a varying number of persons
• Visible body parts due to occlusion or truncation
33
Related Work
Bottom-up Methods
[1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018.
[2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack:
Articulated Multi-Person Tracking in the Wild. In CVPR 2017.
[4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
[5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible
and Occluded Body Joints in a Virtual World. In ECCV 2018.
[6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a
virtual world. In ECCV 2018.
[7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019
Top-down Methods
[1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In
CVPR 2018.
[2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018.
[3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
34
Top-down Methods
Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-
Track: Effcient Pose Estimation in Videos. In CVPR 2018.
https://github.com/facebookresearch/DetectAndTrack
They propose a two-stage approach to keypoint estimation
and tracking in videos.
1) a novel video pose estimation formulation, 3D Mask R-
CNN, that takes a short video clip as input and produces
a tubelet per person and keypoints within those.
2) lightweight optimization to link the detections over time.
35
Top-down Methods
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online
Pose Tracking. In BMVC 2018. https://github.com/YuliangXiu/PoseFlow
• Overall Pipeline: 1) Pose Estimator. 2) Pose
Flow Builder. 3) Pose Flow NMS.
• First, they estimate multi-person poses.
• Second, they build pose flows by maximizing
overall confidence and purify them by Pose
Flow NMS.
• Finally, reasonable multi-pose trajectories
can be obtained.
36
Top-down Methods
Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking.
In ECCV 2018
https://github.com/microsoft/human-pose-estimation.pytorch
37
Related Work
Bottom-up Methods
[1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018.
[2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack:
Articulated Multi-Person Tracking in the Wild. In CVPR 2017.
[4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
[5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible
and Occluded Body Joints in a Virtual World. In ECCV 2018.
[6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a
virtual world. In ECCV 2018.
[7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019
Top-down Methods
[1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In
CVPR 2018.
[2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018.
[3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
38
Bottom-up Methods
• Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and
Tracking. In CVPR 2017.
• Mykhaylo Andriluka, Umar Iqbal, Anton Milan, Eldar Insafutdinov, Leonid Pishchulin, Juergen
Gall, and Bernt Schiele. PoseTrack: A Benchmark for Human Pose Estimation and Tracking. In
CVPR 2018.
OpenPose / DeepCut + Graph partition
39
Bottom-up Methods
• Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern
Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR
2017. https://github.com/eldar/pose-tensorflow
40
Bottom-up Methods
• Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for
Multi Person Pose Tracking. In BMVC 2018.
41
Bottom-up Methods
Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita
Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In
ECCV 2018.
42
Bottom-up Methods
• Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for
Multi Person Pose Tracking. In BMVC 2018.
43
Bottom-up Methods
Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita
Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In
ECCV 2018.
44
Bottom-up Methods
Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial
and Temporal Embeddings. CVPR 2019
 A unified framework for
Pose estimation and tracking
 A bottom-up method
 State-of-the-art result
Part-level grouping
 Part appearance
 Geometric information
Temporal grouping
 Human embedding
 Temporal embedding
Pose tracking
bipartite
graph matching
45
Bottom-up Methods
Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial
and Temporal Embeddings. CVPR 2019
Hourglass
Model [20]
Human
Embedding (HE)
Temporal Instance
Embedding (TIE)
Human-level
representation
Temporal representation for ID
association
46
PoseTrack in JD AI Research
1. An end-to-end POINet: feature extraction and identity association in a unified network.
2. Pose-guided feature extraction network: pose information + part-alignment attention in hierarchical
convolution features.
3. Ovonic insight network to learn the identity matching and switching across frames.
[ACM MM 2019]
47
人体姿态估计+人体跟踪技术
https://posetrack.net/leaderboard.php
48
49
ParsingPose PoseTrack
Human Behavior Understanding: Human-Oriented Analysis
50
What is Human Parsing?
Single
Human Parsing
Multiple
Human Parsing
Instance-level
Human Parsing
Fine-grained Human Parsing
59 Categories
51
Human Parsing Applications
SnapShot Fashion Analysis
Recommendation
Fashion Captioning
Clothing
Search
搭配分析
Fashion
Analysis
流行指数
★★★★★
气质指数
★★★★☆
性感指数
★★★★☆
文本生成
飘逸的长发散发着
青春与活力,搭配
天鹅黄长裙彰显修
长的身材,褐色外
套与包包更增添几
分优雅气质。
Human Parsing + X
52
Challenges of Human Parsing?
• Intrinsic
Varied Person Appearance
Ambiguity of Clothing
Complexity of Clothing
Low Efficiency
Small Targets
Unbalance of Data
• Extrinsic
Occlusion
Clutter
53
Human Parsing History
Clothing
Parsing
Human & Object
Parsing
Pedestrian
Parsing
[Bo et al., CVPR11]
Fashion Parsing
[Yamaguchi et al., CVPR12 ] [Liu et al., MM14,
TMM14, MM15 ]
[Liang et al., ICCV15,
TPAMI15, ECCV16 ]
Constrained Un-constrained
54
Related Work
• Single Human parsing [Bo et al., CVPR11 ]
• Unsupervised super-pixel
• Shape-based matching
• Spatial constraints
Conventional methods:
Yihang Bo, Charless C. Fowlkes: Shape-based pedestrian parsing. CVPR 2011: 2265-2272
55
Related Work
• Single Human parsing
• Conventional methods:
• Yamaguchi, Kota, et al. "Parsing clothing in fashion photographs." CVPR, 2012.
• Yamaguchi, Kota, M. Hadi Kiapour, and Tamara L. Berg. "Paper doll parsing: Retrieving similar
styles to parse clothing items." ICCV, 2013.
• Dong, Jian, et al. "A deformable mixture parsing model with parselets." ICCV, 2013.
Pose Parsing
56
Related Work
• Single Human parsing
• Conventional methods:
• Liu, Si, et al. "Fashion parsing with video context." MM2014, TMM2015.
• Liu, Si, et al. "Fashion parsing with weak color-category labels." TMM, 2014.
weak supervision
57
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Luo, Ping, Xiaogang Wang, and Xiaoou Tang. "Pedestrian parsing via deep
decompositional network." ICCV, 2013.
Hog + DNN Deep Decompositional Network
58
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Liu, Si, et al. "Matching-cnn meets knn: Quasi-parametric human parsing." CVPR. 2015.
• Liang, Xiaodan, et al. "Deep human parsing with active template regression." TPAMI, 2015
Parsing by Matching
59
Related Work
• Single Human parsing
• Deep learning-based methods before 2017:
• Liang, Xiaodan, et al. "Human parsing with contextualized convolutional neural network."
ICCV2015, TPAMI2017.
Parsing
Image-level
Label
Edge Superpixel
60
Related Work
• Single Human parsing
• Deep learning-based methods in 2017
• Gong, Ke, et al. "Look into Person: Self-Supervised Structure-Sensitive Learning and
a New Benchmark for Human Parsing." CVPR. 2017.
SSL: Self-supervised Structure-sensitive Learning
https://github.com/Engineering-Course/LIP_SSL
61
Related Work
• Single Human parsing
• Deep learning-based methods in 2017
• Liang, Xiaodan, et al. "Look into Person: Joint Body Parsing & Pose Estimation
Network and A New Benchmark." TPAMI, 2018.
JPP-Net: Joint Body Parsing & Pose Estimation Network
Pose
Parsing
https://github.com/Engineering-Course/LIP_JPPNet
62
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Luo, Yawei, et al. "Macro-micro adversarial network for human parsing." ECCV. 2018.
MMAN: Macro-Micro Adversarial Network
Parsing
GAN
https://github.com/RoyalVane/MMAN
63
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Liu, Si, et al. "Cross-domain human parsing via adversarial feature and label
adaptation.“ AAAI, 2018.
Cross-domain Human Parsing
Parsing
GAN
https://github.com/mathfinder/Cross-domain-Human-
Parsing-via-Adversarial-Feature-and-Label-Adaptation
64
Related Work
• Single Human parsing
• Deep learning-based methods in 2018
• Luo, Xianghui, et al. "Trusted Guidance Pyramid Network for Human Parsing."
ACMMM, 2018
TGPNet: Trusted Guidance Pyramid Network
65
Related Work
• Multi Human parsing
• Li, Qizhu, Anurag Arnab, and Philip HS Torr. "Holistic, Instance-level Human Parsing."
BMVC, 2017.
Detector FCN“parsing-by-detection”
67
Related Work
• Multi Human parsing
• Fang, Hao-Shu, et al. “Weakly and Semi Supervised Human Body Part Parsing via
Pose-Guided Knowledge Transfer.” CVPR, 2018.
Parsing Pose RefineNet
https://github.com/MVIG-SJTU/WSHP
68
Related Work
• Multi Human parsing
• Gong, Ke, et al. "Instance-level human parsing via part grouping network." ECCV,
2018
Parsing Edge
https://github.com/Engineering-
Course/CIHP_PGN
69
Related Work
• Multi Human parsing
• Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial
Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student
Paper.
https://github.com/ZhaoJ901
4/Multi-Human-Parsing
70
Related Work
• Multi Human parsing
• Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial
Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student
Paper.
Parsing GAN
semantic saliency
prediction
instance-agnostic
parsing
instance-aware
clustering
https://github.com/ZhaoJ901
4/Multi-Human-Parsing
71
Related Work
• Multi Human parsing
• Li, Jianshu, et al. "Multi-Human Parsing Machines." ACM MM, 2018.
GAN
Instance
Segmentation
Parsing
72
Related Work
• Multi Human parsing
• Tao Ruan, Ting Liu, et al. "Devil in the details: Towards accurate single and
multiple human parsing." AAAI, 2019.
Parsing Edge
Context Embedding with Edge Perceiving
PSPNet
U-Net
Edge-Net
https://github.com/liutinglt/CE2P
CE2P
73
Related Work
• Multi Human parsing
• Liu, Ting, et al. "Devil in the details: Towards accurate single and multiple human parsing."
AAAI, 2019.
Parsing
Mask-
RCNN
74
Related Work
• Multi Human parsing
• Gong, Ke et al. "Graphonomy: Universal Human Parsing via Graph Transfer Learning."
CVPR, 2019.
Universal Human Parsing: One Model for Different Datasets
Parsing Graph
Transfer
Learning
https://github.com/
Gaoyiminggithub/
Graphonomy
75
Related Work
• Multi Human parsing
• Yang, Lu et al. "Parsing R-CNN for Instance-Level Human Analysis." CVPR, 2019.
An End-to-end Framework for Multi-Human Parsing
FPN RPN
Non-
Local
Parsing R-CNN
76
Related Work
• Video Human parsing
• Zhou, Qixian, et al. “Adaptive Temporal Encoding Network for Video Instance-level
Human Parsing.” ACMMM, 2018. https://github.com/HCPLab-SYSU/ATEN
77
Related Work
• Multi Human parsing
• Xinchen, Liu, et al. “Devil in the details: Towards accurate single
and multiple human parsing.” MM, 2019.
 A Braiding Network with
two sub-nets:
• A deep-and-narrow net to
learn semantic knowledge;
• A shallow-but-wide net to
capture local structures.
 A novel Braiding Module:
• Exchange information
between the two sub-nets
• Learn robust and effective
features for small targets.
 Pairwise Hard Region
Embedding:
• Differentiate ambiguous
parsing targets through a
hard-aware regional metric
learning loss.
78
Datasets
Single Total Train Val Test Class Instance
Fashionista 685 456 - 229 56 1
ATR 17,700 16,000 700 1,000 18 1
LIP 50,462 30,462 10,000 10,000 20 1
JD-Fashion 16,497 16,317 180 - 21 1
Multiple
PASCAL-Person-Part 3,533 1,716 - 1,817 7 ×
CIHP 38,280 28,280 5,000 5,000 20 √
MHP v1.0 4,980 3,000 1,000 980 19 √
MHP v2.0 25,403 15,403 5,000 5,000 59 √
Video
Indoor (1 frame label) 700 400 200 100 13 1
Outdoor (1 frame label) 741 421 120 200 13 1
VIP (1/25 frame label) 404 354 - 50 20 √
79
Evaluation Metric
• Single Human Parsing
• Pixel accuracy
• Mean pixel accuracy
• Mean IoU
• Frequency weighted IoU
• F1-score
F1 = 2 ∙
𝑃 ∙ 𝑅
𝑃 + 𝑅
80
Evaluation Metric
• Multi Human Parsing
• Mean IoU
• APr & mAP
• Percentage of Correctly Parsed (PCP)
• Video Human Parsing
• Similar to Single & Multi Human Parsing
• Additional: FPS
81
Results of Single Human Parsing
• On ATR
Method Pub Pixel Acc F1-score
Paper Doll CVPR13 88.96 44.76
M-CNN CVPR15 89.57 62.81
ATR PAMI15 91.11 64.38
Deeplab-v2(vgg16) PAMI16 94.42 73.53
PSPnet (resnet101) CVPR17 95.20 75.84
Co-CNN ICCV15 95.23 76.95
Attention(vgg16) CVPR16 95.41 77.23
Deeplab-v3+ ECCV18 95.96 79.49
LG-LSTM CVPR16 96.18 80.97
TGPN MM18 96.45 81.76
Graph-LSTM ECCV16 97.60 83.76
Graphonomy CVPR19 98.32 90.89
82
Results of Single Human Parsing
• On LIP validation
Method Pub Pixel Acc mIoU
SegNet PAMI17 69.04 18.17
FCN-8s CVPR15 76.06 28.29
DeepLabV2 ICLR15 82.66 41.64
Attention CVPR16 83.43 42.92
DeepLabV2 + SSL CVPR17 83.16 42.44
Attention + SSL CVPR17 84.36 44.73
SS-NAN CVPRW17 87.59 47.92
MMAN ECCV18 - 46.81
JPPNet PAMI18 86.39 51.37
CE2P AAAI19 87.37 53.10
BraidNet MM19 87.60 54.42
83
Results of Multi Human Parsing
• On CIHP
Method Pub
mIoU AP @IoU Threshold
mAP
0.5 0.6 0.7
PGN ECCV18 55.8 35.8 28.6 20.5 33.6
DMNet CVPR18 61.51 46.12 41.50
M-CE2P AAAI19 59.50 48.69 40.13 29.74 42.83
Graphonomy CVPR19 58.58 - - - -
Parsing RCNN CVPR19 59.8 - - - -
BraidNet MM19 60.62 48.99 41.67 32.71 43.59
84
Results of Multi Human Parsing
• On MHP v2
Method Pub
PCP
@0.5
AP @IoU Threshold
mAP
0.5 0.6 0.7
Mask R-CNN ICCV17 25.11 14.90
MH-Parser MM18 26.98 17.99 - - -
PGN ECCV18 32.25 25.14 - - 41.78
S-LAB CVPR18 38.27 31.47 - - 40.71
CE2P AAAI19 41.82 33.34 - - 42.25
Parsing RCNN CVPR19 44.2 - - - 40.3
85
Thinking in Human Parsing
• Methodology
 Multi-task learning:
Parsing + Pose + Edge
 Multi-granularity supervision
Low + Middle + High
Un/Semi-supervised
 Improve Efficiency
To be real-time
 Cross-domain
Fashion  Surveillance
 Multi-modality
Image  Video
86
Thanks!
liuwu1@jd.com
Computer Vision and Multimedia Lab
AI Platform and Research

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Learning to Find and Match Interest Points
Learning to Find and Match Interest PointsLearning to Find and Match Interest Points
Learning to Find and Match Interest Points
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)
 
Scaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution AlgorithmsScaling up Deep Learning Based Super Resolution Algorithms
Scaling up Deep Learning Based Super Resolution Algorithms
 
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Cv_Chap 4 Segmentation
Cv_Chap 4 SegmentationCv_Chap 4 Segmentation
Cv_Chap 4 Segmentation
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
 
Cvpr 2017 Summary Meetup
Cvpr 2017 Summary MeetupCvpr 2017 Summary Meetup
Cvpr 2017 Summary Meetup
 
3D human body modeling from RGB images
3D human body modeling from RGB images3D human body modeling from RGB images
3D human body modeling from RGB images
 
PR-240: Modulating Image Restoration with Continual Levels via Adaptive Featu...
PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...
PR-240: Modulating Image Restoration with Continual Levels via Adaptive Featu...
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen..."The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
 
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
CV_Chap 6 Motion Representation
CV_Chap 6 Motion RepresentationCV_Chap 6 Motion Representation
CV_Chap 6 Motion Representation
 
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowPR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 

Ähnlich wie Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II

Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
哲东 郑
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
Smriti Tikoo
 

Ähnlich wie Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II (20)

Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptor
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media Pipe
 
AI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media PipeAI Personal Trainer Using Open CV and Media Pipe
AI Personal Trainer Using Open CV and Media Pipe
 
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptxAge Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
ROBUST HUMAN TRACKING METHOD BASED ON APPEARANCE AND GEOMETRICAL FEATURES IN ...
 
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
Robust Human Tracking Method Based on Apperance and Geometrical Features in N...
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
final ppt
final pptfinal ppt
final ppt
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Articulated human pose estimation by deep learning
Articulated human pose estimation by deep learningArticulated human pose estimation by deep learning
Articulated human pose estimation by deep learning
 
Review on Object Counting System
Review on Object Counting SystemReview on Object Counting System
Review on Object Counting System
 
Paper of Final Year Project.pdf
Paper of Final Year Project.pdfPaper of Final Year Project.pdf
Paper of Final Year Project.pdf
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
 
Human Pose Estimation by Deep Learning
Human Pose Estimation by Deep LearningHuman Pose Estimation by Deep Learning
Human Pose Estimation by Deep Learning
 
Facial Emotion Recognition: A Survey
Facial Emotion Recognition: A SurveyFacial Emotion Recognition: A Survey
Facial Emotion Recognition: A Survey
 
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNINGHUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
HUMAN IDENTIFIER WITH MANNERISM USING DEEP LEARNING
 
Wearable Accelerometer Optimal Positions for Human Motion Recognition(LifeTec...
Wearable Accelerometer Optimal Positions for Human Motion Recognition(LifeTec...Wearable Accelerometer Optimal Positions for Human Motion Recognition(LifeTec...
Wearable Accelerometer Optimal Positions for Human Motion Recognition(LifeTec...
 

Mehr von Wanjin Yu

Mehr von Wanjin Yu (10)

Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

Kürzlich hochgeladen

📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Kürzlich hochgeladen (20)

Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition II

  • 1. 1 Human Behavior Understanding: From Human-Oriented Analysis to Action Recognition liuwu1@jd.com CV Lab JD AI Research Wu Liu
  • 2. 2 Human Behavior Understanding: Human-Oriented Analysis ParsingPose PoseTrack
  • 3. 3 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  • 4. 4 Introduction • Human pose estimation Single person Multi person 1. Right_Shoulder 2. Right_Elbow 3. Right_Wrist 4. Left_Shoulder 5. Left_Elbow 6. Left_Wrist 7. Right_Hip 8. Right_Knee 9. Right_Ankle 10. Left_Hip 11. Left_Knee 12. Left_Ankle 13. Head 14. Neck 15. Spine 16. Pelvis
  • 5. 5 Applications • Human action recognition • Human-computer interaction • Animation • Intelligent Retail, such as self-service supermarket and intelligent warehouses
  • 6. 6 Challenges • Various appearances and low-resolutions • Diverse human poses and views • Occluded or invisible key points • Crowded background
  • 7. 7 Top-down Methods [1] Stacked hourglass net-works for human pose estimation. [Newell, ECCV2016] [2] Towards accurate multi-person pose estimation in the wild. [Papandreou, CVPR2017] [3] RMPE: Regional Multi-Person Pose Estimation. [Fang, ICCV2017] [4] Simple Baselines for Human Pose Estimation and Tracking. [Xiao, ECCV2018] [5] Cascaded Pyramid Network for Multi-Person Pose Estimation. [Chen, CVPR2018] [6] HRNet:Deep High-Resolution Representation Learning for Human Pose Estimation.[Sun, CVPR2019) Human detection + single person key points detection Advantage: State-of-the-art accuracy Problem: Lower speed, human detection accuracy.
  • 8. 8 Bottom-Up Methods [1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017] [2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell A, NeurIPS 2017] [3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas, ECCV2018] [4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part- Based, Geometric. [Papandreou, ECCV2018] [5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019] [6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019] Detecting key points + synthesizing human bodies Advantage: Higher speed, do not rely on human detection Problem: Lower accuracy
  • 9. 9 • Single person: stacked hourglass – basic network backbone [1] • Each hourglass first subsamples the feature maps, and then upsamples the feature maps with the combination of higher resolution features from bottom layers. • This bottom-up, top-down processing is repeated for several times. Single Person Alejandro Newell, Kaiyu Yang, Jia Deng: Stacked Hourglass Networks for Human Pose Estimation. ECCV (8) 2016: 483-499.
  • 10. 10 • Single person: feature pyramid module [2] • Feature pyramid representation can provide sufficient context information, especially for the occluded and invisible key points. • The residual blocks are substituted by feature pyramid modules. Each module consists of bottlenecks at different resolutions. Learning feature pyramids for human pose estimation. W. Yang, S. Li, W. Ouyang, et al. ICCV 2017. Single Person https://github.com /bearpaw/PyraNet
  • 11. 11 Top-down Methods • George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy: Towards Accurate Multi-person Pose Estimation in the Wild. CVPR 2017: 3711-3719
  • 12. 12 • Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose Estimation. ICCV 2017: 2353-2362 Top-down Methods • Handle inaccurate bounding boxes and redundant detections • Symmetric Spatial Transformer Network (SSTN) • Parametric Pose Non-Maximum-Suppression (NMS) • Pose-Guided Proposals Generator (PGPG) https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/
  • 13. 13 • Haoshu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu: RMPE: Regional Multi-person Pose Estimation. ICCV 2017: 2353-2362 Top-down Methods https://cvsjtu.wordpress.com/rmpe-regional-multi-person-pose-estimation/ . Problem of bounding box localization errors Symmetric Spatial Transformer Network • Symmetric Spatial Transformer Network (SSTN) • Parametric Pose Non-Maximum-Suppression (NMS) • Pose-Guided Proposals Generator (PGPG)
  • 14. 14 • Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun: Cascaded Pyramid Network for Multi-Person Pose Estimation. CVPR 2018: 7103-7112 Top-down Methods • This model applies pyramid features. In globalnet, different level features are added together to give a rough prediction of key point positions. • Refinenet utilizes globalnet’s output, upsamples the pyramid features and use hard point mining to improve the accuracy.
  • 15. 15 • Bin Xiao, Haiping Wu, Yichen Wei: Simple Baselines for Human Pose Estimation and Tracking. ECCV (6) 2018: 472-487 Top-down Methods https://github.com/leoxiaobin/pose.pytorch How high resolution feature maps are generated This method combines the upsampling and convolutional parameters into deconvolutional layers in a much simpler way, without using skip layer connections.
  • 16. 16 • Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang: Deep High-Resolution Representation Learning for Human Pose Estimation. CVPR 2019 Top-down Methods 1. Proposed human pose estimation network maintains high-resolution representations through the whole process; 2. start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. 3. repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. https://github.com/leoxiaobin/deep-high-resolution- net.pytorch
  • 17. 17 Bottom-Up Methods [1] Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. [Cao, CVPR2017] [2] Associative Embedding : End-to-End Learning for Joint Detection and Grouping. [Newell A, NeurIPS 2017] [3] MultiPoseNet: Fast multi-person pose estimation using pose residual network. [Kocabas, ECCV2018] [4] PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part- Based, Geometric. [Papandreou, ECCV2018] [5] PifPaf: Composite Fields for Human Pose Estimation. [Sven, CVPR2019] [6] Multi-person Articulated Tracking with Spatial and Temporal Embeddings. [CVPR2019] Detecting key points + synthesizing human bodies Advantage: Higher speed Problem: Lower accuracy
  • 18. 18 • Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh: Convolutional Pose Machines. CVPR 2016: 4724-4732 • Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh: Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. CVPR 2017: 1302-1310 • Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. CoRR abs/1812.08008 (2018) Bottom-Up Methods OpenPose
  • 19. 19 Bottom-Up Methods OpenPose Part association strategies.Architecture of the two-branch multi-stage CNN. Graph matching.
  • 20. 20 • Associative Embedding: End-to-end Learning for Joint Detection and Grouping. Alejandro Newell, Zhiao Huang, and Jia Deng. Neural Information Processing Systems (NIPS), 2017. Bottom-Up Methods https://github.com/princeton-vl/pose-ae-train Detection + Grouping
  • 21. 21 • Muhammed Kocabas, Salih Karagoz, Emre Akbas: MultiPoseNet: Fast Multi-Person Pose Estimation Using Pose Residual Network. ECCV (11) 2018: 437-453 Bottom-Up Methods https://github.com/mkocabas/pose-residual-network MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems.
  • 22. 22 • George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, Kevin Murphy: PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. ECCV (14) 2018: 282-299 Bottom-Up Methods • PersonLab system consists of a CNN model that predicts: (1) keypoint heatmaps, (2) short-range offsets, (3) mid-range pairwise offsets, (4) person segmentation maps, and (5) long-range offsets. • The first three predictions are used by the Pose Estimation Module in order to detect human poses. • The latter two, along with the human pose detections, are used by the Instance Segmentation Module in order to predict person instance segmentation masks.
  • 23. 24 Pose Estimation Dataset Dataset Single person Multi-person Num of Kpts Num of Person LSP Y N 14 ~2K FLIC Y N 9 ~20K MPII Y Y 16 ~25K COCO N Y 17 ~100K AI Challenger N Y 14 ~700K PoseTrack N Y 15 ~160K
  • 24. 25 Pose Estimation COCO leaderboard
  • 25. 26 Pose Estimation Paper Leaderboard Category Method Pub mAP Bottom-up Methods Openpose CVPR2017 61.8 Associative Embedding NeurlPS 2017 65.5 MultiPoseNet ECCV2018 69.6 PersonLab ECCV2018 68.7 Pifpaf CVPR2019 66.7 Multi-person Articulated Tracking CVPR2019 68.0 Top-down Methods G-RMI CVPR2017 64.9 Mask RCNN ICCV2017 63.1 RMPE ICCV2017 72.3 Simple Baseline ECCV2018 73.7 CPN CVPR2018 72.1 HRNet CVPR2019 75.5 Category Method Pub PCKh@50 Bottom-up Methods Openpose CVPR2017 75.6 Associative Embedding NeurlPS 2017 77.5 Top-down Methods RMPE ICCV2017 82.1 Simple Baseline ECCV2018 91.5 HRNet CVPR2019 92.3 COCO MPII
  • 26. 27 Human Pose Estimation API @ Neuhub (1)CVPR 2018 LIP Challenge Single Human Pose Estimation 1st place (2)CVPR 2018 LIP Challenge Multi-Human Pose Estimation 1st place
  • 27. 28 ‘Finger Heart & 618’Gesture for AR Scan WeChat Mini Program for Halloween WeChat Mini Program for POPMART Human Pose Estimation API @ Neuhub
  • 28. 29 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  • 29. 30 PoseTrack • Mykhaylo Andriluka, Google Research, Zürich, Switzerland • Umar Iqbal, University of Bonn, Germany • Anton Milan, Amazon • Christoph Lassner, Amazon • Eldar Insafutdinov, MPI for Informatics, Saarbrücken, Germany • Leonid Pishchulin, MPI for Informatics, Saarbrücken, Germany • Juergen Gall, University of Bonn, Germany • Bernt Schiele, MPI for Informatics, Saarbrücken, Germany PoseTrack is a joint project of the Max Planck Institute for Informatics, University of Bonn and the PoseTrack team.
  • 30. 31 PoseTrack Key Figures  1356 video sequences  46K annotated video frames  276K body pose annotations Two challenges:  Multi-Person Pose Estimation  Multi-Person Pose Tracking
  • 31. 32 Challenges • Large pose and scale variations • Fast motions • a varying number of persons • Visible body parts due to occlusion or truncation
  • 32. 33 Related Work Bottom-up Methods [1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018. [2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. [4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018. [5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018. [6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In ECCV 2018. [7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Top-down Methods [1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In CVPR 2018. [2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. [3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
  • 33. 34 Top-down Methods Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and- Track: Effcient Pose Estimation in Videos. In CVPR 2018. https://github.com/facebookresearch/DetectAndTrack They propose a two-stage approach to keypoint estimation and tracking in videos. 1) a novel video pose estimation formulation, 3D Mask R- CNN, that takes a short video clip as input and produces a tubelet per person and keypoints within those. 2) lightweight optimization to link the detections over time.
  • 34. 35 Top-down Methods Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. https://github.com/YuliangXiu/PoseFlow • Overall Pipeline: 1) Pose Estimator. 2) Pose Flow Builder. 3) Pose Flow NMS. • First, they estimate multi-person poses. • Second, they build pose flows by maximizing overall confidence and purify them by Pose Flow NMS. • Finally, reasonable multi-pose trajectories can be obtained.
  • 35. 36 Top-down Methods Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018 https://github.com/microsoft/human-pose-estimation.pytorch
  • 36. 37 Related Work Bottom-up Methods [1] Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017 & CVPR 2018. [2] Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. [4] Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018. [5] Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018. [6] M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, and R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In ECCV 2018. [7] Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Top-down Methods [1] Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. Detect-and-Track: Effcient Pose Estimation in Videos. In CVPR 2018. [2] Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose Flow: Effcient Online Pose Tracking. In BMVC 2018. [3] Bin Xiao, Haiping Wu, and Yichen Wei. Simple Baselines for Human Pose Estimation and Tracking. In ECCV 2018
  • 37. 38 Bottom-up Methods • Umar Iqbal, Anton Milan, and Juergen Gall. PoseTrack: Joint Multi-person Pose Estimation and Tracking. In CVPR 2017. • Mykhaylo Andriluka, Umar Iqbal, Anton Milan, Eldar Insafutdinov, Leonid Pishchulin, Juergen Gall, and Bernt Schiele. PoseTrack: A Benchmark for Human Pose Estimation and Tracking. In CVPR 2018. OpenPose / DeepCut + Graph partition
  • 38. 39 Bottom-up Methods • Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, and Bernt Schiele. ArtTrack: Articulated Multi-Person Tracking in the Wild. In CVPR 2017. https://github.com/eldar/pose-tensorflow
  • 39. 40 Bottom-up Methods • Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
  • 40. 41 Bottom-up Methods Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018.
  • 41. 42 Bottom-up Methods • Andreas Doering, Umar Iqbal, Juergen Gall, and DE Bonn. JointFlow: Temporal Flow Fields for Multi Person Pose Tracking. In BMVC 2018.
  • 42. 43 Bottom-up Methods Matteo Fabbri, Fabio Lanzi, Simone Calderara, Andrea Palazzi, Roberto Vezzani, and Rita Cucchiara. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. In ECCV 2018.
  • 43. 44 Bottom-up Methods Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019  A unified framework for Pose estimation and tracking  A bottom-up method  State-of-the-art result Part-level grouping  Part appearance  Geometric information Temporal grouping  Human embedding  Temporal embedding Pose tracking bipartite graph matching
  • 44. 45 Bottom-up Methods Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian: Multi-person Articulated Tracking with Spatial and Temporal Embeddings. CVPR 2019 Hourglass Model [20] Human Embedding (HE) Temporal Instance Embedding (TIE) Human-level representation Temporal representation for ID association
  • 45. 46 PoseTrack in JD AI Research 1. An end-to-end POINet: feature extraction and identity association in a unified network. 2. Pose-guided feature extraction network: pose information + part-alignment attention in hierarchical convolution features. 3. Ovonic insight network to learn the identity matching and switching across frames. [ACM MM 2019]
  • 47. 48
  • 48. 49 ParsingPose PoseTrack Human Behavior Understanding: Human-Oriented Analysis
  • 49. 50 What is Human Parsing? Single Human Parsing Multiple Human Parsing Instance-level Human Parsing Fine-grained Human Parsing 59 Categories
  • 50. 51 Human Parsing Applications SnapShot Fashion Analysis Recommendation Fashion Captioning Clothing Search 搭配分析 Fashion Analysis 流行指数 ★★★★★ 气质指数 ★★★★☆ 性感指数 ★★★★☆ 文本生成 飘逸的长发散发着 青春与活力,搭配 天鹅黄长裙彰显修 长的身材,褐色外 套与包包更增添几 分优雅气质。 Human Parsing + X
  • 51. 52 Challenges of Human Parsing? • Intrinsic Varied Person Appearance Ambiguity of Clothing Complexity of Clothing Low Efficiency Small Targets Unbalance of Data • Extrinsic Occlusion Clutter
  • 52. 53 Human Parsing History Clothing Parsing Human & Object Parsing Pedestrian Parsing [Bo et al., CVPR11] Fashion Parsing [Yamaguchi et al., CVPR12 ] [Liu et al., MM14, TMM14, MM15 ] [Liang et al., ICCV15, TPAMI15, ECCV16 ] Constrained Un-constrained
  • 53. 54 Related Work • Single Human parsing [Bo et al., CVPR11 ] • Unsupervised super-pixel • Shape-based matching • Spatial constraints Conventional methods: Yihang Bo, Charless C. Fowlkes: Shape-based pedestrian parsing. CVPR 2011: 2265-2272
  • 54. 55 Related Work • Single Human parsing • Conventional methods: • Yamaguchi, Kota, et al. "Parsing clothing in fashion photographs." CVPR, 2012. • Yamaguchi, Kota, M. Hadi Kiapour, and Tamara L. Berg. "Paper doll parsing: Retrieving similar styles to parse clothing items." ICCV, 2013. • Dong, Jian, et al. "A deformable mixture parsing model with parselets." ICCV, 2013. Pose Parsing
  • 55. 56 Related Work • Single Human parsing • Conventional methods: • Liu, Si, et al. "Fashion parsing with video context." MM2014, TMM2015. • Liu, Si, et al. "Fashion parsing with weak color-category labels." TMM, 2014. weak supervision
  • 56. 57 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Luo, Ping, Xiaogang Wang, and Xiaoou Tang. "Pedestrian parsing via deep decompositional network." ICCV, 2013. Hog + DNN Deep Decompositional Network
  • 57. 58 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Liu, Si, et al. "Matching-cnn meets knn: Quasi-parametric human parsing." CVPR. 2015. • Liang, Xiaodan, et al. "Deep human parsing with active template regression." TPAMI, 2015 Parsing by Matching
  • 58. 59 Related Work • Single Human parsing • Deep learning-based methods before 2017: • Liang, Xiaodan, et al. "Human parsing with contextualized convolutional neural network." ICCV2015, TPAMI2017. Parsing Image-level Label Edge Superpixel
  • 59. 60 Related Work • Single Human parsing • Deep learning-based methods in 2017 • Gong, Ke, et al. "Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing." CVPR. 2017. SSL: Self-supervised Structure-sensitive Learning https://github.com/Engineering-Course/LIP_SSL
  • 60. 61 Related Work • Single Human parsing • Deep learning-based methods in 2017 • Liang, Xiaodan, et al. "Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark." TPAMI, 2018. JPP-Net: Joint Body Parsing & Pose Estimation Network Pose Parsing https://github.com/Engineering-Course/LIP_JPPNet
  • 61. 62 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Luo, Yawei, et al. "Macro-micro adversarial network for human parsing." ECCV. 2018. MMAN: Macro-Micro Adversarial Network Parsing GAN https://github.com/RoyalVane/MMAN
  • 62. 63 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Liu, Si, et al. "Cross-domain human parsing via adversarial feature and label adaptation.“ AAAI, 2018. Cross-domain Human Parsing Parsing GAN https://github.com/mathfinder/Cross-domain-Human- Parsing-via-Adversarial-Feature-and-Label-Adaptation
  • 63. 64 Related Work • Single Human parsing • Deep learning-based methods in 2018 • Luo, Xianghui, et al. "Trusted Guidance Pyramid Network for Human Parsing." ACMMM, 2018 TGPNet: Trusted Guidance Pyramid Network
  • 64. 65 Related Work • Multi Human parsing • Li, Qizhu, Anurag Arnab, and Philip HS Torr. "Holistic, Instance-level Human Parsing." BMVC, 2017. Detector FCN“parsing-by-detection”
  • 65. 67 Related Work • Multi Human parsing • Fang, Hao-Shu, et al. “Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer.” CVPR, 2018. Parsing Pose RefineNet https://github.com/MVIG-SJTU/WSHP
  • 66. 68 Related Work • Multi Human parsing • Gong, Ke, et al. "Instance-level human parsing via part grouping network." ECCV, 2018 Parsing Edge https://github.com/Engineering- Course/CIHP_PGN
  • 67. 69 Related Work • Multi Human parsing • Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student Paper. https://github.com/ZhaoJ901 4/Multi-Human-Parsing
  • 68. 70 Related Work • Multi Human parsing • Zhao, Jian, et al. "Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing." ACMMM, 2018, Best Student Paper. Parsing GAN semantic saliency prediction instance-agnostic parsing instance-aware clustering https://github.com/ZhaoJ901 4/Multi-Human-Parsing
  • 69. 71 Related Work • Multi Human parsing • Li, Jianshu, et al. "Multi-Human Parsing Machines." ACM MM, 2018. GAN Instance Segmentation Parsing
  • 70. 72 Related Work • Multi Human parsing • Tao Ruan, Ting Liu, et al. "Devil in the details: Towards accurate single and multiple human parsing." AAAI, 2019. Parsing Edge Context Embedding with Edge Perceiving PSPNet U-Net Edge-Net https://github.com/liutinglt/CE2P CE2P
  • 71. 73 Related Work • Multi Human parsing • Liu, Ting, et al. "Devil in the details: Towards accurate single and multiple human parsing." AAAI, 2019. Parsing Mask- RCNN
  • 72. 74 Related Work • Multi Human parsing • Gong, Ke et al. "Graphonomy: Universal Human Parsing via Graph Transfer Learning." CVPR, 2019. Universal Human Parsing: One Model for Different Datasets Parsing Graph Transfer Learning https://github.com/ Gaoyiminggithub/ Graphonomy
  • 73. 75 Related Work • Multi Human parsing • Yang, Lu et al. "Parsing R-CNN for Instance-Level Human Analysis." CVPR, 2019. An End-to-end Framework for Multi-Human Parsing FPN RPN Non- Local Parsing R-CNN
  • 74. 76 Related Work • Video Human parsing • Zhou, Qixian, et al. “Adaptive Temporal Encoding Network for Video Instance-level Human Parsing.” ACMMM, 2018. https://github.com/HCPLab-SYSU/ATEN
  • 75. 77 Related Work • Multi Human parsing • Xinchen, Liu, et al. “Devil in the details: Towards accurate single and multiple human parsing.” MM, 2019.  A Braiding Network with two sub-nets: • A deep-and-narrow net to learn semantic knowledge; • A shallow-but-wide net to capture local structures.  A novel Braiding Module: • Exchange information between the two sub-nets • Learn robust and effective features for small targets.  Pairwise Hard Region Embedding: • Differentiate ambiguous parsing targets through a hard-aware regional metric learning loss.
  • 76. 78 Datasets Single Total Train Val Test Class Instance Fashionista 685 456 - 229 56 1 ATR 17,700 16,000 700 1,000 18 1 LIP 50,462 30,462 10,000 10,000 20 1 JD-Fashion 16,497 16,317 180 - 21 1 Multiple PASCAL-Person-Part 3,533 1,716 - 1,817 7 × CIHP 38,280 28,280 5,000 5,000 20 √ MHP v1.0 4,980 3,000 1,000 980 19 √ MHP v2.0 25,403 15,403 5,000 5,000 59 √ Video Indoor (1 frame label) 700 400 200 100 13 1 Outdoor (1 frame label) 741 421 120 200 13 1 VIP (1/25 frame label) 404 354 - 50 20 √
  • 77. 79 Evaluation Metric • Single Human Parsing • Pixel accuracy • Mean pixel accuracy • Mean IoU • Frequency weighted IoU • F1-score F1 = 2 ∙ 𝑃 ∙ 𝑅 𝑃 + 𝑅
  • 78. 80 Evaluation Metric • Multi Human Parsing • Mean IoU • APr & mAP • Percentage of Correctly Parsed (PCP) • Video Human Parsing • Similar to Single & Multi Human Parsing • Additional: FPS
  • 79. 81 Results of Single Human Parsing • On ATR Method Pub Pixel Acc F1-score Paper Doll CVPR13 88.96 44.76 M-CNN CVPR15 89.57 62.81 ATR PAMI15 91.11 64.38 Deeplab-v2(vgg16) PAMI16 94.42 73.53 PSPnet (resnet101) CVPR17 95.20 75.84 Co-CNN ICCV15 95.23 76.95 Attention(vgg16) CVPR16 95.41 77.23 Deeplab-v3+ ECCV18 95.96 79.49 LG-LSTM CVPR16 96.18 80.97 TGPN MM18 96.45 81.76 Graph-LSTM ECCV16 97.60 83.76 Graphonomy CVPR19 98.32 90.89
  • 80. 82 Results of Single Human Parsing • On LIP validation Method Pub Pixel Acc mIoU SegNet PAMI17 69.04 18.17 FCN-8s CVPR15 76.06 28.29 DeepLabV2 ICLR15 82.66 41.64 Attention CVPR16 83.43 42.92 DeepLabV2 + SSL CVPR17 83.16 42.44 Attention + SSL CVPR17 84.36 44.73 SS-NAN CVPRW17 87.59 47.92 MMAN ECCV18 - 46.81 JPPNet PAMI18 86.39 51.37 CE2P AAAI19 87.37 53.10 BraidNet MM19 87.60 54.42
  • 81. 83 Results of Multi Human Parsing • On CIHP Method Pub mIoU AP @IoU Threshold mAP 0.5 0.6 0.7 PGN ECCV18 55.8 35.8 28.6 20.5 33.6 DMNet CVPR18 61.51 46.12 41.50 M-CE2P AAAI19 59.50 48.69 40.13 29.74 42.83 Graphonomy CVPR19 58.58 - - - - Parsing RCNN CVPR19 59.8 - - - - BraidNet MM19 60.62 48.99 41.67 32.71 43.59
  • 82. 84 Results of Multi Human Parsing • On MHP v2 Method Pub PCP @0.5 AP @IoU Threshold mAP 0.5 0.6 0.7 Mask R-CNN ICCV17 25.11 14.90 MH-Parser MM18 26.98 17.99 - - - PGN ECCV18 32.25 25.14 - - 41.78 S-LAB CVPR18 38.27 31.47 - - 40.71 CE2P AAAI19 41.82 33.34 - - 42.25 Parsing RCNN CVPR19 44.2 - - - 40.3
  • 83. 85 Thinking in Human Parsing • Methodology  Multi-task learning: Parsing + Pose + Edge  Multi-granularity supervision Low + Middle + High Un/Semi-supervised  Improve Efficiency To be real-time  Cross-domain Fashion  Surveillance  Multi-modality Image  Video
  • 84. 86 Thanks! liuwu1@jd.com Computer Vision and Multimedia Lab AI Platform and Research