2. Self Introduction
ABEJA, Inc. (Researcher)
- Deep Learning (CV, Graph, NLP, )
- Machine Learning
- Mathematical Optimization
- https://github.com/TatsuyaShirakawa
Tech blog http://tech-blog.abeja.asia/
Poincaré Embeddings Graph Convolution Annotation Hyperbolic
3. Today’s Paper
Exploring the Structure among Visual Tasks
by Measuring Transferability
(Taskonomy = Task + Taxonomy
http://taskonomy.stanford.edu/ http://taskonomy.vision/
+ Super Thorough Analysis
+ Potentially Promising Research Direction
= Super Interesting CVPR 2018 Best Paper !
+ Super Large Dataset with 26 Task Annotations
4. Paper Introduction
• Considering transferability among visual
tasks
• Analysis of the transferability by means of
AHP (Analytic Hierarchy Process)
• Combinatorial Optimization for extracting
visual Taskonomy
• Massive Dataset & Experiments
(4.5M images, 26 tasks, 47,886 GPU hours)
http://taskonomy.stanford.edu/
6. Disclaimer
The paper, slides, live demos, and web pages are great already.
So, in this talk, let’s focus on the understanding
- the motivation,
- the task,
- method and
- some experimental results
of Taskonomy.
In the following, I extensively quote some slides from
https://storage.googleapis.com/taskonomy_slides/taskonomy_slides.html
8. Zamir et al. Taskonomy 2018
Question: Vision problems - related or independent?
Layout Objects
?
Depth Normals
Image
?
!2
Zamir et al. Taskonomy 2018
9. Zamir et al. Taskonomy 2018
Question: Vision problems - related or independent?
•Can be computationally measured
•Unified model for transfer learning
•Task relationships exist
•Tasks belonging to a structured space
Depth Normals
Layout Objects
Image
derivative
spatial
prior
!3
12. Zamir et al. Taskonomy 2018
Introduction Method Results Summary
Query Image
AutoencodingIn-painting
Object Class. Scene Class.
Jigsaw puzzle Colorization 2D Segm. 2.5D Segm. Semantic Segm.
Vanishing Points 2D Edges 3D Edges 2D Keypoints 3D Keypoints
3D Curvature Image Reshading Denoising
Cam. Pose (non-fixated) Cam. Pose(fixated) Triplet Cam. Pose Room Layout Point Matching
Top 5 prediction:
sliding door
home theater, home theatre
studio couch, day bed
china cabinet, china closet
entertainment center
Eucl. DistanceSurface Normals
Top 2 prediction:
living room
television room
!21
• Task Bank
• 26 Semantic, 2D, 3D, and tasks
• Dataset
• 4 million real images
• Each image has the GT label for all tasks
• Task-Specific Networks
• 26 x
https://storage.googleapis.com/taskonomy_slides/taskonomy_slides.html
13. Dataset Creation
• Semantic tasks (e.g. scene classification)
=> “Knowledge distillation” from known methods
= predictions of trained models are used as labels
• Non-Semantic Labels
=> Programatically computed from images from multiple RGB-D cameras
23. AHP(Analytical Hierarchical Process)
Mathematical Background
Let us consider the ranking of n items {1, 2, …, n}.
Let A = (a_ij), a_ij measure how i-th item is superior to j-th item.
Assume matrix A = (a_ij) has the form of a_ij = u_i / u_j
Then,
(1) A is rank 1
(2) Au = nu (u is the unique non-zero eigenvector)
=> u: importance vector
24. AHP for Taskonomy
1. Take the win-lose ratio between
(a) transfer s_i -> t and (b) transfer s_j -> t
2. Take the 1st principal component (normalized to sum to 1) of the matrix
3. Create the final matrix by
stacking the 1st principal
components
29. Taxonomy Extraction
• Boolean Integer Programming (BIP)
— Finding the subgraph compose of tasks(nodes) and transfers(edges)
which solve the all tasks in minimum cost
Constraint I
if a transfer is in the subgraph, all of its source nodes/tasks must be included too
Constraint II
each target task has exactly one transfer in
Constraint III
supervision budget is not exceeded