SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Towards Automatic Construction of Diverse,
High-quality Image Dataset
Jian Zhang
Multimedia and Data Analytics Lab
University of Technology Sydney
Outlines
➢ Introduction
➢ Related works
➢ Research challenges
➢ Motivations and solutions
➢ Experimental results
➢ Applications
2
Introduction
 Image dataset construction: is the process of collecting
relevant images for the given/target query.
 Significances:
1. Labeled image datasets are the backbone for high-
level image understanding tasks;
2. Continuously drive and evaluate the progress of
feature designing and supervised learning models.
3
Related Works
 Manual labelling based methods
 Active learning based methods
 Automatic learning based methods
4
Manual labelling based methods
Related works:
 LabelMe [Sinclair et al. 1990]
 Pascal VOC series [Everingham et al. 2007]
 ImageNet [Deng et al. 2009]
 CIFAR-10/100 [Krizhevsky et al. 2009]
Advantages: High accuracy
Disadvantages:
 Time consuming and labor intensive
 Scalability is limited
 Diversity is limited
5
Active learning based methods
Related works:
 Towards scalable dataset construction: An active learning
approach. [Collins et al. 2008]
 Large-scale live active learning: Training object detectors
with crawled data and crowds. [Grauman et al. 2014]
Disadvantages:
 Scalability is limited
 Diversity is limited
Advantages:
 The cost of manual labelling has some decrease.
 Relatively high accuracy.
6
Automatic learning based methods
Related works:
 Optimol: automatic online picture collection via
incremental model learning. [Li et al. 2010]
 Harvesting: harvesting image databases from the web.
[Schroff et al. 2011]
 Prajna: Towards recognizing whatever you want from
images without image labeling. [Hua et al. 2015]
Disadvantages:
 Low accuracy
 Limited diversity
Advantages: Scalability is no longer a problem
7
Research challenges
Our work focus on automatic learning based
methods.
The major challenges include:
 Visual polysemy
 Limited diversity
 Low accuracy
8
Research challenge: Visual Polysemy
Fig. 1: Visual polysemy. For example, the query “mouse” returns
multiple visual senses on the first page of results. The retrieved web
images suffer from the low precision of any particular visual sense.
9
Research challenge: Limited Diversity
Fig. 2: Images for “airplane” from four different datasets.
Each dataset has their preference for image selection.
10
Research challenge: Low Accuracy
Fig. 3: Due to the error indexing of image search engine, even we retrieve the
sense-specific images, some instance-level noise may also be included. The
noisy images are marked with red bounding boxes.
11
Motivations
Fig. 4: The average accuracy of top 500 images in Google Image
Search, Web Search and Flickr Search for 10 queries.
12
Motivations
 The first few images returned from image
search engine tend to have a relatively high
accuracy.
 Image search engine restricts the number of
returned images for each query.
 The diversity of the selected images not only
depends on the selection mechanisms, but
also relies on the diversity of the initial
candidate images.
13
Solutions
Query
Noisy images
pruning
Selected
images
Collected
candidate images
Query
Pruning
noisy images
Selected
images
Collected candidate
images
Discovering candidate
textual metadata
Pruning noisy
textual metadata
Traditional solution:
Our solution:
14
Solutions
Noisy textual metadata:
Fig. 5: A snapshot of the retrieved images for visual non-salient
and less relevant textual metadata.
15
Solutions
Our solution:
Fig. 6: Illustration of the process for obtaining multiple textual metadata. The input is a
textual query that we would like to find multiple textual metadata for. The output is a
set of selected textual metadata which will be used for raw image dataset construction.
16
Solutions
Why:
 Solving Visual Polysemy by Allowing Sense-specific Diversity
 Improving Scalability of Image Search Engine
 Increasing Diversity of the Initial Candidate Images
17
Solutions
Fig.7: Illustration of the process for obtaining selected images. The input is a set of
selected textual metadata. Artificial images, inter-class noisy images, and intra-class
noisy images are marked with red, green and blue bounding boxes separately. The
output is a group of selected images in which the images corresponding to different
textual metadata.
18
Solutions
Why:
 Improving the Accuracy of the Selected Images
 Ensuring the Diversity of the Selected Images
19
Experiments
Polysemy:
 Multiple Visual Senses Discovering
 Classifying Sense-specific Images
 Re-ranking Search Results
Accuracy:
 Image Dataset WSID-100 Introduction
 Image Classification Ability Comparison
Diversity:
 Cross-dataset Generalization Ability Comparison
20
Polysemy: Multiple Visual Senses Discovering
Fig. 8: Examples of multiple visual senses discovered by our proposed approach.
For example, our approach automatically discovers and distinguishes four senses
for “Note”: notes, galaxy note, note tablet and music note. For “Bass”, it discovers
multiple visual senses of: bass fish, bass guitar, bass amp, and Mr./Mrs. Bass etc.
21
Polysemy: Classifying Sense-specific Images
Fig. 9: The detailed performance comparison of classification accuracy
over 30 categories on the CMU-Poly-30 dataset
22
Polysemy: Classifying Sense-specific Images
Fig. 10: The detailed performance
comparison of classification accuracy
over 5 categories on the MIT-ISD
dataset.
Tab. 1: The average performance
comparison of classification accuracy
on the CMU-Poly-30 and MIT-ISD
dataset. 23
Polysemy: Re-ranking Search Results
Tab. 2: Web images for polysemy terms were annotated manually.
For each term, the number of annotated images, the semantic
senses, the visual senses and their distributions are provided, with
core semantic senses marked in boldface.
24
Polysemy: Re-ranking Search Results
Tab. 3: Area Under Curve (AUC) of all senses for “bass” and “mouse”.
25
Accuracy: Image Dataset WSID-100 Introduction
WSID-100: Web-supervised Image Dataset 100 Categories
http://www.multimediauts.org/dataset/WSID-100.html
26
Accuracy: Image Classification Ability Comparison
Fig. 11: The image classification accuracy (%) comparison over 14 categories on the PASCAL VOC 2007 dataset.
Fig. 12: The image classification accuracy (%)
comparison over 6 categories on the PASCAL VOC
2007 dataset.
Tab. 4: The average accuracy (%) comparison over 14
and 6 common categories on the PASCAL VOC 2007
dataset. 27
Diversity: Cross-dataset Generalization Ability Comparison
Fig. 13: The cross-dataset generalization ability of various datasets by using a varying number
of training images, and tested on (a) ImageNet, (b) Optimol, (c) Harvesting, (d) DRID-20,
(e) Ours, (f) Average. 28
Application: Fine-grained Visual Categorization
 Fine-grained visual recognition: aims to distinguish between subordinate
categories such as different birds, flowers, food, car, etc.

Large
variance
Little
variance
Fig. 14: Illustration of the two characteristics that fine-grained subcategories have: large
variance in the same subcategory as shown in the first line, and small variance among different
subcategories as shown in the second line. Images in (a) birds and (b) cars are from CUB-200-
2011 and Cars-196 datasets, respectively.
29
 Challenges:
1) Large variance in the same subcategory;
2) Small variance among different subcategories.
 Solutions:
1) Strongly supervised methods;
2) Weakly supervised methods;
3) Webly supervised methods.
30
Strongly supervised methods: require manually labeled
bounding boxes or part annotations during training.
Fig. 15: Bounding boxes and part annotations for fine-grained visual categorization.
31
Weakly supervised methods: require image-level labels
during training.
Fig. 15: Image-level labeling for fine-grained visual categorization.
32
Webly supervised methods: require no need for human
intervention during training.
Fig. 16: A snapshot of retrieved web images for “Bobolink”. Label noise
is marked in red bounding boxes and blue bounding boxes.
33
Our work focus on webly supervised based methods.
The major problems include:
 Label noise (low accuracy)
 Domain mismatch (limited diversity)
34
Fig. 17: Label noise and domain mismatch problems in the web data for fine-grained categorization.
Due to incorrect labeling, the web images collected through query “Fish Crow” may contain noise
(e.g., the “Purple Finch” image marked with red bounding box). Directly leverage the web images
without noise removal may lead to classifying the test image “Purple Finch” as “Fish Crow”. In
addition, the collected web images may contain multiple domains (e.g., natural, sketch, and cartoon
images). Due to the domain distribution between the test image “Bobolink” and the collected sketch
domain web images for “Fish Crow”, although the test image “Bobolink” has no noise in the
collected web images for “Fish Crow”, directly leverage the web images without domain mismatch
alleviation may also lead to classifying the test image “Bobolink” as “Fish Crow”.
35
Fig. 18: The architecture of our proposed DDN model. The input of our model is a set of “bags”, which consists
of multiple images (“instances”). The bag of images is first fed into the backbone net (e.g., VGG16) to generate
the feature map. Then the feature map goes into a fully connected layer followed by a ReLU, thus generating a
feature vector N. This feature vector is subsequently sent into two branches. The upper branch is our proposed
Attention Block, which consists of a fully connected layer followed by a tanh layer and another fully connected
layer. After the Attention Block, we obtain the instance probability vector A. In the bottom branch, the feature
vector N is multiplied by the output of the Attention Block A and then fed into a fully connected layer with a
Sigmoid layer to calculate the probability of the bag being positive. Finally, the bottom branch adopts the
Negative Bernoulli Log Loss, while the upper branch leverages our proposed Attentive Focal Loss. The
weighted sum of the two branch losses is leveraged as the final loss function for DDN. 36
Fig. 19: The architecture of instance-level model. Feature vector N is subsequently sent into two branches. The
upper branch consists of a fully connected layer followed by a tanh layer and another fully connected layer.
After the Attention Block, we obtain the instance probability vector. Finally, the upper branch leverages
Attentive Focal Loss.
37
Fig. 18: The architecture of bag-level model. Feature vector N is subsequently sent into two branches. In
bottom branch, the feature vector N is multiplied by the output of the Attention Block and then fed into a fully
connected layer with a Sigmoid layer to calculate the probability of the bag being positive. Finally, the bottom
branch adopts the Negative Bernoulli Log Loss.
38
Tab. 5: Fine-grained ACA (%) results on CUB200 and Sandford dogs. Tr-B/P means
bounding box or part annotation required in the training stage. Data means the
training data is manually labeled (anno.) or collected from the web (web). The best
result is marked in red, the second best in blue, and the third best in bold.
39
Publications
▪ Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, and Heng-Tao Shen. "Towards Automatic Construction of Diverse,
High-quality Image Dataset", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019.
▪ Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao. "Dynamically Visual
Disambiguation of Keyword-based Image Search", International Joint Conference on Artificial Intelligence (IJCAI), 2019.
▪ Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang and Ling Shao. "Extracting Privileged Information for Enhancing Classifier
Learning", IEEE Transactions on Image Processing (TIP), 2018.
▪ Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang and Ling Shao. " Extracting Multiple Visual Senses for Web Learning ", IEEE
Transactions on Multimedia (TMM), 2018.
▪ Yazhou Yao, Jian Zhang, Fumin Shen, Wankou Yang, Xiansheng Hua and Zhenmin Tang. “Extracting Privileged Information from Untagged
Corpora for Classifier Learning”, International Joint Conference on Artificial Intelligence (IJCAI), 2018.
▪ Yazhou Yao, Jian Zhang, Fumin Shen, Wankou Yang, Pu Huang and Zhenmin Tang. “Discovering and Distinguishing Multiple Visual Senses
for Polysemous Words”, AAAI Conference on Artificial Intelligence (AAAI), 2018.
▪ Yazhou Yao, Jian Zhang, Fumin Shen, Xian-Sheng Hua, Jingsong Xu and Zhenmin Tang. "Exploiting Web Images for Dataset Construction: A
Domain Robust Approach", IEEE Transactions on Multimedia (TMM), 2017.
▪ Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang and Zhenmin Tang. "A Domain Robust Approach for Image Dataset Construction ",
ACM International Conference on Multimedia (ACM MM), 2016.
▪ Yazhou Yao, Jian Zhang, Fumin Shen, Xian-Sheng Hua, Jingsong Xu and Zhenmin Tang. "Automatic Image Dataset Construction with
Multiple Textual Metadata ", IEEE Conference on Multimedia and Expo (ICME), 2016.
40
Thank you!
Any questions?
41

Weitere ähnliche Inhalte

Was ist angesagt?

An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
Applications of Neural Networks
Applications of Neural NetworksApplications of Neural Networks
Applications of Neural NetworksMichael Motoki
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringAdvanced-Concepts-Team
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningAly Abdelkareem
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...哲东 郑
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?Anubhav Jain
 
Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Rajiv Shah
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...inside-BigData.com
 

Was ist angesagt? (20)

An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Applications of Neural Networks
Applications of Neural NetworksApplications of Neural Networks
Applications of Neural Networks
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Dario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineeringDario izzo - Machine Learning methods and space engineering
Dario izzo - Machine Learning methods and space engineering
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
 
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow Image Classification Done Simply using Keras and TensorFlow
Image Classification Done Simply using Keras and TensorFlow
 
Adaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom predictionAdaptive object detection using adjacency and zoom prediction
Adaptive object detection using adjacency and zoom prediction
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
 
Step zhedong
Step zhedongStep zhedong
Step zhedong
 
Fine tuning a convolutional network for cultural event recognition
Fine tuning a convolutional network for cultural event recognitionFine tuning a convolutional network for cultural event recognition
Fine tuning a convolutional network for cultural event recognition
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 

Ähnlich wie Intelligent Multimedia Recommendation

The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full reportJIEMS Akkalkuwa
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
deep_stereo_arxiv_2015
deep_stereo_arxiv_2015deep_stereo_arxiv_2015
deep_stereo_arxiv_2015Ivan Neulander
 
Colour Object Recognition using Biologically Inspired Model
Colour Object Recognition using Biologically Inspired ModelColour Object Recognition using Biologically Inspired Model
Colour Object Recognition using Biologically Inspired Modelijsrd.com
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecognIlyas CHAOUA
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
 
Survey on Multiple Query Content Based Image Retrieval Systems
Survey on Multiple Query Content Based Image Retrieval SystemsSurvey on Multiple Query Content Based Image Retrieval Systems
Survey on Multiple Query Content Based Image Retrieval SystemsCSCJournals
 
ProjectReport_TeamTatin
ProjectReport_TeamTatinProjectReport_TeamTatin
ProjectReport_TeamTatinUrjit Patel
 
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...csandit
 
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...cscpconf
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 
Image Retrieval using Equalized Histogram Image Bins Moments
Image Retrieval using Equalized Histogram Image Bins MomentsImage Retrieval using Equalized Histogram Image Bins Moments
Image Retrieval using Equalized Histogram Image Bins MomentsIDES Editor
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2IRJET Journal
 
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) FrameworkMultimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Frameworktheijes
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Universitat Politècnica de Catalunya
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET Journal
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionaryijwscjournal
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionaryijwscjournal
 
IRJET- Saliency based Image Co-Segmentation
IRJET- Saliency based Image Co-SegmentationIRJET- Saliency based Image Co-Segmentation
IRJET- Saliency based Image Co-SegmentationIRJET Journal
 

Ähnlich wie Intelligent Multimedia Recommendation (20)

The deep learning technology on coco framework full report
The deep learning technology on coco framework full reportThe deep learning technology on coco framework full report
The deep learning technology on coco framework full report
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
deep_stereo_arxiv_2015
deep_stereo_arxiv_2015deep_stereo_arxiv_2015
deep_stereo_arxiv_2015
 
Colour Object Recognition using Biologically Inspired Model
Colour Object Recognition using Biologically Inspired ModelColour Object Recognition using Biologically Inspired Model
Colour Object Recognition using Biologically Inspired Model
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Survey on Multiple Query Content Based Image Retrieval Systems
Survey on Multiple Query Content Based Image Retrieval SystemsSurvey on Multiple Query Content Based Image Retrieval Systems
Survey on Multiple Query Content Based Image Retrieval Systems
 
ProjectReport_TeamTatin
ProjectReport_TeamTatinProjectReport_TeamTatin
ProjectReport_TeamTatin
 
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
Efficient Approach for Content Based Image Retrieval Using Multiple SVM in YA...
 
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
EFFICIENT APPROACH FOR CONTENT BASED IMAGE RETRIEVAL USING MULTIPLE SVM IN YA...
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Image Retrieval using Equalized Histogram Image Bins Moments
Image Retrieval using Equalized Histogram Image Bins MomentsImage Retrieval using Equalized Histogram Image Bins Moments
Image Retrieval using Equalized Histogram Image Bins Moments
 
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2 IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
IRJET - Explicit Content Detection using Faster R-CNN and SSD Mobilenet V2
 
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) FrameworkMultimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
 
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
Web Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual DictionaryWeb Image Retrieval Using Visual Dictionary
Web Image Retrieval Using Visual Dictionary
 
IRJET- Saliency based Image Co-Segmentation
IRJET- Saliency based Image Co-SegmentationIRJET- Saliency based Image Co-Segmentation
IRJET- Saliency based Image Co-Segmentation
 

Mehr von Wanjin Yu

Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIWanjin Yu
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learningWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Wanjin Yu
 

Mehr von Wanjin Yu (9)

Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning Big Data Intelligence: from Correlation Discovery to Causal Reasoning
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
 

Kürzlich hochgeladen

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.CarlotaBedoya1
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Kürzlich hochgeladen (20)

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 

Intelligent Multimedia Recommendation

  • 1. Towards Automatic Construction of Diverse, High-quality Image Dataset Jian Zhang Multimedia and Data Analytics Lab University of Technology Sydney
  • 2. Outlines ➢ Introduction ➢ Related works ➢ Research challenges ➢ Motivations and solutions ➢ Experimental results ➢ Applications 2
  • 3. Introduction  Image dataset construction: is the process of collecting relevant images for the given/target query.  Significances: 1. Labeled image datasets are the backbone for high- level image understanding tasks; 2. Continuously drive and evaluate the progress of feature designing and supervised learning models. 3
  • 4. Related Works  Manual labelling based methods  Active learning based methods  Automatic learning based methods 4
  • 5. Manual labelling based methods Related works:  LabelMe [Sinclair et al. 1990]  Pascal VOC series [Everingham et al. 2007]  ImageNet [Deng et al. 2009]  CIFAR-10/100 [Krizhevsky et al. 2009] Advantages: High accuracy Disadvantages:  Time consuming and labor intensive  Scalability is limited  Diversity is limited 5
  • 6. Active learning based methods Related works:  Towards scalable dataset construction: An active learning approach. [Collins et al. 2008]  Large-scale live active learning: Training object detectors with crawled data and crowds. [Grauman et al. 2014] Disadvantages:  Scalability is limited  Diversity is limited Advantages:  The cost of manual labelling has some decrease.  Relatively high accuracy. 6
  • 7. Automatic learning based methods Related works:  Optimol: automatic online picture collection via incremental model learning. [Li et al. 2010]  Harvesting: harvesting image databases from the web. [Schroff et al. 2011]  Prajna: Towards recognizing whatever you want from images without image labeling. [Hua et al. 2015] Disadvantages:  Low accuracy  Limited diversity Advantages: Scalability is no longer a problem 7
  • 8. Research challenges Our work focus on automatic learning based methods. The major challenges include:  Visual polysemy  Limited diversity  Low accuracy 8
  • 9. Research challenge: Visual Polysemy Fig. 1: Visual polysemy. For example, the query “mouse” returns multiple visual senses on the first page of results. The retrieved web images suffer from the low precision of any particular visual sense. 9
  • 10. Research challenge: Limited Diversity Fig. 2: Images for “airplane” from four different datasets. Each dataset has their preference for image selection. 10
  • 11. Research challenge: Low Accuracy Fig. 3: Due to the error indexing of image search engine, even we retrieve the sense-specific images, some instance-level noise may also be included. The noisy images are marked with red bounding boxes. 11
  • 12. Motivations Fig. 4: The average accuracy of top 500 images in Google Image Search, Web Search and Flickr Search for 10 queries. 12
  • 13. Motivations  The first few images returned from image search engine tend to have a relatively high accuracy.  Image search engine restricts the number of returned images for each query.  The diversity of the selected images not only depends on the selection mechanisms, but also relies on the diversity of the initial candidate images. 13
  • 14. Solutions Query Noisy images pruning Selected images Collected candidate images Query Pruning noisy images Selected images Collected candidate images Discovering candidate textual metadata Pruning noisy textual metadata Traditional solution: Our solution: 14
  • 15. Solutions Noisy textual metadata: Fig. 5: A snapshot of the retrieved images for visual non-salient and less relevant textual metadata. 15
  • 16. Solutions Our solution: Fig. 6: Illustration of the process for obtaining multiple textual metadata. The input is a textual query that we would like to find multiple textual metadata for. The output is a set of selected textual metadata which will be used for raw image dataset construction. 16
  • 17. Solutions Why:  Solving Visual Polysemy by Allowing Sense-specific Diversity  Improving Scalability of Image Search Engine  Increasing Diversity of the Initial Candidate Images 17
  • 18. Solutions Fig.7: Illustration of the process for obtaining selected images. The input is a set of selected textual metadata. Artificial images, inter-class noisy images, and intra-class noisy images are marked with red, green and blue bounding boxes separately. The output is a group of selected images in which the images corresponding to different textual metadata. 18
  • 19. Solutions Why:  Improving the Accuracy of the Selected Images  Ensuring the Diversity of the Selected Images 19
  • 20. Experiments Polysemy:  Multiple Visual Senses Discovering  Classifying Sense-specific Images  Re-ranking Search Results Accuracy:  Image Dataset WSID-100 Introduction  Image Classification Ability Comparison Diversity:  Cross-dataset Generalization Ability Comparison 20
  • 21. Polysemy: Multiple Visual Senses Discovering Fig. 8: Examples of multiple visual senses discovered by our proposed approach. For example, our approach automatically discovers and distinguishes four senses for “Note”: notes, galaxy note, note tablet and music note. For “Bass”, it discovers multiple visual senses of: bass fish, bass guitar, bass amp, and Mr./Mrs. Bass etc. 21
  • 22. Polysemy: Classifying Sense-specific Images Fig. 9: The detailed performance comparison of classification accuracy over 30 categories on the CMU-Poly-30 dataset 22
  • 23. Polysemy: Classifying Sense-specific Images Fig. 10: The detailed performance comparison of classification accuracy over 5 categories on the MIT-ISD dataset. Tab. 1: The average performance comparison of classification accuracy on the CMU-Poly-30 and MIT-ISD dataset. 23
  • 24. Polysemy: Re-ranking Search Results Tab. 2: Web images for polysemy terms were annotated manually. For each term, the number of annotated images, the semantic senses, the visual senses and their distributions are provided, with core semantic senses marked in boldface. 24
  • 25. Polysemy: Re-ranking Search Results Tab. 3: Area Under Curve (AUC) of all senses for “bass” and “mouse”. 25
  • 26. Accuracy: Image Dataset WSID-100 Introduction WSID-100: Web-supervised Image Dataset 100 Categories http://www.multimediauts.org/dataset/WSID-100.html 26
  • 27. Accuracy: Image Classification Ability Comparison Fig. 11: The image classification accuracy (%) comparison over 14 categories on the PASCAL VOC 2007 dataset. Fig. 12: The image classification accuracy (%) comparison over 6 categories on the PASCAL VOC 2007 dataset. Tab. 4: The average accuracy (%) comparison over 14 and 6 common categories on the PASCAL VOC 2007 dataset. 27
  • 28. Diversity: Cross-dataset Generalization Ability Comparison Fig. 13: The cross-dataset generalization ability of various datasets by using a varying number of training images, and tested on (a) ImageNet, (b) Optimol, (c) Harvesting, (d) DRID-20, (e) Ours, (f) Average. 28
  • 29. Application: Fine-grained Visual Categorization  Fine-grained visual recognition: aims to distinguish between subordinate categories such as different birds, flowers, food, car, etc.  Large variance Little variance Fig. 14: Illustration of the two characteristics that fine-grained subcategories have: large variance in the same subcategory as shown in the first line, and small variance among different subcategories as shown in the second line. Images in (a) birds and (b) cars are from CUB-200- 2011 and Cars-196 datasets, respectively. 29
  • 30.  Challenges: 1) Large variance in the same subcategory; 2) Small variance among different subcategories.  Solutions: 1) Strongly supervised methods; 2) Weakly supervised methods; 3) Webly supervised methods. 30
  • 31. Strongly supervised methods: require manually labeled bounding boxes or part annotations during training. Fig. 15: Bounding boxes and part annotations for fine-grained visual categorization. 31
  • 32. Weakly supervised methods: require image-level labels during training. Fig. 15: Image-level labeling for fine-grained visual categorization. 32
  • 33. Webly supervised methods: require no need for human intervention during training. Fig. 16: A snapshot of retrieved web images for “Bobolink”. Label noise is marked in red bounding boxes and blue bounding boxes. 33
  • 34. Our work focus on webly supervised based methods. The major problems include:  Label noise (low accuracy)  Domain mismatch (limited diversity) 34
  • 35. Fig. 17: Label noise and domain mismatch problems in the web data for fine-grained categorization. Due to incorrect labeling, the web images collected through query “Fish Crow” may contain noise (e.g., the “Purple Finch” image marked with red bounding box). Directly leverage the web images without noise removal may lead to classifying the test image “Purple Finch” as “Fish Crow”. In addition, the collected web images may contain multiple domains (e.g., natural, sketch, and cartoon images). Due to the domain distribution between the test image “Bobolink” and the collected sketch domain web images for “Fish Crow”, although the test image “Bobolink” has no noise in the collected web images for “Fish Crow”, directly leverage the web images without domain mismatch alleviation may also lead to classifying the test image “Bobolink” as “Fish Crow”. 35
  • 36. Fig. 18: The architecture of our proposed DDN model. The input of our model is a set of “bags”, which consists of multiple images (“instances”). The bag of images is first fed into the backbone net (e.g., VGG16) to generate the feature map. Then the feature map goes into a fully connected layer followed by a ReLU, thus generating a feature vector N. This feature vector is subsequently sent into two branches. The upper branch is our proposed Attention Block, which consists of a fully connected layer followed by a tanh layer and another fully connected layer. After the Attention Block, we obtain the instance probability vector A. In the bottom branch, the feature vector N is multiplied by the output of the Attention Block A and then fed into a fully connected layer with a Sigmoid layer to calculate the probability of the bag being positive. Finally, the bottom branch adopts the Negative Bernoulli Log Loss, while the upper branch leverages our proposed Attentive Focal Loss. The weighted sum of the two branch losses is leveraged as the final loss function for DDN. 36
  • 37. Fig. 19: The architecture of instance-level model. Feature vector N is subsequently sent into two branches. The upper branch consists of a fully connected layer followed by a tanh layer and another fully connected layer. After the Attention Block, we obtain the instance probability vector. Finally, the upper branch leverages Attentive Focal Loss. 37
  • 38. Fig. 18: The architecture of bag-level model. Feature vector N is subsequently sent into two branches. In bottom branch, the feature vector N is multiplied by the output of the Attention Block and then fed into a fully connected layer with a Sigmoid layer to calculate the probability of the bag being positive. Finally, the bottom branch adopts the Negative Bernoulli Log Loss. 38
  • 39. Tab. 5: Fine-grained ACA (%) results on CUB200 and Sandford dogs. Tr-B/P means bounding box or part annotation required in the training stage. Data means the training data is manually labeled (anno.) or collected from the web (web). The best result is marked in red, the second best in blue, and the third best in bold. 39
  • 40. Publications ▪ Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, and Heng-Tao Shen. "Towards Automatic Construction of Diverse, High-quality Image Dataset", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019. ▪ Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao. "Dynamically Visual Disambiguation of Keyword-based Image Search", International Joint Conference on Artificial Intelligence (IJCAI), 2019. ▪ Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang and Ling Shao. "Extracting Privileged Information for Enhancing Classifier Learning", IEEE Transactions on Image Processing (TIP), 2018. ▪ Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang and Ling Shao. " Extracting Multiple Visual Senses for Web Learning ", IEEE Transactions on Multimedia (TMM), 2018. ▪ Yazhou Yao, Jian Zhang, Fumin Shen, Wankou Yang, Xiansheng Hua and Zhenmin Tang. “Extracting Privileged Information from Untagged Corpora for Classifier Learning”, International Joint Conference on Artificial Intelligence (IJCAI), 2018. ▪ Yazhou Yao, Jian Zhang, Fumin Shen, Wankou Yang, Pu Huang and Zhenmin Tang. “Discovering and Distinguishing Multiple Visual Senses for Polysemous Words”, AAAI Conference on Artificial Intelligence (AAAI), 2018. ▪ Yazhou Yao, Jian Zhang, Fumin Shen, Xian-Sheng Hua, Jingsong Xu and Zhenmin Tang. "Exploiting Web Images for Dataset Construction: A Domain Robust Approach", IEEE Transactions on Multimedia (TMM), 2017. ▪ Yazhou Yao, Xiansheng Hua, Fumin Shen, Jian Zhang and Zhenmin Tang. "A Domain Robust Approach for Image Dataset Construction ", ACM International Conference on Multimedia (ACM MM), 2016. ▪ Yazhou Yao, Jian Zhang, Fumin Shen, Xian-Sheng Hua, Jingsong Xu and Zhenmin Tang. "Automatic Image Dataset Construction with Multiple Textual Metadata ", IEEE Conference on Multimedia and Expo (ICME), 2016. 40