Talk on big media archive visual indexing using Convolutional Neural Networks on different High-Performance Computing Platforms, developing new parametrization schemes. Talk and Poster were presented at International Supercomputing Conference 2015 (Frankfurt a. Main / Germany)
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives
1. Towards Distributed, Semi-Automatic
Content-Based Visual Information Retrieval
(CBVIR) of Massive Media Archives
Christian Kehl and Ana Lucia Varbanescu
University of Amsterdam – Informatics Institute
3. Contributions
• A concept for parameterizing Visual Object classifiers, based
on data sampling strategies
• An initial, small-scale study of accelerator performance impact
using different sampling schemes
• Side contribution: a prototypical implementation of sampling-
based CNN parameterization
4. Approach - Theory
• 2 key ideas:
– use pre-trained network, successive refinement
– even untrained, randomly-initialised models can be used [Jarret2009]
– more re-training -> longer computation, higher accuracy
– CNNs learn “from the [input] data” => controlling data means controlling the
model [Zeiler2012]
– CNN parameter tuning: demands knowledge of the process
– CNN input data “tuning”: demands knowledge of the data (e.g. images)
filter response
[Jarret2009]
5. Approach - Practice
Visual Object Classification:
- CNN
- Bag of Words
=> White-Box Model; interchangeable
Images (database) and tags (thesaurus) are
sample according to operator input. The
VOC white box takes the training sample to
generate a classification model. After
passing quality checks, the archive is re-
classified (cheap operation) to generate a
score matrix for querying.
state chart for indexing, showing data (boxes), states
(upper ellipsis) and transition function (lower ellipsis)
6. Experiments
• VOC block: ConvNet
[Krizhevsky2012]
• Prototype with model pre-
training in pylearn2
• Focus on precision influence:
– sample rate (images/tag)
– tag generalisation
– full retraining vs. pre-trained
refinement (last-layer retraining)
• Hardware:
– Intel Xeon E5-1650 v3 in SMP (8 PEs used)
– dedicated graphics adapter (Quadro K4200)
– Intel E5-2620 + 1x NVIDIA GTX680
– Intel E5-2620 + 1x NVIDIA Tesla C2050
– Intel E5-2620 in SMP (16 PEs used)
• Software:
– pylearn2 + hardware-accelerated numpy
• Datasets:
– startoff: CIFAR-10 – 10 tags; 50,000 train
images; 10,000 test images
– source of additional content: CIFAR-100: 100
classes; 50,000 train images; 10,000 test im.
8. Discussion
• Full model recomputation and last-layer re-training:
comparable precision
• re-training valid alternative for dynamic data updates
• ratio “tag samples : # image” has visible precision impact
=> input sampling does parameterize the model
• Tag generalisation improves precision
• CNN computation times benefit more from Accelerator usage
than algorithmic tuning (for small examples)
• technical: NVIDIA Quadro K4200 has not been utilized by
pylearn2 as accelerator
9. Upcoming Research
• Experiments ILSVRC 2010+ImageNet (reduced overfitting)
• use distributed workflow environments (WS-VLAM)
• scaling on network nodes
• impact of retraining at different NN layers
• research how user feedback can control classification
• improve score matrix evaluation on query for growing
archives (sparse matrix access patterns)
• research “steering via sample”
10. Acknowledgements
References:
[Jarret2009] K. Jarret, K. Kavukcuoglu, M. Ranzato and Y. LeCun,
“What is the best multi-stage architecture for object
recogniton ?”, in IEEE 12th International Conference
of Computer Vision, 2009, pp. 2146-2153
[Zeiler2014] M.D.Zeiler and R. Fergus, „Visualizing and
Understanding Convolutional Networks“, in
Computer Vision – ECCV, Springer, 2014, pp. 818-833
[Krizhevsky2012] A. Krizhevsky, I. Sutskever, and G.E. Hinton,
„ImageNet Classification and Deep
Convolutional Neural Networks“, in Advances in
Neural Information Processing Systems, 2012,
pp. 1097-1105
• Lorentz Centrum, for the
project initialization
• NWO and STW, for the KIEM
project grant
• Roeland Ordelman and “Beeld
en Geluid” (NISV), for the
close project collaboration
• Adam Belloum and Thomas
Mensink (UvA – IvI), for the
discussions on Big Data
workflow packages and Visual
Indexing strategies
Editor's Notes
entertainment industry-driven challenge: indexing and retrieval of images and videos from fast-growing, large media archives
our focus: accelerated, semi-automatic, adaptive process for indexing image collections
Visual Object Recognition preferably as exchangeable black-box (approach for flexibility, APPROACH/METHOD SECTION)
Research Focus (HPC):
How to balance computing time demands for archive indexing and index quality (accuracy) within this process ? [speed-accuracy tradeoff]
How to scale the process with flexibly available computing resources (accelerators, computing nodes in the cloud) ? [speed-cost tradeoff]
Additionally: growing, non-static media archives significantly different from common VOC application scenario (static data pool)
The presented poster targets majorly challenge 1