Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives

Towards Distributed, Semi-Automatic
Content-Based Visual Information Retrieval
(CBVIR) of Massive Media Archives
Christian Kehl and Ana Lucia Varbanescu
University of Amsterdam – Informatics Institute

Motivation and Focus
Object
Recognition
Challenges
Real Media
Archives
image set static dynamic
increase
tag set static dynamic
increase

Contributions
• A concept for parameterizing Visual Object classifiers, based
on data sampling strategies
• An initial, small-scale study of accelerator performance impact
using different sampling schemes
• Side contribution: a prototypical implementation of sampling-
based CNN parameterization

Approach - Theory
• 2 key ideas:
– use pre-trained network, successive refinement
– even untrained, randomly-initialised models can be used [Jarret2009]
– more re-training -> longer computation, higher accuracy
– CNNs learn “from the [input] data” => controlling data means controlling the
model [Zeiler2012]
– CNN parameter tuning: demands knowledge of the process
– CNN input data “tuning”: demands knowledge of the data (e.g. images)
filter response
[Jarret2009]

Approach - Practice
Visual Object Classification:
- CNN
- Bag of Words
=> White-Box Model; interchangeable
Images (database) and tags (thesaurus) are
sample according to operator input. The
VOC white box takes the training sample to
generate a classification model. After
passing quality checks, the archive is re-
classified (cheap operation) to generate a
score matrix for querying.
state chart for indexing, showing data (boxes), states
(upper ellipsis) and transition function (lower ellipsis)

Experiments
• VOC block: ConvNet
[Krizhevsky2012]
• Prototype with model pre-
training in pylearn2
• Focus on precision influence:
– sample rate (images/tag)
– tag generalisation
– full retraining vs. pre-trained
refinement (last-layer retraining)
• Hardware:
– Intel Xeon E5-1650 v3 in SMP (8 PEs used)
– dedicated graphics adapter (Quadro K4200)
– Intel E5-2620 + 1x NVIDIA GTX680
– Intel E5-2620 + 1x NVIDIA Tesla C2050
– Intel E5-2620 in SMP (16 PEs used)
• Software:
– pylearn2 + hardware-accelerated numpy
• Datasets:
– startoff: CIFAR-10 – 10 tags; 50,000 train
images; 10,000 test images
– source of additional content: CIFAR-100: 100
classes; 50,000 train images; 10,000 test im.

Results
significantly overfitted network (error rates)

Discussion
• Full model recomputation and last-layer re-training:
comparable precision
• re-training valid alternative for dynamic data updates
• ratio “tag samples : # image” has visible precision impact
=> input sampling does parameterize the model
• Tag generalisation improves precision
• CNN computation times benefit more from Accelerator usage
than algorithmic tuning (for small examples)
• technical: NVIDIA Quadro K4200 has not been utilized by
pylearn2 as accelerator

Upcoming Research
• Experiments ILSVRC 2010+ImageNet (reduced overfitting)
• use distributed workflow environments (WS-VLAM)
• scaling on network nodes
• impact of retraining at different NN layers
• research how user feedback can control classification
• improve score matrix evaluation on query for growing
archives (sparse matrix access patterns)
• research “steering via sample”

Acknowledgements
References:
[Jarret2009] K. Jarret, K. Kavukcuoglu, M. Ranzato and Y. LeCun,
“What is the best multi-stage architecture for object
recogniton ?”, in IEEE 12th International Conference
of Computer Vision, 2009, pp. 2146-2153
[Zeiler2014] M.D.Zeiler and R. Fergus, „Visualizing and
Understanding Convolutional Networks“, in
Computer Vision – ECCV, Springer, 2014, pp. 818-833
[Krizhevsky2012] A. Krizhevsky, I. Sutskever, and G.E. Hinton,
„ImageNet Classification and Deep
Convolutional Neural Networks“, in Advances in
Neural Information Processing Systems, 2012,
pp. 1097-1105
• Lorentz Centrum, for the
project initialization
• NWO and STW, for the KIEM
project grant
• Roeland Ordelman and “Beeld
en Geluid” (NISV), for the
close project collaboration
• Adam Belloum and Thomas
Mensink (UvA – IvI), for the
discussions on Big Data
workflow packages and Visual
Indexing strategies

Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives

Similar to Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives (20)

More from Christian Kehl

More from Christian Kehl (20)

Recently uploaded

Recently uploaded (20)

Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive Media Archives

Editor's Notes