1. Automatic Image Annotation and
Retrieval Using the Joint Composite
Descriptor
Konstantinos Zagoris, Savvas A. Chatzichristofis, Nikos
Papamarkos and Yiannis S. Boutalis
Department of Electrical & Computer Engineering
Democritus University of Thrace, Xanthi, Greece
kzagoris@ee.duth.gr
2. Problem Definition
Today, capable tools are needed in order to
successfully search and retrieve images.
Content-based Image Retrieval Techniques
have been used such as img(Anaktisi) and
img(Rummager) which employ low-level
image features such as color, texture and
shape in order to locate similar images.
Although the above approaches are
successful, they lack the ability to include
human perception in the query
3. Proposed Technique
A new image annotation technique
A keyword-based image retrieval system
Employ the Joint Composite Descriptor
Utilizes two set of keywords
One Set consists of colors – keywords
The other Set consists of words
The queries can be more naturally
specified by the user
6. Joint Composite Descriptor
Belong to the family of Compact
Composite Descriptors
More than one feature at the same time,
in a very compact representation
Global Descriptor (global image low level
features)
Fusion of the Color and Edge Directivity
Descriptor (CEDD) and the Fuzzy Color
and Texture Histogram (FCTH)
7. CEDD and FCTH Descriptors
The CEDD length is 54 bytes per image while
FCTH length is 72 bytes per image.
The structure of these descriptors consists of n
texture areas. In particular, each texture area
is separated into 24 sub regions, with each
sub region describing a color.
CEDD and FCTH use the same color
information, as it results from 2 fuzzy systems
that map the colors of the image in a 24-color
custom pallete.
9. Color Information given by the two descriptors
comes from the same fuzzy system.
CEDD uses a fuzzy version of the five digital
filters proposed by the MPEG-7 Edge
Histogram Descriptor.
FCTH uses the high frequency bands of the
Haar wavelet Transform in a fuzzy system.
So, the joining of the descriptors is
accomplished by the fusion-combination of
the texture areas carried by each descriptor.
CEDD and FCTH Descriptors
10. Each descriptor can be descripted in the
following way:
𝐶𝐸𝐷𝐷 𝑗 𝑛
𝑚, 𝐹𝐶𝑇𝐻 𝑗 𝑘
𝑚
𝐶𝐸𝐷𝐷 𝑗 2
5
= 𝑏𝑖𝑛 2 × 24 + 5 = 𝑏𝑖𝑛(53)
Joint Composite Descriptor (JCD)
14. Portrait Similarity Grade (PSG)
Defines the connection of the image
depiction with the corresponding word
It is calculated from the normalization of a
trained Support Vector Machines (SVM)
Decision Function
For each word a SVM is trained using as
training samples the JCD values from a
small subset of the available image
database
15. Support Vector Machines (SVMs)
Based on statistical learning theory
Separate the space that the training samples
are resided in two classes
The new sample is classified depending
where in the space residues
16. Portrait Similarity Grade (PSG)
In this work, we used the following
equation to determine the membership
value 𝑅(𝑥) of the sample x to the class 1:
1 1
100 max , 0
1 1
1 1
3 3
1 1
100 1 max , 0
1 1
1 1
3 3
f x f x
f x f x
if f x
e e
R x
if f x
e e
17. Keyword Annotation Image
Retrieval System
User can employ a number of keywords from both
sets in order to describe the image.
For each keyword 𝐴, and initial classification position
𝑅 𝐴 is calculated based on Manhattan distance and
its PSG or its CSG.
Then the final Rank is calculated for each image 𝑙
based on:
𝑅𝑎𝑛𝑘 𝐴
𝑙 = 𝑁−𝑅 𝐴
𝑁
where N is the total number of images in the database
For more keywords the sum of the Ranks for each
keyword is calculated:
𝑅𝑎𝑛𝑘 𝑙 = 𝑅𝑎𝑛𝑘 𝐴
𝑙 + 𝑅𝑎𝑛𝑘 𝐵
𝑙 +𝑅𝑎𝑛𝑘 𝐶
𝑙 + ⋯
20. Image Retrieval
Results using the
keyword
“mountain”
Image retrieval
results using the
JCD of the first
image as query.
21. Conclusions
A automatic image annotation method
which maps the low level features to a high
level features which humans employ.
This method provides two distinct sets of
keywords: colors and words.
The proposed project has been implemented
and evaluated on the Wang and NISTER
databases.
The results demonstrate the method's
effectiveness as it retrieved results more
consistent with human perception.