Presentation on how to chat with PDF using ChatGPT code interpreter
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
1. UNICAMP-UFMG at MediaEval 2012:
Genre Tagging Task
´
Jurandy Almeida1 , Thiago Salles2 , Eder F. Martins2 , Ot´vio A. B. Penatti1 ,
a
Ricardo da S. Torres , Marcos A. Gon¸alves , and Jussara M. Almeida2
1
c 2
1
RECOD Lab, Institute of Computing, University of Campinas (UNICAMP)
{jurandy.almeida, penatti, rtorres}@ic.unicamp.br
2 Dept. of Computer Science, Federal University of Minas Gerais (UFMG)
{tsalles, ederfm, mgoncalv, jussara}@dcc.ufmg.br
MediaEval’12 – Pisa, Italy – October 4-5 – 2012
2. Genre Tagging Task at MediaEval 2012 2
Automatically assign tags to Internet videos collected from blip.tv.
The focus is on tags related to the category of the video.
Each video belongs to only one category.
3. Available Resources 3
Table: Resources made available for the task.
Resources Textual Visual
Used Title, Tags, Descriptions Videos
Not Used Tweets, ASR transcripts Shots and Keyframes
4. The 26 Genre Tags 4
Default Category Documentary
Technology
The Mainstream Media
Music and Entertainment Business
Politics
Food & Drink
Educational
Health
Videoblogging Conferences and Other Events
Religion
Literature
Movies and Television The Environment
Sports
Personal or Auto-biographical
Art School and Education
Comedy
Travel
Gaming Web Development and Sites
Citizen Journalism Autos and Vehicles
5. es s
5
cl te
hi si
ve nd l
d ta ca
an en hi
s m ap
to p gr
au lo io
ve -b
de to
eb au n
w or io
al at
on uc
rs ed
pe d
an
ol
ho
sc
t
ve
l en
tra nm nt
s
ro
vi ve
en re
th
e he
ot
Distribution of Genres (Development Set)
e
ur d
at an
er s
lit ce
en
er k
nf in
co dr
d
an
od
fo
th
al ia
he ed
ss m
ne am
si
bu tre
ns
ai
m
e ry
th ta
en
m
cu
do
g m
in lis
m na
ga ur
jo
en
tiz
ci
y
ed
m
co
t
ar
ts
or on
sp si
n vi
io le
lig te
re d
an
s
ie
ov ng
m g gi
lo t
ob en
de l m
vi na in
at
io rta
te
uc en
ed d
an
ic
us
m gy
lo
no
ch
te
s y
lit
ic or
po eg
at
tc
ul
fa
de
800
700
600
500
400
300
200
100
0
# of objects
6. Proposed Framework 6
It relies on two learning strategies:
1. Meta-learning scheme for combining
classifiers (stacked generalization)
trained with metadata information
model as a bag-of-words.
2. Histogram of motion patterns joint
with a KNN classifier for visual
information.
7. Visual Approach (HMP) 7
Keyframes: Not used
Applying an algorithm to compare video sequences.
“Comparison of Video Sequences with Histograms of Motion Patterns”.
J. Almeida, N. J. Leite, and R. S. Torres.
IEEE Conference on Image Processing (ICIP) 2011.
It relies on three steps:
1. partial decoding;
2. feature extraction;
3. signature generation.
8. Visual Approach (HMP) 8
Video Frames
Motion Pattern 1 1 2 1
0101100110010011
2 1 0 3
Temporal Spatial
Histogram Distribution
1: Partial Decoding 3: Signature Generation
DC coefficient
Time Series of Macroblocks
Pixel Block
106 111 91 94 90 93
Macroblock 2: Feature Extraction
100 88 95 90 96 91
Previous Current Next
I-frames
[Almeida et al., Comparison of Video Sequences with Histograms of Motion Patterns. ICIP 2011]
9. Textual Approach (BoW) 9
Title, tags, and description used as textual features.
Only metadata in English.
Porter Stemming algorithm to remove affixes.
Stop words removed.
Bag-of-Words using T F xIDF as weights.
10. The Classifiers 10
KNN for visual features
Stacked generalization for textual features
Level 0
◮ K-Nearest Neighbors (KNN): using the k nearest training examples to assign a
class to test example.
◮ Na¨ Bayes (NB): a probabilistic learning method that aims at inferring a model
ıve
for each class by assigning to a test example the class associated with the most
probable model that would have generated it.
◮ Random Forests (RF) a variation of decision tree’s bagging, in which an ensemble
of de-correlated decision trees is learned.
Level 1
◮ Support Vector Machine (SVM): a classifier that aims at finding an optimal
separating hyperplane between the positive and negative training documents,
maximizing the distance (margin) to the closest points from either class.
12. Experimental Protocol 12
5-fold cross validation
Development set size: 5,288 videos (∼33%)
Test set size: 10,161 videos (∼67%)
Vocabulary of 54,796 terms
Mean of 31.03 terms per video
Total number of visual features: 4,888
Mean of 3,703.59 visual features per video
13. Experimental Results 13
Table: MAP for each submitted run.
Experiment Input MAP
Run 1 audio/visual only, no ASR 0.1238
Run 4 everything allowed, no uploader ID 0.2112
14. 14
s
ite l
es s ca
cl d hi
hi an p
ve nt ra
d e iog
an pm b
s lo o− n
to ve ut tio
au de a a or uc
eb al d
w on d e
rs an s
pe ol t nt
run1
run4
ho en ve
sc el nm re
v iro he
tra env ot
nd
e e a Figure: AP per class achieved in each run.
th tur es
a c
er n k
lit ere drin
nf d
co an ia
od ed
fo lth m
a s am
he s e ne tr
si ins
bu ma tary sm
e n li
th me rna
cu jou
do en
tiz
ci ing
m
ga edy
m
co n
t io
ar ts is
or ev
el
sp ics d t
lit an
po ies
Experimental Results
ov g
en
t
m ion gin m
lig og in
re obl al rta
n e
de o nt
vi cati d e
u n
ed ic a y y
r
us og o
m nol teg
ch ca
te ult
fa
de
1
0.8
0.6
0.4
0.2
0
Average Precision
15. Experimental Results 15
Visual features
perform better in most imbalanced classes like “food and drink” (AP = 0.372),
“school and education” (AP = 0.509), and “autos and vehicles” (AP = 0.692).
Textual features
perform better in the more frequent classes in development set as “default category”
(AP = 0.860) and “business” (AP = 0.557).
16. Conclusion 16
Remarks
We used different learning strategies for each data modality: video similarity for
processing visual content and an ensemble of classifiers for text-processing.
Findings
Obtained results demonstrate that the proposed framework is promising. By
combining textual and visual features, we can make a contribution to better results.
Future work
The investigation of learning strategies for combining features from different
modalities and considering other information sources, such as ASR transcripts, to
include more features semantically related to each category.
17. Acknowledgements 17
Organizers of Tagging Task and MediaEval 2012
Brazilian funding agencies
FAPEMIG, FAPESP, CAPES, and CNPq