Sports Video Annotation: Detection of Strokes in Table Tennis Task
Paper: http://ceur-ws.org/Vol-2670/MediaEval_19_paper_6.pdf
Slides: https://www.youtube.com/watch?v=CvDhEPxF4H0
Pierre-Etienne Martin, Jenny Benois-Pineau, Boris Mansencal, Renaud Peteri, Laurent Mascarilla, Jordan Calandre and Julien Morlier, Sports Video Annotation: Detection of Strokes in Table Tennis task for MediaEval 2019. Proc. of MediaEval 2019, 27-29 October 2019, Sophia Antipolis, France.
Abstract: Action detection and classification is one of the main challenges in visual content analysis and mining. Sport video analysis has been a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Datasets with sport activities are available now for benchmarking of methods. A large amount of work is also devoted to the analysis of sport gestures using motion capture systems. However, body-worn sensors and markers could disturb the natural behaviour of sports players. Furthermore, motion capture devices are not always available for potential users, be it a University Faculty or a local sport team. Coming years will build upon the basic "Sports Video Annotation: Detection of Strokes in Table Tennis" task offered in 2019. The ultimate goal of this research is to produce automatic annotation tools for sport faculties, local clubs and associations to help coaches to better assess and advise athletes during training.
Presented by Pierre-Etienne Martin
Sports Video Annotation: Detection of Strokes in Table Tennis Task
1. ediaEval 2019
Organized by:
Pierre-Etienne Martin
Jenny Benois-Pineau
Boris Mansencal
Renaud Péteri
Laurent Mascarilla
Jordan Calandre
Julien Morlier
1
Sport Video Classification
Task: Table Tennis
“Brave new task”
5. TTStroke-21
5
➢ 129 videos at 120 fps
➢ 1 387 / 1 074 annotations before / after filtering for 20 classes
➢ 1 048 strokes (+ 272 negative samples extracted)
Acquisition
Annotation platform Samples TTStroke-21
6. Particular conditions
6
➢ institutional email address
➢ acceptation of the particular
conditions (deletion after a year,
blurring faces, no redistribution…)
Time consuming
7. Task
7
➢ Train set fully annotated
➢ Test set partially annotated
➢ 20 classes
➢ up to 4 runs
Offensive Forehand Loop
t
Unknown
8. Task
8
➢ We provide:
○ mp4 videos
○ train: xml files with temporal annotation and stroke class
○ test: xml files with temporal annotation and “Unknown” class
➢ return test xml files with Unknown replace with the prediction
Offensive Forehand Loop
t
Unknown
11. Evaluation
11
➢ Accuracy of the classification
➢ Confusion Matrix for the best run:
○ General
○ Hand : Forehand, Backhand
○ Type : Services, Offensive, Defensive
○ Hand + Type
12. Participation
12
➢ 13 subscriptions
○ 1 was a robot
○ 1 subscribed twice
○ 1 without institutional address
○ 3 did not answer when asked for institutional address
○ 1 did not signed the particular conditions
○ 6 participants with access to the data
➢ 3 submissions
○ 2 did not answer after granting access to the data
○ 1 gave up
➢ 3 working note papers
➢ 2 presentations
13. Results - Best run
13
Accuracies in %
Rank Team General Hand Type Hand + Type
1
Univ. Bordeaux -
LaBRI
22.9 76.8 65.8 54.8
2
Univ. La Rochelle -
MIA
14.1 61.6 48.9 29.1
3
SSN College of
Engineering - India
11.3 65.8 55.1 48.3
14. Suggestions for better performances
14
➢ Build negative samples
➢ Multi label or/and cascade
○ Forehand / Backhand
○ Offensive / Defensive / Services
○ Type of stroke
➢ Data augmentation
[1] P.-E. Martin et al., “Sport action recognition with siamese spatio-temporal cnns: Application to table tennis,” in IEEE CBMI, 2018.
[2] P.-E. Martin et al., “Optimal choice of motion estimation methods for fine-grained action classification with 3D convolutional
networks,” in IEEE ICIP, 2019.
[3] P.-E. Martin et al., “Fine-Grained Action Detection and Classification in Table Tennis with Siamese Spatio-Temporal Convolutional
Neural Network,” in IEEE ICIP, 2019.
[4] J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” in IEEE CVPR, 2017.
[5] G. Varol, I. Laptev, and C. Schmid, “Long-term temporal convolutions for action recognition,” in IEEE TPAMI, 2018.
15. Conclusion
15
➢ Complicated task because
○ limited dataset
○ high inter-similarity
➢ Improvements possible
➢ Task will be reconducted
New participants are most welcome!
16. LaBRI - Building A30
351 cours de la Libération
33405, Talence
pierre-etienne.martin@u-bordeaux.fr
https://www.labri.fr/projet/AIV/CRISP_presentation.php
+33 5 40 00 38 80
www.linkedin.com/in/p-e-martin
@P_eMartin
Merci