This document discusses visual information retrieval in endoscopic surgery videos. It notes that large amounts of surgery video are recorded each day but are difficult to search and retrieve from. The approach uses temporal sampling of frames, and indexes and searches videos based on global and localized global features. It tests this approach on a dataset of over 33 hours of laparoscopy videos containing over 500,000 frames. Evaluation shows the approach can successfully re-find specific frames with near duplicates, and that late fusion of features and localized features like SIFT perform better than global features alone. User studies found the approach provides a useful starting point for interactive video retrieval to help surgeons re-find specific moments in long procedure videos.
HTML Injection Attacks: Impact and Mitigation Strategies
Visual Information Retrieval in Endoscopic Video Archives
1. Visual Information Retrieval in
Endoscopic Video Archives
Jennifer Roldan Carlos, Mathias Lux, Xavier Giro-i-Nieto, Pia Munoz
& Nektarios Anagnostopoulos
2. Motivation
• Surgery videos are taken every day
• Operations rooms are fully booked
• Many procedures already involve video
• Storing videos is / will be req. by law
3. Amount of Videos
• 8-10 h operations / room and day
• say 6 hours excluding set ups, etc.
• 5-6 days a week
• 1,560 h video / year & OR
4. Use Case of Re-finding Frames
• Surgeons take „shots“
• documentation, for patients, discussion
• Shots are intentionally framed
• and make for excellent
representative images
5. Approach
• Temporal sampling: every 5th frame
• Indexing and search based on
• a set of global features
• or a localized global features
7. Features Employed
• Pyramid HOG
• extensive and large texture feature
• Color and Edge Directivity Descriptor
• compact and well performing joint histogram
• SIMPLE
• CEDD descriptors of patches at SURF key points
8. Data Set
• 33 hours of video
• from actual procedures focusing on laporoscopy
• 1,276 videos in total
• 593,446 frames after temporal sampling
10. Evaluation – Re-Finding in Numbers
• Randomly selected more than 700 shots
• Excluding tests, white balance and out-of-patient
• Resulting in 600 sample queries
11. Evaluation – Re-Finding in Numbers
• Hypothesis I: every 5th frame is enough to re-find
images.
• Hypothesis II: There is a noticeable difference
between global and local features.
13. Evaluation – User Study
• Exploratory study, thinking aloud test
• Interactive web page presented to users
• ten cases with all available shots as queries
• three non-labeled search engines
15. Evaluation – User Study
• Population drawn from our projects
• experts in processing endoscopic videos
• well-aware of the requirements surgeons registered
• Task was to ...
• browse diverse results and
• voice drawbacks and benefits
16. Findings
• Sampling every 5th frame works (with headroom)
• Study participants noted that
• late fusion works as expected and yields
interesting results besides near duplicates
• SIMPLE works better for semantically similar
content, ie. translated instruments, etc.
17. Conclusions
• The system does not utilize
• domain dependent methods and heuristics
• run-time and storage demanding methods
• Still, it works out for the use case as a
• candidate support system for surgeons
• baseline to start on interactive video retrieval for
laporoscopy.
18. Future Work
• Salient contours of images
• focus on being robust against lighting and noise
21. Time for questions?
Mathias Lux
± Associate Professor @ Klagenfurt University, Austria
mlux@itec.aau.at
Thanks go to Jennifer Roldan Carlos, Xavier Giro-i-Nieto,
Pia Munoz & Nektarios Anagnostopoulos