- The document proposes a novel semidense representation map that condenses dense visual features while highlighting relevant information and preserving uniform region data.
- It applies sparse visual features to enhance relevant points and uses a regular grid in uniform regions. Experimental results show this reduces data requirements while extracting key information and inherently regularizes outputs.
- The method is implemented efficiently on FPGA hardware, providing over 20x bandwidth savings and 15x memory usage reduction compared to dense representations. It allows for real-time integration of feedback from tasks like attention, ground plane detection, and obstacle detection.
Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation)
1. Efficient architecture to
condensate visual information
driven by attention processes
Mª Sara Granados Cabeza Supervisors:
Javier Díaz Alonso
Sonia Mota Fernández
Alberto Prieto Espinosa
2. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
1
3. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
2
4. Human Visual System
• Eyes do more than
capturing images:
▫ Extract spatio-temporal
transition
▫ Efficient communication
using an event driven
schema
• More than just the eyes
▫ Visual cortex also
important
3
5. Human Visual System
• Main characteristics we try to emulate in
artificial systems:
▫ Retina pre-processing capabilities
▫ Adaptation to changing environments
▫ Limited resources
▫ Active vision:
Gazing control
Attention
Bottom-up (saliency)
Top-down (target driven)
4
11. Possible Solutions
• Retinomorphic grabbing systems
• Other hardware devices: Optimized PC
implementations (SSE, MMX, IPP), DSP, GPU,
latest FPGA, ASIC
▫ Brute-force solution
• Compression
▫ Only group and/or reduce color components
▫ No feedback integration
▫ Transfer the problem to higher-level stages
10
13. Our solution
• Novel representation map
▫ Avoid sending unnecessary information
“Smart” compression = Condensation
Without losing uniform-region data
▫ Versatile
Generic algorithm for several visual features
Suitable for commodity and embedded platforms
▫ Create a feedback channel
Bottom-up saliency
Target-driven selection (top-down attention)
▫ Easy integration with higher-level algorithms 12
14. Tools and Methods
13
Model
1
High performance
implementation.
PC based (basic
approaches in java, c,
c++, etc..)
with accelerator (GPU
and FPGA) or optimized
code (c/c++)
3
Low
performance
implementation
PC based (Matlab)
2
Stand alone
platform.
DSPs or FPGA-
based,
prototyping
board
4
Specific
purpose
system.
FPGA-ASIC based,
specific purpose
board with
certification
capabilities
5
Complexity
Time
15. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
14
17. Semidense representation map
• What?
▫ Condenses dense visual features
▫ Highlights relevant information
▫ Keeps uniform-region information
• How?
▫ Using sparse visual features as relevance enhancer
▫ Applying a regular grid in the uniform regions
16
26. Uniform Regions: Window Size
• Depends on
▫ The condensation ratio:
5x5 window 4% of the dense points
7x7 window 2%
9x9 window 1%
Less points imply less subsampling operations
▫ the image resolution
• We need to keep enough information
▫ Dense features not so dense (NaN problem)
25
27. Uniform Regions: Filter
• Instead of subsampling, filtering:
▫ Error smoothing
▫ Information spreading
• Filters assessed:
▫ Median
▫ Bilateral
▫ Anisotropic
• Benchmark:
▫ Middlebury
26
34. Semidense representation map
• Trade-off configuration:
▫ Canny-based relevant point extractor
▫ 5x5 grid based on bilateral filter
• Results
▫ Reduces memory and bandwidth requirements
▫ Extracts relevant information
▫ Incorporates uniform-region information
▫ Inherently regularizes
▫ Easily integrates in higher-level algorithms
34
35. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
35
36. Applications
1. Attention integration:
▫ Bottom-up: Saliency maps as relevant-point
extractor
▫ Top-down: Independently Moving Objects (IMOs)
2. High-level algorithm application:
▫ Using only semidense visual features
▫ First, we extract the ground-plane
▫ Then, we detect obstacles
36
39. Attention Processes: Top-Down
• IMOs extraction:
▫ Using Pauwels et al. (Journal of Vision, 2010)
• Integration with Semidense Maps
▫ IMOs from frame N is integrated in the semidense
representation map of frame N+1
• Integration with other processes:
▫ such as Time-To-Contact (TTC)
39
42. Ground-Plane Detection
42
• Compare Original vs Semidense:
▫ Similar response
▫ Grid contains needed information
▫ Structure in the RP introduces noise
• Same accuracy with 8% of input resolution
45. Obstacle Detection Algorithm
• Final output
45
Original (dense) Semidense
• Equivalent output with 8% of input resolution
46. Obstacle Detection Algorithm
• Integration in several stages of a high-level
algorithm
• Similar response
▫ Uniform region relevance
• Workload reduction:
4611x 1.5x
47. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
47
50. Condensation Architecture
• Hardware trade-off
configuration:
▫ Canny
▫ Median filter
• Fine-grain pipeline
▫ Submodules
Logical functionality
▫ Stages
Number of simple operations
▫ Scalar units
Number of features to
process
• One processed datum per clock
cycle
[#Operations, #Scalar]
51. OF E & O D
Low-Level Vision System
Hysteresis +
nonmax suppr.
RP + Grid RP + Grid
Condensation Condensation Condensation
Pyramid
CO - PROCESSOR
MCU
Vy RP
Vy Grid
Vx RP
Vx. Grid
Disp RP
Disp. Grid
Grid mask
Memory
Grid extractor
size
FIFO
Grid
RP
Gridfeedback
RPfeedback
Vx VyRPGrid RPGrid
D
Storage Storage Storage
Vx
RP
Vy
RP
Vx’ Grid Vy’ Grid D
RP
D’ Grid
52. Efficient Communication Protocol
• Grid is regular and known
beforehand
▫ Store only the values
▫ Grid binary mask :
computed
sent (more versatile)
• RP are non regular:
▫ Under 4% of data:
Address Event Representation
(AER)
▫ Boahen et al. (IEEE, 2004)
Vx, Vy and D codified with 12 bits
53. Hardware utilization (XCV4FX100)
• Integration with existing system:
▫ Tomasi et al. (IEEE,2011) ~90% of slices used
▫ Barranco et al. (DSP,2012) ~75% of slices used
• JPEG Compression Core:
▫ 17% per visual feature 51% (D, Vx, Vy) 53
57. Versatile Architecture
• Feedback integration in real-time
▫ Reconfigurable RP and Grid masks
Programmable or computed on real-time
▫ RP Feedback (Objects, TTC, IMOs)
▫ Grid Feedback (Ground-plane, Adaptive grid)
• Task-driven configuration
57
58. Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
58
59. Conclusions
• Novel semidense representation map:
▫ Relevant data treated with higher priority
▫ Keeping uniform-region information
• Versatile
• Create a feedback channel
▫ Bottom-up saliency
▫ Target-driven transfer (top-down attention)
• Easy integration with higher-level algorithms
• Efficiently implemented in hardware
59
60. Future Work
• Integrate other enhancing signals
▫ Descriptors (SIFT, SURF, GLOH,…)
• SoC integration using latest platforms
• Incorporate different feedback signals:
▫ TTC estimations
▫ Adapt grid dynamically
• Explore new applications:
▫ Tracking
▫ Video surveillance
▫ Multi-camera systems
60
61. Main Contributions
• Semidense representation map
• Assessed several enhancing signals
• Evaluated different filters and window size
• Regularization capabilities
• Integration in multiple applications
• Efficient FGPA implementation
▫ 1 datum per clock cycle
• Framework to integrate different signals:
▫ signal-to-symbol loop
61