Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation)

Efficient architecture to
condensate visual information
driven by attention processes
Mª Sara Granados Cabeza Supervisors:
Javier Díaz Alonso
Sonia Mota Fernández
Alberto Prieto Espinosa

Summary
• Introduction and Motivation
• Semidense Representation Map for Visual
Features
• Method Validation: Experimental Results and
Applications
• Implementation on Reconfigurable Hardware
• Conclusions and Future Work
1

Summary
Features
Applications
2

Human Visual System
• Eyes do more than
capturing images:
▫ Extract spatio-temporal
transition
▫ Efficient communication
using an event driven
schema
• More than just the eyes
▫ Visual cortex also
important
3

Human Visual System
• Main characteristics we try to emulate in
artificial systems:
▫ Retina pre-processing capabilities
▫ Adaptation to changing environments
▫ Limited resources
▫ Active vision:
 Gazing control
 Attention
 Bottom-up (saliency)
 Top-down (target driven)
4

Artificial Visual Systems
• Robotics
5
• Vehicular
applications
• Low Vision
Aids

DRIVSCO Bandwidth Constraints
8
PCI Express
PCI

DRIVSCO Memory Constraints
9
XIRCA

Possible Solutions
• Retinomorphic grabbing systems
• Other hardware devices: Optimized PC
implementations (SSE, MMX, IPP), DSP, GPU,
latest FPGA, ASIC
▫ Brute-force solution
• Compression
▫ Only group and/or reduce color components
▫ No feedback integration
▫ Transfer the problem to higher-level stages
10

Possible Solutions
• Multi-modal descriptors
▫ Pugeault et al. (JVCI , 2011)
11

Our solution
• Novel representation map
▫ Avoid sending unnecessary information
 “Smart” compression = Condensation
 Without losing uniform-region data
▫ Versatile
 Generic algorithm for several visual features
 Suitable for commodity and embedded platforms
▫ Create a feedback channel
 Bottom-up saliency
 Target-driven selection (top-down attention)
▫ Easy integration with higher-level algorithms 12

Tools and Methods
13
Model
1
High performance
implementation.
PC based (basic
approaches in java, c,
c++, etc..)
with accelerator (GPU
and FPGA) or optimized
code (c/c++)
3
Low
performance
implementation
PC based (Matlab)
2
Stand alone
platform.
DSPs or FPGA-
based,
prototyping
board
4
Specific
purpose
system.
FPGA-ASIC based,
specific purpose
board with
certification
capabilities
5
Complexity
Time

Summary
Features
Applications
14

Semidense representation map
• What?
▫ Condenses dense visual features
▫ Highlights relevant information
▫ Keeps uniform-region information
• How?
▫ Using sparse visual features as relevance enhancer
▫ Applying a regular grid in the uniform regions
16

Relevant Point Extractors
• Saliency maps:
▫ Itti and Koch (Nature Reviews Neuroscience, 2010)
18

• Descriptors
▫ SIFT (IJCV, 2005), SURF (CVIU,2008), etc.
19
SIFT SURF

• Structure-based edge and corner detectors
▫ Canny (PAMI,1986), Sobel (Journal of Microscopy, 1988),
Intrinsic Dimension (BMVC,2003)
20

Relevant Points: Selection mask
21
Canny LIP-Sobel
Intrinsic Dimension 1 (id1) Intrinsic Dimension 2 (id2)

Disparity Benchmark Dataset
• Middlebury
• % Bad error (ΔDisparity >1)
22

Uniform Regions
• Neighborhood:
▫ Size and shape
• Representative point:
▫ Subsampling? Filtering?
24

Uniform Regions: Window Size
• Depends on
▫ The condensation ratio:
 5x5 window  4% of the dense points
 7x7 window  2%
 9x9 window  1%
 Less points imply less subsampling operations
▫ the image resolution
• We need to keep enough information
▫ Dense features not so dense (NaN problem)
25

Uniform Regions: Filter
• Instead of subsampling, filtering:
▫ Error smoothing
▫ Information spreading
• Filters assessed:
▫ Median
▫ Bilateral
▫ Anisotropic
• Benchmark:
▫ Middlebury
26

Decondensation
29
Semidense feature
Original input
Extracted Disparity Decondensed disparity

Real-time approaches
• Low-cost hardware is
noisy
30
Ground Truth
Noisy Hw

33
Noisy Hardware
Affine-based regularization
Semidense representation
Decondensed map

Semidense representation map
• Trade-off configuration:
▫ Canny-based relevant point extractor
▫ 5x5 grid based on bilateral filter
• Results
▫ Reduces memory and bandwidth requirements
▫ Extracts relevant information
▫ Incorporates uniform-region information
▫ Inherently regularizes
▫ Easily integrates in higher-level algorithms
34

Summary
Features
Applications
35

Applications
1. Attention integration:
▫ Bottom-up: Saliency maps as relevant-point
extractor
▫ Top-down: Independently Moving Objects (IMOs)
2. High-level algorithm application:
▫ Using only semidense visual features
▫ First, we extract the ground-plane
▫ Then, we detect obstacles
36

Attention Processes: Bottom-Up
37
Original
Zero-Threshold RP
Saliency maps
RP mask

Attention-Driven Condensation
38

Attention Processes: Top-Down
• IMOs extraction:
▫ Using Pauwels et al. (Journal of Vision, 2010)
• Integration with Semidense Maps
▫ IMOs from frame N is integrated in the semidense
representation map of frame N+1
• Integration with other processes:
▫ such as Time-To-Contact (TTC)
39

Obstacle Detection Algorithm
 Chumerin (PhD Dissertation, 2011)
41

Ground-Plane Detection
42
• Compare Original vs Semidense:
▫ Similar response
▫ Grid contains needed information
▫ Structure in the RP introduces noise
• Same accuracy with 8% of input resolution

• Elevation map
43
Original (dense) Semidense

• Obstacle map
44

• Final output
45
• Equivalent output with 8% of input resolution

• Integration in several stages of a high-level
algorithm
• Similar response
▫ Uniform region relevance
• Workload reduction:
4611x 1.5x

Summary
Features
Applications
47

FPGAs
• High performance
▫ Parallel processing
• Low power consumption
• Reduced size
• Reconfigurable
▫ Application adaptation
▫ Real-time reconfiguration
• Industrial applications (robots, vehicles,
inspection, surveillance, …)
• Certification capabilities (Safety standards) 48

External interface
Warping
OF Core
Stereo Core
Multi-scale
Extension
Multi-scale
Extension
Rectification
Embedded
Processors
Memory
Controller Unit
LF Core
Interface controller
49
Condensation
Module

Condensation Architecture
• Hardware trade-off
configuration:
▫ Canny
▫ Median filter
• Fine-grain pipeline
▫ Submodules
 Logical functionality
▫ Stages
 Number of simple operations
▫ Scalar units
 Number of features to
process
• One processed datum per clock
cycle
[#Operations, #Scalar]

OF E & O D
Low-Level Vision System
Hysteresis +
nonmax suppr.
RP + Grid RP + Grid
Condensation Condensation Condensation
Pyramid
CO - PROCESSOR
MCU
Vy RP
Vy Grid
Vx RP
Vx. Grid
Disp RP
Disp. Grid
Grid mask
Memory
Grid extractor
size
FIFO
Grid
RP
Gridfeedback
RPfeedback
Vx VyRPGrid RPGrid
D
Storage Storage Storage
Vx
RP
Vy
RP
Vx’ Grid Vy’ Grid D
RP
D’ Grid

Efficient Communication Protocol
• Grid is regular and known
beforehand
▫ Store only the values
▫ Grid binary mask :
 computed
 sent (more versatile)
• RP are non regular:
▫ Under 4% of data:
 Address Event Representation
(AER)
▫ Boahen et al. (IEEE, 2004)
Vx, Vy and D codified with 12 bits

Hardware utilization (XCV4FX100)
• Integration with existing system:
▫ Tomasi et al. (IEEE,2011)  ~90% of slices used
▫ Barranco et al. (DSP,2012)  ~75% of slices used
• JPEG Compression Core:
▫ 17% per visual feature  51% (D, Vx, Vy) 53

DRIVSCO Bandwidth Constraints
54
PCI Express
PCI

Semidense Maps Bandwidth
55
> 20 x

Semidense Maps Memory
56
• Memory needs:
▫ 90 MB (highest resolution)
▫ 20 MB (lowest resolution)
• Semidense memory use
▫ Lowest:
 1 feature: < 87 KB
 Whole system: < 2MB
▫ Highest:
 1 feature: < 350 KB
 Whole system: < 6 MB
> 15 x

Versatile Architecture
• Feedback integration in real-time
▫ Reconfigurable RP and Grid masks
 Programmable or computed on real-time
▫ RP Feedback (Objects, TTC, IMOs)
▫ Grid Feedback (Ground-plane, Adaptive grid)
• Task-driven configuration
57

Summary
Features
Applications
58

Conclusions
• Novel semidense representation map:
▫ Relevant data treated with higher priority
▫ Keeping uniform-region information
• Versatile
• Create a feedback channel
▫ Bottom-up saliency
▫ Target-driven transfer (top-down attention)
• Easy integration with higher-level algorithms
• Efficiently implemented in hardware
59

Future Work
• Integrate other enhancing signals
▫ Descriptors (SIFT, SURF, GLOH,…)
• SoC integration using latest platforms
• Incorporate different feedback signals:
▫ TTC estimations
▫ Adapt grid dynamically
• Explore new applications:
▫ Tracking
▫ Video surveillance
▫ Multi-camera systems
60

Main Contributions
• Semidense representation map
• Assessed several enhancing signals
• Evaluated different filters and window size
• Regularization capabilities
• Integration in multiple applications
• Efficient FGPA implementation
▫ 1 datum per clock cycle
• Framework to integrate different signals:
▫ signal-to-symbol loop
61

Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation)

Ähnlich wie Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Efficient architecture to condensate visual information driven by attention processes (PhD thesis presentation)