2. Itseez
• Real time computer vision solutions on
embedded platforms:
– Mobile products: ItSeez3D, Facense
– Automotive: driver assistance systems
– Ecosystem: OpenCV, OpenVX
3. S C A N N E R
VICTOR ERUKHIMOV
victor.erukhimov@itseez3d.com
Capture the world in 3D!
7. Embedded
vision
challenges
•Intense and power hungry computations
•Need to run in real-time on
embedded/mobile/wearable devices
•Very few specialized hardware products
•Software ecosystem not ready for embedded real-
time scenarios
32. OpenVX 1.0 tiling and user kernels
• An implementation of OpenVX 1.0 can already do tiled processing with the standard
kernels
– The user/programmer just needs to be sure to declare intermediate images as “virtual”
– “Virtual” indicates the user will not try to access the intermediate results, so they to not need to be fully allocated/constructed
• User can already create their own kernels per the existing OpenVX 1.0 specification
– There is a User Kernel section in the OpenVX 1.0Advanced Framework API section
– But the image data for these user-defined kernels cannot be “tiled”
– Note: a “kernel” is analogous to a C++ “class” and a “node” is analogous to an “instance”
The use of kernels versus nodes enables object-oriented programming within the C programming language
• The new User Kernel Tiling Extension is only needed for tiled processing of user-
defined kernels
– The user/programmer needs to provide additional information about their kernel to enable the OpenVX implementation to properly
decompose the image into tiles and run the user node on these tiles
– The User Kernel Tiling Extension defines an API that can be used to provide this additional information
33. O
The User Kernel Tiling Extension
1.The user writes the kernel function to be executed on each tile
– The OpenVX runtime will call this function on a specific tile during vxProcessGraph()
– The extension defines macros this function can use to determine information about the given tile and its parent image
– E.g., the tile’s height and width, the tile’s (x, y) location in the parent image, and the parent image’s height and width
2.The user adds this new kernel to the OpenVX system via vxAddTilingKernel()
– vxAddTilingKernel() takes a name, a pointer to the user’s function, and the number of kernel parameters
3.The user describes each of the kernel’s parameters via vxAddParameterToKernel()
– This is the same function used to describe non-tiled user kernel parameters
4.The user tells OpenVX about its pixel-access behavior via vxSetKernelAttribute()
– Must set the output block size, input neighborhood size, and border mode
5.The user calls vxFinalizeKernel() to indicate that the kernel description is complete
f
34. Required user tiling kernel attributes
• VX_KERNEL_ATTRIBUTE_OUTPUT_TILE_BLOCK_SIZE
– The size of the region the user’s kernel prefers to write on each loop iteration
– The OpenVX implementation will ensure that the tile sizes are a multiple of this block size
– Except possibly at the edges of the image
• VX_KERNEL_ATTRIBUTE_INPUT_NEIGHBORHOOD
– The “extra” input pixels needed to compute an output block
– E.g., a pixelwise function has an input neighborhood of 0 on all sides
– A 3x3 filter has a neighborhood of 1, and a 5x5 filter has a neighborhood of 2 (on all sides)
• VX_KERNEL_ATTRIBUTE_BORDER
– Indicates whether the kernel function can correctly handle the odd-sized tiles near the edges of the image (VX_BORDER_MODE_SELF) or
not (VX_BORDER_MODE_UNDEFINED)
• Examples:
tileBlocksize
=
(1,
1)
Neighborhood
=
(0,
0,
0,
0)
e.g.,
pixelwise
add
tileBlocksize
=
(1,
1)
Neighborhood
=
(1,
1,
1,
1)
e.g.,
3x3
box
filter
tileBlocksize
=
(1,
1)
Neighborhood
=
(2,
2,
2,
2)
e.g.,
5x5
box
filter
tileBlocksize
=
(4,
4)
Neighborhood
=
(0,
0,
0,
0)
e.g.,
4x4
pixelate
tileBlocksize
=
(4,
1)
Neighborhood
=
(2,
2,
2,
2)
e.g.,
SIMD-‐optimized
5x5
box
that
writes
4
pixels/cycle
35. Additional optimization
• The user may provide two versions of the function for the user kernel
• The fast version and the flexible version
• The OpenVX implementation will only call the fast function when it’s “safe”
– The tile size is a whole-number multiple of the output tile block size
– The inputneighborhood doesn’textend beyond the boundariesof the input image
• The fast version of the function doesn’t have to check any edge conditions
– Computesefficientlywithout conditional checksand branches
• The flexible version needs to make the appropriate checks to handle the edge conditions
• There is a relationship between the fast function, flexible function, and border mode
– Read the spec
Fast
Flexible