Unistroke and multistroke gesture recognizers have always striven to reach some robustness with respect to
all variations encountered when people issue gestures by hand
on touch surfaces or with sensing devices. For this purpose,
successful stroke recognizers rely on a gesture recognition
algorithm that satisfies a series of invariance properties such
as: stroke-order invariance, stroke-number invariance, stroke direction invariance, position, scale, and rotation invariance.
Before initiating any recognition activity, these algorithms
ensure these properties by performing several pre-processing
operations. These operations induce an additional computational
cost to the recognition process, as well as a potential error
bias. To cope with this problem, we introduce an algorithm that
ensures all these properties analytically instead of statistically
based on a vector algebra. Instead of points, the recognition
algorithm works on vectors between vectors. We demonstrate
that this approach not eliminates the need for these preprocessing
operations but also satisfies an entire structure preserving
transformation.
Paper available at https://dial.uclouvain.be/pr/boreal/en/object/boreal%3A217006
2. Vector-based, Structure Preserving Stroke
Gesture Recognition
DMSVIVA’2019 (Lisbon, Portugal, July 8th-9th, 2019)
Nathan Magrofuoco1, Paolo Roselli2, Jorge Luis Perez-Medina1,3,
Jean Vanderdonckt1, Santiago Villarreal1
1LouRIM, Université catholique de Louvain, Belgium
2Università degli Studi di Roma “Tor Vergata”, Roma, Italy
3Universidad de Las Américas, Quito, Ecuador
3. 3
Background on Stroke Gesture Recognition
• Two families of approaches
• Specific: tied to a particular gesture set
• Machine Learning, SVM, Hidden Markov Models, Neural
networks,…
• Two main limitations:
• Need to re-train, re-model if gesture set is modified
• Overfitting problem
• Generic: independent of any gesture set
• Nearest-Neighbor-Classification (NNC)
• Pattern matching
• Two advantages
• No need to re-train, re-model is gesture set is modified
• No overfitting
4. • Invariance properties
• Variation of strokes => stroke clustering
Stroke invariance = independence of any combination of
strokes
• Variation of directions => direction interpretation
Direction invariance = independence of any direction in
gesture recognition
5. • Invariance properties
• Variation of sampling => re-sampling needed
Sampling invariance = independence of any sampling
Source: J.O. Wobbrock, A.D. Wilson, Y. Li, Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes, Proc. of UIST’2007
6. • Invariance properties
• Variation of scale => different sizes depending on the
surface of the platform: rescaling is perhaps needed
Scale invariance = independence of any size/scale
7. • Invariance properties
• Variation of location => different locations depending
on the surface of the platform: translation is perhaps
needed
Translation invariance = independence of any location
8. • Invariance properties
• Variation of angle => different orientations depending
on position
Rotation invariance = independence of any rotation
9. 9
Nearest-Neighbor-Classification (NNC)
• Pre-processing steps to ensure invariance
• Re-sampling
• Points with same space between: isometricity
• Points with same timestamp between: isochronicity
• Same amount of points: isoparameterization
• Re-Scaling
• Normalisation of the bounding box into [0..1]x[0..1] square
• Rotation to reference angle
• Rotate to 0°
• Re-rotating and distance computation
• Distance computed between candidate gesture and
reference gestures (1-NN)
10. 10
Nearest-Neighbor-Classification (NNC)
• Two families of approaches
• “Between points” distance
• $-Family recognizers: $1, $3, $N, $P, $P+,
$V, $Q,…
• Variants and optimizations: ProTractor,
Protactor3D,…
• “Vector between points” distance
• PennyPincher, JackKnife,…
• A third new family of approaches
• “Vector between vectors” distance:
this paper!
11. 11
• Definition of a basic gesture as a vector
• From p1 to p2, create vector 𝑢
• From p2 to p3, create vector 𝑣
• By derivation, create vector 𝑏 = 𝑢 + 𝑣
• Note that −(𝑢 + 𝑣) is the inverse vector
𝑣
𝑢
p1
p2
p3
𝑢
𝑣
b = 𝑢 + 𝑣 𝑏 = −(𝑢 + 𝑣)
𝑣
𝑢
12. 12
• Local Shape Distance between 2 triangles based on
similarity
This is the
only simple
formula to
compute
𝑎
𝑏
𝑢
𝑣𝑎 + 𝑏
𝑢 + 𝑣
13. 13
• Step 1. Vectorization for each pair of vectors between
three consecutive points
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4q5
q6
Training
gesture
Candidate
gesture
14. 14
• Step 1. Vectorization for each pair of vectors between
three consecutive points
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4q5
q6
p1
p2
p3
p4
p5
p6
q4q5
q6
Training
gesture
Candidate
gesture
q1
q2
q3
18. 18
• Step 4. Summing all individual figures into final one
• Step 5. Iterate for every training gesture
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4q5
q6
p1
p2
p3
p4
p5
p6
q1
q2
q3
q4q5
q6
p1
p2
p3
p3
p4
p5
p2
p3
p4
p4
p5
p6
Training
gesture
Candidate
gesture
p1p2p3,
q1q2q3
(N)LSD (
)
p2p3p4,
q2q3q4 )
(
)
p4p5p6,
q3q4q5
((p3p4p5,
q3q4q5 )
=0.02 =0.04 =0.0001 =0.03
=0.02+0.04+0.0001+0.03=0.0901
(indicative figures)
19. 19
• Normalized Local Shape Distance between 2 triangles
based on similarity
• LSD is not symmetric => anisotropic distance
• NLSD is symmetric => isotropic distance
𝑁𝐿𝑆𝐷 𝑎, 𝑏 , 𝑐, 𝑑 = 𝐿𝑆𝐷
𝑎
𝑎
,
𝑏
𝑏
,
𝑐
𝑐
,
𝑑
𝑑
20. 20
Discussion
• Advantages
• !FTL is aligned with $P, the state-of-the-art recognizer in
terms of recognition rate on this gesture set
• !FTL(LSD), resp. (NLSD), is 4, resp. 3 times faster than $P
• Position, Scale, Rotation invariances
• Are algebraically guaranteed due to vector-based approach
• Can be controlled on-demand
• No need to perform pre-processing such as
• Normalization
• Re-scaling
• Re-rotation