Dr.Kawewong Ph.D Thesis

PIRF-Nav:
An Online Incremental Appearance-
based Localization and Mapping in
Dynamic Environments
Aram Kawewong

Hasegawa Laboratory
Department of Computational Intelligence and Systems Science
Interdisciplinary Graduate School of Science and Engineering
Tokyo Institute of Technology

1

Introduction to SLAM
 Simultaneous Localization and Mapping, or SLAM, is a
navigation system needed for every kind of mobile
robots
 In the unfamiliar environment, the robot must be able
to perform two important tasks simultaneously
 Mapping the new place if the place has never been visited
previously
 Localizing itself to some mapped place if the place has
been visited before

2

Appearance-based Localization and
Mapping (FAB-MAP)

3

Why Visual SLAM ? What are the
Challenging ?
 Why don’t we just use GPS ?
 GPS is not always reliable in the crowded city centre
 GPS can only locate the coordinate/position of the agent but
not the corresponded scene; how can the robot answer the
question “look at this picture and tell me where it is ?” or “have
you ever visited this place before ? Can you describe about the
nearby places ?”
 No false positive (can have false negative)
 If the robot is not confident then it should answer “this is the
new place”. If the robot is to answer “this place is the same
place as the place ….”, it must be 100% correct.
 100% precision (all answers must be correct)
4

Appearance-based Localization and
Mapping VS Place Recognition
Place Recognition Localization and Mapping (Robotics)
(Computer Vision)

Input Images All testing images are Every input image is a testing image; it
known to come from might come from somewhere in the map
somewhere in the map or it might be the previously unseen
place

Environment Closed Environment Opened Environment

Precision Precision-1 is not the Precision-1 is the first priority concern;
main concern if the recall one false positive may lead the serious
rate is reasonably high error in navigation

5

Appearance-based SLAM’s Common
Objectives
 100% Precision with very high Recall Rates
 Can run incrementally in an online manner
 Life-long
 Low computation time
 Consume less memory
 Suitable to navigate in large-scale environments
 Can solve 2 main problems:
 Dynamical Changes
 Perceptual Aliasing (Different Places but look similar)
 Note:
 Coordinate-based Localization is not required here

6

Visual SLAM’s Related Works

1. FAB-MAP (Cummins & Newman, IJRR’08)
 Considering the efficiency at 100% precision, the obtained
recall rate of FAB-MAP (a State-of-the-art method) is still not
so high.
 An offline generation process for dictionary generation is
necessary.
2. Fast Incremental Bag-of-words (Angeli, et al. T-RO’08)
 The system can run incrementally; offline dictionary
generation process is not needed.
 Accuracy is said to be less than or equal to that of FAB-MAP
 Consume much higher memory than FAB-MAP
7

What Do We Want ? :PIRF-Nav’s
Advantages
FAB-MAP Inc. BoW (T- PIRF-Nav
(IJRR’08) RO’08) (prop.)
Ability to incrementally run
without needs for offline No Yes Yes
dictionary generation process
Memory Consumption Low High Moderate
Ability to run in real-time Yes Yes Yes
Robustness against dynamical Moderate Low High
changes* (~40% on (~20% on City (~85% on
City Centre) Centre) City Centre)

* The recall rate is considered at 100% precision
8

Basic Idea & Concept of PIRF-Nav

 Making use of PIRF, we can detect the good landmarks of
each individual place
 The extracted PIRFs should be sufficiently informative to
represent the place so that the system does not need the
preliminary generated visual vocabulary
 The number of PIRFs is sufficiently small to be used in the
real-time application
 Because the PIRF is robust against dynamical changes of
scenes, the PIRF-based visual SLAM (called PIRF-Nav)
become an efficient online incremental visual SLAM
9

Basic Idea of PIRFs (proposed)

 Outdoor Scenes generally include distant objects
whose appearances are robust against the changes in
camera position
 Averaging the “slow-moving” local features which
capture such objects give us the less and more robust
features

10
10

PIRF Extraction Algorithm
Image Sequence

3 1 2 0 4 0
0 3 4 2 1 1
0 6 1 6 0 5
1 0 5 4 5 0
4 5 3 1 0 4
3 0 1 3 0 2

Sliding Window; w = 3

Sequence of Matching Index Vectors
11

Briefly on PIRF’s Performance
 Exp. 1 Scenes From Suzukakedai

Training (640x428) Testing (640x428)
580 489

 Exp. 1 Scenes From O-okayama

Training (640x428) Testing (640x428)
450 493 12

PIRF’s Performance
Recognition Rate of Suzukakedai and O-okayama

93.46%
77.48%
100.00%
90.00%
80.00%

45.75%
70.00%

36.71%

Suzukakedai
30.22%
31.08%

O-Okayama
27.59%

60.00%
24.54%

22.29%

18.23%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%

13

Even With These Strong Changes,
PIRF Still Works Well !!!

Highly Dynamic Changes in Scenes

14
Illumination Changes in Scenes

PIRF (City Centre Dataset)

Original Descriptors (SIFT)

15

Position-invariant Robust Feature (PIRF) (proposed)

PIRF-Nav Processing Diagram (prop.)

Overall Processing Diagram
 Step 1: Perform simple feature
matching. The score is
calculated based on the
popular term frequency-
inverted document frequency
weighting
 Step 2-3: Adapt the score by
considering the neighbors and
then perform normalization
 Step 4: Perform second
integration over the score’s
space for relocalization
16

Notation Definition

 At time t, a map of the environment is a collection of
nt discrete and disjoint location
�� = {��1 , … , �� }
 Each of these locations �� , which has been created
from past image �� , has an associated model ��
 The model �� is a set of PIRFs

17

STEP 1: Simple Feature Matching

 The current model �� is compared to each of all
mapped models �� = {��0 , … , �� } using standard
feature matching with distant threshold ��2
 Each matching outputs the similarity score s
 ��0 is model of the location ��0 which is a virtual
location for the event “no loop closure occurred at
time t”.
 Based on the obtained score s, the system proceed to
the next step if ��(��) ≠ 0

18

(Continued)

 The similarity score s is calculated by considering the term
frequency – inverted document frequency (tf-idf) weighting
(Sivic & Zisserman, ICCV’ 03) :-
��
tf − idf = log
��
 �� is the number of occurrences of visual word w in ��
 �� is the total number of visual words in ��
 �� is the number of models containing word w
 N is the total number of all existing models

19

(Continued)

 To be used with PIRF, the function is then converted to
mi ��
�� = log
k=1 ��
 �� is the number of models �� , 0 ≤ �� ≤ �� , �� ≠ �� ,
containing PIRFs which match the kth PIRF of the input
model ��
 �� is the number of all matched PIRFs between input and
the query model
 The system proceeds to STEP 2 if and only if the maximum
score does not belong to ��0 and is greater than ��1
20

STEP 2: Considering Neighbors

 Accepting of rejecting loop-closure detection based on the
score from only single image is sensitive to noise
 This can be handled by considering the similarity score of
neighboring image models as:-
��+��
�� = �� ∙ �� , ��
��=��−��
 The term �� , �� is the transition probability generated
from a Gaussian on the distance in time between i and k
 �� stands for the number of neighbors examined
21

STEP 3: Normalizing the Score
 Done by considering the
standard deviation and
mean value over all scores
 ln indicates the number of
neighbours taken into
consideration
 The beta-scores are
converted into normalized
score according to the
equation
�� − ��
, if �� ≥ ��
�� = ��
1, Otherwise

where 22

�� = �� + ��

STEP 4: Re-localization

 The obtained location �� would be accepted as loop
closure if �� − �� > ��2
 Ideally, the neighboring model scores of location Lj
should decrease symmetrically from a model score.
However, scenes in dynamic environments usually
contains moving objects that frequently cause the
occlusion. The score of some assigned location may
not be symmetrical.

23

Step 4: Relocalization (Sample
Problems)
Location assigned from
Step 3 does not have a
symmetrical score

Performing one more
summation can shift the
location to the right one

24

STEP 4: Re-Localization

 Therefore, we perform the second summation over
the neighbouring score model to achieve a more
accurate localization
��+��
��′ = �� ∙ �� , ��
��=��−��

 The obtained normalized score for all possible models
determines the most potential loop-closure location ��,
where �� = argmax ��′
��
25

Results & Experiments : DATASETS

 Three datasets have been used
 City Centre (2474 images with size 640 x 480)
The dataset was taken to address to problem of dynamical changes of
scenes in the city centre.
 New College (2146 images with size 640 x 480)
The dataset was taken to address the problem of perceptual aliasing. By
this dataset, a robot walked to the same place many times. Many different
places look very similar.
 Suzukakedai (1079 images with size 1920 x 1080)
The dataset was taken by video camera attached with the omnidirectional
lens. The dataset was taken to address the problem of highly dynamical
changes where the different event was organized (i.e. open-campus
event)

26

Results & Experiments: DATASETS

 City Centre

27


 New College

28


 Suzukakedai

29

Results & Experiments: BASELINE

 Among many visual SLAM methods, FAB-MAP (Cummins &
Newman. IJRR’08) and the fast incremental BoW method
of Angeli et al. (T-RO’ 08) are considered to be state-of-the-
art.
 Both of them are based on Bag-of-words scheme
 Each of them offer different advantages
 FAB-MAP  High accuracy with offline dictionary generation
 Angeli et al.  Lower than or equal accuracy to FAB-MAP but
with an online incremental dictionary generation
 PIRF-Nav must offer higher accuracy than FAB-MAP while
being an online incremental method like Angeli et al.

30

Evaluation on Appearance-based
Loop-closure Detection Problem
Correct Loop-closure
PrecisionA =
All Loop-closure
Input Image
Correct Loop-closure
RecallA =
All labeled loop-closure

Loop- Binary Classification: New
Closing ? place / Old Place

Image Retrieval Problem:
Add new place to Find the loop- Retrieve the most likely place for
the map closure place loop-closure

Correctly retrieved image
PrecisionB =
All retrieved images

Output the loop- Correctly retrieved image
closure location RecallB =
All labeled images
31


 Actually, performance should be evaluated by two graphs:
 Precision A – Recall A curve
 Precision B – Recall B curve
 However, for compact representation, most works in
visual SLAM use Precision B – Recall B curve to show the
performance because
 The binary classification is currently not so much problematic
 Important challenge is given to the performance of image
retrieval

32

(City Centre)
Precision A – Recall A:
Focusing on only the problem of
saying “YES/NO” loop-closure
detected is currently trivial

Precision B – Recall B:
Instead, given that the precision of
the “YES/NO” loop-closure
detected is 100%, it is much more
interesting to see how accurate the
system can correctly retrieve the
corresponding image
33

Result 1: City Centre
Vehicle Trajectory
Loop Closure Detection

PIRF-Nav (100% Precision) (proposed) FAB-MAP (100% Precision)
34

Result 1 : City Centre (Precision-Recall
Curve)

35

Result 1: City Centre
(Computation Time)

36
*It is noteworthy that all programs of PIRF-Nav were written in MATLAB
while FAB-MAP was written in C.

Result 2: New College
Vehicle Trajectory

PIRF-Nav (100% Precision) (proposed) FAB-MAP (100% Precision)
37

Result 2: New College (Precision-
Recall Curve)

38

Result 3: Suzukakedai

Vehicle Trajectory

39
PIRF-Nav (100% Precision)

Result 3: Suzukakedai (Precision-
Recall Curve)

40

Result 4: Combined Datasets
(Precision-Recall Curve)

41
Note: We did not test FAB-MAP on this experiment because FAB-MAP completely failed in Suzukakedai
Dataset. Also the results on City Centre and New College clearly imply that FAB-MAP will not gain
better accuracy in this experiment.

Sample Matched Images (Dynamical
Changes in Major Part of Scene)

42

Sample Matched Images (Different
View-Points)

43

Conclusions

 PIRF-Nav outperforms FAB-MAP in term of accuracy with
more than 80% recall rate at 100% precision on all datasets
provided by the authors
 PIRF-Nav offers an online and incremental ability to run in
very different environments
 Although the computation time of PIRF-Nav at the same
image scale is slower than FAB, PIRF-Nav compensates this
drawback by processing on smaller image scale since the
accuracy is still considerably much higher than FAB-MAP

44

Thank you for Your Kind
Attention
“DOUBT IS THE FATHER OF INVENTION”

QUOTED BY GALILEO

45

Publication

 Journal
1. A. Kawewong and O. Hasegawa, "Classifying 3D Real-World Texture Images by
Combining Maximum Response 8, 4th Order of Auto Correlation and Colortons, " Jour.
of Advanced Comp. Intelligence and Intelligent Informatics, vol. 11, no. 5, 2007.
2. A. Kawewong, Y. Honda, M. Tsuboyama, and O. Hasegawa, "Reasoning on the Self-
Organizing Incremental Associative Memory for Online Robot Path Planning," IEICE
Trans. Inf. & Sys., vol. E93-D, no. 3, 2009. (impact factor 0.369)
3. 本田雄太郎，Aram Kawewong, 坪山学，長谷川修："半教師ありニューラルネットワーク
による場所細胞の獲得とロボットの自律移動制御"，信学論D，2009，採録決定
4. A. Kawewong, N. Tongprasit, S. Tangruamsub and O. Hasegawa, “Online and
Incremental Appearance-based SLAM in Highly Dynamic Environments, " Int’l Jour.
Robotics Research (IJRR). (To Appear in 2010, impact factor 2.882, rank#1 in robotics)
5. A. Kawewong, S. Tangruamsub and O. Hasegawa, “Position-Invariant Robust Features
for Long-term Recognition of Dynamic Outdoor Scenes," IEICE Trans. Inf. & Sys.
(conditional accepted)

46

Publication

 Conferences
1. A. Kawewong and O. Hasegawa, "3D Texture Classification by Using Pre-testing
Stage and Reliability Table, " IEEE Proc. International Conference on Image
Processing (ICIP), (2005).
2. A. Kawewong and O. Hasegawa, "Combining Rotationally Variant and Invariant
Features Based on Between-Class Error for 3D Texture Classification, " IEEE Int’l
Conf. On Computer Vision (ICCV) Workshop, 2005.
3. A. Kawewong, Y. Honda, M. Tsuboyama, O. Hasegawa, "A Common-Neural-
Pattern Based Reasoning for Mobile Robot Cognitive Mapping, " In Proc. Int’l
Conf. Neural Information Processing (ICONIP), 2008.
4. A. Kawewong, Y. Honda, M. Tsuboyama, O. Hasegawa, "Common-Patterns Based
Mapping for Robot Navigation, " in Proc. IEEE Int’l Conf. Robotics and Biomimetics
(ROBIO), 2008.
5. S. Tangruamsub, M. Tsuboyama, A. Kawewong and O. Hasegawa, "Mobile Robot
Vision-Based Navigation Using Self-Organizing and Incremental Neural
Networks," in Proc. Int’l Joint Conf. Neural Networks (IJCNN), 2009.

47

Publication

 Conferences
6. A. Kawewong, S. Tangruamsub, and O. Hasegawa, "Wide-baseline Visible
Features for Highly Dynamic Scene Recognition," in Proc. Int'l Conf.
Computer Analysis of Images and Patterns (CAIP), 2009.
7. N. Tongprasit, A. Kawewong and O. Hasegawa, "Data Partitioning
Technique for Online and Incremental Visual SLAM," in Proc. Int’l Conf. on
Neural Information Processing (ICONIP), 2009. (oral & student travel award)

48

Dr.Kawewong Ph.D Thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (9)

Similar to Dr.Kawewong Ph.D Thesis

Similar to Dr.Kawewong Ph.D Thesis (20)

More from SOINN Inc.

More from SOINN Inc. (6)

Recently uploaded

Recently uploaded (20)

Dr.Kawewong Ph.D Thesis