The document describes a method for creating panoramic images from video frames. Key steps include camera calibration to determine intrinsic parameters, feature detection and matching between frames using SIFT or Shi-Tomasi features, selecting key frames when sufficient camera movement is detected, and stitching the key frames onto a cylindrical projection to create the panorama. Experimental results show Shi-Tomasi with optical flow is faster than SIFT with FLANN for feature matching.
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 4)
Poster_Final
1. 4
Method
Camera
calibra1on
Each
camera
has
its
own
unique
intrinsic
parameters,
which
is
expressed
by
a
3x3
camera
calibra9on
matrix,
K.
We
obtain
this
matrix
by
using
a
video
calibra9on
technique
implemented
with
OpenCV
using
mul9ple
checkerboard
images
as
shown
below.
(3
of
25
images
are
displayed
here)
Raw
image
Image
with
lens
distor9on
removed
Mo1va1on
• Explora9on
using
robots
can
be
useful
for
examining
interior
structures
• Panoramas
create
a
wider
field
of
view
to
increase
the
teleoperator’s
3D
awareness
• To
render
a
real-‐9me
visualiza9on
to
the
teleoperator,
the
implemented
algorithm
should
s9tch
video
frames
together
as
efficiently
as
possible
1
Introduc1on
Assump1ons
• Lens
distor9on
is
removed
from
video
• Intrinsic
camera
parameters
are
known
Problem
statement
From
a
video,
detect
features
in
frame
In
and
match
these
features
in
consecu9ve
frames
in
order
to
detect
the
key
frame
In+1
from
the
calculated
homography
Hn,n+1
Use
the
saved
key
frames
and
homographies
to
create
a
s9tched
panorama
of
the
environment.
2
Problem
Statement
5
Experimental
Results
6
References
[1]
Brown,
MaYhew,
and
David
G.
Lowe.
"Automa9c
Panoramic
Image
S9tching
Using
Invariant
Features."
Int
J
Comput
Vision
Interna1onal
Journal
of
Computer
Vision
74.1
(2006):
59-‐73.
[2]
Shi,
Jianbo,
and
Tomasi.
"Good
Features
to
Track."
Proceedings
of
IEEE
Conference
on
Computer
Vision
and
Pa9ern
Recogni1on
CVPR-‐94
(1994).
7
Acknowledgments
This
research
was
supported
by
the
Na9onal
Science
Founda9on
Research
Experiences
for
Undergraduates
(REU)
program
on
Interdisciplinary
Research
in
Mechatronics,
Robo9cs,
and
Automated
System
Design
(NSF
grant
no.
1263293).
Any
opinions,
findings,
and
conclusions
or
recommenda9ons
expressed
in
this
material
are
those
of
the
author(s)
and
do
not
necessarily
reflect
the
views
of
the
Na9onal
Science
Founda9on.
3
System
Overview
25 50 75 100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Number of Features
Time(s)
Time Comparison Between
Shi−Tomasi/Optical Flow and SIFT/FLANN
Shi−Tomasi Corner/Optical Flow
SIFT/FLANN
Sample
images
for
calibra9on
Feature
Detec1on
and
Matching
There
are
many
different
methods
used
to
detect
features.
Scale
Invariant
Feature
Detec9on,
SIFT,
is
a
popular
method
that
is
known
to
be
robust
as
it
is
invariant
to
changes
in
image
scale
[1].
Shi-‐Tomasi’s
method
is
a
technique
which
only
tacks
corner
features
in
an
image
[2],
which
is
ohen
faster.
SIFT
features
are
matched
between
frames
using
the
Fast
Library
for
Approximate
Nearest
Neighbors,
FLANN,
method.
The
Shi-‐Tomasi
technique
tracks
the
corners
with
op9cal
flow
which
uses
the
gradient
in
the
neighborhood
of
a
feature
to
track
its
mo9on
among
frames.
Feature
detec9on
using
SIFT
and
matching
using
FLANN
Feature
detec9on
using
Shi-‐Tomasi
Corner
and
matching
using
Op9cal
Flow
Key
Frame
Selec1on
In
order
to
get
a
well
s9tched
panoramic
image,
it
is
important
to
choose
a
frame
I’
that
will
overlap
well
with
its
previous
frame
I.
A
homography
H
is
a
projec9ve
transforma9on
that
relates
image
points
by
x’
=
Hx,
where
x
and
x’
are
image
points
in
I
and
I’,
respec9vely.
Image
s9tching
is
accomplished
by
calcula9ng
H
between
the
two
images.
Op9cal
flow
tracks
features
among
frames.
When
tracking
n
features,
once
the
successfully
tracked
features
n’
in
the
current
frame
is
less
than
n/2,
the
camera
has
moved
a
sufficient
amount
to
provide
a
good
homography.
Image
S1tching
The
key
frames
are
s9tched
to
a
panoramic
texture
that
is
mapped
onto
a
cylinder.
From
H
and
K,
the
camera
rota9on
R
is
es9mated.
Based
on
this
pose,
an
image
point
x
in
the
world
space
of
the
cylinder
is
determined
by
X
=
K-‐1R-‐1x
where
X
=
(x,
y,
z)
.
These
world
coordinates
are
then
mapped
to
UV
texture
coordinates
where
u
=
arctan(x/z)
and
v
=
y/√(x2
+
z2).
The
final
step
is
to
texture
map
the
cylinder
by
x
=
r*cos(u),
y
=
v,
z
=
r*sin(u)
where
r
is
the
radius
of
the
cylinder.
Shi-‐Tomasi
and
Op9cal
Flow
SIFT
and
FLANN
Shi-‐Tomasi
and
Op9cal
Flow
SIFT
and
FLANN
S9tched
images
with
cylindrical
projec9on
Input
Images