Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Psr2010
1. A Stable Hand Tracking Method by Skin Color Blob Matching
Jung-Ho Ahn*, Jong-Hyoun Kim**
Abstract: Hand detection and tracking is one of the main research areas in computer vision for human
computer interaction. But many research results are not wholly satisfactory for the practical purpose. In this
paper we propose a fast and stable hand detection and tracking method with human body model. We detected
hand area by combining the information of difference image and skin color area and reconstructed accurate
hand shape. For hand tracking we suggest a skin color blob matching method with some tracking rules. The
experimental results show that the proposed algorithm performs very well in real time.
Keywords: Hand Tracking, Skin Color Blob Matching, Skin Color Model, Difference Image
INTRODUCTION and search the area of the next frame whose color
distribution is similar to that of the target
Recently, vision-based Human Computer object(Comaniciu et. al. 2000, Yang et. al. 2005, Shan et.
Interaction(HCI) systems have been widely studied. al. 2007).
Especially, hand detection and tracking is a key Hand tracking systems, in general, have some
interaction technology for Human Robot Interaction constraints that depend on their application domains. Our
(HRI) systems(Brethes et. al. 2004) and augmented gesture interaction system including hand tracking will
reality(AR) systems(Billinghurst et. al. 2008; Kim et. al. work in some set of a laboratory as g-speak system
2005, Yin and Davis, 2010). Main application area of our developed by Oblong Industries. Therefore our hand
research is also tangible augmented reality with human tracking system assumes that one main person will show
gesture interaction that will give a spatial operating up and make predefined command gestures such as
environment. zoom-in, zoom-out, pointing, selecting and dragging, etc.
Approaches to hand tracking and detection have been For now we also assume that the background is not so
based on either some hand models or skin color based much clutter. Also, many graphic and network computing
detection. Hand models was constructed by 2D or 3D modules will be working with our system simultaneously.
statistical pattern recognition using some classifiers The efficiency requirement of live processing has
trained by collected gray scaled hand images(Black and restricted us to the algorithms that are capable of near
Jepson, 1998; Kolsch and Turk, 2004). Skin color model frame-rate operation. Under these circumstances we need
was studied in some color spaces that give good to avoid high cost methods. Experimental results will
representation of skin color area such as RGB, YCbCr, show that the proposed hand tracking method is very fast
HIV, I1I2I3 etc with collected skin images. There has been as well as stable.
many methods and discussions to model the skin “Hand Detection” section describes hand detection
area(Caetanoa et. al. 2003). Vezhnevets et. al(2003) and methods by using skin color and moving area detection.
and Kakumanu et. al.(2007) have given excellent “Hand Tracking” section explains the proposed hand
summaries of the state-of-the-art skin color detection tracking method together with the face detection and
techniques. With a skin color model the detected skin hand gesture area definition. The experimental results are
color blobs were classified as the hands by predefined given in the next section and then conclusions and
human body model obtained by statistical analysis. One discussions are presented.
of main research areas using skin color model is the face
detection(Hsu and Abdel-Mottaleb, 2002; Singh et. al., System Overview
2003). These techniques can be applied to hand detection The main contribution of the proposed method is the
and tracking tasks. General object tracking methods design of an efficient integrated vision system for human
usually exploits the color distribution of a target object gesture interaction. Under our circumstances we can
detect and track both hands as well as the face. Fig. 1
* Professor
Division of Computer Media Information Engineering, shows a flow chart and some features of the proposed
Kangnam University, Korea hand tracking system. Based on detected skin color and
E-mail : jungho@kangnam.ac.kr moving area, we made an efficient and practical method
** Professor
that can detect both face and hand in every frame. The
Department of Gameware, Kaywon School of Art & Design
E-mail : hyoun@kaywon.ac.kr detecting process can induce good tracking performance.
181
2. Fig. 1 Overview of the proposed hand tracking method
HAND DETECTION
Gestures are, in nature, a communication tool by
moving some parts of the human body. For gesture
recognition purpose we detect the hands in motion. Our
hand detection method is based on skin color detection
and image difference between the consecutive two
frames.
Skin Color Modeling
Basically skin color varies according to the
illumination condition. With bright lighting condition the
skin color is close to white, whereas with dark lighting Fig. 3 RGB skin color distribution
condition it turns out to be black.
2 2 2
To detect the skin areas we collected skin colors from R − m1 G − m1 B − m1
the images under various lighting conditions and R + G + B < T1C , or
s1 s1 s1
performed statistical analysis on the distribution. Some R G B
examples of skin images(patches) are shown in Fig. 2, R − mR
2
2
G − mG
2
2
R − mB
2
2
and Fig. 3 shows their scatter plot in RGB color space. + + < T2C ,
s2 s2 s2
R G B
where R, G, B are RGB color components for a pixel,
mij and sij are mean and standard deviation of i
Fig. 2 Example of skin patches component of j-th Gaussian Model, i = R, G, B, j = 1, 2.
T jC ’s are the thresholds that set skin color boundaries.
As shown in Fig. 3, the skin color distribution can be
modeled or gathered in two groups. Therefore we The experiments show that the thresholds are not much
modeled the skin color with two Gaussian Models in sensitive. As postprocessing we performed the
RGB color space(Caetanoa et. Al. 2003). The two morphological operations on the detected skin pixels as
Gaussian distributions cover the distributions of bright follows:
and dark skin colors respectively. For efficient 1. Dilation of size 3
computation we used the Gaussian models with spherical 2. Erosion of size 7
covariance instead of full covariance. 3. Dilation of size 5
First dilation is performed to connect skin components or
Skin Color Detection fill the holes, second erosion is to remove salt-and-
With two spherical Gaussian models a pixel is pepper noises, third dilation is to recover the original size
determined by having a skin color if of skin areas. The binary image where pixel values are
assigned 255 for skin color and 0 for non-skin color is
called a skin color map.
182
3. (a) Original image (b) Skin color map (c) Difference image (d) Hand reconstruction
Fig. 4 Hand Detection Results
Moving Hand Detection hands lie below the waist to avoid tracking errors. The
To detect moving hands we identified moving area by resetting rule is simple. When both hands are below waist
differencing two consecutive gray-scaled source images. the right skin blob is set the left hand and the left skin
That is, blob is set the right hand.
I t ( p) − I t −1 ( p) > TD , After determining the face, we set the line below two
times height of the face bounding box from face box. We
where I t ( p) is a gray-scaled value of t-th frame at a
set the upper part from the line in the image as the region
pixel p and TD is a threshold value. In the experiments of interest(ROI) for hand tracking, i.e. hand tracking area.
TD is set to 30. In Fig. 5, the middle white box shows the face detection
Basically we detect hands in motion since the gesture result and white dashed line shows the boundary of hand
is to send user’s intent by hand motion. The skin tracking area.
detection will overestimate skin areas but we only
consider the skin area that lies in moving area. This idea
removes the areas having skin color in the background.
Therefore, we identified the spots that happened both
pixel difference and skin color. These spots are usually
small parts of the hands. To accurately detect hand
position we recovered the hands by FloodFill algorithm.
Taking the moving skin spots as seeds, we find all skin
pixels that are connected to them. This recovery process
gives us accurate hand shapes. Fig. 4 (c) and (d) shows
difference image and recovered hand shapes. Fig. 5 Face and hand gesture area
Hand Tracking Method
HAND TRACKING The proposed hand tracking method is based on hand
detection described in the previous section. In general,
After detecting hands in the input images we perform object(e.g. hand) detection is performed once, then object
the hand tracking that identifies the left and right hands. tracking process follows the detection since detection
For stable hand tracking we restrict the hand area to be costs more than tracking. However, since our detection
tracked, by which we can reduce the tracking errors. process is very fast and stable, we perform hand
detection in every frame. By tracking we mean to assign
Hand Tracking Area the detected results(hands) to the left or right hands.
Under the assumption that only one person is shown Basically proposed hand detection can be said to be
in the image, we can detect the face area by using simple moving skin blob detection. Under the assumption that
rule. By connected component analysis we can have one person shows up in the input image, the detected tow
some skin blobs. The face is identified as the middle skin blobs should be both hands. Hence the left and right
biggest skin blob. hands decision(tracking) rule is as follows.
It is understood that the command hand gestures are
performed when the hands lies upper than the waist. Ct = arg min || Ct −1 − Ci || ,
People usually move their hands freely without any i
meaning when the hands lie below the waist, but it where Ci is the center of the i-th detected skin blob,
makes many tracking errors. This observation motivates and Ct-1 and Ct are the centers of the left or right hands at
the hand tracking area. Hence we do not take such (t-1)-th and t-th frames, respectively.
motion seriously but reset the hand tracking when the There are two constraints for the left and right hands
183
4. decision. Sometimes the small skin color parts of clothes because the detected moving points are given by the
are detected. So when more than two skin blobs are seeds of the FloodFill algorithm. We will solve this
detected we discard the skin blobs whose size is too problem by temporal background subtraction method that
different from previous hand’s size. Another constraint is set temporal background image as the face and subtract
distance. We search the skin blobs that are within the the front hand from it. Second, when the hands cross, the
hand tracking distance from previous hands’ center proposed algorithm falsely identifies the left and right
position. When there are no proper skin blobs that satisfy hands since we find the nearest skin blobs from the
the constraints, we conclude that the corresponding hand previous hands’ positions. This problem is serious in the
does not move and assign its current position as its tracking point of view but, it is not serious in the gesture
previous position. Then we reconstruct hands by using recognition point of view because most command hand
the FloodFill algorithm with a seed of the previous gestures do not have this pose.
center pixels.
CONCLUSIONS AND DISCUSSIONS
EXPERIMENTAL RESULTS
This study explored the hand tracking problem for the
Experimental Environments HCI system with gesture interaction. To interlock with
The proposed hand tracking method assumes the some other computing modules such as graphic and
following: networking, the real-time issue is very crucial. Therefore
- One main person shows up, we designed very effective algorithms in the computation
- The majority of the clothe color is not similar to and memory consumption. The proposed algorithm
skin color, shows very good performance under some constraints.
- The background is not so clutter. The main idea follows the observation that command
We used the computer of Intel Core™2 Duo CPU E7500 gestures send user’s message during hand movement. So
@ 2.93GHz, 2.93GHz, and a webcam of Logitech we detected the hands in motion via skin color and
quickcam ultra vison with the image resolution of the moving area detection. Object detection costs much more
input video stream is 640×480. than object tracking. This is the reason that detection is,
in general, performed once, then the detected objects are
Result Analysis tracked in the following frames without detection.
Experiments have been performed in many live However, since the proposed detection algorithm is very
demonstrations and shown very good tracking effective, we performed the simple detection in every
performance with near frame rate speed. Fig. 6 shows frame and tracked the hands by a matching rule for the
some tracking results with hand segmentation. detected skin blobs.
One of main feature of the proposed algorithm is Future study will focus on improving skin color and
robustness to the fast and large movement. Fig. 6(d) face detection. For now the skin color detection is very
shows the successful tracking for fast movement that successful in the normal office lighting condition where
causes motion blur. Compared to the well-known mean the user is below the fluorescent light only. It was very
shift tracking(Comaniciu et. al. 2000, Yang et. al. 2005) hard to build skin color model for every lighting
the proposed algorithm is very robust to the case of large condition but we will make some rule to check given
movement. The mean shift based tracking modeled the lighting condition and adapt the skin color model to
target object with its color histogram and find the most it(Hsu and Abdel-Mottaleb, 2002, Brethes et. al. 2004).
similar area within the predefined tracking boundary in The proposed face detection rule is so simple that it is the
the next frame. In our experiments this approach failed main reason that the proposed algorithm is not robust to
very often when user moved the hands very fast so that the clutter background. We will add more sophisticated
they were in the out of the tracking boundary. Therefore, face detection rule such as the eye and mouth detection
tracking boundary was very hard to be set properly. to determine a skin color blob as the face.
The proposed hand tracking method has almost no The goal of our research is recognize some command
errors except two cases. First, when hands occlude the gestures for HCI in AR. Some examples of defined
face, hand tracking is successful but its hand gestures are shown in Fig. 6. We will endeavor to
segmentation includes both the hand and the face recognize such gestures in the near future.
184
5. (a) Pointing Gesture (b) Push Gesture
(c) Pull Gesture (d) Pass Gesture
Fig. 6 Hand Tracking and Segmentation Results
ACKNOWLEDGMENTS Pattern Analysis and Machine Intelligence, 24(5),
696-706, 2002.
This research is supported by Ministry of Knowledge [7] P. Kakumanu, S. Makrogiannis, N. Bourbakis, “A
Economy and Electronics and Telecommunications survey of skin-color modeling and detection
Research Institute(ETRI) in the Technology Innovation methods”, Pattern Recognition, 40: 1106-1122, 2007.
Program 2009. [8] H. Kim, G. Albuquerque, S. Havemann, D. W.
Fellner, “Tangible 3D: Hand Gesture Interaction for
Immersive 3D Modeling”, IPT & EGVE Workshop,
REFERENCES 2005.
[9] M. Kolsch and M. Turk, “Robust Hand Detection”,
[1] M. Billinghurst, H. Kato and I. Poupyrev, “Tangible IEEE International Conference on Automatic Face
Augmented Reality”, International Conference on and Gesture Recognition, 614-619, 2004.
Computer Graphics and Interactive Techniques: [10] C. Shan, T. Tan, and Y. Wei, “Real-time hand
ACM SIGGRAPH ASIA, 2008. tracking using a mean shift embedded partical filter”,
[2] M. J. Black and A. D. Jepson, “EigenTracking: Pattern Recognition, 40(7): 1958-1970, 2007.
Robust Matching and Tracking of Articulated [11] S. K. Singh, D. S. Chauhan, M. Vasta and R. Singh,
Objects Using a View-Based Representation”, “A Robust Skin Color Based Face Detection
International Journal of Computer Vision, 26(1): 63- Algorithm”, Tamkang Journal of Science and
84, 1998. Engineering, 6(4): 227-234, 2003.
[3] L. Brethes, P. Menezes, F. Lerasle and J. Hayet, [12] C. Yang, R. Duraiswami and L. Davis, “Efficient
“Face Tracking and Hand Gesture Recognition for mean-shift tracking via a new similarity measure”,
Human-Robot Interaction”, IEEE International IEEE Conference on Computer Vision and Pattern
Conference on Robotics and Automation, 2: 1901- Recognition, 1: 176-183, 2005.
1906, 2004. [13] V. Vezhnevets, V. Sazonov, A. Andreeva, “A Survey
[4] T. Caetanoa, S. Olabarriagab and D. Baronea, “Do on Pixel-Based Skin Color Detection Techniques”,
mixture models in chromaticity space improve skin GraphiCon Conference, Moscow, Russia, 82-92,
detection?”, Pattern Recognition, 36(12): 3019-3021, 2003.
2003. [14] Y. Yin and R. Davis, “Toward Natural Interaction in
[5] D.Comaniciu, V. Ramesh, and P. Meer, “Real-time the Real World: Real-time Gesture Recognition”,
tracking of non-rigid objects using mean shift”, ICMI-MLMI, Beijing, China, 2010.
IEEE Conference on Computer Vision and Pattern
Recognition, 2: 142-149, 2000.
[6] R.-L. Hsu and M. Abdel-Mottaleb (2002), “Face
Detection in Color Images”, IEEE Transactions on
185