Você está na página 1de 4

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery

A NEW APPROACH TO HAND TRACKING AND GESTURE RECOGNITION BY A NEW FEATURE TYPE AND HMM Pham The Bao, Nguyen Thanh Binh, Tu Duy Khoa Faculty of Mathematics and Computer Science University of Science, Ho Chi Minh City ptbao@hcmuns.edu.vn, ntbinh@math.hcmuns.edu.vn
ABSTRACT In this paper, we introduce a hand gesture recognition system to recognize real time gestures in Vietnamese sign language system. In our system, there are three modules: real time hand tracking, training gesture and gesture recognition using pseudo two dimension hidden Markov models (P2-DHMMs). In the hand tracking module, we introduce a new robust algorithm to obtain hand region, called Tower method, and use skin color for hand gesture tracking and recognition. Next, a gesture recognition system is developed, which can reliably recognize single hand gesture on a standard camera. Furthermore, we propose a new feature type in gesture recognition to improve the accuracy of overall system. In the experiments, we have tested our system to vocabulary of 29 gestures in Vietnamese sign language system (VSL), and show the effectiveness of the system and Tower method. Keywords: P2-DHMMs, Hand tracking, Tower method, Camshift method. 1. INTRODUCTION hand gesture recognition systems employed geometric feature-based methods, template-based methods, and active statistic models [5]. Given the success of HMMs in speech, it is also used successfully in hand gesture recognition systems [6]. Motivated by the desire to provide users with a capable gesture recognition system, we developed Tower method to obtain hand region and use P2D HMMs to recognize hand gesture. A motivation of our system is that Tower tracking method helps us extract the hand region effectively by an easily controlled threshold, and can track 2-5 times more quickly than Camshift method. 2. SYSTEM OVERVIEW

Human-machine interfaces are playing a role of growing importance as information technology continues to evolve quickly. Keyboards have been replaced by handwriting recognition in Palm, and Pocket PC PDAs [1]. Moreover, some companies have developed some new cell phones to help deaf people communicate with another by hand gesture recognition. Gesture recognition becomes a necessary area of active current research in computer vision. In this paper, we focus on the problem of hand gesture recognition using a real time hand tracking method with P2-DHMMs. We have considered single hand gestures as sequences of distinct hand shapes and hand region. Many approaches to gesture recognition have been developed. A large variety of techniques have been used for modeling the hand. An approach based on the 2D locations of fingertips and palms was used by Davis and Shah [2]. N.D.Binh, E.Shuichi and T. Ejima use a Kalman filter and hand blobs analysis for hand tracking [3]. Bobick and Wilson developed dynamic gestures, which have been handled using framework [4]. Previous attempts to develop
978-0-7695-3735-1/09 $25.00 2009 IEEE DOI 10.1109/FSKD.2009.276 3

In our system, diagram 1, the single handed gestures are only considered. A gesture is a specific combination of hand position, orientation, and flexion observation at some time instance. Our system indentifies a gesture based upon the temporal sequence of hand regions in the image frame. The output of hand tracking process is the input of the recognition process. The hand region can be extracted by Tower tracking method. After obtaining the hand region, a P2D HMMs are used to recognize the gesture. In fact, our system is shown in the following steps: a.Choose initial minimum size of the object for Tower tracking method. b.While hand is shown in the camera : i. Track and extract the hand from an image sequence (frame) by skin color segmentation and Tower tracking method. ii. Verify the extracted hand region. c.Using P2D HMMs to recognize the gesture, which gets maximum probability. 3. HAND TRACKING

We developed a new real time hand tracking method, which is robust and reliable on hand tracking in unconstrained background. The hand gesture is captured in the camera. Skin color segmentation based on YCbCr color space is applied to obtain the hand region. It has proven to be an effective method for segmenting the hand in fairly unrestricted environment. Morphological operations are

used to smooth the image and remove noise before extracting the hand with Tower tracking method. Training database

object. A coarse tower is a rectangle 3x3. The distance d2 between two consecutive fine towers in each row or each column is smaller than d1.

Camera

Hand tracking

Hand recognition

Figure 3: The distribution of the fine towers in the hand region The steps of Tower tracking method are in the following algorithm: i. Step 1: determine the features of the object and approximate value of distance d1, d2. ii. Step 2: generate the coarse towers with the distance d1 reasonably. iii. Step 3: scan the signal (features of the object) in all coarse towers. If it exists, save it and stop to go to step 4; otherwise, go to step 2. iv. Step 4: With each signal, found in step 3, execute the boundary spreading algorithm. v. Step 5: refine the set of points, found in step 4, show the location and shape of the object. vi. Step 6: results. All towers always base on some basic features of the objects. These features may change flexibly, and depend on the goal of each object tracking problem. The more apparent these features are, the more effective the Tower tracking method gets. According to our goal, we will choose how to generate two kinds of towers suitably. 3.2. Hand tracking

Results Diagram 1. System overview. 3.1. A new approach to hand tracking Tower tracking method is an effective method, which can solve the basis problem in image processing: object tracking problem. The main point of our method is that we only observe in some groups of chosen points in the image to get the position and shape of the object with controlled errors. Each group of chosen point is considered as a tower, figure 1. A tower is used to recognize the existence of some chosen features around it. So, the distribution of these towers in image and the features of the object involve the robustness and accuracy of the goal.

Figure 1: Generating towers in the image There are two kinds of the towers. The first kind is called the coarse tower, which is used to locate quickly the object. A coarse tower is a rectangle 5x5. The initial requirement to generate the coarse towers in the image is that we need to know approximately value of d1, the minimum size of the object in the camera. The distance between two consecutive coarse towers in each row or each column is d1, which ensures that the object always intersects at least one coarse tower.

In hand tracking module, we use the moving feature and skin color feature [7] in Tower tracking method. We can control the values of d1 and d2, which influence directly the accuracy of problems. The values of d1 and d2 using in our system are 40 pixels and 10 pixels in turn, when the tester stands far about 1.2 meters from the camera. We generate the coarse or fine towers in this following way as figure 4:

Figure 4: Distribution of coarse towers in the image

Figure 2: The distribution of coarse tower. The next kind, figure 3, is the fine towers, which are used to find the shape of the object accurately after locating the

Figure 5: Illustration of six-neighbor towers of a red tower

Definition 1: A tower is considered in boundary if at least one tower does not match the features of the object and it satisfies the features of the object. We also consider the towers, which do not have enough 6 neighbors around them, in boundary. The boundary spreading algorithm, based on the location of the signal which is received from coarse tower, we do the following steps: Step 1: move left to find the first in boundary tower, suppose t0. Step 2: T= {to}. Step 3: set T1= T, T2=. Step 4: With each of elements in T1, we do: Scan six-neighbor towers to find next in boundary tower t*. If t* T, then T= T {t*}, T2=T2 {t*}. If T2=, go to Step 6. Step 5: Set T1= T2, T2=, go back Step 4. Step 6: result T. Then T is the set of boundary tower, which helps to expose the shape of the object containing the initial signal. 4. GESTURE RECOGNITION

following super-sates are capable. In each super state, there is linear one dimension hidden Markov model (1DHMM) [10] to model each row, which is autonomous with its neighboring rows. In each part of the image corresponding to a super state, we divide it in to PxL sampling windows, which scan the image from left to right, top to bottom, to obtain a sequence of observations for each 1DHMM in each super state. As the sampling window moves from left to right on a line, each observation has Q columns of overlap with the observation preceding it. When the right edge or the last full frame on the current line is reached, the sampling window moves back to the beginning of the line and shifts down with M rows of overlap between successive lines. We use the Cr value of each pixel as features in the recognition system.

Figure 7: A sampling technique with P=L=20, M=Q=10. 4.2. Training the hand model Each P2DHMM is trained by the hand gesture in the database obtained from the training set of each of the gesture using the Baum-Welch algorithm. We also used the intelligent selection of training system images in [3] to improve the training database. 4.3. Gesture recognition The embedded Viterbi algorithm in P2DHMMs [8] is used to determine the probability of each hand model. The image is recognized as the hand gesture, whose model has the highest production probability. 5. EXPERIMENTS

After extracting the hand by Tower algorithm, we recognize the gesture. Since the hand images are two dimensional, it is natural to use Pseudo 2D HMMs to recognize single-hand gesture. Pseudo 2-DHMMs in our system are realized as a vertical connection of horizontal HMM (k) [3]. 4.1. Design of hand gesture models in VSL Each hand gesture in VSL is mapped to a P2D HMM. After getting hand ROI region, we resize it into the standard size. Then, we divide the hand model image into five parts corresponding to five super states in a P2DHMM by the new feature type as figure 6. This is a new feature type in hand gesture, which is more reliable in VSL than one in ASL. Moreover, the feature type to classify the superstates is necessary to distinguish the images between two different hand gestures, such as A and B. So, after overhauling in our database, we chose it to set up the recognition system.

Figure 6: How to classify the hand ROI region into 5 superstates in the image. The topology of the super state model is a N-state linear model, where only self transitions and transitions to the

We used 80 images (we build the dataset) to train and test for each hand gesture in VSL; so there are 2320 images in our database. In the training process, we used 30 images to train P2DHMM of each hand gesture in VSL. The remaining 50 images of each hand gesture were used to measure the accuracy of each P2DHMM. We made the test in our PC: Pentium IV, CPU 2.8 Ghz, RAM 480M, and a Webcam 240x320 on the programming environment Matlab R2006a. 5.1. Results of tracking

The complete system works at about 25 frames/sec. The speed of Tower tracking algorithm was normally measured about 0.1s/1frame in our PC. So, we used it to obtain the hand region in every frame. The motivation of Tower algorithm is that it can track the hand accurately albeit the tester moves hand too fast, which is approximately impossible with Camshift method [9], or Camshift combining with Kalman filter method [3]. Because of only executing in the boundary of the object, the results also show that Tower tracking algorithm can track faster and more accurately than Camshift method. We do many tests of the speed between our method and Camshift method. Normally, our method can do about 2-3 times faster than Camshift, especially when tester move his hand quickly come to the Webcam, our method can do about 4-6 times faster than Camshift. However, our method requires the good features of the object. 5.2. Results of recognition We tested our system using VSL. The images of the same gesture were taken at different times and different testers. The best recognition rate is 96% in our database, table 1, and 89% in testing on the camera, table 2.
Table 1. The accuracy rate in our data base. Number of substate in super state 5 7 10 Number of testing images 2320 2320 2320 Number of correct images 2158 2228 2180 Number of fail images 162 92 140 Accuracy rate 93% 96% 94%

[2] J. Davis and M. Shah. Recognizing hand gestures. In Proceedings of European Conference on Computer Vision, ECCV: 331-340, 1994. [3] N.D.Binh, E.Shuichi and T.Ejima, Real-time hand tracking and hand recognition system, ICCI 2006, 5th IEEE International Conference, 2006. [4] D. J. Turman and D. Zelter, Survey of glove-based input, IEEE Computer Graphics and Application 14:30-39, 1994. [5] J.Yang, Y.Xu, and C.S. Chen, Gesture Interface: Modeling and Learning, IEEE International Conference on Robotics and Automation, Vol.2, pp. 1747-1752, 1994. [6] R. Lockton. A.W. Fitzgibbon, Real-Time gesture recognition using deterministic boosting, Preceedings of British Machine Vision Conference, 2002. [7] Pham The Bao, Analysis Skin Color by PCA and Relations in YCbCr Color Space, Proceedings of The 21st Int. Technical Conference on Circuits/Systems, Computers and Communications, pp. 329-332, Thailand 2006. [8] Vesa Matti Mantyla, Discrete hidden Markov models with application to isolated user dependent hand gesture regconition, VTT Publications 449, Technical Research Center of Finland. [9] Y.Cheng, Meanshift, mode seeking, and clustering,IEEE Trans. Pattern Anal. Machine Intell., 17:790-799, 1995. [10] L.R. Rabiner, A toturial on hidden Markov models and selected. applications in speech recognition, Proc. of IEEE, vol. 77, no. 2, pp. 257-286, Feb.1989.

Table 2. The accuracy rate in testing on camera. Number of substate in super state 5 7 10 Number of testing images 870 870 870 Number of correct images 740 775 748 Number of fail images 130 95 122
Accuracy

rate 85% 89% 86%

6.

CONCLUSION

We have developed a new hand tracking method combining new type feature to gesturing hand in VSL, which are robust to work automatic and in real time. We will continue developing Tower method not only in object tracking, maybe in object segmentation. 7. REFERENCE

[1] Palm Products-Ways to Enter Data into a Palm Handheld, Aug, 2003, Available at (online).

Você também pode gostar