Escolar Documentos
Profissional Documentos
Cultura Documentos
Tracking hand motion with a color dotted glove for sign language recognition
TABLE OF CONTENTS
List of Tables and Figures .............iii Summary...iv 1.0 Introduction. 1 2.0 American Sign Language (ASL) ..1 3.0 Hand Tracking Technologies.2 3.1 Tracking with interface3 3.1.1 Optical Tracking.3 3.1.1.1 Marker Systems3 3.1.1.2 Silhouette Analysis....4 3.1.2 Magnetic tracking.5 3.1.3 Acoustic tracking...6 3.2 Glove Tracking..6 4.0 Interpreting sign language.6 4.1 using a recurrent neural network...7 4.2 using hidden Markov model...9 5.0 Proposed Methodology....10 5.1 Glove Design...10 5.1.1 Glove vs. Bare hand tacking....12 5.2 Rasterizing the frame....12 5.3 Color and pixel correcting...12 5.4 Indexing the library database..13 5.5 Matching and tracking..14 5.6 Pose estimation and finding the nearest neighbor.14 5.7 Blending nearest neighbor..15 5.8 Recognizing ASL..15 6.0 Discussion...18 7.0 Conclusion.22 8.0 Recommendation.22 References.23
FIGURES Figure 2.0: Usage of ASL..2 Figure 3.1.1.2: Kruegers manipulation of graphics by hand5 Figure 4.1.1: Sign language word recognition system by using recurrent neural network....8 Figure 4.1.2: Recurrent Neural Networks.8 Figure 4.2: The four states HMM used for recognition....10 Figure 5.1: Glove Design..11 Figure 5.1.1: Bare hand estimation and edge detection.12 Figure 5.6: Hausdorff-like image distance....15 Figure 5.8 Interpretation of sign language alphabets...16
TABLES Table 5.3: Estimating neighboring color13 Table 6.1: Technical details comparison..18 Table 6.2: Comparison of advantage and limitations..20
Summary
Sign language is an essential communication toolkit for deaf people. Sign language use is not associated with a specific ethnicity, location, or even household. Rather, people learn ASL because they are deaf, hearing impaired or, less commonly, speech impaired, or because they have family or friends who sign. As sign language is not practiced in all walks of human life, a disabled person faces difficulties in daily life conversations. To solve this problem, hand and finger gesture can be tracked and sign language can be recognized using specific computer system and further translated using a voice output device. For last two decades hand-tracking systems have been widely used in industrial applications, virtual reality and medicine fields, but due to their expense and complicity their deployment has been limited to regular customers. The purpose of this report is to reduce the gap of communications between deaf and hearing people by developing an inexpensive and simple hand tracking system that can be used interpreting and translating sign language. This report focuses on interpreting American Sign Language (ASL) because of its practice all around the world. Different hand tracking technologies and two possible strategies of interpreting ASL has been discussed and compared in this report in terms of their advantages and limitations. Based on the result, the report proposes a consumer hand tracking system using a real-time data driven pose estimation technique with only a webcam and a polymer glove with a specific pattern. The glove design and simple algorithms enables to employ a nearest-neighbor approach to track hands at interactive rates. The tracking motion can be interpreted and translated using a sign language recognition library.
Glossary
Silhouette A silhouette is the image of a person, an object or scene represented as a solid shape of a single color, usually black, its edges matching the outline of the subject Virtual Reality Virtual reality (VR) is a term that applies to computer simulated environments that can simulate physical presence in places in the real world, as well as in imaginary worlds. Neural networks The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes. Rasterisation Rasterisation is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (pixels or dots) for output on a video display or printer, or for storage in a bitmap file format. Degree of Freedom (DOF) The number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Hausdroff distances Hausdroff distance measures how far two subsets of a metric space are from each other. It turns the set of nonempty compact subsets of a metric space into a metric space in its own right. It is named after Felix Hausdorff. Gaussian radial basis kernel Gaussian radial basis kernel is a real-valued function whose value depends only on the distance from the origin. .
1.0 Introduction
Conveying meaning using hand shapes, body language and expression, otherwise known as sign language is an essential tool for visual, hearing and speech impaired people to establish communication with other parties. Due to its diverse pattern and complexity, a disabled person who excels at sign language often fails to communicate with non-sign language knowing listener. As sign language is not practiced in all walks of human life, a disabled person faces difficulties in daily life conversations. Speech synthesizer or generator (e.g. text to speech) is widely used by people with visual impairment or reading disabilities, but communicating with a listener using this type of device is never lucid and also it requires a significant amount of time to make conversations. Computer recognition of sign language can be used for enabling communication with hearing, visual or speech impaired people. Articulated finger tracking systems have been widely used in professional and scientific arenas, but they are rarely developed for consumer applications because of their price and complexity.
So far hand tracking system has been developed using different technologies, e.g. optical tracking of LED or infrared reflecting markers, imaged based visual tracking, magnetic tracking, acoustic tracking, LED gloves or digital data entry glove etc. In this report, different methods of finger tracking which can be used for computer recognition of sign language would be discussed. Based on facts and research, the paper would propose a possible idea of a simple but effective system for real-time tracking hand motion that only requires a webcam and a cloth glove with color markers placed at a custom pattern. A database library would be created for the reference and a estimation pose would be deducted to confirm the track.
the number of ASL speakers is unknown, there were 2.5 million deaf people in United States in the year 2000, who were dependable on ASL [1]. Availability of ASL throughout the world is shown in Figure 2.0. ASL's grammar allows more edibility in word order than English and sometimes uses redundancy for emphasis. ASL uses approximately 6000 gestures for common words and communicates obscure words or proper nouns through finger spelling [2]. Because of ASLs availability and ease of use, this report chose ASL as its conducting sign language.
technologies will be discussed emphasizing on their advantages and limitations. These technologies can be divided into interface tracking and glove technologies. Tracking with interface uses optical, magnetic, or acoustic sensing to determine the 3-space position of the hand. Glove technologies use an electromechanical device fitted over the hand and fingers to determine hand shape [4].
3.1.1.1 Marker Systems Using of flashing infrared LEDs as a marker have been used widely in medical and entertainment industry.In each hand, makers are placed in each operative finger. Cameras are responsible for capturing each marker and measure its positions. Two types of marker system have been developed to record the motion of limbs of body; Infrared LED system such asSelspot [7], Op-Eye, and Optotrak[8].
8
Limitations 1. High processing time is required to analyze several camera images and to determine each markers 3D position. [6] 2. Complex algorithm is needed to infer pose estimation. [7] 3. Multiple cameras needed to accurately distinguish the ambiguities when markers coincide in visual field. 4. Inability to resolve ambiguities restricts its use to track fingers in interactive application.
3.1.1.2 Silhouette Analysis Silhouette analysis of an image can easily distinguish body parts such as head, legs, arms and fingers.Myron Krueger successfully analyzed complex motions in real time by processing silhouette images using a custom hardware. Based on his technique, he developed a wide collection of interactions and games without using gloves or goggles. The movements and actions were integrated into his system called Videoplace [10].Inspired by Kruegers work, Pierre Wellnerdeveloped DigitalDesk [11]. The idea behind DigitalDesk is to mount a video camera above a ordinary physical desk, pointing down at the work surface. Processing the camera output, the system can determine whenthe users point (using a LED-tipped pen) or gestures above a real or projected object. This allows the user to run and edit a projected text file or a calculator by making gestures (figure).
Figure 3.1.1.2: Kruegers manipulation of graphics by hand, fingertips controlling a spinal curve [10].
Limitations 1. The consumer grade camera usually has low fps speed (24fps-60fps). This makes difficult to track rapid moving fingers. 2. Poor resolution (less than 300 dpi) of the consumer grade camera makes difficult to determine the location point of the fingers as they occlude each other and are occluded by the hand [5]. 3. Complex algorithm and technique is needed to interpret complex real time motions.
10
11
structured sets belong to the sign languages. In sign language, where each gesture already has assigned meaning, strong rules of context and grammar are applied to make recognition tractable. To date, most work on sign language recognition has employed expensive datagloves" which tether the user to a stationary machine [21] or computer vision systems limited to a calibrated area [22].Current successful gesture recognition system is based on computer vision technology and Virtual Reality (VR) [23]. The VR glove-based gesture recognition systems use a VR glove to extract a sequence of 3D hand configuration sets which contain finger orientation angles, and use various structures of neural networks[24] or Hidden Markov Models (HMM) [25] to recognize 3D motion data as gestures.
12
Data Glove
Start
Result
Figure 4.1.1: Sign language word recognition system by using recurrent neural network [25]
13
= 1
If the initial state is of distribution = { }, an HMM can be written in a compact notation to represent the complete parameter set of the model = (, , )
HMMs are widely used in speech, gesture recognition and signal processing systems. HMMs provide the algorithm for modeling dynamical 3-D dependencies and correlations between measurements. The dynamical dependencies are modeled implicitly by a Markov chain with a specied number of hidden states. The initial state for an HMM can be determined by estimating how many different states are involved in specifying a sign language. While better results might be obtained by modifying different states for each sign, a four state HMM with one skip transition was determined to be sufficient for this task [29] (Figure 4.2).
14
Figure 4.2: the four states HMM used for recognition [29].
Schlenzig et al. [31] used hidden Markov models to recognize hello," good-bye," and rotate" in sign language.Wilson and Bobick [32] explored incorporating multiple representations in HMM frameworks, and Campbell et. al. [33] used a HMM-based gesture system to recognize 18 T'ai Chi gestures with 98% accuracy.
looks for this three (#FFA500, #00FF00, #FF00FF ) fully saturated colors to distinguish front, back side and the fingertips of the hand.#FFA500, #00FF00, #FF00FF has been classified as master color. The color patches and the pattern on the glove enables quicker and maximum robust pose estimation with less complex color identification algorithms [36].Orange and lime patches are connected at side of each fingers which enables us to easily distinguish the side of the finger. The 3D hand model has 21 degree of freedom (DOF) including 6 DOFs for global transformation and 4 DOFs per finger.
16
5.1.1 Glove vs. Bare Hand tracking In bare-hand pose estimation, two very different poses can map to very similar images. This is a difcult challenge that requires slower and more complex inference algorithms to address. An extra step needs to be acquired to obtain the skin data (edge detection)for good results [37]. With gloved hand, very different poses always map to very different images (See Figure 3). This allows us to use a simple image lookup approach.
significant amount color pixels. To solve this problem all neighbor color close to magenta, lime and orange would beclassified as glove pixel (Table 5.3). The system would reject any other color from the frame including the background color. For maximum result, using three colors other than the glove should be prohibited in the visual area. Table 5.3: Estimating neighboring color Master color #FFA500 (RGB decimal 255, 165, 0) #00FF00 (RGB decimal 0, 255, 0) #FF00FF (RGB decimal 255, 0, 255) Neighboring colors Accept all colors ranging from RGB (205~255, 130~180, 0) Accept all colors ranging from RGB (0~90, 170~255, 0~90) Accept all colors ranging from RGB (170~255, 0~80, 170~255)
After color pixel classification, only two pixel remains; glove pixel and nonglove pixel. Glove pixels would be cropped and decreased into 40x40 pixel micro images. Let denote as Micro Image Setand it would be classified as hand region. Once the hand region is acquired, it will be queried with the library database for positive match. Decreasing the number of the pixel into micro images would optimize further speed querying for the positive match.
18
Members of the D is denoted as d1, d2, d3... dn (n is natural number). A distance metric between dmand dnis denoted as s(dm,dn). Low-dispersion sampling was used to create a uniform set of samples D from overcomple collection of finger configurations . A sampling algorithm *35+ is used to minimize dispersion at each iteration successfully, The next furthest distance from previous sample +1 = s(d , d ) Where, is given samples at i iteration.
1 1
( , )1
2 +( )2
= (, ) 1( , ) = 2( , ) 1 = (, ) 1( , ) 1, 2 = 1, 2 + 1, 2
19
Figure 5.6: Hausdorff-like image distance. A database image and a query image are compared by computing the divergence from the database to the query and from the query to the database. [33]
20
Sign Alphabet
Captured Frame
Sign Alphabet
Captured Frame
21
22
6.0 Discussion
The proposed method of hand tracking system is a combination and correlation of optical and glove tracking. Although, LED markers, magnetic sensors, silhouette analysis and acoustic tracking exposes robust and smooth tracking and they are used widely in automation, medical and entertainment industry, the proposed idea restricted use of this technologies because they require more sophisticated algorithms, expense and time. Thus, it makes completely affordable by consumers. A detailed comparison of technical data of all hand tracking systems available along with the proposed prototype system is given in table 6.1 Table 6.1: Technical details comparison [7], [8], [9], [10], [11], [5]. Device/Syst Tracking em name trakSTAR Vicon system Retail price/ manufacturin g expense Magnetic Tracking Optical and $30k-150k 10-24 1000 fps 6kg $50,000.00 Camera/ sensor used/DOF 6DOF 1000 fps 2 kg Speed Weight Resolution/ area of coverage 6 Megapixel 1280x1024
23
camera
pixel
900
Ascension (Polhemus ) Exoskeleto ns (joint sensors plus a gyroscope) mechanical system Mattel Power Glove
Magnetic Tracking
$50,000.00
120 fps
1.8 kg
$40,000.00
180 sensors
500 sample/se c
5 kg
Acoustic tracking
$10,000.00
6 DOF
120 sample/se c
1.1 kg
2 m radial aera
Digital Desk Optical [5] tracking, silhouette analysis Videoplace Optical tracking, silhouette analysis MIT Led glvoe LED glove technology
$5,000.00
2 camera
30fps
Less than 1 m
$2000.00
1 camera
24-60 fps
Less than 1 m
16 DOF
100~120 sample/se
0.8 kg
24
c Cyber Glove 22 thin foil strain gauges sewn into the fabric glove to track, Electromag netic $5000.00+ 22 thin foil 300 sample/se c 0.5 kg
$40,000.00
10 camera
2 kg
16 megapixel
$105.00
A detailed comparison of advantage and limitations are given in table 6.2. Table 6.2: Comparison of advantage and limitations[7], [8], [9], [10], [11], [5]. Device/ System Name trakSTAR Advantage High rate data, Highly available in industry Vicon optical system (Motion analysis) OptotrakCertus High rate data, Highly available in industry Minimum data loss after Expense, Occlusion, relies on software for data resolution Low capture rate, small region Limitations Expense, Occlusion
25
occlusion identifies, no environment restriction Ascension (Polhemus) No occlusion, orientation information recorded Exoskeletons (joint sensors plus a gyroscope) mechanical system VPL Data Glove Sayre Glvoe Reasonable cost Effective for multi-functional control Digital Data enry glove Cyber Glove First ASL recognizer m Virtual Technologies. It is comfortable, easy to use, and has an accuracy and precision well suited for complex gestural work or fine manipulations Vicon MX Proposed prototype: Color dotted glove Precise and accurate tracking Light weight, comfortable, faster than using HHM or recurrent neural network because it queries in database for positive match rather than processing topologies. Slow estimation process time, limited accuracy due to inadequate library database Slow processing time Slow speed capturing Less gesture Fits a rigid body skeleton well, high data rate Environment restriction, can be bulky Not accurate in body location
26
Although the proposed system has slow estimation response time because of the rapid access to database for every frame, it managed to show credibility in respect of expense and complexity.
7.0 Conclusion
This report introduced a hand-tracking user-input device composed of a single camera and a polymer glove. The report shows that without using HMM or recurrent neural network, this system can work effectively. The system is logically balanced and should work effectively in 3-D manipulation and pose recognition tasks. The system could be improved by installing fine sensors and inverse kinematics algorithms, but that would restrict the idea of being cost effective. Because the primary purpose of this report is to deliver a robust and low-cost-user input.
8.0 Recommendation:
The proposed system bears more possible extensions. More cameras can be installed for more accuracy as long as the hands do not occlude. Our hand movement and finger configuration can be replaced with LED pens or multi touch interfaces for ease of user experience. Inverse kinematics [25] and optimal smoothness [33] can be applied for more accuracy of the detection and tracking system. Camera calibration process can be improved with better sensor alignment and resolution. The system can also be used in the field of virtual surgery, virtual games and sports alongside recognition of sign language.
27
References:
1. Judith Holt, Sue Hotto and Kevin Cole, Demographic Aspects of Hearing Impairment: Questions and Answers, Third Edition, 1994. 2. Karen Nakamura, About ASL, Deaf Resource Library, http:// www.deaflibrary.org. 3. Heinlein, Robert A. , "Science fiction: its nature, faults and virtues", The Science Fiction Novel, Chicago: Advent, 1959. 4. G.J. Grimes, "Digital Data Entry Glove Interface Device., Bell Telephone Laboratories, Murray Hill. NJ, US Patent 4.414.537, Nov.8.1983. 5. Sturman, D.J., Zeltzer, D. "A survey of glove-based input, IEEE Computer Graphics and Applications, (January 1994). 6. J. Rehg, T. Kanade,DigitEyes: Vision-Based Human Hand-Tracking, School of Computer Science Technical Report CMU-CS-93-220, December 1993. 7. Herman J. and Woltering, Optotrak, Selspot, Gait Measurement in Two-and ThreeDimensional SpaceA Preliminary Report, Cleveland 1994. 8. OptotrakCertus Motion Capture System, available at: http://www.ndigital.com/lifesciences/certus-techspecs.php. 9. ViconMX, available at: http://www.vicon.com/products/sensors.html. 10. Myron Krueger. Artificial Reality 2, Addison-Wesley Professional, 1991. 11. Pierre Wellner, Interecting with paper on the DigitalDesk, Rank Xerox EuroPARC, Cambridge, UK, Volume 36 Issue 7, Pages 87-96, July 1993. 12. Jannick P. Rolland, YohanBaillot, and Alexei A. Goon, A Survey Of Tracking Technology For Virtual Environments, Center for Research and Education in Optics and Lasers (CREOL), University of Central Florida. 13. Polhemus FASTTRAK official website available at: http://www.polhemus.com/?page=Motion_Fastrak 14. Ascension Technologies trakSTAR official website available at http://www.ascensiontech.com/medical/trakSTAR.php 15. Logitech video technologies, http://www.logitech.com/en-us/488/455
28
16. A.G.E. Tech,Abrams Gentile Entertainment, 2009. 17. Vitor F. Pamplona, Leandro A. F. Fernandes, JooPrauchner, Luciana P. Nedel and Manuel M. Oliveira, The Image-Based Data Glove, Proceedings of X Symposium on Virtual Reality (SVR'2008), Joo Pessoa, 2008. Anais do SVR 2008, Porto Alegre: SBC, 2008, pp. 204211. 18. Dr. G. Grimes, Digital Data Entry Glove, US Patent 4,414,537 Patented Nov. 8, 1983 19. Tom Zimmermann et al, Dataglove: A hand gesture interface device, 1985. 20. Ken Pimentel, Kevin Teixeira, "Virtual Reality: through the new looking glass", Intel/Windcrest/McGraw Hill,1993. 21. L. Campbell, D. Becker, A. Azarbayejani, A. Bobick, and A. Pentland, Invariant features for 3-D gesture recognition," Intl. Conf. on Face and Gesture Recogn., pp. 157-162, 1996 22. Y. Cui and J. Weng,Learning-based hand sign recognition." Intl. Work. Auto. Face Gest. Recog. (IWAFGR),, p. 201-206, 1995. 23. Thad Starner, Joshua Weaver, and Alex Pentland, A Wearable Computer Based American Sign Language Recognizer, The Media Laboratory, Massachusetts Institute of Technology, 2001. 24. Marcus Vinicius Lamar, Hand Gesture Recognition using T-CombNET A Neural Network Model dedicated to Temporal Information Processing, Doctoral Thesis, Institute of Technology, Japan, 2001. 25. AnkitChaudhary, J. L. Raheja, Karen Das, and Sonia Raheja. (2011, Feb). Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way A Survey, International Journal of Computer Science & Engineering Survey (IJCSES), vol. 2(1). 26. Jian-kang Wu, Neural networks and Simulation methods, Marcel Dekker, Inc., USA, 1994. Available at:http://books.google.co.in/books/about/Neural_networks_and_simulation_methods. html?id=95iQOxLDdK4C&redir_esc=y
29
27. ManarMaraqa, Raed Abu-Zaiter, Recognition of Arabic Sign Language (ArSL) Using Recurrent Neural Networks, IEEE First International Conference on the Applications of Digital Information and Web Technologies, p. 478-48, 2008. 28. Tie Yang, YangshengXu, Hidden Markov Model for Gesture Recognition, May 1994. 29. Thad Eugene Starner, Visual Recognition of American Sign language Using Hidden markov models, 1999. 30. J. Schlenzig, E. Hunter, and R. Jain, Recursive identification of gesture using hidden Markov models." Proc. Second Ann. Conf.on Appl. of Comp. Vision, p. 187-194, 1994. 31. A. Wilson and A. Bobick. Learning visual behavior for gesture analysis." Proc. IEEE Int'l.Symp. on Comp. Vis, Nov. 1995. 32. C.Y. Suen, M. Berthod, and S. Mori, Automatic recognition of handprinted characters: the state of the art, Proceedings of the IEEE, Vol. 68, No. 4, pp. 469-487, 1980. 33. B. Dorner,Chasing the colour glove: visual hand tracking, 1994. 34. Robert Y. Wang, Jovan Popovic, Real-Time Hand-Tracking with a Color Glove, 2009. 35. White, R., Crane, K., And Forsyth, D. A., Capturing and animating occluded cloth, ACM Transactions on Graphics, 2008. 36. M. Yuan, F. Farbiz, C.M. Manders, T.K. Yin., Robust hand tracking using a simple color classication technique, The International Journal of Virtual Reality, 8(2), 2009.
30