Sign Recognition by Hand Tracking

University of Victoria Faculty of Engineering
Tracking hand motion with a color dotted glove for sign language recognition
ASM SaeedulAlam Electrical Engineering saeedul@uvic.ca September 4, 2012
TABLE OF CONTENTS
List of Tables and Figures .............iii Summary...iv 1.0 Introduction. 1 2.0 American Sign Language (ASL) ..1 3.0 Hand Tracking Technologies.2 3.1 Tracking with interface3 3.1.1 Optical Tracking.3 3.1.1.1 Marker Systems3 3.1.1.2 Silhouette Analysis....4 3.1.2 Magnetic tracking.5 3.1.3 Acoustic tracking...6 3.2 Glove Tracking..6 4.0 Interpreting sign language.6 4.1 using a recurrent neural network...7 4.2 using hidden Markov model...9 5.0 Proposed Methodology....10 5.1 Glove Design...10 5.1.1 Glove vs. Bare hand tacking....12 5.2 Rasterizing the frame....12 5.3 Color and pixel correcting...12 5.4 Indexing the library database..13 5.5 Matching and tracking..14 5.6 Pose estimation and finding the nearest neighbor.14 5.7 Blending nearest neighbor..15 5.8 Recognizing ASL..15 6.0 Discussion...18 7.0 Conclusion.22 8.0 Recommendation.22 References.23
LIST OF TABLES AND FIGURES
FIGURES Figure 2.0: Usage of ASL..2 Figure 3.1.1.2: Kruegers manipulation of graphics by hand5 Figure 4.1.1: Sign language word recognition system by using recurrent neural network....8 Figure 4.1.2: Recurrent Neural Networks.8 Figure 4.2: The four states HMM used for recognition....10 Figure 5.1: Glove Design..11 Figure 5.1.1: Bare hand estimation and edge detection.12 Figure 5.6: Hausdorff-like image distance....15 Figure 5.8 Interpretation of sign language alphabets...16
TABLES Table 5.3: Estimating neighboring color13 Table 6.1: Technical details comparison..18 Table 6.2: Comparison of advantage and limitations..20
Summary
Sign language is an essential communication toolkit for deaf people. Sign language use is not associated with a specific ethnicity, location, or even household. Rather, people learn ASL because they are deaf, hearing impaired or, less commonly, speech impaired, or because they have family or friends who sign. As sign language is not practiced in all walks of human life, a disabled person faces difficulties in daily life conversations. To solve this problem, hand and finger gesture can be tracked and sign language can be recognized using specific computer system and further translated using a voice output device. For last two decades hand-tracking systems have been widely used in industrial applications, virtual reality and medicine fields, but due to their expense and complicity their deployment has been limited to regular customers. The purpose of this report is to reduce the gap of communications between deaf and hearing people by developing an inexpensive and simple hand tracking system that can be used interpreting and translating sign language. This report focuses on interpreting American Sign Language (ASL) because of its practice all around the world. Different hand tracking technologies and two possible strategies of interpreting ASL has been discussed and compared in this report in terms of their advantages and limitations. Based on the result, the report proposes a consumer hand tracking system using a real-time data driven pose estimation technique with only a webcam and a polymer glove with a specific pattern. The glove design and simple algorithms enables to employ a nearest-neighbor approach to track hands at interactive rates. The tracking motion can be interpreted and translated using a sign language recognition library.
Glossary
Silhouette A silhouette is the image of a person, an object or scene represented as a solid shape of a single color, usually black, its edges matching the outline of the subject Virtual Reality Virtual reality (VR) is a term that applies to computer simulated environments that can simulate physical presence in places in the real world, as well as in imaginary worlds. Neural networks The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes. Rasterisation Rasterisation is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (pixels or dots) for output on a video display or printer, or for storage in a bitmap file format. Degree of Freedom (DOF) The number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Hausdroff distances Hausdroff distance measures how far two subsets of a metric space are from each other. It turns the set of nonempty compact subsets of a metric space into a metric space in its own right. It is named after Felix Hausdorff. Gaussian radial basis kernel Gaussian radial basis kernel is a real-valued function whose value depends only on the distance from the origin. .
1.0 Introduction
Conveying meaning using hand shapes, body language and expression, otherwise known as sign language is an essential tool for visual, hearing and speech impaired people to establish communication with other parties. Due to its diverse pattern and complexity, a disabled person who excels at sign language often fails to communicate with non-sign language knowing listener. As sign language is not practiced in all walks of human life, a disabled person faces difficulties in daily life conversations. Speech synthesizer or generator (e.g. text to speech) is widely used by people with visual impairment or reading disabilities, but communicating with a listener using this type of device is never lucid and also it requires a significant amount of time to make conversations. Computer recognition of sign language can be used for enabling communication with hearing, visual or speech impaired people. Articulated finger tracking systems have been widely used in professional and scientific arenas, but they are rarely developed for consumer applications because of their price and complexity.
So far hand tracking system has been developed using different technologies, e.g. optical tracking of LED or infrared reflecting markers, imaged based visual tracking, magnetic tracking, acoustic tracking, LED gloves or digital data entry glove etc. In this report, different methods of finger tracking which can be used for computer recognition of sign language would be discussed. Based on facts and research, the paper would propose a possible idea of a simple but effective system for real-time tracking hand motion that only requires a webcam and a cloth glove with color markers placed at a custom pattern. A database library would be created for the reference and a estimation pose would be deducted to confirm the track.
2.0 American Sign Language (ASL)

American Sign Language (ASL) is the language of choice for most deaf people in the United States, Canada and Africa [1]. ASL is a sign language in which the hands, arms, head, facial expression and body language are used to speak without sound. ASL features an entirely different grammar and vocabulary from normal phonetic languages such as English [2].Although
6
the number of ASL speakers is unknown, there were 2.5 million deaf people in United States in the year 2000, who were dependable on ASL [1]. Availability of ASL throughout the world is shown in Figure 2.0. ASL's grammar allows more edibility in word order than English and sometimes uses redundancy for emphasis. ASL uses approximately 6000 gestures for common words and communicates obscure words or proper nouns through finger spelling [2]. Because of ASLs availability and ease of use, this report chose ASL as its conducting sign language.
ASL is the national sign language.
ASL is used alongside other sign languages.
Insignificant use of ASL
Figure 2.0: Usage of ASL [2]
3.0 Hand TrackingTechnologies

The tracking system is developed focusing on user-data interaction. The objective is to establish a communication between human and computer interactions and use the virtual data to synchronize with the hand gestures. Different types of tracking technologies have been used so far to track the 3D position of hand and to capture finger configuration. The history of hand tracking goes back to post-WWII development of master slave manipulator arms and during Renaissance with development of the pantograph [3]. In this section, different types of tracking
technologies will be discussed emphasizing on their advantages and limitations. These technologies can be divided into interface tracking and glove technologies. Tracking with interface uses optical, magnetic, or acoustic sensing to determine the 3-space position of the hand. Glove technologies use an electromechanical device fitted over the hand and fingers to determine hand shape [4].
3.1 Tracking with interface

In this system, hand position can be tracked by following orientation of hand and finger configuration. Position tracking could be done using following three technologies [5]; Optical tracking, using a single or multiple cameras from a certain distance. Magnetic Tracking, Radiating a magnetic pulse from a fixed source. Acoustic Tracking, using triangulation of ultrasonic wave to locate the hand.
3.1.1 Optical Tracking

In optical tracking, small markers are put on the major bone segments of the body. The markers might emit infrared waves and could be either LEDs or reflecting dots.Single or multiple cameras are used to capture the motion of the subject along with the markers. The software system integrates those markers in 2D coordination and triangulates to calculate 3D position for each marker [6]. Another method uses a single camera to capture the silhouette image of the subject, which is analyzed to determine positions of the various pans of the body and user gestures.
3.1.1.1 Marker Systems Using of flashing infrared LEDs as a marker have been used widely in medical and entertainment industry.In each hand, makers are placed in each operative finger. Cameras are responsible for capturing each marker and measure its positions. Two types of marker system have been developed to record the motion of limbs of body; Infrared LED system such asSelspot [7], Op-Eye, and Optotrak[8].
8
Reflective marker systems such Elite and Vicon Avalon[9]. .
Limitations 1. High processing time is required to analyze several camera images and to determine each markers 3D position. [6] 2. Complex algorithm is needed to infer pose estimation. [7] 3. Multiple cameras needed to accurately distinguish the ambiguities when markers coincide in visual field. 4. Inability to resolve ambiguities restricts its use to track fingers in interactive application.
3.1.1.2 Silhouette Analysis Silhouette analysis of an image can easily distinguish body parts such as head, legs, arms and fingers.Myron Krueger successfully analyzed complex motions in real time by processing silhouette images using a custom hardware. Based on his technique, he developed a wide collection of interactions and games without using gloves or goggles. The movements and actions were integrated into his system called Videoplace [10].Inspired by Kruegers work, Pierre Wellnerdeveloped DigitalDesk [11]. The idea behind DigitalDesk is to mount a video camera above a ordinary physical desk, pointing down at the work surface. Processing the camera output, the system can determine whenthe users point (using a LED-tipped pen) or gestures above a real or projected object. This allows the user to run and edit a projected text file or a calculator by making gestures (figure).
Figure 3.1.1.2: Kruegers manipulation of graphics by hand, fingertips controlling a spinal curve [10].
Limitations 1. The consumer grade camera usually has low fps speed (24fps-60fps). This makes difficult to track rapid moving fingers. 2. Poor resolution (less than 300 dpi) of the consumer grade camera makes difficult to determine the location point of the fingers as they occlude each other and are occluded by the hand [5]. 3. Complex algorithm and technique is needed to interpret complex real time motions.
3.1.2 Magnetic tracking

Magnetic tracking technology is quite robust and widely used for single or double hand-tracking [12]. Magnetic tracking uses a source element radiating a magnetic field and a small sensor that reports its position and orientation with respect to the source. Magnetic systems do not rely on line-of-sight observation like optical and acoustic systems.But metallic objects in the environment would distort the magnetic field, giving erroneous readings. They also require cable attachment to a central device (as do LED and acoustic systems) [5].Polhemus FASTTRAK and Ascension TechnologiestrakSTAR provide various multisource, multi-sensor magnetic systems that will track a number of points at up to 100 Hz in ranges from 3 to 20 feet [13], [14].
10
3.1.3 Acoustic tracking

Acoustic tracking uses high-frequency sound to triangulate a source within the work area. Most systems such as Logitech[15] andMattel Power Glove[16] sends out pings from the source (usually mounted on the hand) received by microphones in the environment. Precise placement of the microphones allows the system to locate the source in space to within a few millimeters. These systems rely on line-of-sight between the source and the microphones, and can suffer from acoustic reflections if surrounded by hard walls or other acoustically reflective surfaces. Multiple acoustic trackers must operate at non conflicting frequencies, a strategy also used in magnetic tracking [5].
3.2 Glove Tracking

Motion tracking with gloves instrumented with sensors or gloves which emit or reect infrared light performs accurate result [12]. These techniques also give real-time results but are expensive and may put some constraints on the possible hand movements.Inspired by Rich Sayres work of worlds first data glove *17+, Thomas et aldeveloped an inexpensive, light weight glove by using flexible tubes with a light source at one end and a photocell at the other end. He used voltage from each photocell to correlate with finger configuration [5]. In 1983, Gary Grimes developed Digital Data Entry Glove to recognize sign languages for the first time. He used a cloth glove with specifically positioned numerous sewn sensors to track finger movement [18]. . Thomas Zimmermans Data glove[19], Dexterous HandMaster (DHM) [20] and VPLDataGlove [19] are the finest examples of modern glove tracking technologies. The advantage of using glove technologies are faster response time, minimum environment restriction, availability in industry and minimum data loss after occlusion identifies. On the other hand, relying on the software for data resolution and high expense are the only limitations [6].
4.0 Interpreting sign language

Sign language recognition from static and dynamic hand gestures has been an active area of research for last two decades. While there are many different types of gestures, the most
11
structured sets belong to the sign languages. In sign language, where each gesture already has assigned meaning, strong rules of context and grammar are applied to make recognition tractable. To date, most work on sign language recognition has employed expensive datagloves" which tether the user to a stationary machine [21] or computer vision systems limited to a calibrated area [22].Current successful gesture recognition system is based on computer vision technology and Virtual Reality (VR) [23]. The VR glove-based gesture recognition systems use a VR glove to extract a sequence of 3D hand configuration sets which contain finger orientation angles, and use various structures of neural networks[24] or Hidden Markov Models (HMM) [25] to recognize 3D motion data as gestures.
4.1 Using a recurrent neural network

An artificial neural network (ANN) can be defined as a hugely parallel distributed processor consists of simple processing units (figure 4.1), which has a natural tendency for storing experimental knowledge and available it for use [24]. ANN consists of many interconnected processing elements (figure) [25] which is used searching for identification and control gestures, game-playing and decision making, pattern recognition and medical diagnosis [26]. Also ANN has the ability to adaptive self-organizing [25]. Manar [27] used two recurrent neural networks architectures for static hand gesture to recognize Arabic Sign Language (ArSL); Elman recurrent neural networks and fully recurrent neural networks [Figure 4.2 ]. Digital camera and a colored glove were used for input image data. RGB color classification was used to segment the video frames. Thirty segmented features of the hand image were then extracted and grouped to represent single image. Angles and distances were measured between the fingertips and the wrist. 900 colored images were used for training set, and 300 colored images for testing purposes. Results had shown that fully recurrent neural network system (with recognition rate 95.11%) better than the Elman neural network (89.67%) [27].
12
Data Glove
Verfiying the start point(neural network for posture recognition)
Start
Sign language recognition (Recurrent neural network)
Result
Verifying the sampling endpoint (History)

End
Figure 4.1.1: Sign language word recognition system by using recurrent neural network [25]
Fully recurrent neural networks
Elman recurrent neural networks
Figure 4.1.2: Recurrent Neural Networks
13
4.2 Using Hidden Markov Model

A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov chain [Figure] with unobserved (hidden) states. A Hidden Markov Model can be defined by [28]: A set of states {} ={1 + 2 } where 1 is an initial state and 2 is a final state. The transition probability matrix = , where is the transition probability of taking the transition from state i to state j. The output probability matrix = ( ) for discrete HMM and = () a continuous HMM where stands for a discrete observation symbol, and stands for continuous observations of k-dimensional random vectors. For a discrete HMM and ( ) have the following properties: 0, 0 = 1
= 1
If the initial state is of distribution = { }, an HMM can be written in a compact notation to represent the complete parameter set of the model = (, , )
HMMs are widely used in speech, gesture recognition and signal processing systems. HMMs provide the algorithm for modeling dynamical 3-D dependencies and correlations between measurements. The dynamical dependencies are modeled implicitly by a Markov chain with a specied number of hidden states. The initial state for an HMM can be determined by estimating how many different states are involved in specifying a sign language. While better results might be obtained by modifying different states for each sign, a four state HMM with one skip transition was determined to be sufficient for this task [29] (Figure 4.2).
14
Figure 4.2: the four states HMM used for recognition [29].
Schlenzig et al. [31] used hidden Markov models to recognize hello," good-bye," and rotate" in sign language.Wilson and Bobick [32] explored incorporating multiple representations in HMM frameworks, and Campbell et. al. [33] used a HMM-based gesture system to recognize 18 T'ai Chi gestures with 98% accuracy.
5.0 Proposed Methodology

This report propose an inexpensive and light weight tracking device which is influenced by B. Dorner*34+ and Roberts work *35+. An experiment was set to validate the facts and the data of this report. The principal method is to infer a pose from a still frame of the hand wearing a color dotted glove. The glove is designed in a way so that this inference task searches that pose in library database. The library database is generated by sampling records of natural hand poses and indexed by rasterizing images of the poses.A (noisy) input image from the camera is rst transformed into a normalized query. It is then compared to each entry in the database according to a robust distance metric. An evaluation of our data-driven pose estimation algorithm would show a steady increase in retrieval accuracy with the size of the database. [33]
5.1 Glove Design

The glove design is adequately unique that the inference of the pose of a hand can be acquired from a single frame captured by a consumer grade camera. The glove is made of simple transparent polymer. The glove has 16 bright orange (hexadecimal code #FFA500 ) colored patches at the back, 16 lime (hexadecimal code #00FF00 )colored patches at the front and 5 magenta colored (hexadecimal code #FF00FF) patches at the tip of the finger. The system only
15
looks for this three (#FFA500, #00FF00, #FF00FF ) fully saturated colors to distinguish front, back side and the fingertips of the hand.#FFA500, #00FF00, #FF00FF has been classified as master color. The color patches and the pattern on the glove enables quicker and maximum robust pose estimation with less complex color identification algorithms [36].Orange and lime patches are connected at side of each fingers which enables us to easily distinguish the side of the finger. The 3D hand model has 21 degree of freedom (DOF) including 6 DOFs for global transformation and 4 DOFs per finger.
Back view of glove
Front view of glove
Side view of glove
Figure 5.1: Glove Design
16
5.1.1 Glove vs. Bare Hand tracking In bare-hand pose estimation, two very different poses can map to very similar images. This is a difcult challenge that requires slower and more complex inference algorithms to address. An extra step needs to be acquired to obtain the skin data (edge detection)for good results [37]. With gloved hand, very different poses always map to very different images (See Figure 3). This allows us to use a simple image lookup approach.
Figure 5.1.1: Bare hand estimation and edge detection [17]
5.2 Rasterizing the frame

Typical consumer webcam has 30Hz to 60 Hz refresh rate and 24 to 30 frame per second capturing ability. An experiment was set up and Sony Visual Communication 2.0 web-camera (30fps @ 60Hz refresh rate) was used as a webcam. The video was captured with iPiRecoder. The captured video is the collection of captured frame sequences ().Bilateral filter is used to reduce noise and to smooth each frame image. Each frame of is then rasterized using Adobe Image Processor.
5.3 Color and pixel correcting

The rasterized frames are then converted into primary pixel set , where is a set of 100x100 pixel images. The system will only recognize three distinct colors using color pixel classification; magenta, lime and orange. Due to light ambience, webcam capturing sensor sensitivity, converting image format quality, image hue and shadow the captured frame image might lose a
17
significant amount color pixels. To solve this problem all neighbor color close to magenta, lime and orange would beclassified as glove pixel (Table 5.3). The system would reject any other color from the frame including the background color. For maximum result, using three colors other than the glove should be prohibited in the visual area. Table 5.3: Estimating neighboring color Master color #FFA500 (RGB decimal 255, 165, 0) #00FF00 (RGB decimal 0, 255, 0) #FF00FF (RGB decimal 255, 0, 255) Neighboring colors Accept all colors ranging from RGB (205~255, 130~180, 0) Accept all colors ranging from RGB (0~90, 170~255, 0~90) Accept all colors ranging from RGB (170~255, 0~80, 170~255)
After color pixel classification, only two pixel remains; glove pixel and nonglove pixel. Glove pixels would be cropped and decreased into 40x40 pixel micro images. Let denote as Micro Image Setand it would be classified as hand region. Once the hand region is acquired, it will be queried with the library database for positive match. Decreasing the number of the pixel into micro images would optimize further speed querying for the positive match.
5.4 Indexing the library database

The library database is produced sampling all 40x40 pixel natural hand configuration, sign language alphabet and common hand gestures. This database would be used as a reference database. An enriched database that covers all natural hand gestures helps the system toperformeffectively in retrieval accuracy in terms of gestures configuration [34]. In the experiment, a set of 1000 finger configuration D was sampled using iPiMoCap system.
18
Members of the D is denoted as d1, d2, d3... dn (n is natural number). A distance metric between dmand dnis denoted as s(dm,dn). Low-dispersion sampling was used to create a uniform set of samples D from overcomple collection of finger configurations . A sampling algorithm *35+ is used to minimize dispersion at each iteration successfully, The next furthest distance from previous sample +1 = s(d , d ) Where, is given samples at i iteration.
5.5 Matching and Tracking

The tracking is done between the frames. The centroids of each of the visible colored patches in the rasterized frame sequence () pose would be calculated. The system would identify the closest vertex to each centroid. The displacement of each centroid from moving hand then calculated from the difference between two consecutive frames. Correspondence is then established betweencentroids from each frame.
5.6 Pose estimation and finding nearest neighbor

The nearest neighbor pixel is found calculating distance metric between two micro images. Only Hausfroff distances are counted among the pixel points to get a precise divergence. Each micro image searches the library database for the positive ambiguity. To complete the process, each micro image and the database images are compared. The divergence from the database to the query and from the query to the database is calculated. Foreach non-background pixel in one image, the distance is penalized to the closest pixel of the same color in the other image.
Given Distance metric [33] (1, 2) =
1 1
( , )1
2 +( )2
= (, ) 1( , ) = 2( , ) 1 = (, ) 1( , ) 1, 2 = 1, 2 + 1, 2
19
Figure 5.6: Hausdorff-like image distance. A database image and a query image are compared by computing the divergence from the database to the query and from the query to the database. [33]
5.7 Blending nearest neighbor

In order to maximize smooth tracking, n amount of closest pixels are chosen to blend with the background pixel. Blending the a certain amount of neighboring pixels in addition to pose estimation helps to find the distance and thus to calculate the motion more accurately. Let = a set of blended ten closest micro images, is calculated with a Gaussian radial basis kernel [33], ( 1 exp ( exp
1 , 2 ) 2 1 , 2 2
Where is chosen to be the average distance to the neighbors.
5.8 Recognizing ASL

Once the hand region and is determined, pose estimation is completed and nearest neighbors are blended, the system will query the database library for the positive match. The result from the experiment are shown in figure 5.8, hand tracking is enabled to recognize sign language alphabets (J and Z are not shown).
20
Sign Alphabet
Captured Frame
Image denoising and normaliza tion
Database candidate match (Micro Image)
Sign Alphabet
Captured Frame
Image denoising and normaliza tion
Database candidate match (Micro Image)
21
22
Figure 5.8 Interpretation of sign language alphabets.
6.0 Discussion
The proposed method of hand tracking system is a combination and correlation of optical and glove tracking. Although, LED markers, magnetic sensors, silhouette analysis and acoustic tracking exposes robust and smooth tracking and they are used widely in automation, medical and entertainment industry, the proposed idea restricted use of this technologies because they require more sophisticated algorithms, expense and time. Thus, it makes completely affordable by consumers. A detailed comparison of technical data of all hand tracking systems available along with the proposed prototype system is given in table 6.1 Table 6.1: Technical details comparison [7], [8], [9], [10], [11], [5]. Device/Syst Tracking em name trakSTAR Vicon system Retail price/ manufacturin g expense Magnetic Tracking Optical and $30k-150k 10-24 1000 fps 6kg $50,000.00 Camera/ sensor used/DOF 6DOF 1000 fps 2 kg Speed Weight Resolution/ area of coverage 6 Megapixel 1280x1024
23
optical system OptotrakCe rtus
marker tracking Marker system, optical tracking $70,000.00+
camera
pixel
540 camera 8 sensor positions
900
18kg+3.4 kg (system control )
Capture region 3x4 meter marker 512 3m capture area
Ascension (Polhemus ) Exoskeleto ns (joint sensors plus a gyroscope) mechanical system Mattel Power Glove
Magnetic Tracking
$50,000.00
18 sensors, six dofs
120 fps
1.8 kg
Electromag netic tracking
$40,000.00
180 sensors
500 sample/se c
5 kg
No range limit, wired system
Acoustic tracking
$10,000.00
6 DOF
120 sample/se c
1.1 kg
2 m radial aera
Digital Desk Optical [5] tracking, silhouette analysis Videoplace Optical tracking, silhouette analysis MIT Led glvoe LED glove technology
$5,000.00
2 camera
30fps
Less than 1 m
$2000.00
1 camera
24-60 fps
Less than 1 m
16 DOF
100~120 sample/se
0.8 kg
24
c Cyber Glove 22 thin foil strain gauges sewn into the fabric glove to track, Electromag netic $5000.00+ 22 thin foil 300 sample/se c 0.5 kg
Boujou silver bullet Proposed prototype: Color dotted glove
Optical tracking Marker
$40,000.00
10 camera
Full frame 120 fps
2 kg
16 megapixel
$105.00
1 camera, glove has 21 DOF
30 fps at 60Hz refresh rate
Glove weights 100 grams
0.1 meter at 640x360 pixel
and Optical (glove+ tracking webcam+ capturing software)
A detailed comparison of advantage and limitations are given in table 6.2. Table 6.2: Comparison of advantage and limitations[7], [8], [9], [10], [11], [5]. Device/ System Name trakSTAR Advantage High rate data, Highly available in industry Vicon optical system (Motion analysis) OptotrakCertus High rate data, Highly available in industry Minimum data loss after Expense, Occlusion, relies on software for data resolution Low capture rate, small region Limitations Expense, Occlusion
25
occlusion identifies, no environment restriction Ascension (Polhemus) No occlusion, orientation information recorded Exoskeletons (joint sensors plus a gyroscope) mechanical system VPL Data Glove Sayre Glvoe Reasonable cost Effective for multi-functional control Digital Data enry glove Cyber Glove First ASL recognizer m Virtual Technologies. It is comfortable, easy to use, and has an accuracy and precision well suited for complex gestural work or fine manipulations Vicon MX Proposed prototype: Color dotted glove Precise and accurate tracking Light weight, comfortable, faster than using HHM or recurrent neural network because it queries in database for positive match rather than processing topologies. Slow estimation process time, limited accuracy due to inadequate library database Slow processing time Slow speed capturing Less gesture Fits a rigid body skeleton well, high data rate Environment restriction, can be bulky Not accurate in body location
26
Although the proposed system has slow estimation response time because of the rapid access to database for every frame, it managed to show credibility in respect of expense and complexity.
7.0 Conclusion
This report introduced a hand-tracking user-input device composed of a single camera and a polymer glove. The report shows that without using HMM or recurrent neural network, this system can work effectively. The system is logically balanced and should work effectively in 3-D manipulation and pose recognition tasks. The system could be improved by installing fine sensors and inverse kinematics algorithms, but that would restrict the idea of being cost effective. Because the primary purpose of this report is to deliver a robust and low-cost-user input.
8.0 Recommendation:
The proposed system bears more possible extensions. More cameras can be installed for more accuracy as long as the hands do not occlude. Our hand movement and finger configuration can be replaced with LED pens or multi touch interfaces for ease of user experience. Inverse kinematics [25] and optimal smoothness [33] can be applied for more accuracy of the detection and tracking system. Camera calibration process can be improved with better sensor alignment and resolution. The system can also be used in the field of virtual surgery, virtual games and sports alongside recognition of sign language.
27
References:
1. Judith Holt, Sue Hotto and Kevin Cole, Demographic Aspects of Hearing Impairment: Questions and Answers, Third Edition, 1994. 2. Karen Nakamura, About ASL, Deaf Resource Library, http:// www.deaflibrary.org. 3. Heinlein, Robert A. , "Science fiction: its nature, faults and virtues", The Science Fiction Novel, Chicago: Advent, 1959. 4. G.J. Grimes, "Digital Data Entry Glove Interface Device., Bell Telephone Laboratories, Murray Hill. NJ, US Patent 4.414.537, Nov.8.1983. 5. Sturman, D.J., Zeltzer, D. "A survey of glove-based input, IEEE Computer Graphics and Applications, (January 1994). 6. J. Rehg, T. Kanade,DigitEyes: Vision-Based Human Hand-Tracking, School of Computer Science Technical Report CMU-CS-93-220, December 1993. 7. Herman J. and Woltering, Optotrak, Selspot, Gait Measurement in Two-and ThreeDimensional SpaceA Preliminary Report, Cleveland 1994. 8. OptotrakCertus Motion Capture System, available at: http://www.ndigital.com/lifesciences/certus-techspecs.php. 9. ViconMX, available at: http://www.vicon.com/products/sensors.html. 10. Myron Krueger. Artificial Reality 2, Addison-Wesley Professional, 1991. 11. Pierre Wellner, Interecting with paper on the DigitalDesk, Rank Xerox EuroPARC, Cambridge, UK, Volume 36 Issue 7, Pages 87-96, July 1993. 12. Jannick P. Rolland, YohanBaillot, and Alexei A. Goon, A Survey Of Tracking Technology For Virtual Environments, Center for Research and Education in Optics and Lasers (CREOL), University of Central Florida. 13. Polhemus FASTTRAK official website available at: http://www.polhemus.com/?page=Motion_Fastrak 14. Ascension Technologies trakSTAR official website available at http://www.ascensiontech.com/medical/trakSTAR.php 15. Logitech video technologies, http://www.logitech.com/en-us/488/455
28
16. A.G.E. Tech,Abrams Gentile Entertainment, 2009. 17. Vitor F. Pamplona, Leandro A. F. Fernandes, JooPrauchner, Luciana P. Nedel and Manuel M. Oliveira, The Image-Based Data Glove, Proceedings of X Symposium on Virtual Reality (SVR'2008), Joo Pessoa, 2008. Anais do SVR 2008, Porto Alegre: SBC, 2008, pp. 204211. 18. Dr. G. Grimes, Digital Data Entry Glove, US Patent 4,414,537 Patented Nov. 8, 1983 19. Tom Zimmermann et al, Dataglove: A hand gesture interface device, 1985. 20. Ken Pimentel, Kevin Teixeira, "Virtual Reality: through the new looking glass", Intel/Windcrest/McGraw Hill,1993. 21. L. Campbell, D. Becker, A. Azarbayejani, A. Bobick, and A. Pentland, Invariant features for 3-D gesture recognition," Intl. Conf. on Face and Gesture Recogn., pp. 157-162, 1996 22. Y. Cui and J. Weng,Learning-based hand sign recognition." Intl. Work. Auto. Face Gest. Recog. (IWAFGR),, p. 201-206, 1995. 23. Thad Starner, Joshua Weaver, and Alex Pentland, A Wearable Computer Based American Sign Language Recognizer, The Media Laboratory, Massachusetts Institute of Technology, 2001. 24. Marcus Vinicius Lamar, Hand Gesture Recognition using T-CombNET A Neural Network Model dedicated to Temporal Information Processing, Doctoral Thesis, Institute of Technology, Japan, 2001. 25. AnkitChaudhary, J. L. Raheja, Karen Das, and Sonia Raheja. (2011, Feb). Intelligent Approaches to interact with Machines using Hand Gesture Recognition in Natural way A Survey, International Journal of Computer Science & Engineering Survey (IJCSES), vol. 2(1). 26. Jian-kang Wu, Neural networks and Simulation methods, Marcel Dekker, Inc., USA, 1994. Available at:http://books.google.co.in/books/about/Neural_networks_and_simulation_methods. html?id=95iQOxLDdK4C&redir_esc=y
29
27. ManarMaraqa, Raed Abu-Zaiter, Recognition of Arabic Sign Language (ArSL) Using Recurrent Neural Networks, IEEE First International Conference on the Applications of Digital Information and Web Technologies, p. 478-48, 2008. 28. Tie Yang, YangshengXu, Hidden Markov Model for Gesture Recognition, May 1994. 29. Thad Eugene Starner, Visual Recognition of American Sign language Using Hidden markov models, 1999. 30. J. Schlenzig, E. Hunter, and R. Jain, Recursive identification of gesture using hidden Markov models." Proc. Second Ann. Conf.on Appl. of Comp. Vision, p. 187-194, 1994. 31. A. Wilson and A. Bobick. Learning visual behavior for gesture analysis." Proc. IEEE Int'l.Symp. on Comp. Vis, Nov. 1995. 32. C.Y. Suen, M. Berthod, and S. Mori, Automatic recognition of handprinted characters: the state of the art, Proceedings of the IEEE, Vol. 68, No. 4, pp. 469-487, 1980. 33. B. Dorner,Chasing the colour glove: visual hand tracking, 1994. 34. Robert Y. Wang, Jovan Popovic, Real-Time Hand-Tracking with a Color Glove, 2009. 35. White, R., Crane, K., And Forsyth, D. A., Capturing and animating occluded cloth, ACM Transactions on Graphics, 2008. 36. M. Yuan, F. Farbiz, C.M. Manders, T.K. Yin., Robust hand tracking using a simple color classication technique, The International Journal of Virtual Reality, 8(2), 2009.
30

Sign Recognition by Hand Tracking

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Sign Recognition by Hand Tracking

Enviado por

Direitos autorais:

Formatos disponíveis

University of Victoria Faculty of Engineering

ASM SaeedulAlam Electrical Engineering saeedul@uvic.ca September 4, 2012

LIST OF TABLES AND FIGURES

2.0 American Sign Language (ASL)

ASL is the national sign language.

ASL is used alongside other sign languages.

Insignificant use of ASL

Figure 2.0: Usage of ASL [2]

3.0 Hand TrackingTechnologies

3.1 Tracking with interface

3.1.1 Optical Tracking

Reflective marker systems such Elite and Vicon Avalon[9]. .

3.1.2 Magnetic tracking

3.1.3 Acoustic tracking

3.2 Glove Tracking

4.0 Interpreting sign language

4.1 Using a recurrent neural network

Verfiying the start point(neural network for posture recognition)

Sign language recognition (Recurrent neural network)

Verifying the sampling endpoint (History)

Fully recurrent neural networks

Elman recurrent neural networks

Figure 4.1.2: Recurrent Neural Networks

4.2 Using Hidden Markov Model

5.0 Proposed Methodology

5.1 Glove Design

Back view of glove

Front view of glove

Side view of glove

Figure 5.1: Glove Design

Figure 5.1.1: Bare hand estimation and edge detection [17]

5.2 Rasterizing the frame

5.3 Color and pixel correcting

5.4 Indexing the library database

5.5 Matching and Tracking

5.6 Pose estimation and finding nearest neighbor

Given Distance metric [33] (1, 2) =

5.7 Blending nearest neighbor

Where is chosen to be the average distance to the neighbors.

5.8 Recognizing ASL

Image denoising and normaliza tion

Database candidate match (Micro Image)

Image denoising and normaliza tion

Database candidate match (Micro Image)

Figure 5.8 Interpretation of sign language alphabets.

optical system OptotrakCe rtus

marker tracking Marker system, optical tracking $70,000.00+

540 camera 8 sensor positions

18kg+3.4 kg (system control )

Capture region 3x4 meter marker 512 3m capture area

18 sensors, six dofs

Electromag netic tracking

No range limit, wired system

Boujou silver bullet Proposed prototype: Color dotted glove

Optical tracking Marker

Full frame 120 fps

1 camera, glove has 21 DOF

30 fps at 60Hz refresh rate

Glove weights 100 grams

0.1 meter at 640x360 pixel

and Optical (glove+ tracking webcam+ capturing software)

Você também pode gostar