Você está na página 1de 10

PARK COLLEGE OF ENGINEERING AND TEKNOLOGY KANIYUR,COIMBATORE LIP READING USING NEURAL NETWORK

PRESENTED BY :RAHAMATHULLAH.M.R, 3rd YEAR EEE, SUNDARESWARAN.J, 3rd YEAR EEE,

EMAIL ID:sanjv23@yahoo.com Contact:-9994472853. ABSTRACT


Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze. Neural network is applied in LIP READING, one of the easiest ways to recognize the speech. It is one of the latest techniques widely preferred for speech recognition. We describe a lip reading system that uses both, shape information from the lip contours and intensity information

from the mouth area. Shape information is obtained by tracking and parameterising the inner and outer lip boundary in an image sequence. Intensity information is extracted from a grey level model, based on principal component analysis. In comparison to other approaches, the intensity area deforms with the shape model to ensure that similar object features are represented after non-rigid deformation of the lips. We describe speaker independent recognition experiments based on these features. Preliminary results suggest that similar performance can be achieved by using either shape or intensity information and slightly higher performance by their combined use.

CONTENTS: Neural Networks. Application of Neural Networks Speech Recognition Lip Reading Shape Modelling Intensity Modelling Lip Tracking: Matching the Intensity Model to the Image Speech modeling Conclusion
NEURAL NETWORK: A neural network is a powerful data modeling tool that is able to capture and represent complex input/output A neural network acquires knowledge through learning. A neural network's knowledge is stored within inter-neuron relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system tasks that could to perform those "intelligent" similar performed by the human brain. Neural networks resemble the human brain in the following two ways:

connection strengths known as synaptic weights. Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze. The true power and advantage of neural networks lies in their ability to represent both linear and non-linear relationships and in their ability to learn these relationships directly from the data being modeled. Traditional linear models are simply inadequate when it comes to modeling data that contains non-linear characteristics. The most common neural

the output when the desired output is unknown. A graphical representation of an MLP is shown below.

The MLP and many other neural networks learn using an algorithm called back propagation. With back propagation, the input data is repeatedly presented to the neural network. With each presentation the output of the neural network is compared to the desired output and an error is computed. This error is then fed back (back propagated) to the neural network and used to adjust the weights such that the error decreases with each iteration and the neural model gets closer and closer to producing the desired output. This process is known as "training".

network model is the multilayer perception (MLP). This type of neural network is known as a supervised network because it requires a desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data so that the model can then be used to produce

gesture. The sheer variety and complexity of a word makes recognizing similar words very difficult. . A neural network is a model of the way in which the human brain works. They are ideally suited to all forms of pattern recognition and have the extraordinary ability to learn. Neural networks are capable of incorporating APPLICATIONS NETWORKS: Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including: Facial animation Speech recognition Detection and tracking of moving targets Customer research Data validation Risk management Neural network is applied in LIP READING, one of the easiest ways to recognize the speech. It is one of the latest techniques widely preferred for speech recognition. OF NEURAL multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this potentiality of neural networks to improve the speech recognition accuracy.

SPEECH RECOGNITION: Speech recognition work is one of the most exciting areas of modern computer science research. For the computers to understand speech and

LIP READING: Lip reading involves information the is extraction of visual speech features. The most visual speech contained in the inner and outer lip

contour, it has also been shown that information about the visibility of teeth and tongue provide important speech cues. place Particularly of for fricatives, the articulation can often be

inner lip boundary or by a parametric contour model which represents the lip boundaries. The extracted features are of low dimension and invariant to illumination. Model-based systems

determined visually, i.e. for labiodentals (upper teeth on lower lip), interdentally (tongue behind front teeth) and alveolar (tongue Other of lips. Lip reading approaches can be classified into Image-based systems. Model-based systems. touching speech gum ridge) place. be information might

depend on the definition of speech related features by the user. The definition may therefore not include all speech relevant information and features like the visibility of teeth and tongue which are difficult to represent. The early systems performed well for a speaker independent recognition task, but it did not contain any intensity information which might provide additional speech information. Here we extend this system by augmenting the feature vector with intensity information extracted from the mouth region. We evaluate the contribution of intensity information separately and in combination with shape features. SHAPE MODELLING: For modelling the shape variability of lips, we use an approach based on active shape models. These are statistically based deformable models which represent a contour by a set of points. Patterns of characteristic shape variability are learned from a training

contained in the protrusion and wrinkling

Image-based systems use grey level information from an image region containing the lips either directly or after some processing as speech features. Most image information is therefore retained, but it is left to the recognition system to discriminate speech information from linguistic variability. Model-based systems usually variability and illumination

represent the lips by geometric measures, like the height or width of the outer or

set, using principal component analysis (PCA). The main modes of shape variation captured in the training set can therefore be described by a small number of parameters. The main advantage of this modelling technique is that heuristic assumptions about legal shape Shape model for the inner and outer lip contour with profile vectors, perpendicular to the lip contours. deformation are avoided. Instead, the model is only allowed to deform to shapes similar to the ones seen in the training set. Any shape x representing the co- ordinates of the contour points can be approximated by x=x' + Pb Where x is the mean shape, P the matrix of eigenvectors of the Covariance matrix and b, a vector containing the weights for each eigenvector. Only the first few eigenvectors corresponding to the largest eigen values are needed to describe the main shape variability. We built and tested two Lip model with mean shape and mean intensity

models of the lips: Model 1, which represents the outer lip boundary only and Model 2, which represents the outer and inner lip boundary. The models are used to locate, track and parameterize lip movements in image sequences. The weights for the shape modes are recovered from the tracking results and serve as features for the recognition system.

INTENSITY MODELLING: Several approaches for speech reading, based on intensity information have been developed. Our approach for extracting intensity information is based on principal component analysis and is related to the exigent lips. This approach placed a window around the mouth area on which PCA was performed. Since the window does not deform with the lips, the eigenvectors of the PCA mainly account for intensity variation due to different lip shape and mouth opening. We already obtain detailed information of the lip shape from our shape model by a small number of parameters and are therefore mainly interested which in is intensity information

to obtain the principal modes of profile variation. Any profile h can now be approximated by

where

is the mean profile, Pg the to the largest eigen

matrix of the first column eigenvectors, corresponding values and bg , a vector containing the weights for each eigen vector.

independent of lip shape. Example images of a person We follow an approach, where one dimensional profile is sampled perpendicular to the contour at each model point as shown in Figure 1. But instead of using local grey level models we construct a global grey-level model by concatenating the vectors of all model points to form a global intensity vector h. We then estimate the covariance matrix of the global profile vectors over the training set and perform PCA LIP TRACKING: MATCHING THE INTENSITY IMAGE: The profile model was initially designed and tailored to enable robust tracking of the lips rather than to extract speech information from the profile vectors. The profile model is MODEL TO THE saying the word three with tracking results

used to describe the fit

between the

shape feature system.

parameters vector for

or

the the

intensity recognition

image and the model. During image search the model is aligned to the image as closely as possible by calculating the optimal weights for the first few eigenvectors. The mean square error (MSE) between the aligned profile and the image is used as cost and a minimization algorithm deforms the shape model to find a minimum cost. The profile weight vector for aligning the model is found using

parameters or both parameter sets as Assuming accurate tracking

performance, the shape and intensity parameters are invariant to translation, rotation and scale. The intensity modes account visibility protrusion. Dynamic speech information is important and often less sensitive to inter speaker variability, i.e. intensity for of both, teeth and illumination tongue and differences and differences due to the

And the cost E is obtained using

values of the lips will remain fairly constant during speech while intensity values of the mouth speech. opening will

The profile vectors deform with the shape model and therefore always represent the same object features. The weight vector bg provides information about the principal modes needed to align to the image. We recover the weights from the tracking results and use them as speech features. SPEECH MODELLING: The weights for the shape model and the intensity model are extracted at each image frame to form frame dependent feature vectors for the recognition system. We use either the

change

during

The intensity

values of the lips will vary between speakers but the temporal changes of intensity might be similar for different speakers. Dynamic features will therefore be more robust to different illumination and different speakers. Recognition model-1 accuracy for

by example makes them very flexible and powerful. Neural networks also contribute to other areas of research such as neurology and psychology. We have described lip reading system that uses both, shape and intensity information. An important property of the intensity model is that it deforms with the lip contour model in Recognition model-2 accuracy for order to represent the same object features after lip movements. Recognition tests using only intensity parameters indicate that much visual speech information is contained visibility in of grey level information and tongue. which might account for protrusion or teeth Recognition performance was slightly higher for intensity features than for shape features and their combined use outperformed both feature sets. This excellent application in lip reading is under research and expected to give out lot of fruitful outcomes. Its wide usage for the impaired adds more importance to CONCLUSION: The world of computing has a lot to gain from neural networks. Their ability to learn this application.

BIBLIOGRAPHY: Summerfields Lip reading and audio-visual speech perception. Lanitis, C. J. Taylor and T. F. Cootes, An Automatic system Face Using Identification

Flexible Appearance Models.

Você também pode gostar