Você está na página 1de 36

A Seminar on

Presented by Mr. Sachin Deshmukh. 9970406068 Guided By Prof.S.K.Sonkar

Presentation Flow

Introduction Related Work Various Approaches Various Modules Working Performance Parameters Of The System Applications Advantages and Disadvantages Future Scope Conclusion References

Introduction
Introduction and Motivation
History Traditional Method Need ?

Overview Of Android Speech to


Text Converter

Speech recognition is done via the Internet Use of HMM

Related Work

Related Work

1. Speech to Text Conversion using Android Platform

International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.253-258

Related Work

2. Android Speech To Text Converter For SMS Application

IOSR Journal of Engineering Mar. 2012, Vol. 2(3) pp: 420-423

Related Work

3. Smart Texting System

International Journal of Engineering Research and Applications (IJERA) ISSN: 22489622 www.ijera.com Vol. 2, Issue 2,Mar-Apr 2012, pp.1126-1128

Literature Survey

4. SMS Application Using Speech To Text Converter In Android Mobiles

Association for International Journal in Computer Science & Electronics Volume I Reference ID: aijcse2005.

Various Approaches
Template-Based Approaches Knowledge-Based Approaches Neural Network-Based Approaches Hidden Markov Model (HMM)-Based Speech

Recognition

1.Template Based Approach


It is a process of matching unknown speech is compared against a set of pre-recorded words (templates) to find the best match. Disadvantage that the pre-recorded templates are fixed, so variations in speech can only be modeled by using many templates per word, which eventually becomes impractical. Template preparation and matching become prohibitively expensive or impractical. vocabulary size increases beyond a few hundred words. requires storage and processing power to perform the matching. Template matching was also heavily speaker dependent and continuous speech recognition was also impossible.

2. Knowledge Based Approaches


The expert knowledge about variations in speech is

hand-coded into a system. It uses set of features from the speech, and then the training system generates set of production rules automatically from the samples. This has the advantage of explicitly modeling variations in speech; but unfortunately such expert knowledge is difficult to obtain and use successfully, so this approach was judged to be impractical, and automatic learning procedures were sought instead.

3. Neural Network-Based Approaches


Another approach in acoustic modeling is the use of

neural networks. They are capable of solving much more complicated recognition tasks, but do not scale as excellent as Hidden Markov Model (HMM) when it comes to large vocabularies A neural network (NN) is an interconnected group of natural or artificial neurons that uses a mathematical or computational model for information processing

4. Hidden Markov Model (HMM)Based Speech Recognition


In this approach the speech data is trained and the

pattern representative is created using one or more test patterns. then HMM-Based Recognition is used. Recognition or pattern classification is the process of comparing the unknown test pattern with each sound class reference pattern and computing a measure of similarity (distance) between the test pattern and each reference pattern. An Hidden Markov model is use for speech recognition, which converts the speech to text.

Various Modules
1. Speech Recognition 2. Speech Preprocessing 3. Hmm Training

1. SPEECH RECOGNITION
Speech samples are obtained from speaker at real time. For speech recognition we require microphone There is need to store the sample of different users to make

system more compatible to any type of voice.


Sr. No. 1. 2. 3. 4. Voice Speech Cute Google Yahoo

Text Nick\Now is Spite and/this/yes Down to/now

Table showing output generated by acquisition module

2. SPEECH PREPROCESSING
The voice which is taken at the real time will require

noise free speech signals background noise that need to be removed. The preprocessing reduces the amount of efforts in next stages. Input to the speech preprocessing is speech signals which then converted into speech frames and gives unique sample.

Steps in Speech Preprocessing


Steps: 1. The system must identify useful or significant samples from the

speech signal. To accomplish this goal, the system divides the speech samples into overlapped frames. 2. The system performs checks for the voice activity using endpoint detection and energy threshold calculations. 3. The speech samples are then passed through a pre-emphasis filter. 4. The frames with voice activity are passed through a Hamming window. The system performs autocorrelation analysis on each frame. 6. The system finds linear predictive coding (LPC) coefficients using the Levinson and Durbin algorithm. We apply a Hamming window to each frame to minimize signal discontinuities at the beginning and end of the frame.

3. HMM TRAINING
Training involves creating a pattern representative of

the features of a class using one or more test patterns. A model commonly used for speech recognition is the HMM, which is a statistical model used for modeling an unknown system using an observed output sequence.

Working
Speech Recognition Speech Processing Noise Free Data HMM Training

HMM Based Recognition

Googles speech recognition engine.

Hidden Markov Model

Text Storage

Speech To Text Conversion System

Performance Parameters Of The System


1. Accuracy of Recognition Accuracy is measured with the Word Error Rate (WER), whereas speed is measured with the real time factor. WER can be computed by the equation,

WER=(S+D+I)/N
Where S is the number of substitutions, D is the number of the deletions, I is the number of the insertions and N is the number of words in the reference.

Performance Parameters Of The System


2. Speed of Recognition The speed of a speech recognition system is commonly measured in terms of Real Time Factor (RTF). It takes time P to process an input of duration I. It is defined by the formula,

RTF=P/I

Applications

Applications

1. Sending Message

2. Email

Applications

3.Web Search

Applications

4.Voice Dial

5. Data Entry

6.Speech To Text Convertor in Mobile Phones


Applications

Advantages
Natural way of interaction.it is not necessary to sit at a

keyboard. Faster data processing. No training required for users! Useful for physically handicapped people. We can use Google's speech recognition engine for memory savings.

Disadvantages
If there is noise or some other sound in the room (e.g. the

television or a kettle boiling), the number of errors will increase. The microphone is close to the user More distant microphones (e.g. on a table or wall) will tend to increase the number of errors. Requires preprocessing after acquiring of speech. The system requires training of voice because of the purpose of it should recognize the correct voice of that user. the need for permanent Internet connection.

Future Scope
Further work is planned to implement the model of speech recognition for different language.

Conclusion

Reference
[1] B. Raghavendhar Reddy, E. Mahender : Speech to Text

Conversion using Android Plat-form,International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 1, January -February 2013, pp.253-258 [2] Ms. Anuja Jadhav,Prof. Arvind Patil : Android Speech to Text Converter for SMS Application,IOSR Journal of Engineering Mar. 2012, Vol. 2(3) pp: 420-423 [3] Jagriti Chand:Sms Application Using Speech To Text Convertor In Android Mobiles, Volume I Issue I Reference Id: Aijcse2005 [4] M. Bacchiani, F. Beaufays, J. Schalkwyk, M. Schuster, and B. Strope. Deploying GOOG-411: Early lessons in data, measurement, and testing. In Proceedings of ICASSP, pages 52605263, 2008.

Questions???

Thank You.

Você também pode gostar