Speech Processing and Its Applications

Speech Processing and its p g Applications
A. Muthamizh Selvan
Research Scholar (Ph.D.) Department of Computer Science & Engg. Bharathiar University Coimbatore - 641 046.
muthamizh@ieee.org @ g
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore
Speech = Sound
Sound is a wave (Disturbance) Sound needs a medium to travel Sound vibrates the air like a slinky What is Wave ?
Wave is a disturbance traveling through a medium by which energy is transferred from one particle of the medium to another without causing any permanent displacement of the medium itself. A wave can be described as a disturbance that travels through a medium, transporting energy from one location to another location. The medium is simply the material through which the disturbance is moving moving. A wave is a transfer of energy from one point to another without the transfer of material between the two points.
Example
A sine wave is the simplest form of this wave motion
What is Sound ?
Sound is the result of a mechanical disturbance of some object in a physical medium, such as air. This mechanical disturbance generates vibrations that can be represented as electrical signals by means of a device (for example, a microphone), that converts these vibrations into a time-varying voltage. - Eduardo Reck Miranda, University of Plymouth Sound is a wave motion propagated in an elastic medium, traveling in both transverse and longitudinal directions producing an auditory directions, sensation, by the change of pressure at the ear. Sound is a wave which is created by vibrating objects and propagated through a medium from one location to another.
Characteristics of Sound
Sound i a M h i l Wave S d is Mechanical W Sound is a Longitudinal Wave Sound is a Pressure Wave Mechanical Wave
A sound wave is transported through a medium via the mechanism of particle interaction is characterized as a mechanical wave. Example : Tuning Fork p g
Longitudinal Wave
Longitudinal sound waves are waves in which the motion of the individual particles of the medium is in a direction which is parallel (and antiparallel) to the direction of energy transport Example : Slinky
Longitudinal sound wave
Energy moves left and right Coil moves left and right
Energy transport
Note: Speech is a longitudinal sound wave

Pressure Wave
A sound wave consists of a repeating pattern of high pressure and low p essu e eg o s ov g pressure regions moving through a medium is referred to as a pressure oug ed u s e e ed o p essu e wave. Pressure Wave
C Compressions R Rarefactions
The compressions are regions of high air pressure while the rarefactions are regions of low air pressure.
Types of Sound Wave Signal Discrete Time Signal ( Eg. Speech Signal ) Continuous Time Signal ( Eg. Music Signal )
Praat
7
Digital Representation of Sound
Analog - to - Digital Analog-to-Digital conversion of signal is in two steps g g g p Sampling Quantization (Code Word Generation) Sampling
The sound pressure level when picked up by a microphone becomes an electrical (analog) signal. Analog to Digital Converter (ADC) convert the electrical signal into Digital form.
Quantization
The Digitized form of signal is quantizing as a sequence of numbers (Code words 4 bits or 16 bits) that represents the shape of the electrical signal; which represented the shape of the sound wave
Sample Frequency
The Sample frequency of a sound is equal to the number of cycles which occur every second (" l per second", " " or "H ") d ("cycles d" "cps" "Hz").
Sample Points are 001, 100, 101, 110, 010, 001, 001, 010
Sampling Rate
The frequency of this sampling process is called sampling frequency or sampling rate, and it i measured i H t (H ) li t d is d in Hertz (Hz).
Sampling Theorem
The sampling theorem states that in order to accurately represent a sound digitally, the sampling rate must be higher than at least twice the value of the highest frequency contained in the signal
File formats of Sound (.wav) adopted by Microsoft (.voc) ( voc) adopted by Creative Lab's Sound Blaster Lab s (.snd and .au) originated by NeXT and Sun computers (.aif ) originated by Apple computers (.avr) adopted b A i and A l computers ( ) d d by Atari d Apple (.ils) New TIMIT database format p g (.adf ) CSRE software package format (.adc) old TIMIT database format
10
Sound Examples
Highly Good Voice: g y 44100 samples per second results in a sound file of 669 kB Good Voice: 11050 samples per second results is a sound file of 167 kB Bad Voice: 5500 samples per second results in a sound file of 83 kB l d lt i d fil f Distorted Voice: 2250 samples per second results in a sound file of 35 kB Highly Distorted Voice: 1125 samples per second results i a sound fil of 17 kB l d lt in d file f Voice with Noise: 11050 samples per second results in a sound file with Noise
11
Properties of Sound Waves

Period ( T ) The time it takes to complete one wave cycle 1 / Frequency (T = 1 / F) Amplitude (A) Height of the wave Speed of Sound ( V ) Wavelength / Period (V = / T) Frequency * Wavelength (V = F) Wavelength () The distance between adjacent crests Speed f Sound / F S d of S d Frequency ( V / F) (=V
12
V F Relation The sound wave properties Speed, Frequency and Wavelength are not independent (V = F) (F = V / ) (=V / F)
13
Frequency ( F )
The number of times the wavelength occurs in one second Speed f Sound / W l S d of S d Wavelength (F = V / ) th
Frequency is proportional to wavelength

Higher frequencies are interpreted as a higher pitch Lower frequencies are interpreted as a Lower pitch
High Frequency Wave
Low Frequency Wave

14
Energy Intensity = Energy / Time * Area Intensity = Power / Area Pitch ( F0 )

The fundamental frequency of an emitted sound
Formants ( Fn= n*F1 ) F t *F

The harmonic frequencies of the wave are called Harmonics or Formants. The harmonic frequencies change the quality or timbre of the sound, but not the pitch
Zero C Z Crossing R t ( ZCR = ZC / S ) i Rate ZCs

The number of time-domain zero-crossings within a defined region of signal, divided by the number of samples of that region
Example : Praat
15
Speech Processing
Speech Processing Applications Text -To Speech ( TTS ) Spoken Language Recognition (Identification) Speaker Recognition (Id ifi i ) S k R i i (Identification) Speech - To - Text (Speech Recognition) Spoken Language Recognition (Identification) S ii ( ifi i )
The recognizer gets an input utterance. It then performs the same signal preprocessing as the trainer. Then, it will identify the language which is spoken by extracting the feature vectors and comparing the features to all the stored example feature vectors of the languages
Speaker Recognition (Identification)

The recognizer records an utterance from a user. Then it performs the same as Language Recognition System. Then, it will identify the person who is spoken b e tracting the feat re vectors and comparing the ho by extracting feature ectors features to all the stored example features of the utterances for the users
16
Speech - To - Text (Speech Recognition)

Basically, it means talking to your computer, and having it correctly recognize what you are saying. i h t i This system will allow a computer to identify the speech utterance that a person speaks into a Microphone and transcribing speech wave forms of into alphanumeric text and navigational commands
17
Types of Speech Recognition

Isolated Word Recognition Connected Word Recognition Continuous Speech Recognition Spontaneous Speech Recognition
Uses and Applications

Dictation Command and Control Telephony Medical/Disabilities Embedded Applications
Multilingual Speech Recognition

Multi Lingual Speech Recognition consists of recognition engines in more than th one l language th t runs i one common f that in framework t th with a k together ith language identification component which is able to switch between the recognizers. Eg. Verbmobil is a Multilingual Speech Recognition system of three languages, which are German, English, and Japanese
18
Speech - To Translation ( STS )

STS allows translating the speech form of source language into recipient language. Speech-to-Translated Text and Text-to-Translated Speech are the applications itself in the common frame of STS system. Multilingual Translator and Synthesizer are the related and required systems for Speech-to-Speech Translation.
19
Work Done in Speech Processing Applications

Speaker Recognition:
http://www.icsi.berkeley.edu/Speech/speakerid/ http://www.s3.kth.se/signal/edu/projekt/students/01/yellow/ http://www.owlnet.rice.edu/~elec301/Projects01/speaker_id/index.html http://www.seas.upenn.edu/ archanaa/DSP_Project.html http://www seas upenn edu/~archanaa/DSP Project html http://www.data-compression.com/vq.html http://www.cs.joensuu.fi/pages/tkinnu/research/ http://cslu.cse.ogi.edu/HLTsurvey/ch1node9.html
Speech Recognition:
http://www.dcs.shef.ac.uk/~stu/com326/sym.html p http://web1.mtnl.net.in/~nilami/dtw.html
20
Free Software for Speech Recognition

XVoice : http://www.onelist.com/community/xvoice CVoiceControl/kVoiceControl: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html Open Mind Speech: http://freespeech.sourceforge.net //f f GVoice: http://www.cse.ogi.edu/~omega/gnome/gvoice/ ISIP: http://www.isip.msstate.edu/project/speech/ CMU SphinX: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html http://www speech cs cmu edu/sphinx/Sphinx html Ears: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/ NICO ANN Toolkit: http://www.speech.kth.se/NICO/index.html Myers' Hidden Markov Model Software: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/ Jialong He's Speech Recognition Research Tool: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/
21
Commercial Software for Speech Recognition

IBM ViaVoice: http://www-4.ibm.com/software/speech/linux/dictation.html Vocalis Speechware: http://www.vocalisspeechware.com/ Babel Technologies: http://www.babeltech.com p SpeechWorks: http://www.speechworks.com Nuance: http://www.nuance.com Abbot/Abbot Demo: bbo / bbo e o: http://www.softsound.com Entropic: http://www.entropic.com http://www entropic com
22
References
1. H. Lamb, The Dynamical Theory of Sound, 2nd ed. London: Edward Arnold and Co., 1925 2. Lord Rayleigh, The Theory of Sound, 2nd. ed. 1894 (London: Macmillan, 1926, 2 vols. 3. L. E. 3 L E Kinsler and A R Fre F d A. R. Frey, Fundamentals of A t l f Acoustics, 2nd ed John ti ed., Wiley & Sons, 1962 4. Wood, Acoustics, 2nd ed., Dover, 1966 5. L. Rabiner & B. Juang "Fundamentals of Speech Recognition, 1993 . ab e . Jua g undamentals ecognition , 993 6. C. Becchetti and L.P. Ricotti, "Speech Recognition : Theory and C++ Implementation, 1999 7. D. Jurafsky, J. Martin, "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2000 8. J. Deller, J. Hansen, J. Proakis, "Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue), 1999 Reissue) 9. L. Rabiner, R. Schafer, "Digital Processing of Speech Signals, 1978 10. Sinaporn Suebvisai, Paisarn Charoenpornsawat, Alan Black, Monika Woszczyna and Tanja Schultz, Thai Automatic Speech Recognition, y j p g ICASSP-2005
23
Questions
Please
24

Speech Processing and Its Applications

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Speech Processing and Its Applications

Enviado por

Direitos autorais:

Formatos disponíveis

Speech Processing and its p g Applications

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A sine wave is the simplest form of this wave motion

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Note: Speech is a longitudinal sound wave

Digital Representation of Sound

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Properties of Sound Waves

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Frequency is proportional to wavelength

High Frequency Wave

Low Frequency Wave

Energy Intensity = Energy / Time * Area Intensity = Power / Area Pitch ( F0 )

Formants ( Fn= n*F1 ) F t *F

Zero C Z Crossing R t ( ZCR = ZC / S ) i Rate ZCs

Speaker Recognition (Identification)

Speech - To - Text (Speech Recognition)

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Types of Speech Recognition

Uses and Applications

Multilingual Speech Recognition

Speech - To Translation ( STS )

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Work Done in Speech Processing Applications

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Free Software for Speech Recognition

Commercial Software for Speech Recognition

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Você também pode gostar

Formants ( Fn= nF1 ) F t F