Você está na página 1de 24

Speech Processing and its p g Applications

A. Muthamizh Selvan
Research Scholar (Ph.D.) Department of Computer Science & Engg. Bharathiar University Coimbatore - 641 046.

muthamizh@ieee.org @ g

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Speech = Sound
Sound is a wave (Disturbance) Sound needs a medium to travel Sound vibrates the air like a slinky What is Wave ?
Wave is a disturbance traveling through a medium by which energy is transferred from one particle of the medium to another without causing any permanent displacement of the medium itself. A wave can be described as a disturbance that travels through a medium, transporting energy from one location to another location. The medium is simply the material through which the disturbance is moving moving. A wave is a transfer of energy from one point to another without the transfer of material between the two points.

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Example

A sine wave is the simplest form of this wave motion

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

What is Sound ?
Sound is the result of a mechanical disturbance of some object in a physical medium, such as air. This mechanical disturbance generates vibrations that can be represented as electrical signals by means of a device (for example, a microphone), that converts these vibrations into a time-varying voltage. - Eduardo Reck Miranda, University of Plymouth Sound is a wave motion propagated in an elastic medium, traveling in both transverse and longitudinal directions producing an auditory directions, sensation, by the change of pressure at the ear. Sound is a wave which is created by vibrating objects and propagated through a medium from one location to another.

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Characteristics of Sound
Sound i a M h i l Wave S d is Mechanical W Sound is a Longitudinal Wave Sound is a Pressure Wave Mechanical Wave
A sound wave is transported through a medium via the mechanism of particle interaction is characterized as a mechanical wave. Example : Tuning Fork p g

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Longitudinal Wave
Longitudinal sound waves are waves in which the motion of the individual particles of the medium is in a direction which is parallel (and antiparallel) to the direction of energy transport Example : Slinky
Longitudinal sound wave
Energy moves left and right Coil moves left and right

Energy transport

Note: Speech is a longitudinal sound wave


A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Pressure Wave
A sound wave consists of a repeating pattern of high pressure and low p essu e eg o s ov g pressure regions moving through a medium is referred to as a pressure oug ed u s e e ed o p essu e wave. Pressure Wave

C Compressions R Rarefactions

The compressions are regions of high air pressure while the rarefactions are regions of low air pressure.

Types of Sound Wave Signal Discrete Time Signal ( Eg. Speech Signal ) Continuous Time Signal ( Eg. Music Signal )
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Praat
7

Digital Representation of Sound

Analog - to - Digital Analog-to-Digital conversion of signal is in two steps g g g p Sampling Quantization (Code Word Generation) Sampling
The sound pressure level when picked up by a microphone becomes an electrical (analog) signal. Analog to Digital Converter (ADC) convert the electrical signal into Digital form.

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Quantization
The Digitized form of signal is quantizing as a sequence of numbers (Code words 4 bits or 16 bits) that represents the shape of the electrical signal; which represented the shape of the sound wave

Sample Frequency
The Sample frequency of a sound is equal to the number of cycles which occur every second (" l per second", " " or "H ") d ("cycles d" "cps" "Hz").

Sample Points are 001, 100, 101, 110, 010, 001, 001, 010

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Sampling Rate
The frequency of this sampling process is called sampling frequency or sampling rate, and it i measured i H t (H ) li t d is d in Hertz (Hz).

Sampling Theorem
The sampling theorem states that in order to accurately represent a sound digitally, the sampling rate must be higher than at least twice the value of the highest frequency contained in the signal

File formats of Sound (.wav) adopted by Microsoft (.voc) ( voc) adopted by Creative Lab's Sound Blaster Lab s (.snd and .au) originated by NeXT and Sun computers (.aif ) originated by Apple computers (.avr) adopted b A i and A l computers ( ) d d by Atari d Apple (.ils) New TIMIT database format p g (.adf ) CSRE software package format (.adc) old TIMIT database format
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

10

Sound Examples
Highly Good Voice: g y 44100 samples per second results in a sound file of 669 kB Good Voice: 11050 samples per second results is a sound file of 167 kB Bad Voice: 5500 samples per second results in a sound file of 83 kB l d lt i d fil f Distorted Voice: 2250 samples per second results in a sound file of 35 kB Highly Distorted Voice: 1125 samples per second results i a sound fil of 17 kB l d lt in d file f Voice with Noise: 11050 samples per second results in a sound file with Noise
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

11

Properties of Sound Waves


Period ( T ) The time it takes to complete one wave cycle 1 / Frequency (T = 1 / F) Amplitude (A) Height of the wave Speed of Sound ( V ) Wavelength / Period (V = / T) Frequency * Wavelength (V = F) Wavelength () The distance between adjacent crests Speed f Sound / F S d of S d Frequency ( V / F) (=V
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

12

V F Relation The sound wave properties Speed, Frequency and Wavelength are not independent (V = F) (F = V / ) (=V / F)
13

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

Frequency ( F )
The number of times the wavelength occurs in one second Speed f Sound / W l S d of S d Wavelength (F = V / ) th

Frequency is proportional to wavelength


Higher frequencies are interpreted as a higher pitch Lower frequencies are interpreted as a Lower pitch

High Frequency Wave

Low Frequency Wave


A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

14

Energy Intensity = Energy / Time * Area Intensity = Power / Area Pitch ( F0 )


The fundamental frequency of an emitted sound

Formants ( Fn= n*F1 ) F t *F


The harmonic frequencies of the wave are called Harmonics or Formants. The harmonic frequencies change the quality or timbre of the sound, but not the pitch

Zero C Z Crossing R t ( ZCR = ZC / S ) i Rate ZCs


The number of time-domain zero-crossings within a defined region of signal, divided by the number of samples of that region

Example : Praat
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

15

Speech Processing
Speech Processing Applications Text -To Speech ( TTS ) Spoken Language Recognition (Identification) Speaker Recognition (Id ifi i ) S k R i i (Identification) Speech - To - Text (Speech Recognition) Spoken Language Recognition (Identification) S ii ( ifi i )
The recognizer gets an input utterance. It then performs the same signal preprocessing as the trainer. Then, it will identify the language which is spoken by extracting the feature vectors and comparing the features to all the stored example feature vectors of the languages

Speaker Recognition (Identification)


The recognizer records an utterance from a user. Then it performs the same as Language Recognition System. Then, it will identify the person who is spoken b e tracting the feat re vectors and comparing the ho by extracting feature ectors features to all the stored example features of the utterances for the users
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

16

Speech - To - Text (Speech Recognition)


Basically, it means talking to your computer, and having it correctly recognize what you are saying. i h t i This system will allow a computer to identify the speech utterance that a person speaks into a Microphone and transcribing speech wave forms of into alphanumeric text and navigational commands

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

17

Types of Speech Recognition


Isolated Word Recognition Connected Word Recognition Continuous Speech Recognition Spontaneous Speech Recognition

Uses and Applications


Dictation Command and Control Telephony Medical/Disabilities Embedded Applications

Multilingual Speech Recognition


Multi Lingual Speech Recognition consists of recognition engines in more than th one l language th t runs i one common f that in framework t th with a k together ith language identification component which is able to switch between the recognizers. Eg. Verbmobil is a Multilingual Speech Recognition system of three languages, which are German, English, and Japanese
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

18

Speech - To Translation ( STS )


STS allows translating the speech form of source language into recipient language. Speech-to-Translated Text and Text-to-Translated Speech are the applications itself in the common frame of STS system. Multilingual Translator and Synthesizer are the related and required systems for Speech-to-Speech Translation.

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

19

Work Done in Speech Processing Applications


Speaker Recognition:
http://www.icsi.berkeley.edu/Speech/speakerid/ http://www.s3.kth.se/signal/edu/projekt/students/01/yellow/ http://www.owlnet.rice.edu/~elec301/Projects01/speaker_id/index.html http://www.seas.upenn.edu/ archanaa/DSP_Project.html http://www seas upenn edu/~archanaa/DSP Project html http://www.data-compression.com/vq.html http://www.cs.joensuu.fi/pages/tkinnu/research/ http://cslu.cse.ogi.edu/HLTsurvey/ch1node9.html

Speech Recognition:
http://www.dcs.shef.ac.uk/~stu/com326/sym.html p http://web1.mtnl.net.in/~nilami/dtw.html

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

20

Free Software for Speech Recognition


XVoice : http://www.onelist.com/community/xvoice CVoiceControl/kVoiceControl: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html Open Mind Speech: http://freespeech.sourceforge.net //f f GVoice: http://www.cse.ogi.edu/~omega/gnome/gvoice/ ISIP: http://www.isip.msstate.edu/project/speech/ CMU SphinX: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html http://www speech cs cmu edu/sphinx/Sphinx html Ears: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/ NICO ANN Toolkit: http://www.speech.kth.se/NICO/index.html Myers' Hidden Markov Model Software: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/ Jialong He's Speech Recognition Research Tool: http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

21

Commercial Software for Speech Recognition


IBM ViaVoice: http://www-4.ibm.com/software/speech/linux/dictation.html Vocalis Speechware: http://www.vocalisspeechware.com/ Babel Technologies: http://www.babeltech.com p SpeechWorks: http://www.speechworks.com Nuance: http://www.nuance.com Abbot/Abbot Demo: bbo / bbo e o: http://www.softsound.com Entropic: http://www.entropic.com http://www entropic com
22

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

References
1. H. Lamb, The Dynamical Theory of Sound, 2nd ed. London: Edward Arnold and Co., 1925 2. Lord Rayleigh, The Theory of Sound, 2nd. ed. 1894 (London: Macmillan, 1926, 2 vols. 3. L. E. 3 L E Kinsler and A R Fre F d A. R. Frey, Fundamentals of A t l f Acoustics, 2nd ed John ti ed., Wiley & Sons, 1962 4. Wood, Acoustics, 2nd ed., Dover, 1966 5. L. Rabiner & B. Juang "Fundamentals of Speech Recognition, 1993 . ab e . Jua g undamentals ecognition , 993 6. C. Becchetti and L.P. Ricotti, "Speech Recognition : Theory and C++ Implementation, 1999 7. D. Jurafsky, J. Martin, "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2000 8. J. Deller, J. Hansen, J. Proakis, "Discrete-Time Processing of Speech Signals (IEEE Press Classic Reissue), 1999 Reissue) 9. L. Rabiner, R. Schafer, "Digital Processing of Speech Signals, 1978 10. Sinaporn Suebvisai, Paisarn Charoenpornsawat, Alan Black, Monika Woszczyna and Tanja Schultz, Thai Automatic Speech Recognition, y j p g ICASSP-2005
A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

23

Questions

Please

A. Muthamizh Selvan, Research Scholar (Ph.D.), DCSE, Bharathiar University, Coimbatore

24

Você também pode gostar