Você está na página 1de 13

Master of Science Thesis Data Communication Systems

Vector Quantization and Speech encoding methods

Brunel University Electronic and Computer Engineering London, UK

Ippokratis Karakotsoglou System Engineer Telecommunications

ABSTRACT
The purpose of this dissertation is to investigate the Vector Quantization methods used to enhance speech quality achieving low-rates. It will try to provide all the necessary details to understand the various Vector Quantizers used and their design as well as their algorithms. The performance of various quantizers is judged based on the two fundamental criteria, the distortion and bit rate. The requirement in speech communications is achieving toll-quality speech without noise and affordably. This can be achieved by basic Vector Quantization and especially by their sophisticated variations. The Vector Quantization has been developed based on Shannons fidelity criterion and is in fact a transition from Shannons terminology block source codes to Vector Quantization. Vector Quantization is in fact the many years development of Shannons proposal on how to minimize distortion. In simulation results figures Shannons bound can always be seen and comparisons with this were made. The document defines the quantization problem and briefly describes the most basic forms of Quantization, Scalar Quantization. It then investigates Vector Quantization and all its aspects related to speech coding. It finishes with the application of speech coding to cellular communications.

GLOSSARY
Signal to Noise Ratio A way to express the effect of noise as a system parameter. It is the ratio of the signal to noise. It is expressed in dB. Quantization The process by which a signal is approximated by its discrete values. The signal is first quantized in discrete values, which are then transmitted to receiver who approximates the signal from these values. Stationary process A process which is independent of time that means its statistical properties remain same within the time intervals Ergodic process The similarity is statistical properties of every sequence produced by a process Lossy coding The loss of quality after a signal reconstruction at the receiver Optimality The principal goals for a communication system in respect the quality and performance Signal Compression The effect of minimizing the storage requirements and hence the rate requirements for a given signal Modulation The modification of signal to a form suitable for transmission Encoder The system hardware or software, which encodes a signal into a form, which satisfies certain requirements for transmission Decoder The system hardware or software, which decodes a signal from a transmission form into another form for further process. Usually located at the receiver Blocklength The dimension of a codebook used for a Vector Quantizer Correlation Statistical properties that define or characterize a signal Sampling The process of generating a discrete time signal from a continuous time Vocoder Abbreviation for Voice Coder

1 INTRODUCTION
1.1 Dissertation Subject Analysis
1.1.1 Speech Coding and Communications analysis
In recent years, there has been an increasing interest in achieving telephone (toll) quality speech coding over communication channels and data networks at rates of 4 Kbps. Although with the emergence of optical fibers bandwidth in wired communications has become inexpensive there is a growing need for bandwidth conservation and enhanced privacy in wireless cellular and satellite communications. The applications of voice communications at low bit rates require that speech signal is in digital format so that it can be processed, stored or transmitted under software control. Speech Coding involves sampling and amplitude quantization. The sampling is always done at a rate equal to or greater than twice the bandwidth of the analogue speech. The representation of the sampled signal varies among different methods and leaves a lot of space for improvement. The objective of speech coding is to represent speech with minimum number of bits while maintaining good quality for the listener. Speech is generally bandlimited to 4 kHz and sampled at 8kHz. One of the techniques for coding speech is Pulse-Code Modulation (PCM), which is simply a quantizer of sampled amplitudes. Quantization of the amplitudes of the sampled signal results in data compression but also introduces some distortion. The minimization of this distortion is considered in the rate distortion theory.

A fundamental result of the rate distortion theory is that quantizing vectors instead of scalars we can achieve better performance. We can also have lower distortion. All the waveform-encoding methods (PCM, DPCM, ADPCM, DM, ADM) provide telephone (toll) quality speech but there is noticeable distortion below 16,000 bits/s. Below 9,600 bits/s LPC is used but there is distortion because of the synthetic quality of speech signal. Speech coding at low-rates is achieved using an analysis-synthesis process. In the analysis stage, speech is represented by a set of parameters that are encoded efficiently. In the synthesis process these parameters are decoded and used with the reconstruction mechanism to form speech. Analysis-synthesis relies on the availability of a parametric model of the source output generation. When such a model exists the transmitter analyses the source output and extract the model parameters, which are transmitted to the receiver. The transmitter uses the model together with the transmitted parameters to approximate the source output. Speech coders or vocoders rely on speech models that do not necessarily produce an exact match of the waveform. They rather produce a near exact match of the waveform with minor differences unnoticeable for the listener. Signal coding can be parametric or non-parametric depending upon whether the actual signal, or its parametric representation, is quantized. There are two categories for quantization methods. These are: Scalar and Vector Quantization. Although Vector Quantization has been studied only after 1980, Shannon had developed a coding structure in his theory of source coding with fidelity criterion. Shannon called vector quantizers as block source codes with fidelity criterion.

By applying Vector Quantization to speech encoding methods including waveform and model-based methods we can improve significantly the quality of speech. In other words a listener would have difficulty to understand the difference between the digitized speech and the analogue speech. Shannon proposed that the structure of the ear and brain do a number of evaluations, appropriate in the case of speech or music transmission. There is for example an intelligibility criterion in which p(x,y) is equal to the relative frequency of incorrectly interpreted words when message x(t) is received as y(t). We could experimentally determine the transmitted signal because ear is insensitive to some of its properties like phase and the sensitivity to amplitude and frequency is logarithmic. This fundamental and all future work of speech coding or improving speech communications should be based on this proposal. Today Vector Quantization methods are widely used in speech coding for digital cellular systems. Today there has been a lot of advances in speech coding using Vector Quantization. Robert Gray and Allen Gersho [6] have done a lot of research in this area. Their work was based on the fundamentals of Shannons theory. Where Shannons optimality criterion was the minimization of the average distortion. This minimization is limited by the rate of the code. Shannon found that in order to satisfy the optimality criterion a source encoder has to operate in a nearest neighbor method (NN) method which means that given an input block, Shannon called block codes in his paper the what we call today vectors, the encoder must select the binary codeword in such a way that when is decoded and reproduced it will have the minimum distortion. The block codes or source codes defined by Shannon we call them today Vector Quantization

which is a mapping of real vectors into binary vectors according to the minimum distortion rule. A vector is a set of signal samples.

1.1.2 Aims and Objectives of the dissertation


The main objective of the project is to investigate Vector Quantization and VQ coding methods that are intended for digital speech communications and to provide a survey of the Vector Quantization (VQ) methodologies used for speech coding. The objective is to show the improved efficiency and ability of Vector Quantization for speech encoding. It is also to show how effectively can be used in wireless networks (cellular telephony) for speech coding and transmission. Writing this document I have been motivated or better inspired by the advances in speech coding with Vector Quantization, which have enabled the standardisation of low-rate coding algorithms for digital cellular communications. The efficiency of Vector Quantization attracts today many researchers and leaves space for further improvement. These standardizations are results of a lot of years (more than fifty) of speech coding research which many of them this document will discuss. Until recently, low-rate coding algorithms were of interest only to researchers. Now, speech coding is of interest to a large part of the community of the engineers who are confronted with the task to solve implementation problems such as, fitting an algorithm to a single chip for portable cellular telephones. This is very challenging because by fitting an algorithm to single chip could be also applied to a number of other applications and systems. Even cellular companies invest money for further improvement at low-rates. Vector Quantization has been studied intensively the last twenty-five years and was found that it has enormous potential for high-quality speech coding at low rates.

Speech coding methods are related largely with numerical methods and statistic observations that are computationally intensive and sensitive to machine precision. Additionally these methods employ heuristic, mathematical and statistical methodologies and approximations. While signal processing and communication theory are associated with mathematical and statistical methods, the heuristic methods were established through the years of experimental work. This is to say that research methods solved speech communication problems by evaluating past experiences and moving by trial and error to a solution. In this document I will try to sort through the literature and highlight the key theoretical and heuristic techniques employed in speech-coding algorithms focusing on VQ algorithms. For each method I will give a technical explanation of the technicalities associated with it.

1.1.3 Deliverables
A brief overview of speech analysis and production models will be given. The communication problems will be identified and examined based largely on Shannons paper Mathematical Theory of Communications. Speech coding and transmissions problems will be discussed. Several other key issues for speech coding and transmission will be identified/itemised and examined. Vector Quantization methods shall be investigated and how it manages compression and communications with little distortion unnoticeable from the human ear. By the end of this project the following will have happened: Vector Quantization Speech coding methods superior ability and effectiveness understood and proved. Vector Quantization speech coding in Cellular Telephony

detailed explanation report will have been delivered. Speech analysis and coding methods will have been clarified. More specifically the project will try to give clear and understandable answers to the following questions: How can we minimize the bit rate in the digital representation of a signal without a noticeable or objectionable loss of quality? How can we transmit the encoded signal and recover the signal at the receiver end without a noticeable distortion?

1.1.3 Organization of the document and important considerations


Before I start explaining into the subject of Vector Quantization and its application to speech I need to consider some important aspects related to speech coding and communications. Speech properties need to be discussed and define what a speech signal is and how it can be transmitted and reconstructed with unnoticeable distortion. The importance of Rate Distortion Theory will be discussed and Shannons work will be considered as the basis of all my work. Shannons work is fundamental in communications and most of the work in this document will be based on this theory. It is worth to mention here that although Shannon stated that by coding vectors instead of scalars better performance could be achieved, VQ appeared only in the late 1970s and no significant research results were reported before. Recently however VQ became associated with high quality speech coding at low-rates as I said before. The document has two parts. The first part is smaller and deals with speech signals, communication theory and other waveform encoding methods. The second part deals with Vector Quantization. The conclusions and results from the first part will be used on the second part.

First Part First, I will explain Shannons theory of Communication, Rate Distortion Theory, Information Theory and Data Compression. I will then try to describe the speech as a signal and explain its properties. Next, I will turn to signal processing of speech, human ear processing of speech and mathematical models. I will then explain terms such as quality of speech and speech compression. My starting point will be the definition of speech quality and what we mean by saying high-fidelity speech coders. This basically is based on the Rate Distortion Theory. More specifically the fact that reconstruction of the speech signal does not need to be an exact match of the original is my starting point. I will briefly describe the three categories of codecs, which are the Waveform codecs, the vocoders and the hybrid codecs. I will briefly describe the Channel Vocoder and the transmission of speech. Next, Linear Predictive Coder shall de described followed by CELP which shall be examined in detail since it is a basic start when considering a study in speech communications. G.728, which is fundamental in Speech Coding, will be discussed in detail together with CELP coding methods. G.728 encoder/decoder operation will be presented and discussed. I will then turn to Scalar Quantization and explain how it works. Next I will briefly describe the waveform-encoding methods (PCM, DPCM, ADPCM, DM, ADM) that provide telephone (toll) quality speech and I will explain why there is noticeable distortion below 16,000 bits/s. Below 9,600 bits/s LPC is used but there is distortion because of the synthetic quality of speech signal.

Next, I will describe Scalar Quantization and give a detailed explanation of the Uniform Quantization. Within the same context I will try to analyse the Quantization problem in signal compression. Second Part Initially I will try to give a thorough explanation of Vector Quantization generally as a quantization method and I will try to answer why we would we want to use Vector Quantization over Scalar Quantization. I will present one of the most important aspects of Vector Quantization that is the design of a Vector Quantizer and the generation of a codebook. I will then explain the Linde-Buzo-Gray (LBG) algorithm. Optimality and computational complexity issues will be discussed and examined. Also, structural properties of VQ will be discussed. Then I will describe most of memoryless VQ methods and then Predictive Vector Quantization shall be studied. I will then describe the Trellis-Coded Quantization (TCQ). Application of various VQ on speech coding will be considered and examination of how they improve communications will be examined. At the end the application of VQ methods to cellular communications will be discussed and a detailed mobile system will be presented explaining VQ coders/decoder operation.

1.1.4 Historical Perspective of the subject of the dissertation


Speech coding research started over fifty years ago. The motivation for most researchers diving into this field was to develop systems for transmission of speech over low bandwidth. At the early stages of speech coding research they were trying to transmit speech through telegraph cables. Homer Dudley from Bell laboratories was the first researcher to show that there is no need for the speech signal to be

reconstructed accurately and eventually provided the first analysis by synthesis method for speech coding. Dudley invented the first what we called vocoders (voice coders). See the image at the end of the document. The vocoder consists of an analyser (transmitter) and a synthesizer (receiver). The speech is fed into 16 filters of the analyser each of which determines the strength of the speech signal in a particular band of frequencies and transmits it to the synthesizer. Also, some processing is made to determine whether the sound is voiceless or voiced and if yes what is the pitch. At the synthesizer if the signal is voiceless then noise is produced and if it is voiced a series of electrical pulses is produced at the same rate that the air puffs were produced. They are then passed through a filter to produce the speech. That time, the need for communication over low-bandwidth channels drove the research of speech coding. Dudleys idea was a to analyse speech in terms of its pitch and spectrum and use a ten band-pass filter to reconstruct it as I mentioned. PCM appeared in 1940 and later were discovered DPCM and ADPCM that gave a bit-rate of 64 Kbits/s and 32 Kbits/s. After the First World War the need for digital representation of speech new methods gained the interest of the researchers due to encryption and high fidelity transmission. PCM, DPCM, ADPCM were proposed but the need for low bit rates speech coding was increasing. The need for low-rate speech coders capable of producing highquality speech for communications was growing and Vector Quantization promoted by Gersho and Gray in the 1980s and 1990s gave a solution to this problem. Vector Quantization proved to be very useful in encoding LPC parameters when it was first applied to analysis/synthesis of speech. This allowed Linear Predictive Coding (LPC)

rates to be dramatically reduced to 800 b/s with very little loss in quality of speech. Atal and Schroeder proposed a linear prediction algorithm with stochastic vector excitation, which they called Code Excited Linear Prediction (CELP). During the past 20 years VQ has proved to be a valuable method for coding speech. Especially this was done under noisy environments or heavily disturbed background. This has given researchers a significant push forward to study further VQ methods. I believe this is only the beginning of a new period in Quantization methods for Speech and Image coding methods.

Você também pode gostar