Você está na página 1de 3

Speech Compression Using Linear Predictive Coding

Saad Shahid khokhar, saad.khokhar@hotmail.com Muhammad Junaid Ashfaq, m.junaid.a@hotmail.com University of Engineering and Technology, Taxila.
Abstract: Speech Signal coding has been and still is a major issue in the area of digital speech procession. Information Technology advancement has catalysed the generation of Information. As it is unreal to access unlimited bandwidth, Signal Compression is required for high quality speech encryption, storage and communication. This project emphasizes to develop an algorithm for encoding and decoding high quality speech signal with low bit rate. Proposed algorithm analyses the signal quality, bit-rate, processing delay and complexity of the system. We will use a robust technique, Linear Predictive Coding. The system uses tenth order Levinson-Durbin recursion algorithm. Human pitch difference issue will also be addressed by the proposed system.

signal could be distinguished to be different. Several techniques of speech coding such as Linear predictive Coding (LPC), Wave-form coding and sub-band coding exist. The speech signals that need to be coded are wide-band signals with frequencies ranging from 0 to 8 KHz. We have used one of the oldest methods used for compression, Linear Predictive Coding (LPC). It is known for simplicity to the system and uses the least system resources. It is based on the principle of redundancy removing. It analyses the voiced and unvoiced signal to achieve predictive coefficient which then form the basis of forming the Algorithm.

INTRODUCTION The compression of data signals is an integral part of Information Theory. In particular, the field of rate-distortion theory deals with the qualitatively different problems of lossless coding and lossy coding, using the concept of source entropy in the process. Speech coding is the act of transforming the speech signal to a more compact form, which can then be transmitted with a considerably smaller memory. It is not possible to access unlimited bandwidth. Therefore, there is a need to code and compress speech signals. Speech compression is required in long distance communication, high quality speech storage and message encryption. For example, in digital cellular technology many users need to share the same frequency bandwidth. Utilizing speech compression makes it possible for more users to share the available system. Another example where speech compression is needed is in digital voice storage. For a fixed amount of available memory, compression makes it possible to store longer messages. Speech coding is a lossy type of coding, which mean that the output signal does not exactly sound like the input. The input and the output

The speech coder that is developed is analyzed using both subjective and objective analysis. Subjective analysis will consist of listening to the encoded speech signal and making judgments on it quickly. The quality of the played back speech will be solely based on the opinion of listener. An objective analysis will be introduced to technically assess the speech quality and to minimize human bias. The objective analysis will be performed b computing segmental signal to noise ratio (SEGSNR) between the original and the coded speech signal. In the analyses all four parameters of system performance are reviewed that are; bit rate, signal quality, processing delay and complexity. The pitch is also took into account and made the algorithm robust enough to compress the voice signal from male and female utterances.
OBJECTIVES

Speech encoding Speech synthesis

PRE STUDY

Studying the phenomenon of speech from a human yield following results: For certain voice sound, vocal cords vibrate. The rate at which he vocal cord vibrate determines the pitch of voice. For certain fricatives and plosive sound, vocal cords do not vibrate but remain constantly opened. The shape of vocal tract determines the sound. As one speaks, vocal tract changes its shape producing different sound.

of a voiced frame is in fact the dominant frequency in that frame. One way of finding the pitch is to cross correlate the frame. This will strengthen the dominant frequency components and cancel out most of the weaker ones. If the 2 biggest data point magnitudes are within a 100 times of each other, it means that there is some repetition and the distance between these two data points is the pitch. Predictive Coefficients are also found in each separate frame. The gain and the filter coefficients are found using Levinsons method. The transfer function of the time varying digital filter is given by:

H (z)

RESEACH WORK LPC is used to estimate the basic speech parameters that are needed to reconstruct the original signal. These parameters include Voice, Pitch, Gain and predictive coefficients (ak). Firstly, we divide the speech signal in frames of length 30ms. These frames start every 20mSec. Thus each frame overlaps with the previous and next frame. Shown in the figure below: The summation is computed from k=1 to k=10. This is 10th order Levinson Recursion. This means only first 10 coefficients are transmitted to the LPC synthesizer. To compute the coefficients we use the auto-correlation method. This is selected because in this method the denominator of the above equation has all roots in the unit circle guaranteeing system stability. If the frame is voiced then pulse train is used to represent it with non-zero taps at the start of pitch period. If the frame is unvoiced then white noise is used to represent it and a pitch 0 is transmitted.

Figure 1: Audio signal to separate frames After separating frames, LPC analyzer involves a decision making process of finding if the frames are voiced or unvoiced. To determine if the frame is voiced or unvoiced you need to find out if the frame has a dominant frequency. If it does, the frame is voiced. If there is no dominant frequency the frame is unvoiced. If the frame is voiced you can find the pitch. The pitch of an unvoiced frame is simply 0. The pitch

Figure 2: Model of the LPC coder The speech signal is then reconstructed by arranging all the frames together with the help of the information sent.

I.

AUTHORS BIBLIOGRAPHY

Figure 3: Block diagram of the LPC system RESULTS ANALYSIS After analyzing subjectively and objectively, the compressed signal is not fully comparable to the original signal but the distortion is not undetectable and all parameters are falling in acceptable range. MR. MUHAMMAD JUNAID ASHFAQ has completed his FSc from Pakistan International School, Riyadh in 2009 with A GRADE. Now he is a student of Telecommunication Engineering in University of Engineering and Technology, Taxila. He has previously written a research paper with the title: Job Saturation in Global Telecom Industry.

Figure 4: Input Speech Signal (test1)

MR. SAAD SHAHID KHOKHAR has completed his FSc from F.G Degree colloge for men, Wah Cantt in 2009 with A GRADE. He was awarded a MERIT SCHOLARSHIP CERTIFICATE in 2008 for obtaining A+ grade in FSc part-I. Now he is having a 4 year BSc program in the field of Telecommunication Engineering at University of Engineering and Technology, Taxila. He has previously written a research paper with the title: Job Saturation in Global Telecom Industry. I. REFERENCES [1] M. H Johnson and A. Alwan Speech Coding: Fundamentals and Applications, to appear as a chapter in the Encyclopedia of Telecommunications, Wiley, December 2002. [2] L. R. Rabinere and R. W. Schafer, Digital Processing of Speech Signals, Prentice- Hall, Englewood Cliffs, NJ, 1978.

Figure 5: LPC reconstructed signal As you see, both the signals are very much likely. The distortion is because of the lossy nature of the system and channel.

[3] B.S. Atal, M. R. Schroeder, and V. stover, Voice Excited Predictive Coding System for Low Bit-Rate Transmission of speech, Proc. ICC, pp.30-37 to 30-40, 1975.

Você também pode gostar