1.1 Motivation: Subband Coding Using Filter Banks OCTOBER 2011

SUBBAND CODING USING FILTER BANKS OCTOBER 2011
DEPT OF ECE, SBC, PATTOOR 1

CHAPTER 1
INTRODUCTION

Speech coding is the process of obtaining a compact representation of voice
signals for efficient transmission over band-limited wired and wireless channels
and/or storage. In general, speech coding is a procedure to represent a digitized
speech signal using as few bits as possible, maintaining at the same time a reasonable
level of speech quality. A not so popular name having the same meaning is speech
compression. Speech coding has matured to the point where it now constitutes an
important application area of signal processing

1.1 MOTIVATION
In the era of third-generation (3G) wireless personal communications
standards, despite the emergence of broad-band access network standard proposals,
the most important mobile radio services are still based on voice communications.
Even when the predicted surge of wireless data and Internet services becomes a
reality, voice will remain the most natural means of human communication, although
it may be delivered via the Internet, predominantly after compression.
Due to the increasing demand for speech communication, speech coding
technology has received augmenting levels of interest from the research,
standardization, and business communities. Advances in microelectronics and the
vast availability of low-cost programmable processors and dedicated chips have
enabled rapid technology transfer from research to product development; this
encourages the research community to investigate alternative schemes for speech
coding, with the objectives of overcoming deficiencies and limitations. The
standardization community pursues the establishment of standard speech coding
methods for various applications that will be widely accepted and implemented by
the industry. The business communities capitalize on the ever-increasing demand and
opportunities in the consumer, corporate, and network environments for speech
processing products.

1.2 OVERVIEW OF SPEECH CODING
This section describes the structure, properties, and applications of speech
coding technology. Speech coding is the art of creating a minimally redundant
representation of the speech signal that can be efficiently transmitted or stored in
digital media, and decoding the signal with the best possible perceptual quality. Like
any other continuous-time signal, speech may be represented digitally through the
processes of sampling and quantization; speech is typically quantized using either
16-bit uniform or 8-bit companded quantization. Like many other signals, however, a
sampled speech signal contains a great deal of information that is either redundant
(nonzero mutual information between successive samples in the signal) or
perceptually irrelevant (information that is not perceived by human listeners). Most
telecommunications coders are lossy, meaning that the synthesized speech is
perceptually similar to the original but may be physically dissimilar.
Speech coding is performed using numerous steps or operations specified as
an algorithm. An algorithm is any well-defined computational procedure that takes
some value, or set of values, as input and produces some value, or set of values, as
output. An algorithm is thus a sequence of computational steps that transform the
input into the output. Many signal processing problemsincluding speech coding
can be formulated as a well-specified computational problem; hence, a particular
coding scheme can be defined as an algorithm. In general, an algorithm is specified
with a set of instructions, providing the computational steps needed to perform a
task. With these instructions, a computer or processor can execute them so as to
complete the coding task. The instructions can also be translated to the structure of a
digital circuit, carrying out the computation directly at the hardware level.

1.3 APPLICATIONS OF SPEECH CODERS
Speech coding has an important role in modern voice-enabled technology,
particularly for digital speech communication, where quality and complexity have a
direct impact on the marketability and cost of the underlying products or services .
There are many speech coding standards designed to suit the need of a given
application.

More recently, with the explosive growth of the internet, the potential market
of voice over internet protocol (voice over IP, or VoIP) has lured many companies to
develop products and services around the concept. Speech coding will play a central
role in this revolution.
Another smaller-scale area of application includes voice storage or digital
recording, with some outstanding representatives being the digital telephone
answering device (DTAD) and solid-state recorders. For these products to be
competitive in the marketplace, their costs must be driven to a minimum. By
compressing the digital speech signal before storage, longer-duration voice messages
can be recorded for a given amount of memory chips, leading to improved cost
effectiveness.
Techniques developed for speech coding have also been applied to other
application areas such as speech synthesis, audio coding, speech recognition, and
speaker recognition. Due to the weighty position that speech coding occupies in
modern technology, it will remain in the center of attention for years to come.

1.4 OBJECTIVE OF PRESENT WORK
The main objectives of the project can be divided into three goals;
- To study the basics of speech coding.
- To study design and implementation of filter bank.
- Understand Matlab functions.
- To find out compression ratio.


CHAPTER 2

LITERATURE REVIEW

The history of audio and music compression was beginning in the 1930s with
research into pulse-code modulation (PCM) and PCM coding. Compression of
digital audio was started in the 1960s by telephone companies who were concerned
with the cost of transmission bandwidth. The 1990s have seen improvements in these
earlier algorithms and an increase in compression ratios at given audio quality levels.
Speech compression is often referred to as speech coding which is defined as a
method for reducing the amount of information needed to represent a speech signal.
Most forms of speech coding are usually based on a lossy algorithm. Lossy
algorithms are considered acceptable when encoding speech because the loss of
quality is often undetectable to the human ear.

2.1 INTRODUCTION
Speech coding is fundamental to the operation of the public switched
telephone network (PSTN), videoconferencing systems, digital cellular
communications, and emerging voice over Internet protocol (VoIP) applications. The
goal of speech coding is to represent speech in digital form with as few bits as
possible while maintaining the intelligibility and quality required for the particular
application . Interest in speech coding is motivated by the evolution to digital
communications and the requirement to minimize bit rate, and hence, conserve
bandwidth. There is always a tradeoff between lowering the bit rate and maintaining
the delivered voice quality and intelligibility; however, depending on the application,
many other constraints also must be considered, such as complexity, delay, and
performance with bit errors or packet losses.
Based on these developments, it is possible today, and it is likely in the near
future, that our day-to-day voice communications will involve multiple hops

including heterogeneous networks. This is a considerable departure from the plain
old telephone service (POTS) on the PSTN, and indeed, these future voice
connections will differ greatly even from the digital cellular calls connected through
the PSTN today. As the networks supporting our voice calls become less
homogeneous and include more wireless links, many new challenges and
opportunities emerge. There was almost an exponential growth of speech coding
standards in the 1990's for a wide range of networks and applications, including the
PSTN, digital cellular, and multimedia streaming.
In order to compare the various speech coding methods and standards, it is
necessary to have methods for establishing the quality and intelligibility produced by
a speech coder. It is a difficult task to find objective measures of speech quality, and
often, the only acceptable approach is to perform subjective listening tests [5].
However, there have been some recent successes in developing objective quantities,
experimental procedures, and mathematical expressions that have a good correlation
with speech quality and intelligibility.

2.2 BASIC ISSUES IN SPEECH CODING
Speech and audio coding can be classified according to the bandwidth
occupied by the input and the reproduced source. Narrowband or telephone
bandwidth speech occupies the band from 200 to 3400 Hz, while wideband speech is
contained in the range of 50 Hz to 7 kHz. High quality audio is generally taken to
cover the range of 20 Hz to 20 kHz.
Given a particular source, the classic tradeoff in lossy source compression is
rate versus distortion--the higher the rate, the smaller the average distortion in the
reproduced signal. Of course, since a higher bit rate implies a greater bandwidth
requirement, the goal is always to minimize the rate required to satisfy the distortion
constraint. For speech coding, we are interested in achieving a quality as close to the
original speech as possible. Encompassed in the term quality are intelligibility,
speaker identification, and naturalness. Absolute category rating tests are subjective
tests of speech quality and involve listeners assigning a category and rating for each
speech utterance according to the classifications, such as, Excellent (5), Good (4),

Fair (3), Poor (2), and Bad (1). The average for each utterance over all listeners is the
Mean Opinion Score (MOS).
Although important, the MOS values obtained by listening to isolated
utterances do not capture the dynamics of conversational voice communications in
the various network environments. It is intuitive that speech codec should be tested,
within the environment and while executing the tasks, for which they are designed.
Thus, since we are interested in conversational (two-way) voice communications, a
more realistic test would be conducted in this scenario. Recently, the perceptual
evaluation of speech quality (PESQ) method was developed to provide an
assessment of speech codec performance in conversational voice communications.
The PESQ has been standardized by the ITU-T as P.862 and can be used to generate
MOS values for both narrowband and wideband speech.

2.4 SPEECH CODING STANDARDS
Standards exist because there are strong needs to have common means for
communication: it is to everyones best interest to be able to develop and utilize
products and services based on the same reference [2]. The standard bodies are
organizations responsible for overseeing the development of standards for a
particular application. Brief descriptions of some well-known standard bodies are
given here.
- International Telecommunications Union (ITU): The Telecommunications
Standardization Sector of the ITU (ITU-T) is responsible for creating speech
coding standards for network telephony. This includes both wired and
wireless networks.
- Telecommunications Industry Association (TIA): The TIA is in charge of
promulgating speech coding standards for specific applications. It is part of
the American National Standards Institute (ANSI). The TIA has successfully
developed standards for North American digital cellular telephony, including
time division multiple access (TDMA) and code division multiple access
(CDMA) systems.

- European Telecommunications Standards Institute (ETSI): The ETSI has
memberships from European countries and companies and is mainly an
organization of equipment manufacturers. ETSI is organized by application;
the most influential group in speech coding is the Groupe Speciale Mobile
(GSM), which has several prominent standards under its belt.
- United States Department of Defense (DoD): The DoD is involved with the
creation of speech coding standards, known as U.S. Federal standards, mainly
for military applications.
- Research and Development Center for Radio Systems of Japan (RCR):
Japans digital cellular standards are created by the RCR.

2.5 METHODS OF SPEECH CODING.
The process of breaking the input speech into subbands via bandpass filters
and coding each band separately is called subband coding. To keep the number of
samples to be coded at a minimum, the sampling rate for the signals in each band is
reduced by decimation. Of course, since the band pass filters are not ideal, there is
some overlap between adjacent bands and aliasing occurs during decimation.
Ignoring the distortion or noise due to compression, Quadrature mirror filter (QMF)
banks allow the aliasing that occurs during filtering and sub sampling at the encoder
to be cancelled at the decoder. The codecs used in each band can be PCM, ADPCM,
or even an analysis-by-synthesis method. The advantage of sub band coding is that
each band can be coded differently and that the coding error in each band can be
controlled in relation to human perceptual characteristics.
Transform coding methods were first applied to still images but later
investigated for speech. The basic principle is that a block of speech samples is
operated on by a discrete unitary transform and the resulting transform coefficients
are quantized and coded for transmission to the receiver. Low bit rates and good
performance can be obtained because more bits can be allocated to the perceptually
important coefficients, and for well-designed transforms, many coefficients need not
be coded at all, but are simply discarded, and acceptable performance is still
achieved. Although classical transform coding has not had a major impact on

narrowband speech coding and sub band coding has fallen out of favor in recent
years, filter bank and transform methods play a critical role in high quality audio
coding, and at least one important standard for wideband speech coding (G.722.1) is
based upon filter bank and transform methods. Although it is intuitive that subband
filtering and discrete transforms are closely related, by the early 1990's, the
relationships between filter bank methods and transforms were well-understood.
Today, the distinction between transforms and filter bank methods is somewhat
blurred, and the choice between a filter bank implementation and a transform method
may simply be a design choice.


CHAPTER 3

SUBBAND CODING.

3.1 INTRODUCTION

Sub-Band Coding (SBC) is a powerful and general method of encoding audio
signals efficiently. Unlike source specific methods (like LPC, which works only on
speech), SBC can encode any audio signal from any source, making it ideal for
music recordings, movie soundtracks, and the like. MPEG Audio is the most popular
example of SBC. This document describes the basic ideas behind SBC and discusses
some of the issues involved in its use.

3.1.1 BASIC PRINCIPLES
SBC depends on a phenomenon of the human hearing system called masking.
Normal human ears are sensitive to a wide range of frequencies. However, when a
lot of signal energy is present at one frequency, the ear cannot hear lower energy at
nearby frequencies. We say that the louder frequency masks the softer frequencies.
The louder frequency is called the masker. (Strictly speaking, what we're describing
here is really called simultaneous masking (masking across frequency). There are
also non simultaneous masking (masking across time) phenomena, as well as many
other phenomena of human hearing, which we're not concerned with here. For more
information about auditory perception, see the upcoming Auditory Perception OLT.)
The basic idea of SBC is to save signal bandwidth by throwing away information
about frequencies which are masked. The result won't be the same as the original
signal, but if the computation is done right, human ears can't hear the difference.


3.1.2 ENCODING OF AUDIO SIGNALS
The simplest way to encode audio signals is Pulse Code Modulation (PCM),
which is used on music CDs, DAT recordings, and so on. Like all digitization, PCM
adds noise to the signal, which is generally undesirable. The fewer bits used in
digitization, the more noise gets added. The way to keep this noise from being a
problem is to use enough bits to ensure that the noise is always low enough to be
masked either by the signal or by other sources of noise. This produces a high quality
signal, but at a high bit rate (over 700k bps for one channel of CD audio). A lot of
those bits are encoding masked portions of the signal, and are being wasted.
There are more clever ways of digitizing an audio signal, which can save
some of that wasted bandwidth. A classic method is nonlinear PCM, such as mu-law
encoding (named after a perceptual curve in auditory perception research). This is
like PCM on a logarithmic scale, and the effect is to add noise that is proportional to
the signal strength. Sun's .au format for sound files is a popular example of mu-law
encoding. Using 8-bit mu-law encoding would cut our one channel of CD audio
down to about 350k bps, which is better but still pretty high, and is often audibly
poorer quality than the original (this scheme doesn't really model masking effects).

3.1.3 A BASIC SBC SCHEME
To enable higher quality compression, one may use subband coding. First, a
digital filter bank divides the input signal spectrum into some number of subbands.
The psychoacoustic model looks at the energy in each of these subbands, as well as
in the original signal, and computes masking thresholds using psychoacoustic
information. Each of the subband samples is quantized and encoded so as to keep the
quantization noise below the dynamically computed masking threshold. The final
step is to format all these quantized samples into groups of data called frames, to
facilitate eventual playback by a decoder. Decoding is much easier than encoding,
since no psychoacoustic model is involved. The frames are unpacked, subband
samples are decoded, and a frequency-time mapping reconstructs an output audio
signal.

Most SBC encoders use a structure like this. First, a time-frequency mapping
(a filter bank, or FFT, or something else) decomposes the input signal into subbands.

Fig3.1 Block diagram of SBC

The psychoacoustic model looks at these subbands as well as the original
signal, and determines masking thresholds using psychoacoustic information. Using
these masking thresholds, each of the subband samples is quantized and encoded so
as to keep the quantization noise below the masking threshold. The final step is to
assemble all these quantized samples into frames, so that the decoder can figure it
out without getting lost.
Decoding is easier, since there is no need for a psychoacoustic model. The
frames are unpacked, subband samples are decoded, and a frequency-time mapping
turns them back into a single output audio signal. This is a basic, generic sketch of
how SBC works. Notice that we haven't looked at how much computation it takes to

do this. For practical systems that need to run in real time, computation is a major
issue, and is usually the main constraint on what can be done.
Over the last five to ten years, SBC systems have been developed by many of
the key companies and laboratories in the audio industry. Beginning in the late
1980's, a standardization body of the ISO called the Motion Picture Experts Group
(MPEG) developed generic standards for coding of both audio and video. MPEG
Audio as a specific example of a practical SBC system

3.2 FILTER BANKS

In signal processing, a filter bank is an array of band-pass filters that
separates the input signal into multiple components, each one carrying a single
frequency subband of the original signal. One application of a filter bank is a graphic
equalizer, which can attenuate the components differently and recombine them into a
modified version of the original signal. The process of decomposition performed by
the filter bank is called analysis (meaning analysis of the signal in terms of its
components in each sub-band); the output of analysis is referred to as a subband
signal with as many subbands as there are filters in the filter bank. The
reconstruction process is called synthesis, meaning reconstitution of a complete
signal resulting from the filtering process.
In digital signal processing, the term filter bank is also commonly applied to
a bank of receivers. The difference is that receivers also down-convert the subbands
to a low center frequency that can be re-sampled at a reduced rate. The same result
can sometimes be achieved by under sampling the band pass subbands.
Another application of filter banks is signal compression, when some
frequencies are more important than others. After decomposition, the important
frequencies can be coded with a fine resolution. Small differences at these
frequencies are significant and a coding scheme that preserves these differences must
be used. On the other hand, less important frequencies do not have to be exact. A
coarser coding scheme can be used, even though some of the finer (but less
important) details will be lost in the coding.

The vocoder uses a filter bank to determine the amplitude information of the
subbands of a modulator signal (such as a voice) and uses them to control the
amplitude of the subbands of a carrier signal (such as the output of a guitar or
synthesizer), thus imposing the dynamic characteristics of the modulator on the
carrier.

Fig 3.2 An Eight - Band Bank Filter
3.3 UP SAMPLING
Up-sampler is used to increase the sampling rate by an integer factor. An up-
sampler with a up-sampling factor L, where L is a positive integer, develops an
output sequence Xu[n] with a sampling rate that is L times larger than that of the
input sequence X[n]. Up-sampler is linear but time-variant discrete time systems.

Fig 3.3 Up-sampler

Up sampling operation is implemented by inserting L-1 equidistant zero-valued
samples between two consecutive samples of X[n].
The input-output relation is given by the following equation:

=
=
otherwise , 0
, 2 , , 0 ], / [
] [
L L n L n x
n x
u

Figure below shows the up-sampling by a factor of 3:

Fig3.4 Example of Up-sampler with factor L=3

In practice, the zero-valued samples inserted by the up-sampler are replaced
with appropriate nonzero values using some type of filtering process and Process is
called interpolation and will be discussed later.

3.3.1 FREQUENCY-DOMAIN CHARACTERIZATION
Consider first a factor-of-2 up-sampler whose input-output relation in the
time-domain is given by

=
=
otherwise , 0
, 4 , 2 , 0 ], 2 / [
] [
n n x
n x
u


In terms of the z-transform, the input-output relation is then given by:

=
= =
even
] 2 / [ ] [ ) (
n
n
n
n
n
u u
z n x z n x z X

[]

In a similar manner, we can show that for a factor-of-L up-sampler

) ( ) (
L
u
z X z X =

On the unit circle, for
e j
e z = , the input-output relation is given by

) ( ) (
L j j
u
e X e X
e e
=

Figure below shows the relation between ) (
e j
e X and ) (
e j
u
e X for L = 2 in the case
of a typical sequence x[n]

Fig 3.5 Relation between ) (
e j
e X and ) (
e j
u
e X for L=2

3.4 DOWN-SAMPLING

Down-sampler is used to decrease the sampling rate by an integer factor. An
down-sampler with a down-sampling factor M, where M is a positive integer,
develops an output sequence y[n] with a sampling rate that is (1/M)th of that of the
input sequence x[n]. Down-sampler is linear but time-variant discrete time systems.

Fig 3.6 Down-sampler

Downsampling operation is implemented by keeping every M-th sample of x[n] and
removing M-1 in-between samples to generate y[n].
The input-output relation is given by the following equation:
y[n] = x[nM]
Figure below shows the down-sampling by a factor of 3:

Fig 3.7 Example for down-sampling with factor M=3

3.4.1 FREQUENCY-DOMAIN CHARACTERIZATION

Applying the Z-transform to the input-output relation of a factor M down-
sampler Y[n] =x [Mn]
We get
Y (z) = []

The expression on the right-hand side cannot be directly expressed in terms of X (z).
To get around this problem, define a new sequence . Where
] [
int
n x

Now can formally related to x[n] by the equation
Where

Then applying Z-transform and reducing finally we get the following equation:
= ) (
int
z X ( )
1
0
1
M
k
k
M
W z X
M

Consider a factor-of-2 down-sampler with an input x[n] whose spectrum is as shown
below. The DTFTs of the output and the input sequences of this down-sampler are
then related as:

)} ( ) ( {
2
1
) (
2 / 2 / e e e j j j
e X e X e Y + =

Figure below shows the relation between ) (
e j
e X and ) (
2 / e j
e X for M = 2

Fig3.8 Relation between ) (
e j
e X and ) (
2 / e j
e X for M = 2

=
=
otherwise , 0
, 2 , , 0 ], [
] [
int
M M n n x
n x
] [ ] [ ] [
int
n x n c n x = ] [
int
n x
=
=
otherwise , 0
, 2 , , 0 , 1
] [
M M n
n c

3.5 SAMPLING RATE CONVERSION BY A FACTOR L/M

With an understanding of the down-sampling and up-sampling processes, we
now study the sampling rate conversion by a non-integer factor of L/M. This can be
viewed as two sampling conversion processes. In step 1, we perform the up-sampling
process by a factor of integer L following application of an interpolation filter H1 (z);
in step 2, we continue filtering the output from the interpolation filter via an anti-
aliasing filter H2 (z), and finally operate down-sampling. The entire process is
illustrated in the below figure.

Fig3.9 Sampling rate conversion by a factor L/M

Since the interpolation and anti-aliasing filters are in a cascaded form and
operate at the same rate, we can select one of them. We choose the one with the
lower stop frequency edge and choose the most demanding requirement for pass
band gain and stop band attenuation for the filter design. A lot of computational
saving can be achieved by using one low pass filter. Let us see one example of CD to
DAT form conversion. The sampling rate in CD is 44.1 kHz and in DAT is48 KHz.
Now the question is how to convert 44.1 KHz data to 48 KHz data? This
is as shown below:
48 / 44.1 = 160 / 147

Fig3.10 44.1 KHz to 48 KHz sampling rate conversion


3.6 ANALYSIS AND SYNTHESIS FILTERS
A basic operation in multirate signal processing is to decompose a signal into
a number of sub-band components, which can be processed at a lower rate
corresponding to the bandwidth of the frequency bands. Down-sampling mixes
frequency components in the original signal by aliasing and frequency folding.
Therefore, the signal should be filtered before decimation. The below figure shows
the decomposition of a signal into two sub-band components. The purpose of the
filters H1 and H2 is to extract the low- and high-frequency components of the signal
x before decimation. The set of filters shown in below figure is called an analysis
filters or analysis filter banks. If required, the signals may be decimated further into
narrower sub-band components.
A convenient way to implement the decimation is to use stages with the
decimation factor M= 2 as shown in figure. Then only one low-pass and one high-
pass filter is required. For M>2, band-pass filters with different Pass-bands are
required as well. The sub-band components obtained from analysis filter are then
allowed for processing.

Fig 3.11 Analysis Filter Bank

After processing of the separate sub-band components, they are combined to
reconstruct (a properly processed version of) the original signal at the original,
higher sampling rate. Up-sampling generates aliasing frequencies. Therefore, the
expanded signals should be filtered in order to extract the correct frequency
components. The set of filters used to reconstruct the desired signal is called
synthesis filter or synthesis filter banks.

The below figure shows the block diagram of synthesis filters:

Fig 3.12 Synthesis Filter Bank

3.6.1 QUADRATURE MIRROR FILTER (QMF)
The basic building block in applications of Quadrature mirror filters (QMF)
is the two-channel QMF bank as shown in the below figure. This is a multirate
digital filter structure that employs two decimators in the signal analysis section
and two interpolators in the signal synthesis section. The low-pass and the high-
pass filters in analysis section have impulse responses H0 (z) and H1 (z),
respectively. Similarly, the low-pass and high-pass filters contained in the synthesis
section have impulse response F0 (z) and F1 (z), respectively

Fig 3.13 Two channel QMF structure


The analysis and the synthesis filters in the above figure are typically
complementary low-pass and high-pass filters that mirror each other about the digital
frequency, /2, as shown in below figure. Such filters are often called quadrature
mirror filters (QMF), since /2 correspond to one fourth the sampling frequency.

Fig 3.14 Magnitude response of Analysis and Synthesis filter


CHAPTER 4
SYSTEM IMPLEMENTATION

4.1 DESIGN OVERVIEW
In this project, we design a Quadrature Mirror Filter (QMF) Bank with
application to sub-band image coding. Then we extend our result to four-band filter
bank. The present invention relates to signal decomposition and reconstruction in sub
band coding and more particularly to analysis and synthesis filter banks that are
designed according to the quadrature mirror filter concept such that the sub band
coding of various types of signal may be accomplished with minimal computational
complexity so as to result in perfect signal reconstruction.
For the non-trivial PR FIR QMF design, we abandon H1 (z) =H0 (-z)
condition. Instead, we design H0 (z) so that it has a power symmetry property. After
designing 2 channel PR FIR QMF bank, we explore the compression strategy. For
transforming a signal to a digital form, we have to quantize the signal representing it
as n-bits. The problem is that how many bits we should allocate to each output of
analysis filter bank. At each output of the analysis filter bank, we need to allocate
different number of bits because each output of the analysis filter bank has a
different signal property (e.g. dynamic range). However, simultaneously, we need to
consider reconstruction quality since as we use fewer bits; we lose more information
of original signal. The key of the compression is to minimize the total number of bits
for representing all the outputs of the analysis filter bank keeping reconstruction
quality requirement. In this project, we use a greedy approach to handle the
compression problem.
In the second part of the project, we will explore the effectiveness of subband
coding for lossy image compression. We will use four compression systems:
1)2-channel PR FIR QMF bank
2) 2-channel Uniform FIR QMF bank,
3) 3-channel PR FIR QMF bank
4) 2D PR FIR QMF bank.

At each compression system, we will use the greedy approach to solve the bit
allocation problem as in the first part of the project. Also, we will investigate the
effect of the optimized quantization method against the uniform quantization
method. For the bit allocation problem, first, we need to define the criterion of good
reconstruction quality somehow so that we stop the routine of allocating bits to each
output of the analysis bank. We will use two ways of evaluating the reconstruction
quality: quantitative and subjective. For the quantitative evaluation, we use the SNR
while for the while for the subjective evaluation; we use a 7-level impairment scale.

4.1.1 PERFECT RECONSTRUCTION FIR QMF BANK DESIGN
In the design of a two-channel perfect reconstruction FIR QMF filter bank,
we try to compress the output signal of the analysis bank by quantization. We will
quantize it with two methods: Uniform quantizer and Optimized quantizer by the
Lloyd-max method and discuss the difference. Also, for the problem of allocating
number of bits to each output of the analysis bank, we will present a greedy
approach.

Fig 4.1 Two channel QMF bank

4.2 SOFTWARE IMPLEMENTATION

A variety of techniques have been developed to efficiently represent speech
signals in digital form for either transmission or storage. Since most of the speech
energy is contained in the lower frequencies, we would like to encode the lower-
frequency band in more bits than the high-frequency band. Sub-band coding is a
method where the speech signal is subdivided into several frequency bands and each
band is digitally encoded separately An example of a frequency subdivision is let us
assume that the speech signal is sampled at a rate Fs samples per second. The first

frequency subdivision splits the signal spectrum into two equal width segments, a
low pass signal (0< F < Fs/4) and a high pass signal (Fs/4 < F < Fs/2). The second
frequency subdivision splits the low pass signal from the first stage into two equal
bands, a low pass signal (0 < F< Fs/8) and a high pass signal (Fs/8 < F < Fs/4).
Finally, the third frequency subdivision splits the low pass signal from the second
stage into two equal bandwidth signals. Thus, the signal is subdivided into 4
frequency bands, covering 3 octaves,
Decimation by a factor of 2 is performed after frequency subdivision. By
allocating a different number bit per samples to the signals in the 4 sub-band, we can
achieve a reduction in the bit rate of the digitalized speech signal.

4.2.1 BASIC SUBBAND CODER.

Fig 4.1 Subband coder

In sub band coding, the speech is first split into frequency bands using a bank
of band-pass filters. The individual band pass signals are then decimated and
encoded for transmission. A filter bank is a collection of band-pass filters, all
processing the same input signal. The important parameters in sub band coders are
the number of frequency bands and the frequency coverage of the system, and the
ways subband coders are coded. There are two kinds of sub band structures: Uniform
band structures, where all the bands have equal widths, and octave band structures,
where the bandwidths are half as great as the higher adjacent band and twice as great
as the lower adjacent band.


4.2.2 PROCESS SPEECH CODING AND DECODING SYSTEM
The task of the speech coder is to digitize the speech signal and represent it
with a digital bit stream and to produce the highest possible speech quality at the
lowest possible bit-rate. The speech coder generally consists of three components:
speech analysis, parameter quantization and parameter coding. After analysis, the
samples must be quantized to reduce the number of bits required. The output of the
quantizer is provided to the coder which assigns a unique binary code to each
possible quantized representation. These binary codes are packed together for
efficient transmission .Quantizers is generally divided in two: Uniform or non
uniform quantizers, and adaptive quantizers. A non uniform quantizer or an adaptive
quantizer followed by an encoder that assigns a code to each quantization level is
called companding pulse code modulation (companding PCM) or adaptive PCM.

4.2.3 SYNTHESIS AND RECOMBINATION
The synthesis system performs the inverse of the analysis operation.
Between every two samples in each band we put in the value zero to increase the
sample rate. Then we merge the two lowest bands, for example: 0-1000 Hz and
1000-2000 Hz, into the reconstructed 0-2000 Hz band. The same operation is done to
the other two bands, 2000-3000 Hz and 3000-4000 Hz, are merged into one band
of 2000-4000 Hz. Again, we increase the sample rate in these two bands and merge
them.

4.2.4 ANALYSIS AND SYNTHESIS
First, we carried out Subband coder by creating analysis and synthesis
sections. This actions does not necessary add distortions in the voice, and the output
is equal to the source, except for some loss of higher frequencies. We recorded our
voice using sound recorder at 8KHz.It means the highest freq component is 4KHz
which correspond to 1 in Matlab.It is computationally very intensive to take FFT of
entire signal, therefore a better choice is to take some finite samples say 36000 for
the purpose of computing FFT.


We implemented the code by two methods
a).Without noble identities: In this method we passed the input signal (speech)
through a low pass and high pass filters in the first stage. After that we decimated the
two signals (Lower band and upper band).This has the disadvantage of computing
the samples which we are finally going to throw away. However the end result was
good without getting aliasing in the final signal.
b).Using noble identities: In this method we decimated the signals first and then
passed them through a low pass and high pass filters in the first stage. We continued
with this approach till we get the four bands in the analysis section. However the end
result was aliasing in the final signal.


CHAPTER 5
CONCLUSION

Sub-band coding is any form of transform coding that breaks a signal into a
number of different frequency bands and encodes each one independently. It is an
important extension of Filter-bank theory and widely used in data compression.
Speech coding is an integral part of our backbone communications services.
As wire line VoIP becomes more important and voice over Wi-Fi is introduced, the
end-to-end networks that support conversational voice will become much less
homogeneous with respect to protocols, latency, physical layer characteristics, and
voice codecs. Furthermore, it will be more difficult to develop and standardize
optimal end-to-end designs that incorporate these disparate multi hop connections.
As a result, continued work that addresses asynchronous tandem connections of
speech coders, latency, packet jitter, and packet loss concealment will be essential if
we are to maintain the high quality of voice services that we have come to expect.
Cross layer designs involving the Application, Media Access, and Physical
layers will be necessary to obtain the requisite quality and reliability and to improve
the efficiency of the wireless links. This will require that protocol developers, speech
codec designers, and physical layer engineers collaborate in establishing future voice
communications solutions. It is expected that mobile ad hoc networks and mesh
networks will be used for voice communications as well. These networks will
motivate the development of increased speech codec functionalities, such as bit rate
scalability, bandwidth scalability, and diversity-oriented methods along the lines of
multiple descriptions coding. Finally, we note that voice is the preferred method of
human communication. Although there have been times when it seemed that the
voice communications problem was solved, such as when the PSTN was our primary
network or later when digital cellular networks reached maturity, such is not the case
today. Reflecting upon the issues and developments highlighted in this paper, it is
evident that there is a diverse set of challenges and opportunities for research and
innovation in speech coding and voice communications.

REFERENCES

[1].PRINCEN, J. P. and BRADLEY, A. B.: Analysis / synthesis filter bank
design based on time domain aliasing cancellation, IEEE Trans. Acoust.,
Speech, Signal Processing, 1986, ASSP-34 (5), pp. 11531161.
[2].Subband coding of images using vector quantization, Peter H. Westerink,
Dick E. Boekee, Jan Biemond, IEEE transactions on communications, vol -36.
[3].J. D. Johnston, A filter family designed for use in quadrature mirror filter
banks In Proc. Int. Conf. Accoust. Speech,Signal Processing,,pp.291-294,Apr,
1980.
[4].R. E. Crochiere and L. R. Rabiner, Multirate digital signal processing,
Englewood Cliffs, NJ: Prentice Hall, 1983
[5].A. K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989.
[6].Gersho, .Speech Coding,. Digital Speech Processing, A.N. Ince, ed., Kluwer
Academic Publishers, Boston, 1992, pp. 73-100.
[7].Schafer R. W. and Rabiner L. R., "A Digital Signal Processing Approach to
Interpolation," Proc. IEEE, Vol. 61, pp. 692-702, June 2003.


APPENDIX

FILTER BANK DESIGN
The interest in digital filter banks has grown dramatically over the last few
years. Owing to the trend toward lower cost, higher speed microprocessors, digital
solutions are becoming attractive for a wide variety of applications. Filter banks
allow signals to be decomposed into subbands, often facilitating more efficient and
effective processing. They are particularly visible in the areas of image compression,
speech coding, and image analysis. The desired characteristics of sub band
decomposition will naturally vary from application to application. Moreover, within
any given application, there are amyriad of issues to consider. First, one might
consider whether to use FIR or IIR filters. IIR designs can offer computational
advantages, while FIR designs can offer greater flexibility in filter characteristics. In
this chapter we focus exclusively on FIR design. Second, one might identify the
time-frequency or space-frequency representation that is most appropriate. Uniform
decompositions and octave-band decompositions are particularly popular at present.
At the next level, characteristics of the analysis filters should be defined. This
involves imposing specifications on the analysis filter pass band deviations,
transition bands, and stop band deviations. Alternately or in addition, time domain
characteristics may be imposed, such as limits on the step response ripples, and
degree of regularity. One can consider similar constraints for the synthesis filters.
For coding applications, the characteristics of the synthesis filters often have a
dominant effect on the subjective quality of the output.
Finally, one should consider analysis-synthesis characteristics. That is, one
has flexibility to specify the overall behavior of the system. In most cases, one views
having exact reconstruction as being ideal. Occasionally, however, it may be
possible to trade some small loss in reconstruction quality for significant gains in
computation, speed, or cost. In addition to specifying the quality of reconstruction, it

is generally possible to control the overall delay of the system from end to end. In
some applications, such as two-way speech and video coding, latency represents a
source of quality degradation. Thus, having explicit control over the analysis-
synthesis delay can lead to improvement in quality. The intelligent design of
applications-specific filter banks involves first identifying the relevant parameters
and optimizing the system with respect to them. As is typical, the filter bank analysis
and reconstruction equations lead to complex tradeoffs among complexity, system
delay, filter quality, filter length, and quality of performance. This chapter is devoted
to presenting an introduction to filter bank design. Filter bank design has reached a
state of maturity in many regards. To cover all of the important contributions in any
level of detail would be impossible in a single chapter. However, it is possible to
gain some insight and appreciation for general design strategies germane to this
topic. In addition to discussing design methodologies for linear analysis-synthesis
systems, we also consider the design of a couple of new nonlinear classes of filter
banks that are currently receiving attention in the literature.

1.1 Motivation: Subband Coding Using Filter Banks OCTOBER 2011

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

1.1 Motivation: Subband Coding Using Filter Banks OCTOBER 2011

Enviado por

Direitos autorais:

Formatos disponíveis

SUBBAND CODING USING FILTER BANKS OCTOBER 2011

DEPT OF ECE, SBC, PATTOOR 1

Você também pode gostar