Você está na página 1de 11

B. Yan and K. W.

Ng: A Survey on the Techniques for the Transport of MPEG-4 Video Over Wireless Networks

863

A SURVEY ON THE TECHNIQUES FOR THE TRANSPORT


OF MPEG-4 VIDEO OVER WIRELESS NETWORKS
Bo Yan and Kam W.Ng

this paper, we present B survey on the techniques


for the transport of MPEC-4 video over wireless networks. As a
new object-based video compression standard, MPEG-4 has been
proved to be suitable for wireless applications. However, because
of the characteristics of wireless channels, such as varying channel
capacity and burst bits errors, some new techniques such as
scalable and error-resilient video coding techniques are required
to be incorporated into IMPEG-4 to improve the coding efficiency.
These techniques are discussed in this paper. The experimental
results show that although these new techniques have improved
the performance greatly, there is still a lot of research work to be
pursued.

known that the available bandwidth of wireless networks is not


constant and may vary over a wide range at any given time. To
make full use of the varying channel capacity, scalable video
coding methods are introduced into MPEG-4. In this section,
some general methods are talked about. Thirdly, wireless
channels are error-prone, burst errors of which can degrade the
video quality greatly. In order to alleviate the effect. some
error-resilient video coding techniques are incorporated in
MPEG-4, including error detection, recovery and concealment,
which will be described. Finally, we can draw a conclusion that
these improvements are not enough and the future work will be
discussed.

Index Terms-Error resilient video coding, MPEG-4, Scalable


video coding, Wireless channel.

11. OVERVIEW OF MPEG-4

Abstract-In

The MPEG-4 standard


MPEG-4 is an ISO/IEC standard developed by MPEG
(Moving Picture Experts Group), the committee that also
developed the Emmy Award winning standards known as
MPEG-I, MPEG-2 and MPEG-7. Among these standards,
except for MPEG-7, the others are video compression
standards.
I) MPEG-I: it is the general video and audio
compression standard, on which such products as
Video CD and MP3 are based.
2 ) MPEG-2: it can provide better video quality, on which
such products as Digital Television, Set top boxes and
DVD are based.
3) MPEG-7: it is different from others and not concerned
about video compression. The standard is for
description and search of audio and visual content.
MPEG-4 is a new object-based method concerned about video
and audio compression, which is different from MPEG-I and
MPEG-2. The objects can be:
I ) Still images (e.g. as a fixed background)
2) Video objects (e.g. a talking person-without the
background)
3) Audio objects (e& the voice associated with that
person)
Although MPEG-4 is concerned about both video and audio
compression, in this paper only the issues about video are talked
about. The following are the main characteristics of MPEG-4
video:
1) Bitrates: typically between 5 kbitis and IO Mbitis;
2 ) Drag Formats: progressive as well as interlaced video;

A.

1. INTRODUCTION

N the last decade, mobile communication has grown rapidly.


With the explosive growth, the need for the robust
transmission of multimedia information-audio, video, text,
graphics, and speech data-over wireless links is becoming an
increasingly important application requirement. Since the video
data may occupy more bandwidth than the other media data
during transmission, it should be given more consideration in
the wireless multimedia communication. As we know, in order
to be transmitted over the wireless network, the source video
data must be compressed before transmission. During the last
decade, many video compression standards have been proposed
for different applications, such as MPEG-I, MPEG-2, H.26X
and so on. But these traditional video compression techniques
cannot efficiently meet the requirement for video transmission
in wireless networks. In'the last few years, MPEG-4 has been
proposed, which is suitable for wireless networks because of its
characteristics [1][2][3]. In this paper, the techniques of
MPEG-4, which make it suitable for applications in wireless
multimedia communication, will he introduced. This paper is
divided into four sections.
Firstly, an overview of the MPEG-4 standard is given. In this
section, the basic characteristics of MPEG-4 will be described,
and the basic coding and decoding procedures will also be
explained. Then we will give the reasons why MPEG-4 is useful
for wireless multimedia communication. Secondly, it is well
Ba Yan and Kam W. NO are both with the Dcpanmcnt ofcomputer Science
and Engineering. the Chinese University of Ilong Kong, Hong Kong (e-mail:

[byan,kwng]~cse.cuh~.edu.hl\).

Contributed Paper
Manuscript received May 23,2002

0098 3063100 510.00 e 2002 IEEE

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

IEEE Transactions on Consumer Electronics, Vol. 48, No. 4, NOVEMBER 2002

864

3 ) Resolutions: typically from sub-QCIF to beyond


HDTV;
4) Compression Efticiency: From "acceptable" to "near
lossless".

Fig. 2. The Structure of Mpeg-4 Endei

Fig. I . The Structure of an Mpeg-4 terminal

Fig. I shows how streams coming from the network (or a


storage device), as TransMux Streams, are demultiplexed into
FlexMux Streams and passed to appropriate FlexMux
demultiplexers that retrieve the Elementary Streams. The
Elementary Streams (ESs) are parsed and passed to the
appropriate decoders. Decoding recovers the data in an AV
object from its encoded form and performs the necessary
operations to reconstruct the original AV object ready for
rendering on the appropriate device. The reconstructed AV
object is made available to the composition layer for potential
use during scene rendering. Decoded AV objects, along with
scene description information, are used to compose the scene as
described by the author. The user can, to the extent allowed by
the author, interact with the scene, which is eventually rendered
and presented [I].
There are four hierarchically organized classes in the MPEG-4
visual standard as follows [21:
I) Video Session: Each video session (VS) is made up of
one or more Video Objects (VO), corresponding to the
various objects in the scene.
2 ) Video Object: Each of the VOs can have several
scalability layers (spatial, temporal, or SNR),
corresponding to different Video Object Layers
(VOL).
3 ) Video Object Layer; Each VOL consists of an ordered
sequence of snapshots in time, called Video Object
Planes (VOP);
4) Video Object Plane: Each VOP is a snapshot in time
of a VO for a certain VOL.
This way, one VS can have several VOs, each of these VOs
having several VOLs, which are the sequences in time ofseveral
VOPs. And finally, each VOP is characterized by its shape,
motion and texture.
Fig. 2 outlines the basic approach of the MPEG-4 video
algorithms to encode not only the rectangular but also arbitrarily
shaped input image sequences. The basic coding structure
involves shape coding (for arbitrarily shaped VOs) and motion
compensation as well as DCT-based texture coding (using
standard 8x8 DCT or shape adaptive DCT).

Firstly, the shape coding is applied. Each macroblock (MB) is


analyzed and classified according to three possible classes:'
transparent (MB outside the ob~iectbut inside the bounding
box);opaque (MB completely inside the object) or border ( M i
over the border). The shape coding method called
Content-based Arithmetic Encoding (CAE) is applied on the
border MBs [4].
Secondly, similar to other traditional methods, the motion
information is encoded by means of motion vectors. Each MB
can either have one or four motion vectors. When four motion
vectors are used, each of them is associated with an 8x8 block
within the MB. With the motion vectors, the decoder can know
which block of pixels in the previous VOP is closest to the
current one and it can be used for the prediction ofthe texture.
In the receiver, the decoder will use the motion vectors for
motion compensation.
Then the texture data can be encoded by two modes: intra and
inter. For the intra mode, a given MB is encoded by itself (with
no temporal prediction), only exploring the spatial
redundancies. On the other hand, for the inter mode, motion
compensation is used to explore the temporal redundancy, and
the difference between the current and prediction MB is
encoded. Both the absolute texture value (intra coding) and the
differential texture value (inter coding) are then encoded using
the DCT transform. Then the DCT coefficients are encoded by
run-length coding and variable length coding (VLC). Instead of
the regular VLC, reversible VLC can be used to code the
texture, if error resilience is a requirement.
At this stage we get all the encoded information including
shape, motion and texture [ 2 ] [ 5 ] .

B. Application Description
Wireless multimedia applications face technical challenges
that are significantly different from the problems typically
encountered with desktop niultimedia applications. This is
because current wireless channels are subject to inherent
limitations, which are as follows.
I) The bandwidth is very narrow and the channel
capacity is not fixed but may vary over a wide range at
any given time.
2 ) The channel is error-prone with bits errors and burst
'
errors due to fading and multi-path reflections.
3 ) Diversity ofwireless networks (e.g. GSM, or Satellite)
in regard to network topology, protocols, bandwidth,

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

6.Yw and K.W. Ng:

A Survey an the Techniques for the Transpan of MPEG-4 Video Over Wireless Networks

reliability etc.
The need of being able to make a trade-off between
quality. performance and cost.
MPEG-4, designed as an adaptive representation scheme that
also accommodates very low bitrate applications, is very
appropriate for wireless multimedia applications. Concretely,
MPEG-4 is useful because:
1) It can provide high compression performance. The
lowest bitrate can be 5 kbps.
2 ) Scalable video coding techniques are introduced into
MPEG-4 to provide the varying coding bit rate for the
varying channel capacity.
3) Many error-resilient tools are incorporated into
MPEG-4, which guarantee the correctness of the data
in the error-prone wireless channel.
4) Object-based coding functionalities allow for
interaction with audio-visual objects and enable new
interactive applications in a wireless environment.
5 ) Face animation parameters can be used to reduce
bandwidth consumption for real-time communication
applications in a wireless environment, e.g. mobile
conferencing [6].
In the following, the techniques on scalable video coding and
error-resilient video coding will be discussed in detail.
4)

865

see that the non-scalable video encoder compresses the raw


video data sequence into only one compressed bit-stream. In
contrast the scalable video encoder generates multiple
sub-streams as shown in Fig. 4. One of the compressed
sub-streams is the base sub-stream, which can be independently
decoded and provides coarse visual quality. Other compressed
sub-streams, which are called enhancement sub-streams, can
only be decoded together with the base sub-stream and can
provide better visual quality. The complete bit-stream (Le.,
combination of all the sub-streams) provides the highest quality
181.

---

IQ

DCT:D i s c r e Cosine Transform


Q: Quantization
VLC : Variable Length Coding

-e
Vidm
t

Dgoded

IDCT lnvene DCT


IQ:InverseQuantnarion
VLD : variable ~engul

Fig. 3. Non-scalable video encoder (a) and deader (b)

111. SCALABLE VIDEO CODING

The objective of the traditional video coding has been to


optimize video quality at a given bit rate. But this has to be
changed for wireless applications. It is well known that the
bandwidth availability ofwireless links is limited and may vary
over a wide range depending on network traffic at any given
time.
In a traditional video communication system, the video data
can be compressed into a bit rate that is less than, and close to,
the channel capacity, and the decoder reconstructs the video
signal using all the bits received form the channel. But in this
model, one requirement that milst be satisfied is that the encoder
knows the channel capacity. Actually, the encoder no longer
knows the channel capacity and at which bit rate the video
quality should be optimized because of the varying channel
capacity in the wireless application. Therefore, the objective of
video coding for wireless is changed to optimize the video
quality over a given bit rate range instead of at a given bit rate.
The bitstream should be partially decodable at any bit rate
within the bit rate at which the decoder can reconstruct the
optimized quality [7]. Scalable video coding can solve this kind
ofproblem, which will be explained as follows.
A . What is scalable video coding?

Basically, video compression schemes can be classified into


two approaches: scalable and non-scalable video coding. The
structures of both can be seen from Fig. 3 and Fig. 4. For
simplicity, the encoder and decoder are only shown in
intra-mode and only the DCT is adopted. From Fig. 3, we can

(b)

Fig. 4. Scalable video encoder (a) and decoder (b)

Fig. 5 illustrates the difference in video quality between the


scalable and non-scalable video coding techniques. In the
figure, the horizontal axis indicates the channel bit rate, while
the vertical axis indicates the video quality received by a user.
The distortion-rate curve indicates the upper bound in quality
for any coding technique at any given bit rate. The two staircase
curves indicate the performance of an optimal non-scalable
coding technique. In the technique, when a given bit rate is
chosen--either low, medium, or high-the
coder tries to
achieve the optimal quality indicated by having the upper comer
ofthe staircase curve very close to the distortion-rate curve. The
receiver can get the best video quality if the channel bit rate
happens to be at the video-coding bit rate. However, as depicted
by the Fig. 5 , if the channel bit rate is lower than the video
coding bit rate, the received video quality becomes very poor
because it cannot get enough bits to reconstruct the video. In
contrast, if the channel bit rate is higher than the video-coding
bit rate, the received video quality does not become any better
~71.
As mentioned above, the scalable video coding techniques can
compress the video sequence into a base layer and an
enhancement layer. Actually the enhancement layer bitstream is
similar to the base layer bitstream in the sense that it has to be

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

IEEE Transactions on Consumer Electronics, Vol. 48,No. 4, NOVEMBER 2002

866

either completely received and decoded or it does not enhance


the video quality at all. So such techniques can change the

Received
Video
Quality

residues that are to be added to the motion-compensated block


from the previous
..................................
M e [7][9].

Enhancement,

EnhancmentLaya
DssCding

Distortion-Rali:

Non-Scalable

Video Coding
Good

Coding

FGS

Moderate

Enhancement layet

Bad

Fig. 5 . The relationship between the hitrate and video quality

non-scalable single staircase curve to a curve with two stairs as


shown in Fig. 5 . The base-layer bit rate determines where the
first stair is and the enhancement layer bit rate determines the
second stair. More detailed discussions about the techniques
will be presented later. As shown in Fig. 5, the most optimal
technique is to achieve the distortion-rate curve at any given
bitrate, which is the objective of the fine granularity scalability
(FGS) video-coding technique in MPEG-4 [7].
0. The categories of scalable video coding

Specifically, compared with decoding the complete


bit-stream, decoding the base sub-stream or multiple
sub-streams produces pictures with degraded quality, or a
smaller image size or a lower frame rate. The scalability of
quality, image size. or frame rates, is called SNR, spatial, or
temporal scalability, respectively. These three scalabilities are
basic scalable mechanisms.

I ) SNR Scalable Video Coding


Signal-to-noise ratio (SNR) scalability is a scheme to
compress the raw video data into two layers at the same frame
rate and the same spatial resolution, but different quantization
accuracy. Fig. 6 shows the SNR scalability decoder [7]. Firstly,
the base-layer bitstream is decoded by the base layer
variable-length decoder (VLD). Then it is quantized inversely
to produce the reconstructed DCT coefficients. The
enhancement bitstream is decoded by the VLD in the
enhancement layer and the enhancement residues of the DCT
coefficients are produced by the inverse quantizer in the
enhancement layer. Thus the higher accuracy DCT coefficient is
obtained by adding the base-layer reconstructed DCT
coefficients and the enhancement-layer DCT residue. The DCT
coefficients with a higher accuracy are given to the inverse DCT
(IDCT) unit to produce the reconstructed image domain

fig. 7. Temporal Scalable Video Coding


2) Temporal Scalable Video Coding
Temporal scalability is a scheme to compress the raw video
data into two layers at the same spatial resolution, but different
frame rates. The base layer is coded at a lower frame rate. In
contrast, the enhancement layer compresses a video with a
higher frame rate providing the missing frames. Thus the coding
efficiency of temporal scalability is high and vely close to
non-scalable coding. Fig. 7 shows a structure of temporal
scalability[7]. In the base layer only P-type prediction is used,
while in the enhancement layer prediction can be either P-type
or 9-type from the base layer or P-type from the enhancement
layer [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
~~

............................

I-'

Fig. 8. Spatial Scalable Vidw Decoding

3) Spatial Scalable Video Coding


Spatial scalability i s a scheme to compress the raw video data
into two layers at the same frame rate, but different spatial
resolution. The base layer is coded at a lower spatial resolution.
The reconstructed base-layer picture is up-sampled to fonn the
prediction for the high-resolution picture in the enhancement
layer. Fig. 8 shows the simplified structure of spatial scalability
[7]. lfthe spatial resolution ofthe base layer is the same as that
of the enhancement layer, Le., the up-sampling factor being 1,
this spatial scalability decoder can be considered as an W D
scalability decoder too [7].

B. Yan and K. W. Ng:

A Survey on the Techniques for the Transpon of MPEG4 Video Over Wireless Networks

867

in the bitstream ahead of that of other parts of the frame .


171_
I) Frequency Weighting
S

As we know, different DCT coefticients can contrihute to the


~~..
+
I -2E-J
visual quality differently. Usually, the accuracy of the low
frequency
DCT coefficients
is~more important
than that~of the
E
~
~
~
~
~
8--3i@EiF
high frequency DCT coefticients. Better visual quality can be
7
Shifl
U C L
compruscd
-~
Stnam
achieved with more bits of the low frequency coefticients.
(a)
BassLaycr
B B S ~ L ~ Therefore,
~ ~ ~
the bits of low frequency DCT coefficients can be
Compmaed --&=--E
I
Decoded
put
into
the
enhancement bitstream earlier so that they are more
slream
vidm
.____________
likely to be included in a truncated bitstream. To achieve the
Enhancement Layer
-Enhancemat Layer objective, the frequency weighting mechanism is included into
Compressed
the F G 1~~ 1 ,
S
2) Selective Enhancement
(b)
For one frame ofthe video, a part of it may be visually more
Fig. 9. We Structures ofthe FGS Encoder (a) and Decoder (b)
significant than other parts. So the bits of the MBs of interest
C. Fine Granirlarity Scalabilil?, (FGSj
may be put ahead so that they can be more likely to be included
According to the introduction to the SNR, Temporal and
in a truncated bitstream [7].
Spatial scalable video coding, we can see that they can provide
the best video quality at only two bit rates. But for wireless D. Some Improvements about FGS
In order to cover a wide range of bit rate with a scalable
applications, we need a more scalable scheme for video coding
because of the varying channel capacity. Therefore a new bit-stream, there is a need to combine FGS with other scalable
scalable coding mechanism, called tine granularity scalability video coding methods. Some improvements have been
(FGS), was proposed to MPEG-4 to provide more flexibility in proposed.
meeting different demands of wireless channels.
I) FGST Method
Fig. 9 shows the structures of the FGS encoder and decoder
[S][lO]. From it, we can know that an FGS encoder compresses
FGS-FGST Layer
a raw video sequence into two sub-streams, i.e., a base layer
FGS
bit-stream and an enhancement bit-stream. Different from other
VOP
scalable techniques, an FGS encoder uses bit-plane coding to
represent the enhancement stream. As we know, in the
conventional DCT coding, the quantized DCT coefticients are
coded using run-level coding. The number of consecutive zeros
before a nonzero DCT coefficient is called a run and the
absolute value of the nonzero DCT coefficient is called a
Base layer
level. The major difference between the bit-plane coding
method and the run-level coding method is that the bit-plane
~ i gIO.
. The Struchue of the FGST Method
coding method considers each quantized DCT coefficient as a
binary integer of several bits instead of a decimal integer of a
FGST, which means FGS temporal scalability, is to combine
certain value. So any bits coded by bit-plane method can be used
FGS
with temporal scalability so that not only quantization
to reconstruct the DCT coefficients [7][1 1][12]. With the help
accuracy
can be scalable, but also temporal resolution (frame
ofthe bit-plane method, FGS is capable ofachieving continuous
can
be scalable. In the mechanism. the quality of each
rate)
rate control for the enhancement stream as shown in Fig. 5 . This
is because the enhancement bit stream can be truncated temporal enhancement frame does not affect the quality of other
anywhere to achieve the target bit-rate. Any bits received from frames, since the temporal prediction for the temporal
the enhancement-layer can^. be adopted to improve the video enhancement frame is restricted to the base layer. Therefore,
quality, which is impossible for other scalable video coding there is no problem to use bit-plane coding for the entire DCT
methods. And it is the reason why the FGS has good advantags coefficients in the temporal enhancement frame. And for FGST,
not only the temporal enhancement frames can be dropped as in
over others.
the
regular temporal scalability, but also the quantization
To further improve the performance of FGS enhancement
accuracy
is scalable within each temporal enhancement frame.
video, two advanced features are introduced into FGS, i.e.
Therefore, the coding efficiency of it using all bit-plane coding
frequency weighting and selective enhancement. The former
in the temporal enhancement frame will be higher than regular
means that different weights are used for different frequency
temporal scalability for coding the DCT coefticients. This
components, so that more bits of the visually important
compensates its potential loss of coding eficiency by not
frequency components can be put in the bitstream ahead ofthat allowing prediction in the enhancement layer. Fig. IO shows the
ofother frequency components. Similar to the former, the latter structure of it [7][13].
uses different weighting for different spatial locations of a frame
so that more bits ofsome more important pans o f a frame are put

- - Campreraed
B a x Layer
video-^-^^&-

Raw

--

1
-~~~~~~
t
z

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

~~~~~

~~

IEEE

868

Transactions on Consumer Electronics, Vol. 48, No. 4, NOVEMBER 2002

2) FGSS Method
Similar to FGST, FGSS is to combine FGS with spatial
scalability, which is illustrated in Fig. 1 I [14]. In the FGSS
scheme, the base layer is still encoded as in the traditional
spatially scalable coding. However, the enhancement layer now
adopts the bit-plane coding technique. Fig. 1 1 illustrates
conceptually the structure ofthe FGSS coding scheme. From the
figure, the input video sequence is first down-sampled and
compressed at low resolution to a given hit rate with any
existing non-scalable coding techniques. In the traditional
spatially scalable coding, the video will be up-sampled to
provide the high-resolution for the enhancement layer coding.
However in FGSS, several FGS lower enhancement layers are
first used to improve the video quality at the low-resolution
level if the hit rate of the base layer is very low. Then if the
quality of the low-resolution video is good enough in the base
layer, the video can be up-sampled so that the spatial resolution
can be immediately switched to high-resolution in the
enhancement layer. Therefore, the enhancement layers at
low-resolution are optional. It depends on some factors such as
the bit rate of base layer, sequence contents, application
requirements and so on [14].
,---I

r--l r--,

r--1

v--,

1sl Enhancement

2ndEnhancemm

3rd Enhancement
I

3
Frames

Fig I I The Structure of the FGSS Method

3) PFGS method
The PFGS method, progressive fine granularity scalable video
coding, has all the features of FGS, such as tine granularity
hit-rate scalability, channel adaptation, and error recovery. On
the contrary, the PFGS framework uses several high-quality
references for the predictions in the enhancement-layer
encoding rather than always use the base layer. Using
high-quality references would make motion prediction more
accurate, thus it could improve the coding efficiency. But the
hits of the enhancement-layer are more likely to be lost than
those ofthe base-layer. Therefore it may make the coder fragile.
The PFGS proposed a method to solve the problem as shown in
Fig. 12, which illustrates the framework o f t h e PFGS [15][ 161.
We can see that a prediction path from the lowest layer to the
highest layer is maintained across several frames, which makes
it robust and qualified for error recovery. For example, if the
enhancement layers offrame 1 are corrupted or not received, the
enhancement layers of frames 2 , 3 and 4 will be affected
because of the loss of the prediction references. But it will be
fine for frame 5 because there is a prediction path from the

lowest layer of frame 1 to the highest layer of frame 5 , which


makes the scheme robust.
Base Laser
1st enhancement
Laver

2nd enhancement
Laser

3rd enhancement
Layer
4th enhancement
LWW

3
Fra.es

Fig. 12. The Structure ofthe PFGS Method

I v . ERROR-RESILIENT
VIDEO CODING IN MPEG-4
Wireless channels are typically noisy and suffer from a number
of channel degradations such as bits errors and burst errors due
to fading and multi-path reflections. The channel errors can
affect the compressed video bit-stream very severely. On the
one hand, if the decoder loses information about a frame or a
group of pixels, it wont be able to decode the frame or the
pixels. And it also cannot decode the frames or the pixels coded
using them. On the other hand, the decoder can lose
synchronization with the encoder because of the lost
information, which leads to the remaining bit-streams being
incomprehensible. Therefore for a robust video compression
method in the wireless applications, resynchronization and
robust techniques are necessary in the encoder.
Generally the MPEG-4 decoder can apply the syntactic and
semantic error detection techniques to enable the video decoder
to detect when a bitstream is corrupted by channel errors. In
motion compensation and DCT, the decoder. can detect
bitstream errors by applying the checks as follows [ 171:
I ) The motion vectors are out ofrange;
2 ) An invalid VLC table entry is found;
3) The DCT coefficient is out of range;
4) The number of DCT coefficients in a block exceeds
64.
After errors are detected, some techniques incorporated into
MPEG-4 can be applied which can provide the important
properties such as resynchronization, data recovery, and error
concealment. They are as follows:
I ) Video packet resynchronization;
2 ) Data partitioning (DP);
3) Reversible Variable Length Codes (RVLCs);
4) Header extension code (HEC).
A. Video Packet Res.ynchronization

When errors occur in the bit-stream, the video decoder


decoding the corrupted stream may lose synchronization with
the encoder, i.e. the precise location of the stream in the video is

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

B. Yan and K. W. Ng: A Survey on the Techniques for the Transport of MPEG-4 Video Over Wireless Networks

uncertain. It will degrade the decoded video quality very greatly


and make it unusable.
A traditional scheme to solve this problem is to introduce
resynchronization markers in the bit-streams at various
locations. When the decoder detects errors, it can hunt for the
next resynchronization marker to regain synchronization. In
previous video coding standards such as MPEG-I, MPEG-2
and H.263, the resynchronization markers are fixed at the
beginning of each row of macro-blocks. But the quantity of
information between two markers is not fixed because of the
variable length coding as shown in Fig. 13 [I 81.

j /
i !

/ / j i

869

be decoded and utilized by the decoder. Therefore MPEG-4


introduces two additional fields in addition to the
resynchronization marker at the beginning of each video packet,
as shown'in Fig. 15. These are:
1) MB no.: The absolute macroblock number of the first
macroblock in the video packet, which indicates the
special location of the macroblock in the current
image.
2) QP: The quantization parameter, which denotes the
default quantization parameter used to quantize the
DCT coefficients in the video packet.
The predictive encoding method has also been modified for
coding the motion vectors so that there are no oredictions across
the video packet boundaries [17][19][20][21].
Resync.
MB/
QP HEC Combined motion and DCT data
mmkw
nn

Fig. 15. Organization of the data within a Video Packet

Resync. M B
marker no. QP

HEC

Motion data

MBM

DCT data

IEEE Transactions on Consumer Electronics, Vol. 48, No. 4, NOVEMBER 2002

870

packet will be discarded by the decoder, because the texture


data is coded based on the motion data and it will he useless
without the motion data.
If no error is detected in the motion and texture sections of the
bitstream, but the resynchronization marker is not found at the
end of decoding all the macroblocks of the current packet, an
error is flagged. In this case, only the texture data in the VP need
to be discarded. The motion data can still be used for the NMB
macroblocks since we have higher confidence in motion vectors
because of the detection of the MBM.
The data partitioning scheme is only used for I-VOP and
P-VOP. And for the two cases the content in the motion and
texture part is different. For I-VOP, the motion part contains the
coding mode and DCs coefficients and the texture part is the
ACs coefficients. For P-VOP, the motion part is the motion
vectors and the texture part contains the DCT coefficients
[ I 71[201[221.

frame has to be discarded. So the MPEG-4 standard introduces


the technique named Header Extension Code to reduce the
sensitivity of the header data, as shown in Fig. 16. In the
technique, a I-bit field called HEC is inserted for every video
packet. If the bit is set, the header information of the video
frame is repeated following the HEC. The duplicated
information can ascertain the decoder to get the correct header
data. The HEC usually is used in the first video packets of the
VOP, but not in all of them [I71 [20].

C. Reversible variable Length codes (RVLC)


Forward decoding
Use

motion
data

MBM

Error

Error

Backward decoding
Fig I7 The data need to be discarded while using the RVLC

As mentioned above, the texture part is coded with variable


length code. Therefore during decoding, the decoder has to
discard all the texture data up to the next resynchronization
marker when it detects an error in the texture part because of
losing synchronization. RVLC can alleviate the problem and
recover more DCT data from a corrupted texture partition. The
RVLC is a special VLC, which can be decoded in both the
forward and reverse direction. When the decoder detects an
error while decoding the texture part in the forward direction, it
can find the next resynchronization marker to decode the texture
part in the backward direction until an error is detected.
Therefore only the data between the two errors locations is
discarded, and other data in the part can be recovered. In Fig.
17, only the data in the shaded area is discarded. It should be
noted that the RVLC scheme can only he used with the help of
resynchronization and data partitioning techniques [ 17][20]

I-VOP

VP Header

DC DCT data

AC DCT data
I

[23].
P-VOP
D. Header Extension Code (HEC)
At the beginnine of each video uacket. there is the most
important information for decoding-the header data, which
contains information about the spatial dimensions of the video
data, the time stamps associated with the decoding and
presentation of the video data, and the mode in which the
current video object is encoded (INTRA or INTER). If the
header information is corrupted by the channel errors, the whole

VPHeader

Motiondata

Texturedata

- -

Fig. I8 illustrates the protection scheme described above, in


which R , , Rz and R, represent the bit rates ofthe three partitions
respectively and they satisfy the condition: RI<R2<Rj. To
realize it, the Rate Compatible Punctured Convolutional
(RCPC) codes are used [28]. In this case, the codes considered

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

B. Yan and K. W. Ng: A Survey on the Techniques for the Transport of MPEG-4 Video Over Wireless Networks

are obtained by puncturing the same mother code and only


one coder and one decoder are required for performing the
coding and decoding of the whole bitstream [29].

V. CONCLUSLONSAND FUTUREWORK
,This paper has presented two kinds of techniques for the
transport of MPEG-4 over wireless networks, scalable video
coding and error-resilient video coding. Our conclusions and
future work will be based on them respectively.
A. Scalable video coding
As a new scalable video coding technique introduced into
MPEG-4, FGS is focused on in Section 2. Many experimental
results have proved that the FGS coding method has better
coding efficiency than SNR, Temporal and Spatial scalabilities.
Generally there are two kinds of video formats often used, i.e.
Common Intermediate Format (CIF) and Quarter CIF (QCIF).
The sizes of their images are 352x288 and 176x144
respectively. In the signal processing field, the video quality is
often evaluated in terms of PSNR, Peak Signal Noise Ratio, in
dB. defined as:
PSNR = 20 log,,,

87 I

efficiency among the FGS, PFGS and Single. It should be noted


that single here means the non-scalable video coding, which can
get the best video quality at a certain bitrate. Table 7 shows the
result. From the table, a conclusion can be drawn that PFGS can
arrive at a better video quality than FGS. Therefore the
techniques described in Section 2 are effective and efficient for
wireless multimedia communication with varying channel
capacity. However, the experimental results shown in Table also indicate that the coding efficiency gap between the FGS
scheme and non-scalable video coding exceeds 3.0 dB.
Although the PFGS scheme has significantly improved the
coding efficiency of FGS, the gap between the PFGS scheme
and the non-scalable video coding is still large.
TABLE I1

255
~

RMSE
(1)
where RMSE is the square root of the Mean Square Error
(MSE):

where we assume f(i, jJ and F(i, j ) are the source and the
reconstructed images, containing MxN pixels each [20]. To
evaluate the video quality, only the PSNR of the luminance
component ( Y ) of the frame needs to be considered.
An example of the.experiments made by Li [7] is introduced
as follows. which compare the coding efficiency of FGS and
SNR scalability. Table I shows the results. As the bit rate
increases, FGS always has better PSNR than the SNR
scalability. At high bit rates, FGS is aboui 2-dB better. The
experiments are made.using the video sequence of Coastguard
and Foreman withthe format of QCIF.
TABLE I
PSNR COMPARISON
BETWEEN FGS AND MPEC-II SNR CODIUG

As mentioned in Section 2. several improvements have been


made to improve the coding efficiency of the FGS, including
FGSS, FGST and PFGS. Experimental results have proved that
the coding efficiency o f PFGS is the best among these
improvements. An example ofexperimental results made by Wu
[I51 is as follows, which give a comparison about the coding

The future work for the scalable video coding is as follows. In


the future, more studies need to be done on how to further
improve the coding efficiency and robustness of the FGS to
transmit the video stream through the wireless channels. As
mentioned above, the coding eficiency gap between the
non-scalable video coding and the PFGS video coding is still
large although it is closing.
Firstly, as we know, the high quality references for motion
prediction can improve the coding efficiency. So if more
enhancement layers are used for prediction, the coding
efticiency will be higher. But the bits ofthe enhancement layers
are more likely to be discarded because of the limited channel
capacity In the situation, if more enhancement layers are used
for prediction, the system will be fragile. To solve it, error
detection and concealment techniques may be introduced into
the enhancement layers. In this way, even if some bits of the
enhancement layers are lost, some techniques can be used to
conceal the errors. With the help of the error-resilient
techniques, the coding efficiency of FGS can be improved
further while keeping the system robust.
Another proposal to improve the FGS is to combine the FGS
and shape coding of the arbitrary shaped objects. An important
benefit of the proposal is that the FGS can significantly benefit
from knowing the shape and position of the various objects in
the scene, so that the selective enhancement mentioned in
Section 3.3 can generate a better video quality. It can also
improve the coding efficiency of the FGS.
6. Error-resilient video coding

In Section IV, to improve the video quality against the bit


errors, some techniques including resynchronization, DP,
RVLC and HEC have been discussed. Actually all the
techniques have been evaluated thoroughly and independently

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

IEEE Transactions on Consumer Electronics, Vol. 48, No. 4, NOVEMBER2002

872

verified by two parties before being accepted into the MPEG-4


standard, The techniques are rigorously tested over a wide
variety of test sequences, bit rates and error conditions. The
compressed bitstreams are corrupted using random bit errors,
packet loss errors, and burst errors. To evaluate the performance
of the proposed techniques, the PSNR values generated by the
error-resilient video codec with and without each error resilient
tool are compared. The techniques are also compared on the
basis of the number of frames discarded due to errors and the
number of bits discarded. The comparison results have proved
that these techniques have improved the video quality greatly in
error-prone channels [17][31][32].
TABLE Ill
AVERAGE PSNR FOR

Bit Rate

EEP A N 0 UEP TOOLS

QClF Format

CIF Format
33.4

32.3

30.8

However, these techniques are not enough for wireless


channels with high bit errors rate (BERs), so the channel coding
method, UEP, is introduced. To test the performance ofthe UEP
against others such as EEP (Equal Error Protection), in which
all partitions are protected equally, many experiments have been
made based on the wireless channel. The results of them have
shown that the UEP technique can improve the quality of the
.reconstructed video at a high channel error rate more greatly
than EEP. Table I11 shows the results of the experiments made
by Wendi [25] in GSM channel. In the table, UEP produces
higher average PSNR for the reconstructed video for both CIF
and QClF images at high channel error rates, which verified the
good performance ofthe UEP. Usually in many cases, IJEP may
leave more errors in the channel decoded bitstream than EEP,
but these errors are in less important portions of the video
packet and degrade less the video quality.
TABLE I\!
COMl

Fatal I
I

1I O

17.2

Cai has made other experiments [24] to compare the coding


performance between the EEP and UEP, in which the source
coding rate is fixed at IOOkbps and channel coding rate is
varying. Table lV shows the simulation result. For example,
total bandwidth of 1 I O kbps represents 100 kbps of fixed source
coding rate and I O kbps of channel coding rate. In this case,
channel coding redundancy is 10%. Similarly, the redundancy
of other cases can be derived. The experimental results
demonstrate that in bandwidth-stringent cases, UEP is much
better than EEP with the gain of nearly 5 dB at 10% redundancy.
When the channel coding redundancy increases to 20% or more,
the performance of EEP is nearly the same as that of UEP. This

is because high redundancy channel coding provides virtually


error free environment for all classes of the video bitstream so
that the advantage of unequal error protection is diminishing.
In future, the work for error-resilient video coding will be
focused on how to further improve the efficiency of the
error-resilient video coding. The proposals are as follows.
Firstly, future work about UEP to improve the performance
should consider some issues concerned with the characteristics
of the wireless channels and video signals. As we h o w the
power problem is an important factor, which greatly affects the
performance of wireless communication. Since the UEP will
generally consume more power than EEP, the power problem
should be taken into account for wireless communication when
applying the UEP. This is particularly necessary when the
bandwidth is not quite stringent and the gain of UEP over EEP is
not so significant as shown in Table IV at the rate of 120 kbps or
over. There is a tradeoff between power and video quality.
Therefore a decision scheme needs to be designed to determine
whether UEP should be adopted under a given channel
condition
Secondly, to improve the performance of UEP, it is necessary
to get the optimal allocation of channel coding rates among
different classes of video data. In order to carry out the optimal
allocation, the relationship of the error sensitivity among
different partitions ofthe video packet is required. Without such
analytical relationship, it is difficult to design the optimal
allocation. Better results may be achieved if rates are chosen
according to accurate sensitivity studies on the bitstream. This
will also be one part of our future research work.
Finally as mentioned in Section IV, by applying the RVLC in
the DCT data partition, the decoder can get more data to decode
when errors occur. Experimental results have proved that it has
improved the performance of the MPEC-4 coder greatly in the
wireless channels [3 I]. Currently, the RVLCs are used only for
the DCT data partition. In the future work, the RVLCs may be
introduced into the motion data partition to code the motion
vectors. Since the motion part is more important than the DCT
part, we expect that it will further improve the efficiency ofthe
error-resilient video coding of MPEG-4:

In conclusion, although many techniques have been


introduced into the MPEG-4 standard, which aim at providing
strong supports for robust transmission of the video data in
wireless multimedia communication, actually they are not
enough at all for all applications. So there is still a lot of
research work to be pursued to improve the performance of
video transmission over wireless channels.

REFERENCES
[I] Rob Koenen, "Overview of the MPEG-4 Standard", ISOIIEC
JTCIISC29IWGI I M4030,2001
121 Snares L.D., Pereira F., "MPEG-4: a flexible coding standard far the

emerging mobile multimedia applications", The Ninth IEEE lntemational


Symposium o n , Volume: 3 , 1998, pp.1335 -1339 "01.3

Yan and K. W. Ng: A Survey on the Techniques for the Transport of MPEG-4 Video Over Wireless Networks
Yaqin zhang, "MPEG-4 Based Multimedia Information System", Digital
Signal Processing Handbook, CRC Press, 1999, Chapter 7.
MPEG Video & SNHC, 'Toding of Audio-Visual Objects: Visual
ISOIIEC 14496-2, Doc. ISOIIECIJTCllSC2YlWGI I N2202, MPEG
Tokyo Meeting, March 1998.
Shieeru Fukunaea. "MPEG-4 Video Verification Model version 1 6 . 0 ,
Cob;ng of M&ng
Pictures and Audm, lSOllEC JTCllSC29i
WGI 1M3312, March 2000
Requirements
Group,
"MPEG-4
Applicatmns".
ISOIIEC
JTClISC29IWGI I MPEG 99rN2724, M a n h 1999
Weiping Li, "Overview of fine granularity scalability in MPEG-4 video
standard", IEEE Transactions an Circuits and Systems for Video
Technology. Volume I I Issue: 3, March 2001 pp.301 -317
Dapeng Wu, Ha", Y.T., Wenwu Zhu, Ya-Qin Zhang. Peha, J.M.,
"Streaming video over the Internet: ,approaches and directions", IEEE
Transactions on Circuits and Systems for Video Technology. Volume: I I
Issue: 3, March 2001 pp.282 -300
R. Aravind. M. R. Civanlar, and A. R. Reibman. "Packet loss resilience of
MPEG-2 scalable video coding algorithm," IEEE Trans. Circuits
Svst.Video Technol.. vol. 6. DD. 42-35,
Oct. 1996.
[ I O ] Radha H.M., van der Schaar M., Yingwsi Chen, "The MPEG-4
fine-grained scalable video coding method for multimedia streaming over
IP", lEEE Transactions on Multimedia, Volume: 3 Issuc: I , March 2001.
pp.53 -68
[II] F. Ling. W. Li, and H. Sun, "Bitplane coding of DCT coefficients for
image and video compression." in Prac. SPlE Visual Communications
and Image Processing (VCIP), San Jose, CA, Jan. 25-27. 1999.
[I21 W. Li, F. Ling. and H. Sun, "Bitplane coding of DCT coefficients'.,
ISOllEC JTCIISC29IWGI I . MPEG97lM2691. Oct. 22. 1997.
[I31 M. van der Schaar, H. Radha. and Y. Chen, "An all FGS solution for
hybrid temporal-SNR scalability," ISOIIEC JTCIISC29IWGI I,
MPEG99lM5552, Dec. 1999.
1141 Qi Wang, Feng Wu. Shipeng Li. Yuzhuo Zhong, Ya-Qin Zhang, "Finegranularity spatially scalable video coding", 2001 IEEE lntcrnational
Conference on Acoustics, Speech, and Signal Processing. Volume: 3,
2001 pp.1801 -1804
Shipeng Li, Ya-Qin Zhang, "A framewrk for efficient
1151 Fens
progressive fine granularityscalahle video coding", IEEE Transactions on
Circuits and Systems for Video Technology, Volume: 1 I Issue: 3, March
2001 pp.332 -344
1161 Xiaoyan Sun. Feng Wu, Shipeng Li. Wen Gao. Ya-Qin Zhang,
"Macroblock-based progressive fine granularity scalable (PFGS) video
coding with flexible temporal-SNR scalablilities", 2001 lnternatianal
Conference on Image Processing, Volume: 2. Oct 2001 pp. 1025 -1028
[I71 Talluri R., "Error-resilient video coding in the IS0 MPEG-4 standard",
IEEE Communications Magazine, Volume: 36 Issue: 6, June 1998,
pp. I I2 -I I9
1181 ITU-T Study Group 16. .'Video Cnding For Law Bit Rate
Communication", ITU-T Recommendation H.263, 1996
[I91 Brailean J., "Wireless multimedia utilizing MPEG-4 rnor resilient tools",
Wireless Communications and Networking Conference, 1999. WCNC.
IYYY IEEE, 1999, p p . ~ 0 4 - ~ n a v ~ ~ . ~
[20] Delicado F.. Cuenca P.. Orozco-Harbosa L.. Garrido A , , Quiler F..
"Perfomimce evaluation ofenor-resilient mechanisms for MPEG-4 video
communications over wireless ATM links", Communications, Computers
and signal Processing, 2001. PACRIM. 2001 IEEE Pacific Rim
Conferenceon, Volume: 2,2001, pp.461 4 6 4
[21] Miki T., Hotani S.. Kawahawr., "Enor resilience features of MPEG-4
audio-visual coding and their application to 3G multimedia terminals",
Signal Processing Proceedings, 2000. WCCC-ICSP 2000. 5th
I
htemational Conference on, Volume: I , 2000, pp.40 -43 VOI.
(221 Valente S , Dufour C. Groliere F, Snook D, "An efficient enor
concealment implementation for MPEG-4 video streams", IEEE
Transactions on Consumer Electronics, Volume: 47 Issue: 3, August 200 I
pp.568 -578
1231 Takirhima Y., Wada M., Murakami H., "Reversible variable length
codes", IEEE Transactions on Communications, Volume: 43 Issue: 2
Pan: 3 , Feb.-March-April 1995 pp.158-162
1241 Cai J., Qian Zhang, Wenwu Zhu, Chen C.W., "An FEC-based enor
control scheme for wireless MPEG-4 video transmission". Wireless
Communications and Networking Conference, 2000. WCNC. 2000 IEEE,
2000, pp.1243 -1247 vo1.3

fi,

873

1251 Heinrelman W.R., Budagavi M., Talluri R., "Unequal error protection of
MPEG-4 compressed video", Image Processing, 1999. lClP 99.
Proceedings. 1999 lntemational Conference on, Volume: 2, 1999, pp.530
-534 vol.2
[26] Martini M.G., Chiani M., "Proportional unequal enor protection for
MPEG-4 video transmission", Communications, 2001. ICC 2001. IEEE
lntemational Conferenceon, volume: 4,2001, pp.1033 -1037 "01.4
1271 S.T. Wanall, S. Fabri, A.H. Sadka, A.M. Kondor, "Prioritisation of Data
Panitionrd MPEG-4 over Mobile Networks", E T - Europcan
Transactions on Telecommunications, Vol. 12, No. 3, MaylJune 2001
[28] Hagenauer J.; "Rate-compatible punctured convolutional codes (RCPC
codes) and their applications", IEEE Transactions on Communications,
Volume: 36 Issue: 4. April 1988 pp.389 -400
[29] Martini M.G.. Chiani M, "Wireless transmission o f MPEG-4 video:
performance evaluation of unequal error protection over a black fading
channel", Vehicular Technology Conferencc, 2001. VTC 2001 Spring.
IEEE VTS 53rd. Volume: 3,2001 pp.2056-2060
1301 Yuenan WU, "Data Compression", Publishing House of Electronics
,
pp.19-22
Industv, J U ~ . 2000
[31] ISOIIEC JTCllSC29lWGl I , "Adhoc Group on Core Experiments on
Enor Resilience aspects of MPEC-4 video, Description of Error
Resilience Core Experimcnts". N1473, Maceio, Brazil, No". 1996.
1321 T. Miki et al., "Revised Error Pattcm Generation Programs for Core
Experiments on Error Resilience", ISOIIEC JTCIISC29lWGI I
MPEG9611492. Maceio, Brazil, No". 1996.

Bo Yan received his B.E. and M.E. degrees in electrical


engineering from the X i " JiaoTang University, Xi'an,
China in 1998 and 2001 respectively.
Since September 2001, he has been a Ph.D. candidate in
the dcpanment of computer science and engineering in the
Chinese University of Hong Kong. Hong Kong. His
research inlerests include video compression. video
transmission and wireless communication.

Kam W. NG received the Ph.D. degree in electrical and electronic engineering


from the University of Bradford. U.K., in 1977. He is a Professor in the
Department ofComputer Science and Engineering at the Chinese Univcrsityaf
Hong Kong, Hong Kong. His research interests include reconfigurable
computing and mobile ad hoc networking.

Authorized licensed use limited to: UNIVERSITY OF ESSEX. Downloaded on November 21, 2008 at 21:49 from IEEE Xplore. Restrictions apply.

Você também pode gostar