VTJMM03

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO.
3, JUNE 2012
923
Multiple Description of Coded Video for Path Diversity Streaming Adaptation

Pedro Correia, Pedro A. Assuncao, Member, IEEE, and Vitor Silva
AbstractThis paper extends the current concept of multiple description coding (MDC) to the compressed domain, by proposing efcient splitting of standard single description coded (SDC) video into a multi-stream representation. A novel multiple description video splitting (MDVS) scheme is proposed to operate at network edges, for increased robustness in path diversity video streaming across heterogeneous communications chains. It is shown that poor performance of existing methods is mainly due to distortion accumulation, i.e., drift, when decoding is carried out with missing descriptions. The proposed scheme is able to effectively control drift distortion in both intra and inter predictive coding, even when only one description reaches the decoder. This is achieved by generating a controlled amount of relevant side information to compensate for drift accumulation, whenever any description is lost in its path. The simulation results show that any individual description can be decoded on its own without producing drift, achieving signicant quality improvement at reduced redundancy cost. The overall performance evaluation, carried out by simulating video streaming over lossy networks with path diversity, also demonstrates that MDVS enables higher quality video in such heterogeneous networking environments, for a wide range of packet loss rates. Index TermsAdaptive video streaming, drift control, multiple description, networks with path diversity.
I. INTRODUCTION
ULTIPLE description coding (MDC) is a promising approach to improve the quality of multimedia streaming over error-prone networks with path diversity, as can be found in current heterogeneous communications. In MDC, a video signal is typically encoded into several independent descriptions, i.e., compressed streams, where each one can be delivered over a separate channel making use of available path diversity. If joint decoding of all descriptions is done at the receiver, then the quality of the reconstructed signal is higher than that obtained
Manuscript received October 10, 2011; revised December 19, 2011; accepted December 20, 2011. Date of publication January 02, 2012; date of current version May 11, 2012. This work was supported by Fundao para a Cincia e Tecnologia (FCT), Portugal, under grants SFRH/BD/30087/2006 and SFRH/BD/ 50035/2009. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Monica Aguilar. P. Correia is with the Instituto de Telecomunicaes, Coimbra, Portugal, and also with the Polytechnic Institute of Tomar, Tomar, Portugal (e-mail: pcorreia@co.it.pt). P. A. Assuno is with the Instituto de Telecomunicaes, Coimbra, Portugal, and also with the Polytechnic Institute of Leiria, Leiria, Portugal (e-mail: paassunc@ieee.org). V. Silva is with the Instituto de Telecomunicaes, Coimbra, Portugal, and also with the DEEC, University of Coimbra, Coimbra, Portugal (e-mail: vitor@co.it.pt). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TMM.2011.2182184
from individual decoding of any single description [1]. These interesting features of MDC are accomplished at the cost of higher coding rate, i.e., redundancy, when compared to classic single description coding (SDC) [2]. Video streaming with path diversity is seen as a novel communication framework involving different technological elds and posing several research challenges. This is essentially driven by networks with multiple available paths from the sender to the receiver (e.g., mesh and overlay networks) and multiple source coding representations (i.e., MDC) that go beyond the classical paradigm of SDC, where one source is encoded into one single representation [3]. However, the combination of MDC with path diversity has always been used in communication chains typically comprised of an uncompressed source signal feeding an MDC encoder, followed by multiple transmission paths to the receiver [4], [5] or by streaming multiple complementary descriptions distributed across the edge servers of content delivery networks [6], [7]. A shortcoming of such communication model is that it does not take into account the typical scenario of current heterogeneous networking, where single path routes co-exist with multiple paths in the same delivery chain. The concept of multiple description video splitting of coded streams (MDVS) addressed in this paper lls the existing gap in heterogeneous video communications where an SDC stream is transmitted over a single path network and then needs to be split into several MDC streams. This might be particularly useful at edge nodes to benet from path diversity over different networks where multiple paths are available. Since MDVS operates on coded streams, any networking node with such processing capability can split an incoming SDC stream into the different outgoing paths that can be used from that particular network node to the end user terminal. A recent work highlighting the advantages of using MDVS for robust video streaming and to deal with handoff over wireless local area networks (WLAN) is presented in [8]. Whenever a coded video stream is processed, the predictive nature of video coding algorithms must be taken into account, because drift leads to distortion accumulation at the end-user decoding terminal. In MDVS, the effect of drift can be explained as follows. In the absence of errors or data loss, all descriptions are decoded and the reconstructed blocks/frames are then used as reference for others by providing accurate predictions. However, if any description is lost, then the predictions reconstructed in the decoder do not match those originally used to encode the SDC stream. This mismatch is the origin of drift by adding distortion to decoded video, which is further accumulated in the reconstruction loop and propagated throughout all subsequent
1520-9210/$31.00 2012 IEEE
924
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012
predicted blocks. In MDC, a possible solution to deal with this problem is to design complex multi-loop encoders, such as those described in [9]. This paper proposes a novel MDVS scheme based on multiple description scalar quantization (MDSQ), using side information to control drift in both spatial and temporal prediction. The side information is generated from the original stream and its rate is controlled with an independent quantization parameter which also controls redundancy. Then, a simplied architecture is devised to reduce the overall complexity in regard to the number of processing functions and memory requirements. No additional information is needed from the original SDC encoder in order to generate such side information at any MDVS-enabled network node. The paper is organized as follows. Section II presents a brief overview of relevant work available in the literature. In Section III, the problem of drift is analysed in the context of MDVS, also highlighting the pertinence of this work. In Section IV the proposed MDVS architecture is described. Section V presents simulation results and discussion, with particular emphasis on drift distortion and streaming quality over lossy networks with path diversity, using the proposed MDVS. Finally Section VI concludes the paper. II. RELATED WORK Current MDC algorithms can be classied into different categories according to the methods used for generating multiple coded descriptions of the same source signal. In the past, several methods have been proposed including scalar quantization [10], MD by subsampling the source signal in different domains (e.g., spatial, temporal, frequency) [11][13], MD transform coding using correlating transforms [14], MD of motion information [15], and partitioning of transform coefcients [16]. The advanced video coding tools and features available in H.264/AVC have also been used to form multiple MD schemes. In [17] a slice group scheme is presented with three motion compensation loops. The video signal is encoded in the central encoder and then divided into two descriptions, each one corresponding to one slice group. Each slice group includes redundant information from the other one. Based on this scheme, in [18] a rate controlled redundancy-adaptive model that takes into account the effects of error propagation and concealment is proposed. A similar approach is proposed in [19] where the temporal and spatial correlations between macroblocks are exploited to achieve efcient redundancy coding. In [20][22] the redundant slice feature of H.264/AVC is exploited in order to form two different descriptions with controlled redundancy. Multiple description based on scalable coding using distinct quality resolution [23] and coding structures [24] were also proposed. MDC has also been investigated for non-standard video coding algorithms such as in [25][27], where MDC using standard coding techniques is combined with distributed video coding (DVC). In comparison with the existing MD schemes previously cited, this paper addresses a different MD problem, which consists in splitting compressed video streams rather than producing MD from uncompressed video signals. Moreover,
since MDVS suffers from the same intrinsic problem of drift as MDC, i.e., accumulation of decoding distortion when any description is lost in the network, a novel aspect of the proposed scheme is its capability for limiting such type of distortion, by generating a controlled amount of side information, specically for this purpose. In the past, the problem of drift distortion in MDC has been dealt with two distinct approaches. Firstly, using multi-loop architectures with side information for each description in order to eliminate or mitigate the decoding mismatch, yet at a cost of an excess rate [28][32]. Secondly, by using a reference picture selection (RPS) method based on automatic repeat request (ARQ). In [33], the reference frames for motion compensated prediction are selected according to feedback information received from the transmission paths. A similar principle is used in [34], where routing messages are used in order to estimate packet loss error rate, by dynamically selecting the best reference frame in order to alleviate error propagation. If the drift compensation process is considered as a bit stream switching problem (i.e., switching from decoding with two descriptions to decoding with a single description), then periodic switching frames (e.g., H.264/AVC SI/SP slices) might be used to enhance MDC error resilience [35]. However, this mechanism cannot be used directly with compressed streams because SI/SP frames would need to be dynamically computed by full MDC encoders at MDVS network nodes. Since the multi-loop drift compensation methods, referred to above, lead to high complexity implementations, a different approach is followed in the MDVS scheme proposed in this paper, which is based on a single-loop MDSQ. This comprises a novel MDVS architecture with low processing complexity, achieved by reusing most coding parameters of the incoming SDC video streams. Two rather different approaches, based either on channel coding or source coding, have been followed in video streaming applications using MDVS. In [36] the channel coding approach is proposed for an end-to-end video communication system where MDVS, based on forward error correcting codes (FEC), is integrated in a congestion control framework for video streaming over the internet. Other examples of previous work in MDVS based on channel coding are reported in [37] and [38]. MDVS based on the source coding approach was also addressed in the past, but mainly focussed on its application in some networking scenarios without taking in account the rate-distortion efciency of actual MDVS processing architectures. In particular, drift free MDVS architectures cannot be found in the available literature. Related work can be found in [39], where an MDVS scheme is proposed based on redundancy rate-distortion optimization for splitting DCT coefcients of the incoming bitstream. In [40] another MDVS scheme is proposed, based on replication and interleaving of DCT coefcients among all descriptions. However, these are open-loop schemes with no drift compensation, also resulting in higher levels of rate redundancy. As highlighted in Section III-B, another novel aspect of this paper is to provide evidence about the catastrophic effect of drift, which drastically reduce performance if the splitting architecture does not adequately compensate for its accumulation.
CORREIA et al.: MULTIPLE DESCRIPTION OF CODED VIDEO FOR PATH DIVERSITY STREAMING ADAPTATION
925
Fig. 2. Classic MDVS scheme.
TABLE I INDEX ASSIGNMENT WITH
Fig. 1. MDVS application scenario.
A rather different approach of video adaptation using side information based on distributed source coding techniques is described in [41]. Although this may not be classied exactly as an MDC scheme, it is worth to mention that such adaptation scheme generates redundant side information taking drift into account. III. MULTIPLE DESCRIPTION VIDEO SPLITTING OF CODED STREAMS (MDVS) Fig. 1 shows a possible video streaming scenario where MDVS might be useful. A single video stream (i.e., SDC stream) is distributed from the streaming server to diverse user terminals over heterogeneous networks, some of them having disjoint separate paths from an intermediate node to the user terminal. Since the server storage capacity and streaming bandwidth required for SDC video is less than that of equivalent MDC (i.e., same quality), due to the inherent redundancy of MDC, SDC is a more efcient coded format for storage and distribution over single path networks. The advantage of MDVS is to introduce a further level of exibility in SDC video streaming, in order to benet from transmission over multiple paths where these are available along the delivery chain. Therefore, MDVS can be seen as a novel adaptation functionality of edge nodes in the heterogeneous video streaming environments of the future media internet. In this paper, the main novel aspects of the proposed MDVS scheme comprise: 1) a two-loop MDVS architecture with drift control in both intra and inter predictive coded slices; 2) an equivalent single-loop architecture; 3) a method to generate side information from SDC video; 4) the capability of controlling the amount of side information according to the expected decoder drift; and 5) an overall performance similar to MDC using uncompressed video. A. Classic MDVS MDVS can be regarded as a data partitioning scheme, capable of generating two descriptions from an SDC video stream. Since current coded video formats convey a great deal of the source information in transform coefcients, MDSQ is good candidate to design low complexity MDVS systems. Fig. 2 shows a classic MD video splitting scheme where each transform coefcient is represented by two different values, which result in dividing an SDC stream into two descriptions. For instance, this MDVS method was used in [8] where the coding information embedded in the original SDC stream, such as slice maps, prediction modes and motion vectors are duplicated in the two resultant descriptions. In the scheme shown in Fig. 2, an index assignment function is used for mapping each quantization index of the original transform coefcients (i.e., central indices) into a pair of side indices which are then entropy encoded. The index assignment function used in this paper follows the same approach as proposed in [1]. It is dened by an index assignment matrix as shown in Table I, whose elements are the SDC quantizer indices, i.e., central indices, each one corresponding to a pair of side indices dened by the respective column and row. Any individual description is a coarse representation because a null coefcient is obtained for several non-zero central indices, e.g., for central indices , 2, 0, 1, 4 and for , 1, 0, 2, 3. The redundancy is controlled by an index spread parameter where is the number of diagonals of the index matrix. In Table I there are 5 diagonals, i.e., . In the general case of balanced descriptions, the same rate is used for all of them. At the decoder, if both descriptions are available, then an inverse index assignment process restores a unique central index to be inverse quantized and inverse transformed. If any description is not available for decoding, the central index cannot be unambiguously identied because there are multiple possible values for each individual description index. This leads to index decoding errors, causing mismatch between the original SDC prediction loop and that of the decoder. As pointed out before, the consequence of such mismatch is distortion accumulation in the decoder prediction, i.e., drift. In [42], the authors propose an error resilient method to minimize this problem, which improves decoding performance when only one description is received. B. Drift Analysis If the classic MDVS scheme of Fig. 2 is used for adapting coded video streaming to networks with path diversity, then drift
926
is introduced at the decoder whenever any description is lost. The drift distortion component can be determined from the relevant signals involved in MD splitting and decoding. When two descriptions are received, both side indices are decoded and merged into the corresponding central index . In this case, for each block , the reconstructed central pixel values are given by (1) is the decoded residue and its associated prewhere diction either from intra prediction or motion compensation, formed from decoding both descriptions. If only one description (either or ) is decoded, then the reconstructed pixel values are given by (2) where is the decoded residue and its prediction formed from decoding description only. Since results from inverse index assignment using with only one description as input, the difference between the original SDC residue and that decoded from only one description produces a reconstruction error , i.e., (3) Substituting (3) in (2) (4) and then using (1) in (4), becomes
Fig. 3. Distortion accumulation within an intra frameCoastguard.
(5) where (6) is the drift component due to mismatch between the SDC predictions used in the original encoder and those reconstructed at the nal decoder from only one description. Note that the above analysis is valid for both the spatial and temporal drift components, though these can be identied as separate contributors to the overall drift distortion. The actual impact of the drift component given by (6) in the objective video quality was experimentally evaluated for the MDVS scheme of Fig. 2. The results for intra predicted and motion compensated (MC) frames, are shown in Figs. 3 and 4, respectively. 1) Intra PredictionI Frames: Fig. 3 shows the drift effect over one intra frame from the coastguard sequence, originally encoded using H.264/AVC with all intra prediction modes enabled. The peak signal-to-noise ratio (PSNR) is shown for each macroblock decoded from only one description and also from both of them. The SDC trace provides a reference for comparison at the same rate as the single description (1.59 bpp). Fig. 3 shows that drift distortion introduced by classic MDVS of an I frame yields unacceptable quality when only one description is decoded. The more macroblocks are decoded, the higher is
Fig. 4. Distortion accumulation in MC predicted frames-Coastguard.
the accumulated distortion, leading to continuous drop of PSNR along each row of macroblocks. The peaks in PSNR correspond to reset the accumulated drift distortion to zero at the beginning of each row of macroblocks because the rst macroblock of each row is not predicted from the previous ones. 2) MC PredictionP Frames: Fig. 4 shows the effect of drift accumulated over one GOP, comprised of one initial I frame followed by 20 P frames. Both descriptions of the initial I frame are fully decoded in order to not inuence drift in subsequent P frames. The PSNR is shown for each frame decoded from only one description and also from both of them. For comparison, the PSNR of the SDC stream at the same bit rate as that of a single description (3.9 Mbit/s) is also shown. Fig. 4 shows that drift distortion introduced by classic MDVS over one GOP also yields unacceptable quality when only one description is decoded. The effect of drift is quite evident from the rapid decrease of PSNR due to distortion accumulation along the GOP. Since the SDC stream has the same rate as the single description, the continuous decreasing of PSNR observed in the latter
927
Fig. 5. Two-loop MDVS architecture.
does not depend on the actual coding rate. Rather it is due to prediction mismatch between encoder and decoder. The above analysis and results clearly show that MDVS cannot be used without drift compensation. An efcient solution is described in the next sections. IV. MDVS WITH DRIFT COMPENSATION A novel MDVS architecture and an equivalent simplied version are proposed to overcome the problem of spatial and temporal drift. In comparison with the classic scheme of Fig. 2, the proposed splitting architecture generates an additional amount of side information for the specic purpose of preventing drift when only one description is decoded. As explained in the next subsections, the side information is generated for each description by further encoding the difference between the original SDC signal and the one decoded from the description itself. A. MDVS Architecture The proposed MDVS architecture using side information for drift compensation is shown in Fig. 5. Such an architecture is used for drift free MDVS of both intra predicted and motion compensated predicted frames. However, the B frames that are not used as references may be processed with the classic MDVS scheme of Fig. 2 because they do not contribute for drift accumulation. Similarly to the classic MDVS, the headers, prediction modes, and motion vectors are duplicated in both descriptions. The side information is encoded by using the same prediction modes and motion vectors, but the corresponding syntax is only included in the corresponding descriptions. In the MDVS architecture of Fig. 5 the side information for each description , with , 2, is dened as (7)
where (8) and is the quantization with side quantizer that determines the amount of side information sent to the decoder, is the transform operation, is the current frame, and is the motion compensated prediction from side encoder . The residue , available at the decoder when only one description is correctly received, is given by (9) denotes the inverse index assignment operation where is the inverse when only description is available and quantization using the SDC quantization parameter . The central index is assigned to the value located in the main diagonal of the index assignment matrix which corresponds to either the row or column of side index . At the decoder, if both descriptions are available, then the inverse index assignment operation is responsible to restore the exact value of the original SDC index, which is then inverse quantized and inverse transformed. However, if only one description is decoded, then frame is reconstructed without drift as follows: (10) with (11) By comparing (2) with (10), one concludes that is the drift compensation signal that is encoded as side information and sent to the decoder. Note that in the classic MDVS scheme of Fig. 2, only signal is sent for decoding which is not enough to prevent drift.
928
Fig. 6. Single-loop MDVS architecture.
Fig. 7. Equivalent single-loop MDVS architecture.
B. Simplied MDVS The MDVS architecture of Fig. 5 can be simplied by assuming that prediction is a linear function (this is valid except for rounding and truncation arithmetic). Using (1) and (8), the following relation can be derived: (12) Considering equation (12) and the previous assumption of linearity, the simplied architecture of Fig. 6 can be derived. In this architecture there is only one loop for intra prediction (i.e., IP) and one MC loop to accumulate the differential signal used for drift compensation. The equivalence between architectures of Figs. 5 and 6 can be demonstrated as follows. In the architecture of Fig. 6, the side information is given by (13) and the signal accumulated in the local prediction loop for drift compensation is dened as follows: (14) then, by using equations (1), (10), (13), and (14), the resulting expression is
(15) Considering the linearity of prediction then , (16)
Equation (16) represents the difference between frames reconstructed from the two prediction loops of Fig. 5. In the simplied architecture of Fig. 6, such difference is accumulated in only one loop and the result is used in the same manner as in Fig. 5, which demonstrates that both architectures are equivalent. Moreover, since transform and quantization can be implemented as independent operations, the architecture of Fig. 6 can be further simplied to that of Fig. 7. For each description, this scheme only uses one frame buffer and two transforms while that of Fig. 5 needs two frame buffers and four transforms. Note that in H.264/AVC, the scheme of Fig. 7 needs to use scaling coefcients in the quantization and inverse quantization functions in order to make them independent from the transform. This can be easily done as described in [43]. Although some operations may involve nonlinear arithmetic, such as clipping functions in transform/quantization, rounding in sub-pel MC interpolation and deblocking ltering, the actual effect on the drift performance of the simplied MDVS is mostly negligible. However, it might be slightly more signicant in high motion sequences. The proposed MDVS architectures were implemented on the reference software of H.264/AVC online available. Each coded description produced by MDVS is standard-compliant and the corresponding coded data is encapsulated into video coding layer (VCL) network adaptation layer (NAL) units. To include the side information in the standard syntax, a new type of VCL NAL unit must be dened for such coded data. This can be done by extending the existing NAL types using different approaches. For instance in H.264/SVC [44], new NAL unit types were dened to accommodate several layers and associated information and in [45], a new type of NAL unit is proposed for embedding redundant information inside standard video streams.
929
V. SIMULATION RESULTS The performance of the proposed MDVS scheme was simulated by splitting SDC video streams into MDC ones for transmission over path diversity networks. An application scenario similar to the one illustrated in Fig. 1 was assumed. Two different types of results were obtained to prove the advantages of using the proposed MDVS: 1) the impact of drift accumulation when one description is totally lost, either in intra or inter predicted coded frames, and 2) the video streaming quality obtained in lossy networks with path diversity. A. MDVS Drift Performance The drift performance of MDVS was evaluated in two different aspects: 1) distortion accumulation in decoded video when only one of the two descriptions reaches the decoding terminal and 2) the extra redundancy of the side information produced by the proposed MDVS to compensate for drift, also in one of the two descriptions. The reference used for comparison is one description obtained from the classic MDVS scheme of Fig. 2. The spatial and temporal drift performance were evaluated by using coded streams with intra predicted and MC predicted frames (P and B), respectively. In order to obtain a comparable evaluation, all streams were encoded at the same rate. The original headers, prediction modes, slice maps and motion vectors are duplicated into both descriptions. The side information is encoded using the same coding modes as the corresponding descriptions. Note that coding modes and motion vectors are not included in the side information because they are available from the respective coded description. 1) Intra Predicted Frames: The benet of drift compensation in intra predictive MDVS is shown in Fig. 8 where the PSNR of each macroblock of one frame, in one of the two descriptions (bus sequence), is shown for classic MDVS and for two-loop MDVS (i.e., Fig. 5) at the same bit rate. The same rate is ensured by an average central quantizer for classic MDVS and , for proposed MDVS. Fig. 8 clearly shows that the proposed MDVS produces much higher and smoother PSNR along the I frame than in the case where drift compensation is not done. In the case of no drift compensation, i.e., classic MDVS, the lowest PSNR is below 20 dB, which is denitely not acceptable. 2) MC Predicted Frames: A different experiment was carried out to evaluate the performance of temporal drift compensation. A GOP structure with high number of predicted frames was used to provide a worst case scenario in regard to temporal drift, i.e., a sequence of P frames using only one reference, i.e., IPPP The GOP size was set to 20 frames. The entire loss of one description is simulated in the path diversity network, for the initial I frame and also for all subsequent P frames (i.e., only one description is decoded). In the error-free descriptions of all streams the initial I frame is sent with side information in order to not affect the quality of subsequent P frames. Both the two-loop and the single-loop MDVS architectures were used in the experiment in order to evaluate the effect of the nonlinearity of motion compensation in the drift accumulation over a significant number of temporally predicted frames.
Fig. 8. PSNR of intra frame macroblocks.
Fig. 9. PNSR for MC predicted frames (IPPP
) for coastguard and foreman.
Fig. 9 shows the PSNR for coastguard and foreman sequences at the same bit rate, using an average central quantizer for classic MDVS and , for proposed MDVS. It is quite evident that the proposed MDVS architectures are drift-free, while temporal drift accumulation is responsible for severe quality degradation in classic MDVS. At the end of the GOP, the PSNR obtained by using the proposed MDVS is about 6 dB higher than classic MDVS. For classic MDVS, two-loop and single-loop, the average PSNR of coastguard is, respectively, 28.96 dB, 32.25 dB ( 3.46), and 32.42 dB ( 3.29) and that of foreman is 31.41 dB, 34.34 dB ( 2.93), and 34.43 dB ( 3.02). These results also show that nonlinearity of motion compensation is negligible, since the PSNR obtained from the single-loop MDVS architecture are quite similar to those obtained from two-loop MDVS. Therefore, full decoding of the incoming stream is not necessary to achieve drift free MDVS and these results validate the proposed single-loop architecture. 3) Generic Regular GOP: The overall performance of the proposed MDVS using generic IBBP GOP structures was also
930
TABLE II PSNR VERSUS SIDE INFORMATION REDUNDANCY (ONE DESCRIPTION)
Fig. 10. PNSR for generic regular GOP (IPBBP
) for coastguard.
evaluated (GOP size of 20 frames). In the proposed MDVS, drift compensation was used for both I and P frames but not for B frames because these are not used as references for prediction. Performance evaluation was carried out by splitting SDC streams of each sequence into two descriptions and then simulating that only one description reaches the decoder for all frames in the GOP. In all streams, the initial I frame is always sent with side information in order to not inuence the quality of subsequent predicted frames. Fig. 10 shows the PSNR obtained from the proposed MDVS architectures and classic MDVS. The same bit rate was obtained for all streams using an average central quantizer for classic MDVS and for the proposed MDVS, which also used . The results in Fig. 10 clearly conrm the effectiveness of the proposed architecture to eliminate the drift and consequently to achieve signicant quality improvement in MD adaptation of coded video streams. The proposed architecture compensates for the drift in P frames which results in a signicant overall quality improvement. Comparing PSNR of both the two-loop and single-loop MDVS architectures, these are very similar, which further validates the effectiveness of the simplied single-loop. 4) Overall Effect of Side Information: The overall effect of the side information in the rate and video quality for different combinations of average and is presented in Table II, where the average PSNR and extra redundancy are shown. This extra redundancy is due to the side information and it is measured as the percentage of total bit rate increase in each description, using the SDC rate obtained at the same as reference. Therefore, this is the actual cost of the side information for achieving drift compensation in MDVS. Without such extra redundancy, the overall redundancy is equal to that of classic MDVS and it is in line with various MDC schemes, as discussed in Section V-C. Note that, as previously pointed out, the side information does not include coded motion vectors neither prediction modes. When a single description is received, these results demonstrate that the proposed MDVS can signicantly improve the
video quality at a small cost in additional redundancy. The table shows that extra redundancy due to side information ranges from 1% to 14%, while PSNR benets from increases between 0.6 dB and 4.1 dB, in comparison with classic MDVS. As previously pointed out, such PSNR improvement is due to drift compensation. Although this is dependent on the type of sequence, it is worthwhile to note that PSNR obtained from the proposed MDVS is consistently better for acceptable levels of extra redundancy. For instance, for the foreman sequence with and an excess rate of 2.9%, the mean PSNR improves 1.8 dB and for bus sequence with and with an excess rate of 3.7%, the mean PSNR improves 2.5 dB. Better improvements can be achieved with higher redundancy values. This is the case, for example, of the bus sequence where the quality improves by 3.4 dB if 11.5% of excess rate is used by side information. The results in Table II also show that redundancy of side information increases with . This is because the amount of side information to be encoded increases for higher values of
931
, which in turn is due to the larger differences between SDC and each description (i.e., difference between and , , 2 in Fig. 5). Moreover, for each , the extra redundancy decreases with . This is due to the fact that is the quantizer used to encode the side information itself. Thus, the higher the value of , the smaller the respective coded rate is. These results provide useful insight for future design of efcient MDVS rate control algorithms. B. MDVS Streaming With Path Diversity The MDVS performance using the proposed architectures was evaluated in a simulated path diversity scenario where an SDC video stream is split into two descriptions at the network edge for streaming over different paths (e.g., Fig. 1). After splitting the SDC stream at the MDVS edge node, each description is streamed over independent paths subject to the same packet loss rates (PLR) and average burst error length (BEL). The side information is then multiplexed and packetized along with the corresponding description. Therefore when a packet is lost, the coded description and side information are both lost. In the simulations, the packet size was set to 1000 bytes. The reference used for performance comparison of the proposed MDVS is SDC streaming under the same networking conditions, i.e., using the same amount bandwidth and suffering from equal PLR and BEL. Burst packet loss was simulated using a Gilbert-Elliott 2-State Markov Model in order to generate different average packet loss rates and mean burst duration [46]. In order to obtain statistically meaningful results, the transmission of each sequence was simulated 100 times under the same network conditions, i.e., average PLR values of 3%, 5%, 7%, and 10% and average BEL of 4 and 12 packets. Five test sequences were used, Bus, Foreman, Motherdaughter, News, and City, CIF@30 Hz. The GOP structure was IBBPBBP with GOP frames. In all cases, an index assignment matrix of 3 diagonals was used to generate the two descriptions from the compressed SDC stream. In this case a single index encoded in a description represents 3 coefcients in the original SDC stream. Frame-copy error concealment was used whenever one packet is lost. Note that, for this type of performance evaluation, such low performance concealment method is preferable over more efcient ones, because the quality results do not include masking effects due to concealment. Figs. 1116 show the average PSNR obtained for different PLR (3%, 5%, 7%, and 10%), BEL and rates (1.25, 1.8, and 2.16 Mbit/s) using bus CIF@30 Hz sequence. For PLR higher than 3%, the simulation results show that proposed method achieve better average PSNR than classic MDVS and SDC. For longer burst length and higher PLR, gains are signicantly increased, particularly for longer error burst lengths. Considering and and 10%, the gains comparing with SDC are 23 dB and comparing with classic MDVS are 12 dB, considering all rates. For , the proposed MDVS architecture improves the decoded video quality, where it is most signicantly for and . Note that for higher PLR, the probability of losing both descriptions simultaneously is also higher, which tends to increase the inuence
Fig. 11. Average PNSR for bus at 1.25 Mbit/s (Burst length
).
).
of the error concealment at the decoder and to reduce the advantages of MDC streaming. Figs. 1116 also show a critical point around , which indicates that for lower values of PLR it is better to keep SDC instead of MDVS. These results suggest that such switching point should be used for no serious loss network conditions. The optimal computation of such switching under various networking conditions is an open issue that deserves further investigation. Note that similar switching points are referred to in the literature, (e.g., [32]). Other recent MDC schemes based on DVC (e.g., [26], [27]) also exhibit similar behavior at relatively higher PLR (e.g., 5%15%). As expected, in the lossless case (i.e., ), both the SDC and classic MDVS achieve better PSNR in comparison with the proposed MDVS. This is due to the overhead required to encode the side information, since the PSNR of the three streams is compared at exactly the same overall bit rate. The difference of about 23 dB is in line with other MDC schemes available in the literature, as discussed in Section V-C. Nevertheless, in the presence of packet loss, the perceptual quality is better because the variation of PSNR is much lower in
932
).
).
).
Fig. 17. Frame-by-frame PSNR for bus sequence.
).
the case of MDVS. This is shown in Fig. 17 for bus sequence, by comparing classic MDVS with proposed MDVS, affected
by the same lost packets. During the period affected by packet loss, the proposed MDVS achieves PSNR gains of about 34 dB and much lower quality variation, i.e., about 3 dB variation in comparison with 7 dB of classic MDVS. Finally, Table III shows simulation results for different sequences for , 5%, 7%, and 10% and average BEL of 4 and 12 packets, at 1 Mbit/s. Comparing average PSNR gains for packets of the proposed MDVS over classic MDVS, these are 12 dB for , 1.82.6 dB for , 22.5 dB for , and 1.12.6 dB for . These results show that the proposed scheme achieves better quality while the small excess rate is benecial by avoiding drift in case of packet loss. In comparison with SDC, the average PSNR gains are 0.30.6 dB for , 0.41.5 dB for , 0.12.6 dB for , and 0.12 dB for . For average packets and comparing the proposed MDVS over classic MDVS, the gains are from 0.10.3 dB for , 0.91 dB for , 1.11.5 dB for , and 1.62.4 dB for . In comparison with SDC, the average PSNR gains are 01 dB
933
TABLE III AVERAGE PSNR IN TRANSMISSION WITH DIFFERENT PACKET LOSS RATES FOR 1 Mbps@30 Hz
, 0.21.3 dB for , and 03 dB for . Higher gains are obtained in sequences with high motion and texture complexity, like bus and foreman sequences. This means that the proposed architecture is more efcient to reduce the overall drift distortion in sequences with high motion and texture complexity. Overall, these results show that network-adaptive MDVS for path diversity consistently improves the robustness of video streaming across networks with multiple paths. Furthermore, the proposed MDVS exhibits better performance than existing classic schemes because of its improved drift characteristics. During packet loss periods, signicantly higher average PSNR is obtained at the expense of acceptable redundancy which is necessary for drift compensation. C. MDVS versus MDC: Comparative Discussion Despite the inherent differences between MDVS and MDC, due to the different nature of their input signals (i.e. compressed versus uncompressed video in MDVS and MDC, respectively), the rate-redundancy performance of the proposed MDVS can still be compared and discussed in the light of that obtained in previous MDC schemes at (same overall rate in both MDC and SDC), namely those based on multi-loop approaches that cope better with drift than classic MDVS. The overall performance of MDVS in comparison with SDC was found to have a PSNR drop of about 1.6 dB and 3.1 dB for rateredundancies between 50% and 90% (including extra redundancy of side information), using the set of sequences referred to above. Under the same conditions, the multi-loop MDC architectures proposed in [29], [30], and [32] exhibit rate-redundancies from 40% to 100% for PSNR drops about 1.69 dB to 4 dB. Also in [16], an open loop MDC scheme is proposed where the results show rate-redundancies between 45% and 100% for PSNR drops between 1.81 dB and 3.35 dB. The MDC scheme proposed in [19] uses a multi-loop approach based on a spatial slice partitioning method previously used in [17]. This scheme has a temporal partitioning coun-
for
terpart, also using multiple-loops, proposed in [28]. In these papers, the overall rate-redundancy distortion performance (at ) was found to be better than that obtained in MDSQ based architectures. However, the coding approaches used by such MDC schemes cannot be used in MDVS without fully decoding the input SDC video followed by independent MDC encoding. Therefore, the use of these multi-loop MDC methods in the same networking scenarios as MDVS is highly complex, which is a signicant disadvantage in comparison with MDVS. A wavelet-based MDC scheme was recently proposed in [13], but exhibits lower performance than MDVS. The results in [13] must be combined with those achieved in [12] in order to nd out that 4 dB quality drop is obtained at a relatively low rate-redundancy (i.e., 30%). Similar conclusions can be derived by comparing MDVS with another recent work in MDC [27]. Overall, even though the rate-redundancy distortion performance of the proposed MDVS is affected by the coding distortion present in the input signal, the global rate-redundancy distortion of MDVS is inline with that of different MDC schemes. VI. CONCLUSION This paper demonstrates that splitting of compressed video streams into multiple descriptions using a classic MDC architecture leads to unacceptable drift accumulation, which severely affects the quality of decoded video when only one description reaches the decoder. Novel MDVS architectures were proposed to overcome the problem of drift. The proposed schemes are effective to prevent drift by using a controlled amount of side information. The experimental results provide evidence that the decoded video quality can be signicantly improved at the expense of an acceptable redundancy increase in comparison with classic MDVS, for channels with distinct packet loss rates. Overall the proposed MDVS architecture nds application in multimedia networking heterogeneous environments, where lossy networks with single and multiple available paths co-exist along the same delivery chain.
934
ACKNOWLEDGMENT The authors would like to thank the reviewers for the valuable comments and suggestions. REFERENCES
[1] V. A. Vaishampayan, Design of multiple description scalar quantizers, IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 821834, May 1993. [2] V. Goyal, Multiple description coding: Compression meets the network, IEEE Signal Process. Mag., vol. 18, no. 5, pp. 7493, Sep. 2001. [3] P. Frossard, J. de Martin, and M. Reha Civanlar, Media streaming with network diversity, Proc. IEEE, vol. 96, no. 1, pp. 3953, Jan. 2008. [4] S. Mao, S. Lin, Y. Wang, S. Panwar, and Y. Li, Multipath video transport over wireless ad hoc networks, IEEE Wireless Commun. Mag., vol. 12, no. 4, pp. 4249, Aug. 2005. [5] E. Akyol, A. M. Tekalp, and M. R. Civanlar, A exible multiple description coding framework for adaptive peer-to-peer video streaming, IEEE J. Select. Topics Signal Process., vol. 1, no. 2, pp. 231245, Aug. 2007. [6] J. Apostolopoulos, T. Wong, W. Tian Tan, and S. Wee, On multiple description streaming with content delivery networks, in Proc. IEEE INFOCOM, 2002. [7] S. Ahuja and M. Krunz, Algorithms for server placement in multipledescription-based media streaming, IEEE Trans. Multimedia, vol. 10, no. 7, pp. 13821392, Nov. 2008. [8] C.-M. Chen, C.-W. Lin, H.-C. Wei, and Y.-C. Chen, Robust video streaming over wireless LANs using multiple description transcoding and prioritized retransmission, J. Vis. Commun. Image Represent., vol. 18, pp. 191206, 2007. [9] Y. Wang, A. R. Reibman, and S. Lin, Multiple description coding for video delivery, Proc. IEEE, vol. 93, no. 1, pp. 5770, Jan. 2005. [10] O. Campana, R. Contiero, and G. A. Mian, An H.264/AVC video coder based on a multiple description scalar quantizer, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 268272, Feb. 2008. [11] I. V. Bajic and J. W. Woods, Domain based multiple description coding of images and video, IEEE Trans. Image Process., vol. 12, no. 10, pp. 12111225, Oct. 2003. [12] C. Tillier, T. Petrisor, and B. P. Popescu, A motion-compensated overcomplete temporal decomposition for multiple description scalable video coding, EURASIP J. Image Video Process., 2007. [13] J. A. M. Biswas, M. Frater, and M. Pickering, Improved resilience for video over packet loss networks with MDC and optimized packetization, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 10, pp. 15561560, Oct. 2009. [14] Y. Wang, M. T. Orchard, V. Vaishampayan, and A. R. Reibman, Multiple description coding using pairwise correlating transforms, IEEE Trans. Image Process., vol. 10, no. 3, pp. 351366, Mar. 2001. [15] C.-S. Kim and S.-U. Lee, Multiple description coding of motion elds for robust video transmission, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 9, pp. 9991010, Sep. 2001. [16] K. R. Matty and L. P. Kondi, Balanced multiple description video coding using optimal partitioning of the DCT coefcients, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 928934, Jul. 2005. [17] N. C. D. Wang and D. Bull, Slice group based multiple description video coding with three motion compensation loops, in Proc. IEEE Int. Symp. Circuits and Systems, May 2005, pp. 960963. [18] D. A. N. Kamonoonwatana and C. Canagarajah, Rate-controlled redundancy multiple description coding for video transmission over MIMO systems, in Proc. IEEE 17th Int. Conf. Image Processing (ICIP), Sep. 2010, pp. 12691272. [19] C.-C. Su, H. H. Chen, J. J. Yao, and P. Huang, H.264/AVC-based multiple description video coding using dynamic slice groups, Image Commun., vol. 23, no. 9, pp. 677691, 2008. [20] T. Tillo, M. Grangetto, and G. Olmo, Redundancy slice optimal allocation for H.264 multiple description coding, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 1, pp. 5970, Jan. 2008.
[21] L. Peraldo, E. Baccaglini, E. Magli, G. Olmo, R. Ansari, and Y. Yao, Slice-level rate-distortion optimized multiple description coding for H.264/AVC, in Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), Mar. 2010, pp. 23302333. [22] I. Radulovic, P. Frossard, Y.-K. Wang, M. M. Hannuksela, and A. Hallapuro, Multiple description video coding with H.264/AVC redundant pictures, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 1, pp. 144148, Jan. 2010. [23] T. B. Abanoz and A. M. Tekalp, SVC-based scalable multiple description video coding and optimization of encoding conguration, Signal Process.: Image Commun., vol. 24, pp. 691701, 2009. [24] C. Zhu and M. Liu, Multiple description video coding based on hierarchical B pictures, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 4, pp. 511521, Apr. 2009. [25] O. Crave, B. Pesquet Popescu, and C. Guillemot, Robust video coding based on multiple description scalar quantization with side information, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 769779, Jun. 2010. [26] S. Milani and G. Calvagno, Multiple description distributed video coding using redundant slices and lossy syndromes, IEEE Signal Process. Lett., vol. 17, no. 1, pp. 5154, Jan. 2010. [27] Y. Fan, J. Wang, and J. Sun, Distributed multiple description video coding on packet loss channels, IEEE Trans. Image Process., vol. 20, no. 6, pp. 17681773, Jun. 2011. [28] Y. Wang and S. Lin, Error-resilient video coding using multiple description motion compensation, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp. 438452, Jun. 2002. [29] A. R. Reibman, H. Jafarkhami, Y. Wang, and M. T. Orchard, Multipledescription video coding using motion-compensated temporal prediction, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 3, pp. 193204, Mar. 2002. [30] X. Tang and A. Zakhor, Matching pursuits multiple description coding for wireless video, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp. 566575, Jun. 2002. [31] Y. A. Y.-C. Lee and R. M. Mersereau, An enhanced multiple description video coder with drift reduction, IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 1, pp. 122127, Jan. 2004. [32] N. Franchi, M. Fumagalli, R. Lancini, and S. Tubaro, Multiple description video coding for scalable and robust transmission over IP, IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 3, pp. 321334, Mar. 2005. [33] Y. W. S. Lin, S. Mao, and S. Panwar, A reference picture selection scheme for video transmission over Ad-HOC networks using multiple paths, in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), Aug. 2001, pp. 9699. [34] Y. Liao and J. D. Gibson, Routing-aware multiple description video coding over mobile ad-hoc networks, IEEE Trans. Multimedia, vol. 13, no. 1, pp. 132142, Feb. 2011. [35] D. Wang, N. Canagarajah, and D. Bull, S frame design for multiple description video coding, in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS2005), May 2005, vol. 3, pp. 27192722. [36] R. Puri, K. Won Lee, K. Ramchandran, and V. Bharghavan, An integrated source transcoding and congestion control paradigm for video streaming in the internet, IEEE Trans. Multimedia, vol. 3, no. 1, pp. 1832, Mar. 2001. [37] T. Gan, L. Gan, and K.-K. Ma, Reducing video-quality uctuations for streaming scalable video using unequal error protection, retransmission, and interleaving, IEEE Trans. Image Process,, vol. 15, no. 4, pp. 819832, Apr. 2006. [38] A. E. Essaili, S. Khan, W. Kellerer, and E. Steinbach, Multiple description video transcoding, in Proc. IEEE Int. Conf. Image Processing (ICIP), 2007, vol. 6, pp. 7778. [39] I. K. Kim, N. I. Cho, and J. Nam, Error resilient video transcoding based on the optimal multiple description of DCT coefcients, in Proc. Int. Workshop Advanced Image Technology, 2003. [40] I. K. Kim and N. I. Cho, Video transcoding for packet loss resilience based on the multiple descriptions, in Proc. Int. Conf. Multimedia and Expo (ICME), 2006, pp. 109112. [41] T. Shanableh, T. May, and F. Ishtiaq, Error resiliency transcoding and decoding solutions using distributed video coding techniques, Signal Process.: Image Commun., vol. 23, pp. 610623, 2008. [42] R. Ma and F. Labeau, Error-resilient multiple description coding, IEEE Trans. Signal Process., vol. 56, no. 8, pp. 39964007, Aug. 2008.
935
[43] D. Lefol, D. Bull, and N. Canagarajah, Performance evaluation of transcoding algorithms for H.264, IEEE Trans. Consum. Electron., vol. 52, no. 1, pp. 215221, Feb. 2006. [44] W. Ye-Kui, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, System and transport interface of SVC, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 11491163, Sep. 2007. [45] C. Lamy-Bergot and B. Gadat, Embedding protection inside H.264/AVC and SVC streams, EURASIP J. Wireless Commun. Netw., vol. 2010, 2010. [46] Z. Li, J. Chakareski, X. Niu, Y. Zhang, and W. Gu, Modeling of distortion caused by Markov-model burst packet losses in video transmission, in Proc. IEEE Int. Workshop Multimedia Signal Processing (MMSP), Oct. 2009, pp. 16.
Pedro A. Assuncao (M98) received the Licenciado and M.Sc. degrees in electrical engineering from the University of Coimbra, Coimbra, Portugal, in 1988 and 1993, respectively, and the Ph.D. degree in electronic systems engineering from the University of Essex, Essex, U.K., in 1998. He is currently a Professor of Electrical Engineering and Multimedia Communication Systems at the Polytechnic Institute of Leiria and researcher at the Instituto de Telecomunicaes, Coimbra, Portugal. He is author/co-author of more than 70 scientic/technical papers, three book chapters, and three U.S. patents. His current research interests include 2-D/3-D video coding, adaptation to diverse networking and user environments, multiple description coding, power-aware video coding, audiovisual error concealment, and perceptual quality evaluation.
Pedro Correia received the Licenciado degree in electrical engineering and the M.Sc. degree from the University of Coimbra, Coimbra, Portugal, in 1996 and 2003, respectively. He is currently pursuing the Ph.D. degree at the University of Coimbra. Since 1999, he has been with the Polytechnic Institute of Tomar, Tomar, Portugal. His research activities are carried out at the Instituto de Telecomunicaes, Coimbra, Portugal. His research interests include image and video multiple description coding, rate control, and multipath network adaptation for multimedia communications.
Vitor Silva received the Licenciado and Ph.D. degrees in electrical engineering from the University of Coimbra, Coimbra, Portugal, in 1984 and 1996, respectively. He is currently an Auxiliary Professor in the Department of Electrical and Computer Engineering, University of Coimbra. His research activities in signal processing, image and video compression, coding theory, and parallel computing are mainly carried out at the Instituto de Telecomunicaes, Coimbra, Portugal, where is the head of the Multimedia Signal Processing group. He has contributed to more than 100 papers in journals and international conferences.

VTJMM03

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

VTJMM03

Enviado por

Direitos autorais:

Formatos disponíveis

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO.

Multiple Description of Coded Video for Path Diversity Streaming Adaptation

1520-9210/$31.00 2012 IEEE

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

Fig. 2. Classic MDVS scheme.

TABLE I INDEX ASSIGNMENT WITH

Fig. 1. MDVS application scenario.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

Fig. 5. Two-loop MDVS architecture.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

Fig. 6. Single-loop MDVS architecture.

Fig. 7. Equivalent single-loop MDVS architecture.

(15) Considering the linearity of prediction then , (16)

Fig. 8. PSNR of intra frame macroblocks.

Fig. 9. PNSR for MC predicted frames (IPPP

) for coastguard and foreman.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

TABLE II PSNR VERSUS SIDE INFORMATION REDUNDANCY (ONE DESCRIPTION)

Fig. 10. PNSR for generic regular GOP (IPBBP

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

Fig. 17. Frame-by-frame PSNR for bus sequence.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 3, JUNE 2012

Você também pode gostar