A Unified Transcoding Approach To Fast Forward and Reverse Playback of Compressed Video

1098
IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, NOVEMBER 2003
A Unified Transcoding Approach to Fast Forward and Reverse Playback of Compressed Video
Yap-Peng Tan, Member, IEEE, and YongQing Liang
Abstract We propose a unified approach for realizing fastforward and fast-reverse playback of a pre-coded video by generating a new and lower-bitrate video through video transcoding. To reduce the computational complexity, we develop fast algorithms to estimate the motion vectors required for transcoding the video. To accommodate changes due to frame-skipping and to sustain satisfactory coding efficiency, we adaptively alter the group-of-pictures structure of the transcoded video to suit for different playback speeds. Subjective tests are conducted to assess the perceived quality of video played back at various fast speeds. Experimental results and complexity analysis are presented in this paper to show the efficacy of the proposed approach . Index Terms Fast video playback, motion estimation, intra-coding refreshment, video transcoding.
I.
There are a number of ways that a pre-coded video can be processed for fast forward and reverse playback over a network. The most straightforward one is to first fully decode the video in the video server, select the frames required for fast playback at the desired speed, and deliver only these frames to the clients player for display. For example, if the video is to be played back at three times the normal speed, only one out of three frames needs to be selected for display. To reduce the data size, the selected frames can be re-encoded before transmission. This scheme is generally known as the cascaded video re-encoding, and its schematic block diagram is shown in Fig. 1. The scheme, however, is not suitable for many realtime applications, as the re-encoding process, which involves full-fledged motion estimation for inter-coded frames, is computationally expensive [6].
P re -co d e d vi e o d D e co d e r F ra m e S e l cti n e o T ra n sco d e d vi e o d E n co d e r
INTRODUCTION
Fast-forward and fast-reverse video playbacks are two common browsing functions existing in many conventional video players. They allow users to find and access video segments of interest by scanning through a video at a faster than normal playback speed. It is, however, technically challenging and complex to incorporate these two functions in todays digital video systems, such as DVD players and online video-on-demand portals [1]-[3]. The reason is that most popular video compression standards, including the MPEG1/2/4 and H.26x series, make use of a motion-compensated prediction scheme to reduce video temporal redundancy for high compression efficiency [4]. Owing to this inter-frame dependency, a predictive-coded frame (or inter-coded frame) cannot be decoded prior to any of its reference frames; hence, direct access to the frame for fast video playback is difficult. Besides, many of these standards are designed in a way that only minimum processing resources are required if a compressed video is decoded for playback in a pre-determined frame order; decoding the video in any other order would require more computation or memory resources and incur longer processing delay. For example, to decode a compressed video in the reverse order would require additional frame memory to store more intermediate frames during the decoding process [5]. The problem becomes even more involved when fast video browsing is to act on remote video sources over bandwidth-limited transmission channels. It is this problem that the approach proposed in this paper is concerned with.
Fig. 1. Cascaded video re-encoding scheme.
This work was supported in part by a SingTel-NTU Joint R&D grant. Yap-Peng Tan is with the School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore (email: eyptan@ntu.edu.sg). Yongqing Liang was with the School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore. He is now with the National University of Singapore (email: tsllyq@nus.edu.sg). Contributed Paper Manuscript received April 17, 2003 0098 3063/00 $10.00 2003 IEEE
To minimize the computational cost, several approaches have been proposed in the literature for the realization of these two video browsing functions [5], [7]. To facilitate reverse playback, Chen et al. [5] propose to convert all the predictivecoded frames (P-frames) of MPEG videos to Intra-coded frames (I-frames), thus eliminating the inter-frame dependencies between P-frames and I-frames. This approach demands additional computational and storage resources to convert and store the I-frames. In [7], Lin et al. propose to encode each video sequence in both forward and reverse orders in a way that the intra-coded frames of the two encoded streams are interleaved with each other. Fast-forward or fastreverse playback of this video is then actualized by selecting the required frames from these two encoded streams while minimizing some predefined cost, such as the processing cost at the clients decoder or the data size to be delivered over the transmission network. The method, however, requires extra storage for the two encoded streams and more than necessary frames are to be delivered over the network and processed by the decoder. To overcome the problems mentioned above, we propose to realize fast-forward and fast-reverse playback of a pre-coded video by using frame-skipping video transcoding with fast motion estimation and intra-coding refreshment schemes. Although the frame-skipping video transcoding method proposed in [8] could be extended to support fast-forward video playback, it does not apply to fast-reverse playback nor consider the quality requirements for various playback speeds and the needs for altering the transcoded group-of-pictures
Y.-P. Tan and Y.Q. Liang: A Unified Transcoding Approach to Fast Forward and Reverse Playback of Compressed Video
1099
fast-forward
I
0
P
4
P
8
I
12
P
16
frames restored from I pre-coded video fast-reverse pre-coded video
10 11 12 13 14 15 16 17
P P P P P P P P I
14 13 12 11 10 9 8
P P P P P P P P
7 6 5 4 3 2 1 0
17 16 15
Rate Control
P
16
I
12
P
8
P
4
I
0
Video Decoder existing motion vectors
only selected frames are transcoded
DCT
Quantizer Inverse Quantizer
Entropy Encoder transcoded video
Inverse DCT Fast Motion Estimation motion vectors for transcoded video Motion Compensation Frame Memory
Fig. 2. The proposed video transcoder for generating new compressed video for fast forward and reverse playback.
(GOP) structure. Even though the method proposed in [9] for reverse playback of MPEG video at the normal speed can be applied to support fast-reverse playback, it would impose a heavy processing burden on both the transmission network and the decoder, as the video needs to be delivered and decoded at some multiple times of the original frame rate. For example, if the original frame rate is 30 frames/s and the target playback speed is four times the normal speed, the method needs to process the video at 120 frames/s. This is not an efficient or viable solution for systems with limited access bandwidth or minimum computation resources. Furthermore, such a high frame rate is usually not necessary because human visual system has only limited discerning ability to fast-moving video content [10]. It appears that no existing work has explored the possibility of using transcoding approach for fast-forward and fast-reverse video playback over a network. Hence, the main objective of this paper is to show that such approach is viable, and the quality degradation incurred by fast video transcoding is below the perceptible level of the human visual system, as evidenced by the subjective test results reported in this paper.
II. THE PROPOSED VIDEO TRANSCODER
been restored by the front-end decoder. In each example, the target playback speed is four times the normal speed. Hence, only one out of every four pre-coded frames needs to be transcoded at the video server, sent over the network, and processed by the clients decoder. The main functions of the proposed transcoder are explained below. A. Change of GOP Structure Consider again the examples shown in Fig. 2. The GOP structure of the pre-coded video consists of one intra-coded frame (I-frame) followed by eight predictive-coded frames (Pframes). To fast forward or reverse the video at four times the normal speed, the GOP structure is changed to one I-frame followed by two P-frames or one I-frame followed by one Pframe; that is, the GOPs of the transcoded video may not have the same number of frames. In general, a frame is transcoded as an I-frame if it is the first frame selected from a new GOP in the pre-coded video, for there are no existing motion vectors carrying the motion information between this frame and a previously encoded frame. Hence, we use the following rule to determine whether a frame is to be transcoded as an I-frame or P-frame. Let L denote the total number of frames in a GOP of the pre-coded video, the speed-up factor of the fast video playback, and k the frame number of a pre-coded frame to be transcoded. Note that for simplicity of notation in this paper, the pre-coded frames are re-numbered in the reverse order when they are to be transcoded for fast-reverse playback, as shown in Fig. 2. As only one out of every frames needs to be transcoded, it follows that frame k is transcoded as an Iframe if k mod L < ; otherwise, it is transcoded as a P-frame. B. Fast Motion Estimation Owing to the change of GOP structure and the correlation between two consecutive frames to be transcoded, new motion vectors are required for motion-compensated prediction in the transcoded video. To avoid using computationally expensive block-matching operation to estimate the required motion vectors, good approximation can be obtained by interpolating the existing motion vectors from the pre-coded video. Based in part on the technique proposed for frame-skipping video transcoding [8] and the method for estimating the reverse
We propose to realize fast-forward and fast-reverse playback of a pre-coded video by producing a new and lowerbitrate video using efficient frame-skipping video transcoding. Different from existing methods, the proposed approach i) transcodes only the frames required for playback at the desired fast speed, ii) employs a fast algorithm to estimate the motion vectors required for transcoding the video, and iii) makes use of an adaptive intra-coding refreshment scheme to alter the video's GOP structure when necessary. Fig. 2 shows the block diagram of the proposed video transcoder. It consists of a typical video decoder followed by a customized video encoder. Note that as the motion vectors required to transcode the video are estimated directly from the motion vectors existing in the pre-coded video, the full-fledged motion estimation, arguably the most computationally expensive operation in an encoder, is not required in the proposed transcoder. Also shown in Fig. 2 are examples of the proposed fast-forward and fast-reverse video transcoding processes, where frames 0 to 17 of a pre-coded video have
1100

M acrobl ocks i ved i predi ng nvol n cti m acrobl M B (p ock ;k-+1)
motion vectors for MPEG video [9], we have devised a unified transcoding approach for fast forward and reverse video playback, and evaluated four fast motion estimation algorithms in-place, area-weighted average, maximum-overlap, and median as detailed below. Let MB(p; k ) denote the macroblock with its top-left corner pixel located at p = (x, y) in frame k, mv (p; k ) the existing motion vector of macroblock MB(p; k ) in the pre-coded video, mv F ( ) (p; k ) and mv R ( ) (p; k ) the motion vectors to be estimated for macroblock MB(p; k ) in the transcoded videos for fast-forward and fast-reverse playbacks, respectively, with speed-up factor (i.e., only one out of frames is transcoded), and A(p; k ) (q; l ) the portion (in terms of pixel count) of macroblock MB(q; l ) involved in predicting or referencing macroblock MB(p; k ) in the pre-coded video. Simple in both concept and computation, the in-place algorithm estimates the motion vector for macroblock MB(p; k ) in the transcoded video as follows: 1) In-place mv
F ()
MB(p;k)
m v F ()(p ;k) m v F (-1)(p ;k)
M B (p ;k)
mv
F (1 )(p ;k-+1)
k -
k -+1
(a) fast-forward transcoding
mvR(-1)(p;k) mv R()(p;k)
Macroblocks involved in referencing macroblock MB( p;k-+1) mv R(1)(p;k-+1)
k - +1
(b) fast-reverse transcoding
k-
Fig. 3. Recursive motion estimation for fast-forward and fast-reverse video playbacks.
macroblock MB(p ; k ) in the pre-coded video. The required motion vector is interpolated from the existing motion vectors of macroblocks in set Q F or Q R , each weighted by the area of
(p; k ) =
t = k +1
mv (p; t ), mv (p; t ),
k
(1)
mv R ( ) (p; k ) =
its portion involved in predicting or referencing macroblock MB(p ; k ) . The reason is that the larger the portion of the predicting or referencing macroblock, the more influence it has over the motion-compensated residue of macroblock MB(p ; k ) ; hence, its motion vector should be given more emphasis. 3) Median
mv F (1) (p; k ) = + mv (p F ; k ), mv R (1) (p; k ) = mv (p R ; k 1).
t = k +1
where the motion vectors of macroblocks in the same location, p, are accumulated over the skipped frames. For the other three algorithms, we use the following recursive formulas to estimate the required motion vectors. mv F ( ) (p; k ) = mv F ( -1) (p; k ) + mv F (1) (p + mv F ( 1) (p; k ); k + 1), mv R ( ) (p; k ) = mv R ( -1) (p; k ) + (2)
(4)
and
p F = arg min mv (p; k ) mv (q; k ) , pQ F qQ F p R = arg min mv (p; k 1) mv (q; k 1) pQ R qQ R
mv R (1) (p + mv R ( 1) (p; k ); k + 1). Fig. 3 shows a schematic illustration on how the proposed recursive estimation of motion vector is performed. The main differences among these three following algorithms lie in how mv F (1) (p; k ) and mv R (1) (p; k ) are computed, as given below. 2) Area-weighted average
mv F (1) (p; k ) = mv R (1) (p; k ) = + qQ F mv (q; k ) A(p ; k ) (q; k )
(5)
,
where
denotes the Euclidean norm, and the algorithm
selects the motion vector that is most similar, on average, to all the motion vectors of macroblocks in set Q F or Q R .
,
qQ F
A(p ; k ) (q; k ) A(p ; k ) (q; k 1)
qQ R mv (q; k 1) A(p ; k ) (q; k 1)
(3)
,
4) Maximum-overlap
mv F (1) (p; k ) = + mv (p F ; k ), mv R(1) (p; k ) = mv (p R ; k 1),
qQ
(6)
where Q F = {q : A(p;k ) (q; k ) > 0} and Q R = {q : A(p;k ) (q; k 1) > 0} are sets containing the indices of the macroblocks in frame k (for fast-forward transcoding) or frame k 1 (for fastreverse transcoding) involved in predicting or referencing
and
p F = arg max A(p ; k ) (q; k ),
qQ F
p R = arg max A(p ; k ) (q; k 1),

qQ R
(7)
1101
where the algorithm selects the motion vector of the macroblock which has the largest portion involved in predicting or referencing macroblock MB(p ; k ) in the precoded video. Note that if all the macroblocks involved in estimating the new motion vector are intra-coded, the current macroblock will also be intra-coded. Furthermore, the motion vectors obtained for fast playback at one speed-up factor can be reused to estimate the motion vectors for a higher speed-up factor. C. Intra-coding Refreshment When the target playback speed increases, more frames are to be skipped and the correlation between two consecutive frames to be transcoded generally reduces. Consequently, more macroblocks cannot be well predicted from their reference frame, particularly for video with large and complex motion. When there are many such macroblocks, the frame can be transcoded more efficiently (i.e., with less number of bits) by intra-coding scheme rather than inter-coding scheme. In the proposed transcoder, the motion-compensated residue of a frame is evaluated to decide whether the frame is to be intraor inter-coded. For simplicity, we only provide the decision rule for fast-forward video transcoding in the following. Similar rule can be used for fast-reverse video transcoding. Let
SAD mv F ( ) (p; k ) = M (p; k ) = D(p; k ) =
III. SUBJECTIVE QUALITY TEST As most transcoding processes introduce coding distortion, some loss of video quality is unavoidable, especially when the target bitrate is low and a fast motion estimation algorithm is employed in the transcoding process. It is therefore necessary to examine whether such quality degradation can be perceived by human eyes when a video is played back at a faster-thannormal speed. We conducted a series of subjective tests to assess the effect of fast video playback on human visual acuity. To simulate the perceived quality of video playback at various fast speeds, we temporally downsampled the original frame sequence of each test video at the desired fast speeds, and then used a standard H.263 encoder to compress the sampled frames at average peak signal-to-noise ratios (PSNR) ranging from 25 dB to 39 dB. For each fast speed tested, the sequence with PSNR equal to 39 dB was used as the assessment reference (or benchmark) for comparison. Five subjects with normal visual acuity participated in the tests. They were research students in engineering disciplines and three of them had conducted experiments involving visual stimuli. The subjective tests were conducted in a well-lighted laboratory. A computer system with a 17-inch high-quality LCD display was used to show the test sequences at various fast speeds. The test sequences were displayed in CIF format on the center of a mid gray screen. The subjects sat approximately 20 inches from the display so that they could clearly see the video details. Three test videos (Foreman, News, and Stefan) were used in the subjective tests and the DSCQS (Double Stimulus Continuous Quality Scale) test method of ITU-R BT.500-8 [11] was adopted. At the beginning of a test session, each subject was presented with a few trial videos of different qualities so that the subject could establish his/her internal sensitivity scale for a reliable judgment. During the test, a series of sequence pairs, each pair consisting of one 39-dB reference sequence and one impaired sequence at a lower PSNR value, were show alternatively, twice, at each fast playback speed. Fig. 4 shows the display order of the test sequence pairs. At the end of the two display cycles, the subject was asked to decide which sequence appeared to have a lower quality. For each test video, the minimum PSNR degradation perceptible to a subject was determined from the pair of sequences which have the smallest PSNR difference that could be correctly identified by the subject. To obtain reliable and reproducible results, the test sequences were shown to each subject not in any particular order, and some tests were randomly repeated without the subject's awareness. Table I lists the average minimum PSNR degradations of the three test videos perceptible to the five subjects at various fast playback speeds. A few observations from the results are in order: i) The faster the playback speed, the larger the minimum PSNR degradation that could be perceived by the
n N
I(p + n; k ) I(p + mv F ( ) (p; k ) + n; k ),
1 I(p + n; k ) , N nN
n N
I(p + n; k ) M (p; k ) ,
where
SAD mv F ( ) (p; k )
differences MB(p + mv
between
denotes the sum of absolute macroblocks MB(p; k ) and
F ()
(p; k ) + n; k ) , N = {(i, j ) : i, j = 0,1, K ,15},
N = 256 , M (p; k ) is the mean intensity value of macroblock

MB(p; k ) , and I(p; k ) is the intensity value of pixel
p in
frame k of the pre-coded video. For a given frame k, intracoding is preferred over inter-coding, if
1 P
pP
1{(SAD(mv F ( ) (p; k )) D(p; k )) > C}> THD,
where 1{} is an indicator function which takes value 1 or 0 depending on whether or not the argument condition is true, set P contains all macroblocks in the transcoded frame, |P| denotes the total number of these macroblocks, and C and THD are two pre-determined thresholds. In our experiments, C and THD are empirically set to 500 and 0.2, respectively, based on the results of a set of test sequences.
1102
Fig. 4. Display order of sequence pairs in subjective tests.
subjects; ii) When the speed-up factor increases beyond a certain value, the minimum perceptible PSNR degradation changes slowly; and iii) Video with content of high and complex motion can withstand a larger PSNR degradation before the visual impairment (or distortion) becomes perceptible to the subjects. For example, the Stefan test video (a tennis video with relatively large motion) can withstand a larger PSNR degradation when compared to the News test video. The results of the subjective tests are consistent with the fact that the human visual acuity reduces with the increase of video motion [10]. The results also support our proposal that a fast video transcoding method can be used to realize fastforward and fast-reverse video playback with little or no apparent loss of quality, as long as the degradation incurred is smaller than what is perceptible to human observers.
TABLE I THE AVERAGE MINIMUM PSNR DEGRADATIONS (IN dB) PERCEPTIBLE TO
FIVE SUBJECTS AT VARIOUS FAST PLAYBACK SPEEDS
speed with the use of different motion estimation algorithms. The PSNR result for each transcoded video was computed by comparing each transcoded frame with its original, uncompressed frame. The average PSNR values obtained by using the full-search, maximum-overlap, area-weighted average, median and in-place algorithms are 27.95 dB, 26.99 dB, 26.74 dB, 25.53 dB and 26.18 dB, respectively. Based on the results of subjective tests obtained in Section III, the average 1-dB difference between the videos transcoded by the full-search and maximum-overlap algorithms is not likely to be visible when the videos are played back at this fast speed. This is confirmed by an independent subjective test comparing these two transcoded videos.
Table II THE PRE-CODED BITRATES AND TRANSCODED BITRATES FOR THE TEST
VIDEOS USED IN THE EXPERIMENTS
Speed-up factor Foreman News Stefan
Test video Pre-coded bitrate (Kbps) Transcoded bitrate (Kbps)
Foreman
512 256
Stefan
1500 512
M&C
2048 768
BBC
1500 768
1
4.5 2.1 5.5
2
5.1 3.0 6.3
3
5.6 3.3 9.0
4
6.0 3.6 10.5
5
6.9 4.0 11.3
6
7.1 4.2 12.8
7
7.1 4.6 13.2
8
7.1 4.9 13.8
9
7.4 5.4 14.0
30 29 28 27 26
T e st v i e o "B B C " d
IV.
EXPERIMENTAL RESULTS
Experimental results are presented in this section to show the efficacy of the proposed approach. The first experiment was conducted to evaluate the performance of various fast motion estimation algorithms. Four test videos Foreman, Stefan, Mobile & Calendar (M&C), and BBC were precoded as H.263 bit streams in CIF format, 30 frames/s, and IPPP L GOP (group-of-pictures) structure. Each GOP had 60 frames. The motion vectors in the pre-coded video were obtained by using a full-search, block-matching motion estimation algorithm with a search window of 31 31 pixels at full-pixel accuracy. The pre-coded videos were then transcoded to various fast-forward and fast-reverse videos of lower bitrates with speed-up factors ranging from 1 to 10. For comparison, the full-search motion estimation algorithm (fullsearch) and the fast motion estimation algorithms under examination were used to obtain the motion vectors of the transcoded video, respectively. Table II summarizes the precoded bitrates and the transcoded bitrates for the four test videos. Fig. 5 shows the PSNR results of the BBC test video transcoded for fast-reverse playback at three times the normal
P S N R (d B )
25 24 23
f l-sea rc h ul m ax-overl ap area-w eighted avg m edian i -p l n ace 6 11 16 21 26
fra m e n u m b e r 31 36
Fig. 5. PSNRs of the BBC test video transcoded by different motion estimation algorithms for fast-reverse playback at three times the normal speed.
Fig. 6 shows the results of a sample BBC frame transcoded by using the full-search and the maximum-overlap algorithms at about the same bitrate. The PSNR values for these two transcoded frames are 29.17 dB and 27.75 dB, respectively. Although it may be possible to tell the visual difference between these two still frames at close inspection, it is rather difficult to do so when the videos are decoded for display, resulting in a fast-reverse playback at three times the normal speed. Hence, instead of the full-search algorithm, which is computationally expensive, the maximum-overlap algorithm can be used to reduce the time required to transcode the video.
1103
TABLE III AVERAGE PSNRS (IN dB) OF THE TEST VIDEOS TRANSCODED BY VARIOUS FAST MOTION ESTIMATION ALGORITHMS AT A SPEED-UP FACTOR OF 2 Sequence Foreman Stefan M&C BBC Full search Max. overlap Area-wght. average Median Inplace
31.51 27.25 24.33 27.33
30.65 26.56 23.53 26.05
29.75 26.18 23.17 25.63
29.63 26.33 23.31 25.47
30.62 26.44 23.53 25.31
(a) Fast-forward at two times the normal speed. Sequence Foreman Stefan M&C Full search Max. overlap Area-wght. average Median Inplace
31.23 26.94 24.25 27.28
30.86 26.10 23.04 26.20
30.17 25.84 22.81 25.82
30.34 25.86 22.66 26.09
30.86 26.10 23.03 26.17
(a)
BBC
(b) Fast-reverse at two times the normal speed.

31 30 29 28 27 26 25 24 23 22 1 2 3 4 5 6 7 8 9 10 PSNR (dB)
Fast forward: test video "Stefan"
Full-search Intra-coding Proposed

speed-up factor
27 26 25 24
F a st re v e rse : te st v id e o "M & C "

P S N R (d B )
(b)
Fig. 6. A sample frame of the BBC test video transcoded by using the (a) full-search (29.17 dB) and (b) maximum-overlap (27.75 dB) motion estimation algorithms.
F u l-search l I tra-co d i g n n
23
P ro p o sed
Table III shows the average PSNR results of the four test videos transcoded at a speed-up factor of 2 using the fullsearch and the fast motion estimation algorithms under examination. The average PSNR difference between the fullsearch and the maximum-overlap algorithms for these four test videos is less than about 1 dB. It should be noted that although the maximum-overlap algorithm generally outperforms the in-place algorithm, they have the similar performance if the speed-up factor is small or the video mainly consists of content of small motion. This is because the reference macroblocks, and thus the motion vectors, selected by these two algorithms in the case of small motion are mostly the same. Therefore, videos with primarily slow-moving scenes can be transcoded by the in-place algorithm for a lower computational complexity, while videos with fast and complex motion by the maximum-overlap algorithm for a better video quality.
22 21 1 2 3 4 5 6 7 8
sp e e d -u p fa cto r
9 10
Fig. 7. Average PSNRs of the transcoded fast-forward Stefan video and fast-reverse Mobile & Calendar video at various fast speeds obtained by re-encoding scheme with full-search motion estimation, pure intra-coding scheme, and the proposed transcoding approach.
The second experiment was conducted to test the performance of the proposed adaptive intra-coding refreshment scheme. In the experiment, the Stefan and Mobile & Calendar test videos were transcoded for fast-forward and fast-reverse playback with speed-up factors ranging from 1 to 10 by using the re-encoding scheme with full-search motion estimation algorithm, pure intra-coding scheme (i.e., all the selected frames are intra-coded), and the proposed transcoding approach using adaptive the intra-coding refreshment scheme
1104
coupled with the maximum-overlap motion estimation algorithm, respectively. Fig. 7 shows the average PSNR results obtained. It can be seen that the re-encoding scheme performs the best at low speed-up factors, while pure intra-coding algorithm could perform better at high speed-up factors. The proposed transcoding approach performs close to the reencoding scheme at low speed-up factors and similar to that of the pure intra-coding algorithm at high speed-up factors. In other words, our proposed approach can perform nearly optimal at all speed-up factors. V. COMPLEXITY ANALYSIS As transcoding is to be performed prior to video transmission, its computational complexity is of much concern for the application of fast forward and reverse video playback. In this section, we compare the computational complexity of the proposed approach with that of the cascaded video reencoding scheme (Fig. 1). Table IV shows the number of arithmetic operations required to transcode one predictivecoded frame (P-frame) of a pre-coded CIF video for fastforward or fast-reverse playback at four times the normal speed. For the proposed transcoding approach, both inplace and maximum-overlap fast motion estimation algorithms described in Section II are used in the analysis; for the cascaded re-encoding scheme, both full-search and threestep search [12] motion estimation algorithms using a search window of 31 31 pixels at full-pixel accuracy are used as benchmark. The details of the analysis are shown in Appendix. It can be seen from Table IV that most of the computations required by the cascaded re-encoding scheme come from motion estimation. With the use of a fast motion estimation algorithm (i.e., in-place or maximum-overlap), the proposed transcoding approach is about 27 times faster than the cascaded re-encoding scheme using full-search motion estimation and about 2 times faster than that using three-step search motion estimation. VI. CONCLUSION We have proposed in this paper an efficient transcoding approach for realizing fast-forward and fast-reverse playback of pre-coded videos over a bandwidth-limited network. In particular, a unified framework has been developed for estimating the motion vectors required to transcode the videos for fast forward and reverse playback at various speeds, and several fast motion estimation algorithms have been evaluated. To account for the property of human visual system, subjective tests have also been conducted to assess the minimum quality degradation that is perceptible to human eyes when a video is played back at various fast speeds. To the end, an efficient video transcoding approach combining the merits of intracoding refreshment and inter-coding using a fast motion estimation algorithm is obtained. Experimental results and complexity analysis are reported to show the efficacy of the proposed approach.
APPENDIX The number of arithmetic operations required for each main function in Table IV is obtained as follows: For DCT/IDCT, the fast algorithms proposed by Arai et al. in [13] require 80 multiplications and 464 additions for each 8 8 pixel block. In addition, 64 divisions or multiplications are required to perform quantization or inverse quantization. Assuming that a division costs as much as a multiplication, the DCT/IDCT and quantization/ inverse quantization for each 8 8 block requires 144 multiplicative and 464 additive operations. Motion compensation (or its inverse) for each 16 16 macroblock (four 8 8 luminance blocks and two 8 8 chrominance blocks in 4:2:0 chroma sub-sampling) requires 6 8 8 = 384 additions. To perform full-search (FS) motion estimation within a search window of 31 31 pixels at full-pixel accuracy, there are 31 31 = 961 search locations to be examined. Assuming that the sum-of-absolute difference is used as the block-matching criterion, at each search location, 16 16 pixel pairs are compared, and each comparison requires three operationsa subtraction, an absolute-value calculation, and an addition. If it costs as much to perform a subtraction, an absolute-value calculation, or an addition, 961 16 2 3 = 738048 additive operations are required to perform full-search motion estimation for each macroblock. To perform three-step search (TSS) within a search window of 31 31 pixels at full-pixel accuracy, there are 1 + 8 log 2 (15 + 1) = 33 search locations to be examined. Hence, 33 16 2 3 = 25344 additive operations are required to perform TSS fast motion estimation for each macroblock. Referring to (1), 2 additions are required to compute the motion vector (both horizontal and vertical components) for each macroblock in the transcoded video for a speed-up factor of . If all the 396 macroblocks in a P-frame of CIF resolution are to be inter-coded, 396 2 4 = 3168 additions are required for transcoding the frame for 4 times the normal playback speed. With reference to (6) and (7), let |Q| denote the number of macroblocks in set Q (QF or QR). In addition to one multiplication (to compute the area of the involving portion) and two additions/subtractions (to compute the two dimensions of the involving portion) that are required for each macroblock in set Q , |Q| - 1 comparisons are required to identify from set Q the macroblock that has the largest portion involved in predicting or referencing macroblock MB(p ; k ) . Another two additions are needed for (2) to accumulate the motion vector in each recursion step. Assuming that a comparison costs as much as an addition, overall 3|Q| + 1 additive and |Q| multiplicative operations are required for each macroblock in each recursion step. For fast-forward transcoding, there are at most four macroblcoks in set QF. For fast-reverse transcoding, the number of referencing macroblocks in set QR of (7) could vary from
1105
macroblock to macroblock and depends on the magnitudes of pre-coded motion vectors. For the sake of simplicity and without loss of generality, we shall assume the average number of macroblcoks in set QR is equal to four. Hence, if all the 396 macroblocks in a P-frame of CIF resolution are to be inter-coded, 396 4 4 = 6336 multiplicative and 396 (3 4 + 1) 4 = 20592 additive operations are required to perform the maximum-overlap motion estimation algorithm for 4 times the normal playback speed. ACKNOWLEDGMENT The authors would like to thank Ms. Yu Jun for her assistance in conducting the subjective tests.
[8] [9] [10] [11] [12] [13]
J.-N. Hwang et al., Dynamic frame-skipping in video transcoding, IEEE Workshop on Multimedia Signal Processing, pp. 616-621, 1998. S. Wee, Reversing motion vector fields, Intl. Conf. on Image Processing, pp. 209-212, Oct. 1998. B. A. Wandell, Foundations of Vision, Sinauer Associates, Inc. 1995. ITU-R, Recommendation BT.500-8, Methodology for the subjective assessment of the quality of television pictures, Sept. 1998. T. Koga, K. Iinuma, A. Hirano, Y. Iijima, and T. Ishiguro, "Motioncompensated interframe coding for video conferencing", IEEE National Telecommunication Conf., vol. 4, pp. G5.3.1--G5.3.5, Nov. 1981. Y. Arai , T. Agui, and M. Nakajima, A Fast DCT-SQ Scheme for Images, Trans. of IEICE, vol E71, no. 11, pp. 1095-1097, Nov. 1988.
REFERENCES
[1] [2] [3] [4] [5] [6] [7] T. D. C. Little and D. Venkatesh, Prospects for interactive video-ondemand, IEEE Multimedia, vol. 13, pp. 14-24, Aug. 1994. D. Deloddere, W. Verbiest, and H. Verhille, Interactive video on demand, IEEE Communications Magazine, vol 32, no. 5, pp. 82-88, May 1994. E. L. Abram-Profeta and K. G. Shin, Providing unrestricted VCR functions in multicast video-on-demand servers, Intl. Conf. Multimedia Computing and Systems, pp. 66-75, July 1998. B. G. Haskell et al., Image and video coding emerging standards and beyond, IEEE Trans. Circ. and Syst. Video Tech., vol. 8, pp. 814-837, Nov. 1998. M.-S. Chen and D. D. Kandlur, Stream conversion to support interactive video playout,'' IEEE Multimedia, vol. 3, no. 2, pp. 51-58, Summer 1996. B. Shen, I. K. Sethi and B. Vasudev, Adaptive motion-vector resampling for compressed video downscaling, IEEE Trans. Circ. and Syst. Video Tech., vol. 9, no. 6, pp. 929-936, Sept. 1999. C.-W. Lin et al., MPEG video streaming with VCR-functionality, IEEE Trans. Circ. and Syst. Video Tech., vol. 11, pp. 415-425, Mar. 2001.
Yap-Peng Tan (M97) received the B.S. degree in electrical engineering from National Taiwan University, Taiwan, R.O.C., in 1993, and the M.A. and Ph.D. degrees in electrical engineering form Princeton University, Princeton, New Jersey, in 1995 and 1997, respectively. He was the recipient of an IBM Graduate Fellowship from IBM T. J. Watson Research Center, Yorktown Heights, New York, from 1995 to 1997. He was with Intel and at Sharp Labs of America from 1997 to 1999. Since November 1999, he has been a faculty member at the Nanyang Technological University of Singapore. His current research interests include image and video processing, content-based multimedia analysis, computer vision, and pattern recognition. He is the principle inventor on four US patents in the areas of image and video processing. Yongqing Liang received the B.S. degree from University of Science and Technology of China, in 1997, and the M.Eng degree from Nanyang Technological University, Singapore, in 2003. He is currently an associate scientist at National University of Singapore. His research interests include image/video processing, perceptual audio coding, communication, and multimedia systems.
TABLE IV THE NUMBER OF ARITHMETIC OPERATIONS REQUIRED TO TRANSCODE ONE P-FRAME OF A PRE-CODED CIF VIDEO FOR FAST FORWARD AND REVERSE
PLAYBACK AT FOUR TIMES THE NORMAL SPEED
Function
Input four P-frames at CIF resolution - Inverse Quantization + IDCT (144 muls., 464 adds. per 8 x 8 block) - Inverse motion compensation (256 adds. per macroblock) Output one P-frames at CIF resolution - Motion compensation (384 adds. per macroblock) - DCT + Quantization (144 muls, 464 adds. per 8 x 8 block) Motion Estimation Algorithm (a) Full search, 31 x 31 search window (747269 adds. per macroblock) (b) Three-step search, 31 x 31 search window (34565 adds. per macroblock) (c)in-place (8 adds. per macroblock) (d)maximum-overlap (16 muls., 72 adds. per macroblock)
Complexity (number of arithmetic operations) Multiplicative Additive Total* 1368576 4409856 608256 152064 1102464 292267008 10036224 6336 3168 20592 303671808 21441024 11407968 11444400
342144
Different Approaches 1. Cascaded re-encoding using full-search ME algorithm 1710720 298539648 2. Cascaded re-encoding using three-step-search ME algorithm 1710720 16308864 3. Proposed approach using in-place ME algorithm 1710720 6275808 4. Proposed approach using maximum-overlap ME algorithm 1717056 6293232 * Assume that an addition requires one arithmetic operation and a multiplication requires three arithmetic operations.

A Unified Transcoding Approach To Fast Forward and Reverse Playback of Compressed Video

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Unified Transcoding Approach To Fast Forward and Reverse Playback of Compressed Video

Enviado por

Direitos autorais:

Formatos disponíveis

1098

IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, NOVEMBER 2003

Fig. 1. Cascaded video re-encoding scheme.

frames restored from I pre-coded video fast-reverse pre-coded video

Video Decoder existing motion vectors

only selected frames are transcoded

Quantizer Inverse Quantizer

Entropy Encoder transcoded video

IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, NOVEMBER 2003

m v F ()(p ;k) m v F (-1)(p ;k)

Macroblocks involved in referencing macroblock MB( p;k-+1) mv R(1)(p;k-+1)

denotes the Euclidean norm, and the algorithm

A(p ; k ) (q; k ) A(p ; k ) (q; k 1)

qQ R mv (q; k 1) A(p ; k ) (q; k 1)

p R = arg max A(p ; k ) (q; k 1),

I(p + n; k ) I(p + mv F ( ) (p; k ) + n; k ),

denotes the sum of absolute macroblocks MB(p; k ) and

(p; k ) + n; k ) , N = {(i, j ) : i, j = 0,1, K ,15},

N = 256 , M (p; k ) is the mean intensity value of macroblock

1{(SAD(mv F ( ) (p; k )) D(p; k )) > C}> THD,

IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, NOVEMBER 2003

Fig. 4. Display order of sequence pairs in subjective tests.

Speed-up factor Foreman News Stefan

Test video Pre-coded bitrate (Kbps) Transcoded bitrate (Kbps)

f l-sea rc h ul m ax-overl ap area-w eighted avg m edian i -p l n ace 6 11 16 21 26

31.51 27.25 24.33 27.33

30.65 26.56 23.53 26.05

29.75 26.18 23.17 25.63

29.63 26.33 23.31 25.47

30.62 26.44 23.53 25.31

31.23 26.94 24.25 27.28

30.86 26.10 23.04 26.20

30.17 25.84 22.81 25.82

30.34 25.86 22.66 26.09

30.86 26.10 23.03 26.17

(b) Fast-reverse at two times the normal speed.

Fast forward: test video "Stefan"

Full-search Intra-coding Proposed

F a st re v e rse : te st v id e o "M & C "

IEEE Transactions on Consumer Electronics, Vol. 49, No. 4, NOVEMBER 2003

[8] [9] [10] [11] [12] [13]

Você também pode gostar