Motion Compensation Based Video Coder

Information Engineering and Technology Faculty German University in Cairo
Motion Compensation Based video coder on a DSP board
Bachelor Thesis
Author: Supervisor: Submission Date:
Mohamed Ismail Mohamed Dr.Gamal Fahmy 13 July, 2009
Information Engineering and Technology Faculty German University in Cairo
Motion Compensation Based video coder on a DSP board
Bachelor Thesis
Author: Supervisor: Submission Date:
Mohamed Ismail Mohamed Dr.Gamal Fahmy 13 July, 2009
II
This is to certify that: (i) The thesis comprises only my original work towards the Bachelor Degree
(ii)
Due acknowledgement has been made in the text to all other material used
Mohamed Ismail Mohamed 13 July , 2009
III
Abstract
The goal in video compression is to remove the redundancy in a video sequence while preserving its fidelity. Video sequence experiences both temporal and spatial redundancies, temporal due to correlation between consecutive frames in the sequence and spatial due to correlation between neighboring elements inside each frame. Motion estimation/compensation is used to predict frame for the issue of temporal redundancy, while transform coding as discrete cosine transform is used to remove spatial redundancy in visual data. Consequently, encoder uses fewer bits allowing a more efficient transmission and storage of the visual data. This thesis has two major purposes: (1) to design a hybrid motion compensated discrete cosine transform video coder based on the block matching algorithm. (2) is to investigate the effect of changing some of video coding parameters and strategies on the reconstructed videos visual quality and also on the coder complexity. Three video sequences were involved in the empirical part, videos contain different scenes with various specifications to demonstrate the results and to highlight the difference between each of the reconstructed sequences quality to the change of coding parameters. Analysis showed that using a smaller block of the frame to search for in the reference frame will always result in better quality, however this will also require dividing the frame into more blocks and will be more complex. Results also proved that searching in a larger region in the reference frame for a specific block, will give a better chance finding the best matching block. Furthermore, results illustrated the exceptions for the non improved quality for low motion videos when increasing the search region. In addition this thesis explores three different search strategies and distinguishes between each ones performance. Finally, testing the full coder took place, including the discrete cosine transform applied to frames that are unpredicted to minimize their encoded bits and to notice the effect on the reconstructed sequences quality.
IV
Dedication
To my Parents, Ismail Hafez and Safaa Moghazy I am grateful to my supervisor Dr.Gamal Fahmy for all the support during the process.
Contents
Chapter 1
1.1 1.2 1.3
Introduction....................................................................................... 1
Importance of Video Compression .................................................................................................. 1 Objective .......................................................................................................................................... 2 Methodology .................................................................................................................................... 3
1.4 Thesis Organization ......................................................................................................................... 4
Chapter 2
2.1 2.2 2.3
Background ........................................................................................ 5
Digital Video .................................................................................................................................... 6 Objective Video Quality .................................................................................................................. 7 Color Spaces .................................................................................................................................... 7 2.3.1 RGB ...................................................................................................................................... 8 2.3.2 YCbCr .................................................................................................................................. 8
2.4
Chroma Sub-Sampling ................................................................................................................... 10
2.5 Digital Video Formats and Applications......................................................................................... 11
Chapter 3
3.2
Video Compression Fundamentals ................................................ 13
3.1 Video Coding Standards ................................................................................................................ 14 MPEG-2 Coding Standard ............................................................................................................. 16 3.2.1 Group of Pictures ............................................................................................................... 18 3.3 Motion Estimation and Compensation ........................................................................................... 19
VI
Chapter 4
4.1 4.2
Block Matching Algorithm ............................................................. 21
Block Matching Algorithm Compression Criteria ......................................................................... 23 Search Algorithm for Motion Estimation ...................................................................................... 24 4.2.1 Full Search Block Matching Algorithms ............................................................................ 24 4.2.2 Fast Search Block Matching Algorithm ............................................................................. 25 4.2.2.1 Logarithmic Search Strategy ............................................................................. 25
Chapter 5
5.2
Transform Coding ........................................................................... 27
5.1 Two Dimensional Discrete Cosine Transform ............................................................................... 28 Quantization ................................................................................................................................... 29
Chapter 6 Analog Devices Hardware & Software Experience......................... 31

6.1 ADZS-BF561-EZLITE ................................................................................................................. 32 6.2 VisualDSP++ Release 5.0 .............................................................................................................. 33 6.3 Implementation and Testing........................................................................................................... 35
Chapter 7 Experiment and Analysis ................................................................... 37

7.1 7.2 Exact Procedure ............................................................................................................................. 37 Results for Different Schemes ....................................................................................................... 40
Chapter 8 Conclusion & Future Work ............................................................... 55

8.1 8.2 Conclusion ..................................................................................................................................... 55 Future Work ................................................................................................................................... 56
References58
VII
List of Figure
Figure .1 A Typical Video Encoder............................................................................................................... 4 1 Figure .1 Example for an image along with its RGB components .............................................................. 8 2 Figure .2 Example for an image along with its YCbCr components ........................................................... 9 2 Figure .3 Chroma Subsampling different versions .................................................................................... 10 2 Figure .1 MPEG Group of pictures ............................................................................................................ 18 3 Figure .2 Video codec with prediction ...................................................................................................... 19 3 Figure .3 Video codec with motion estimation and compensation .......................................................... 20 3 Figure .1 Block matching process.............................................................................................................. 22 4 Figure .2 Full search Raster and Spiral algorithms ................................................................................ 24 4 Figure .3 Fast search Logarithmic algorithm ........................................................................................... 26 4 Figure .1 2-D DCT performed on an 8x8 block of an image ..................................................................... 28 5 Figure .2 An image with the intensity map along with the compacted version ....................................... 29 5 Figure .3 Inverse DCT of Trees; (a) DCT(100%); (b) DCT(75%); (c) DCT(50%); (d) DCT(25%). ................... 30 5 Figure .1 Image for BF561 Hardware ........................................................................................................ 31 6 Figure .2 Connector Locations .................................................................................................................. 32 6 Figure .3 Visual DSP++ Release 5.0 ........................................................................................................... 34 6 Figure .4 Connection to Video In and Video Out devices ......................................................................... 36 6 Figure .1 PSNR for {p} and {B} predicted frames using Logarithmic search ....................................... 41 7 Figure .2 PSNR for {p} and {B} predicted frames using Raster full search .......................................... 42 7 Figure .3 PSNR for {p} and {B} predicted frames using Spiral full search........................................... 42 7
VIII
Figure .4 PSNR for predicted frames Foreman video using Raster full search different search 7 window size ............................................................................................................................. 43 Figure .5 PSNR for {p} and {B} predicted "Stephan video" frames using 'Logarithmic search ............. 44 7 Figure .6 PSNR for {p} and {B} predicted "Stephan video" frames 'Full search algorithms.................. 44 7 Figure .7 PSNR for {p} and {B} predicted "Stephan video" frames for different search windows 'Raster 7 full search algorithm ................................................................................................................ 45 Figure .8 PSNR for {p} and {B} predicted "Fish video" frames using 'Logarithmic fast search algorithm 7 .................................................................................................................................................. 46 Figure .9 PSNR for {P} and {B} predicted Fish video frames using Raster full search algorithm .... 46 7 Figure .10 PSNR for {p} and {B} predicted "Fish video" frames different search windows using 'Raster 7 full search algorithm .............................................................................................................. 47 Figure .11 Foreman video predicted frames macroblock size 1 "Logarithmic search" ............................ 48 7 Figure .12 Foreman video predicted frames macroblock size 8 "Logarithmic search" ............................ 48 7 Figure .13 Foreman video predicted frames macroblock size 16 "Logarithmic search" .......................... 48 7 Figure .14 Stephan video predicted frames macroblock size 1 "Logarithmic search" ............................. 48 7 Figure .15 Stephan video predicted frames macroblock size 8 "Logarithmic search" ............................. 48 7 Figure .16 Stephan video predicted frames macroblock size 16 "Logarithmic search" ........................... 48 7 Figure .17 Fish video predicted frames macroblock size 1 "Logarithmic search" .................................... 48 7 Figure .18 Fish video predicted frames macroblock size 8 "Logarithmic search" .................................... 48 7 Figure .19 Fish video predicted frames macroblock size 16 "Logarithmic search" .................................. 48 7 Figure .20 Foreman video predicted frames macroblock size 1 "Raster search" ..................................... 48 7 Figure .21 Foreman video predicted frames macroblock size 8 "Raster search" ..................................... 48 7 Figure .22 Foreman video predicted frames macroblock size 16 "Raster search" ................................... 48 7 Figure .23 Stephan video predicted frames macroblock size 1 "Raster search" ...................................... 48 7
IX
Figure .24 Stephan video predicted frames macroblock size 8 "Raster search" ...................................... 48 7 Figure .25 Stephan video predicted frames macroblock size 16 "Raster search" .................................... 48 7 Figure .26 Fish video predicted frames macroblock size 1 "Raster search" ............................................. 48 7 Figure .27 Fish video predicted frames macroblock size 8 "Raster search" ............................................. 48 7 Figure .28 Fish video predicted frames macroblock size 16 "Raster search" ........................................... 48 7 Figure .29 PSNR values for "Foreman video" with search window 7 using 'Raster full search' ............... 48 7 Figure .30 PSNR values for "Foreman video" with search window 15 using 'Raster full search' ............. 48 7 Figure .31 PSNR values for "Foreman video" with search window 25 using 'Raster full search' ............. 48 7 Figure .32 PSNR values for "Stephan video" with search window 7 using 'Raster full search'................. 48 7 Figure .33 PSNR values for "Stephan video" with search window 15 using 'Raster full search' .............. 48 7 Figure .34 PSNR values for "Stephan video" with search window 25 using 'Raster full search' .............. 48 7 Figure .35 PSNR values for "Fish video" with search window 7 using 'Raster full search' ...................... 48 7 Figure .36 PSNR values for "Fish video" with search window 15 using 'Raster full search' ..................... 48 7 Figure .37 PSNR values for "Fish video" with search window 25 using 'Raster full search' ..................... 48 7 Figure .38 PSNR for predicted frames using different 2D-DCT Compression Qualities ............................ 48 7 Figure .39 foreman video predicted frames "NO DCT" ............................................................................ 48 7 Figure .40 foreman video predicted frames "DCT 36:64"......................................................................... 48 7 Figure .41 foreman video predicted frames "DCT 21:64"......................................................................... 48 7 Figure .42 foreman video predicted frames "DCT 10:64"......................................................................... 48 7 Figure .43 foreman video predicted frames "DCT 1:64"........................................................................... 48 7 Figure .1 Block diagram for Search window size decision after motion is detected ................................ 48 8
List of Tables
Table 2.1 Video formats with each format specifications...13 Table 3.1 Digital video formats with no. of frames per second and bit rate ...15
XI
Abbreviations
ADSL AVC B-frame BDS BMA CIF CMY CMYK DCT DPCM DVD GOP HDTV I-frame IDCT ISDN ISO ITU JPEG Asymmetric Digital Subscriber Line Advanced Video Coding Bi-directionally predicted frame Block Distortion Surface Block Matching Algorithm Common Intermediate Format Cyan, Magenta, Yellow Cyan, Magenta, Yellow, Black Discrete Cosine Transform Differential Pulse Code Modulation Digital Versatile Disk Group of Pictures High Definition Television Intra-coded frame Inverse Discrete Cosine Transform Integrated Services Digital Network International Organization for Standardization International Telecommunication Union Joint Photographic Experts Group
XII
MAE MC ME MSE MPEG NTSC P-frame PAL PSNR QCIF RGB SAE SIF SDTV UMTS VDSL YCbCr
Mean Absolute Error Motion Compensation Motion Estimation Mean Squared Error Moving Pictures Expert Group National Television System Committee Predictive frame Phase Alternating Line Peak Signal to Noise Ratio Quarter Common Intermediate Format Red, Green, Blue Sum of Absolute Errors Source Intermediate Format Standard Definition Television Universal Mobile Telecommunications System Very High Speed Subscriber Line Luminance, Chrominance blue, Chrominance red
XIII
Chapter 1 Introduction
Chapter One
1. Introduction
1.1 Importance of video compression

Video communication is a rapidly evolving field for several applications which include video telephony, videoconference, remote surveillance, remote working and learning, etc. It is also a key feature for the upcoming information and communication technologies based on residential digital lines (VDSL, ADSL and ISDN) and the 3rd generation of mobile telephony system (UMTS). In this scenario, video image compression plays a fundamental role in reducing the enormous bit-rate for transmission and storage. For example a high quality HDTV picture which has spatial resolution 1920 x 1080 square pixels and digitized as 8-bit per pixel, its uncompressed bit rate is about 1.3905G bit/sec. Consider also the Common Intermediate Format (CIF), the standard for video conferencing that has spatial resolution 352x288. At 30 picture per second video signal and 8 bits per pixel, the uncompressed bit rate is about 36.5M bit/sec. Even for smaller format, the Quarter CIF (QCIF) the uncompressed bit rate is about 9.1M bit/sec. ISDN channel for example has only 64k bit/sec, which means that without compression, it is impossible or non realistic to transmit over network or store such high-volume video data[1] [2]. To this objective, the ISO and the ITU-T committees have worked on several compression standards such as JPEG, MPEG
Chapter 1 Introduction (versions 1, 2, 4), H.261, H.263 and H.26L. These compression standards depend mainly on discarding the temporal and spatial redundancies present in video data, temporal redundancy is present due to the high correlation of successive images frames in most of the video scenes, due to object motion, camera motion, panning and zooming, taking advantage of these correlated sequence of frames by employing inter-frame compression techniques such as motion estimation and compensation. High correlation between neighboring locations pixels values in a video frame is known as spatial redundancy, and is typically removed by employing intra-frame coding techniques such as transform coding [1] [3].
1.2 Objective
Most of the video coding techniques are based on a hybrid Motion-Compensated (MC) Discrete Cosine Transform (DCT) approach [4], which insures the removal of both temporal and spatial redundancies as mentioned before and for typical encoders sometimes entropy coding is also used after these processes take place. In this thesis, testing this video coding technique used in MPEG-2 will take place and by varying video coding parameters that will be discussed later (for example; macroblock size, different search algorithms, search window, group of pictures different structures and DCT quality level), better quality could be achieved. 1. Develop a Motion estimation and compensation based codec that uses block matching algorithms. 2. Develop different search techniques for the block matching algorithm. 3. Test the prediction performance for different videos. 4. Test the prediction performance for different macroblock sizes, search windows, while using different search algorithms. 5. Develop the Discrete cosine transform (DCT) code and the inverse DCT code. 6. Test the prediction for the full hybrid MC DCT codec.
1.3 Methodology
A typical encoder shown in figure 1.1 has an input video signal as a sequence of pictures, first these pictures are processed one by one, divided into equal sized non-overlapping rectangular blocks Macroblocks of on average 16x16 pixel. Ideally the frame dimensions are multiples of the block size and square blocks are most common. If the frame is one that will be used as reference to other frames intraframe, then it will be coded without any reference to others and will pass through the transform coding and quantization block and then transmitted to the receiver. Otherwise if it is an interframe then it will pass through the motion estimation and compensation blocks, where block matching algorithms take place to search the reference frame for the best match and specify its location to create motion vectors to point to this location.
Block size affects the performance of compression techniques. The larger the block size, the fewer the number of blocks for each frame, and hence fewer motion vectors need to be transmitted. However, borders of moving objects do not normally coincide with the borders of blocks and so larger blocks require more correction data to be transmitted. Small blocks result in a greater number of motion vectors, but each matching block is more likely to closely match its target and so less correction data is required. Thus block size represents a tradeoff between minimizing the number of motion vectors and maximizing the quality of the matching blocks. The relationship between block size, image quality, and compression ratio has been the subject of much research and is well understood. Also the searching region Search Window (i.e. Number of candidate blocks to search) in the reference frame is represents a tradeoff between finding the best match, hence better quality and exhaustive computations and waste of time [5].
Figure .1 A Typical Video Encoder 1
1.4 Thesis Organization

In order to prepare the readers for understanding the material discussed in this thesis, Chapter 2 will be a background chapter including explanation for Digital video, Objective video quality, Color spaces, chroma sub-sampling and finally digital video formats and applications. Chapter 3.1 will be devoted to the discussion of Video compression fundamentals and coding standards. In chapter 3.2 MPEG-2 coding standard specifications and compression features will be presented. Next in Chapter 3.4 is the motion estimation and compensation description. Chapter 4 will present the block matching algorithm along with the comparison criteria used for matching, then full and fast search algorithms will be introduced. In Chapter 5, a brief description for transform coding detailing the two dimensional discrete cosine transform (2-D DCT) along with the inverse (2-D DCT) and quantization. Chapter 6 introduces the analog devices hardware and software development tools, ADSP-BF561 Processor desktop evaluation board and VisualDSP++Release 5.0 details, then includes some of the software features and next a simple test using the evaluation board to buffer PAL or NTSC video frames and to display them on another device. In Chapter 7 our experiment will take place, describing each step and analyzing results for the change in different parameters and showing figures and graphs to support the results. Finally, this thesis is concluded in chapter 8.
Chapter 2 Background
Chapter Two
2. Background
Video (In Latin: I see), is a sequence of images referred to as frames and the number of still pictures per unit of time of video is called the frame rate , Obviously the increase in the frame rate comes with increase in the observed video quality, many standards specify on average 25 to 30 frames/sec. The main point is that the frame rate must exceed 15 frames per sec to achieve the illusion of moving image. A visual scene is continuous both spatially and temporally. In order to represent and process a visual scene digitally it is necessary to sample the real scene spatially (typically on a rectangular grid frame in the video image plan) and also temporally (typically as a series of still frames sampled at regular intervals of time). Each frame element is known as pixel is represented digitally as one or more numbers that describe the brightness and color of the sample [6].
2.1 Digital Video
Digital video refers to the capturing, manipulation and storage of video in digital formats, obtaining digital video is done using two way (1) Directly from Digital cameras. (2) Conversion of an analog video signal using both Sampling and Quantization. Video in digital domain is characterized by more than one property, or in other words is preferable compared to analog video; Digital video is less subjective to noise, higher visual quality than analog, allows advanced editing and processing, allows repeated reproduction without losses and finally the most important feature, it allows better compression and encryption schemes. Before examining methods for compressing and transporting digital video, it is necessary to establish the concepts for video in digital domain [7]. Digital video is visual information represented in a discrete form, suitable for digital electronic storage or transmission. In this part concepts of digital video will be described such as: Color spaces (RGB and YCrCb) and Measuring and qualifying visual quality. Video frames are formed using tri-chromatic color mixing theory which states that any color can be formed by mixing three primary colors (RED, GREEN, BLUE) with the right proportion, Also that is the way color monitors works, by exciting primary color phosphors using separate electronic guns. Reflecting sources Secondary colors are cyan, magenta, yellow (CMY) these colors are used to operate the color printers, but sometimes black (K) is added to these colors the enhance quality of printing which results in the (CMYK) model.
2.2 Objective Video Quality

In order to specify, evaluate and compare video communication systems it is necessary to determine the quality of the video images displayed to the viewer. Measuring video quality using Objective criteria gives accurate and repeatable results. Probably the most widely used objective measure is peak signal to noise ratio (PSNR), calculated using Equation (2.1) .PSNR is calculated very easy and is therefore a very popular quality measure. It is widely used for comparing the quality of compressed and the decompressed video images. PSNR is measured on a logarithmic scale and is based on the mean squared error (MSE) between an original and an impaired video frame. (2 1)2
= 10 log10
(2.1)
As a widely known rule a PSNR higher than 40 dB typically indicates an excellent image (i.e., very close to original image), between 30-40 dB usually means good image (i.e., distortion is visible but acceptable), between 20-30 dB is quite poor and finally, PSNR lower than 20 dB is unacceptable. The PSNR measure suffers from a number of limitations, mainly presence of an unimpaired original image to comparison, which may not be available and has perfect fidelity in many cases [7] [8].
2.3 Color Spaces

A monochrome (grey scale) video image may be represented using just one number per sample that indicates the brightness Luminance of the sample position. Conventionally, a larger number indicates a brighter sample. If the sample is represented by n bits, then a value of 0 represent black and a value (2 1) represent white, with other values in between representing shades of grey. Luminance is always represented by 8 bits per sample, therefore the value 255 represent white. Representing color requires multiple numbers per sample. There are several alternative systems for representing color, each is known
Chapter 2 Background as color spaces. Two of the most common color spaces are: RGB (red/green/blue) and YCrCb (luminance/red chrominance/blue chrominance).
2.3.1
RGB
In the red/green/blue color space, each pixel is represented by three numbers indicating the relative proportion of red, green and blue. Because the three components have equal importance to the final color, RGB systems usually represent each component with the same precision and therefore the same number of bits. Using 8 bits per component is quite common: 3 8 = 24 are required to represent each pixel. Figure (2.1), shows an RGB image, along with its separate R, G and B components; Note that the white snow consists of strong red, green, and blue; the brown barn is composed of strong red and green with little blue; the dark green grass consists of strong green with little red or blue; and the light blue sky is composed of strong blue and moderately strong red and green [6] [8].
Figure .1 Example for an image along with its RGB components 2
2.3.2
YCbCr
The human visual system is less sensitive to color than to luminance (brightness), however the RGB system does not take advantage of this since the three colors are equally important and the luminance is present in all the three color components. It is possible to represent the color image more efficiently by
Chapter 2 Background separating the luminance from the color information. A popular color space of this type is Y: Cb: Cr. Y is the luminance component, a monochrome version of the color image. Y is a weighted average of the three components R, G and B: = + +
(2.2)
Where are the weighting factors. The color information can be represented as color difference or chrominance components, where each chrominance component is the difference between [R,G, B] and Y: = = =
(2.3) (2.4) (2.5)
The complete description is given by Y and the three color difference Cr, Cb and Cg that represent the variation in color intensity and the luminance of the image. And since the value of + + is a constant, therefore only two of the three chrominance components should be transmitted. Figure (2.2), shows a color image and the Y, Cb and Cr elements of it. Note that the Y image is essentially a greyscale copy of the main image; that the white snow is represented as a middle value in both Cr and Cb, that the brown barn is represented by weak Cb and strong Cr; that the green grass is represented by weak Cb and weak Cr and that the blue sky is represented by strong Cb and weak Cr [6] [8].
Figure .2 Example for an image along with its YCbCr components 2
2.4 Chroma Sub-Sampling

Visual Redundancy can be reduced as Human eyes are more sensitive to change in brightness than change in color; hence by assigning fewer bits to the chrominance components than Luminance ones will take advantage of this redundancy. There are three types of Chroma Sub-sampling, first the 4:4:4 sampling where the three components (luminance Y, Chrominance Cb &Cr) have same resolution. The Y, Cb, and Cr components are scaled and shifted versions of the analog Y, U, and V components, where the scaling and shifting operations are introduced so that the resulting components take value in the range of (0,255). For the 4:4:4 sampling sample of each component exists at each pixel position, where actually there is no sub-sampling. Second sampling type is the 4:2:2 where the chroma components have the same vertical resolution but have the horizontal one, this format is typically used for high-quality reproduction. Third type of the chroma sub-sampling is the 4:2:0 which is the typical format for most of applications, where the chrominance components have half both the vertical and the horizontal components. As a conclusion on average 4:2:0 version will require half as many bits as the 4:4:4 version [10] [11].
Figure .3 Chroma Subsampling different versions 2
10
2.5 Digital Video Formats and Applications:

Many digital video formats are being used nowadays for example the CIF (Common intermediate format) which has a size of 352x288 and is color sampled by the 4:2:0 technique, CIF uses 30 frames per second and its raw data is 37 Mbps which can be compressed to about 128-384 Kbps, CIF is used for Video conferencing over ISDN/internet. While QCIF is a quarter of CIF with size of 176x144 and also uses 4:2:0 color sampling and 30 frames per second, on the other hand its raw data is 9 Mbps and can be compressed to about 64-128 Kbps and QCIF is used for Video telephony over wired/wireless modems. The new H.263 video codec standard which is better than the H.261 and which can compress the QCIF to about 20 Kbps with better quality than the H.261. The SIF (Source Intermediate Format) size is 352x240 for the 30 frames per second technique and 352x288 for the 25 frames per second technique. And as well SIF uses 4:2:0, with a raw data of 30 Mbps. This format is targeted for video applications which require medium quality such as video games and CD movies. SIF is compressed using the MPEG-1 (Motion Picture Expert Group) technique to 1.1 Mbps, SIF is used for intermediate quality video distribution VCD.
Table 2.1 Video formats with each format specifications
Video format SIF CIF QCIF
Size 352x240/288 352x288 176x144
Color sampling 4:2:0 4:2:0 4:2:0
Frame rate 30/25 fps 30 fps 30 fps
Raw data (Mbps) 30 36.5 9.1
11
Chapter 2 Background The Last decade has seen a rapid increase in applications for digital video technology and new, innovative applications continue to emerge, such as; Video Conferencing, video telephony, Remote learning, Remote medicine, Games and entertainment [6] [8] [11].
12
Chapter 3 Video Compression Fundamentals
Chapter Three
3. Video Compression Fundamentals
Video represented in a digital form requires large number of bits, volume of data for this representation is too large for most of storage and transmission systems which exceeds the continual increase in storage capacity and transmission bandwidth. Table (2) shows the uncompressed bit rates of several video formats. From this table it can be seen that even the QCIF at 15 frames per second (Low quality video) requires 4.6Mbps for transmission or storage.
Table 3.1 Digital video formats with no. of frames per second and bit rate
Format ITU-R 601 CIF QCIF
Frames per second 30 fps 30 fps 15 fps
Bit rate (uncompressed) 216Mbps 36.5Mbps 4.6Mbps
13
Now it is clear that there is a reason for presence of video compression, due to that large gap between high bit rate for uncompressed video data and the available capacity of transmission and storage systems. Video compression systems aim to reduce the amount of data required to store or transmit videos while maintaining an acceptable level of video quality (described in part (2.2)) and also it is obvious that higher compression will result in a greater loss of quality[6].
3.1 Video Coding Standards

Most of practical systems and standards for video compression are known to be lossy, (The volume of data is reduced at the expense of a loss of visual quality).There are several video coding standards as:
H.261: First video coding standard, targeted for video conferencing over ISDN Uses block-based hybrid coding framework with integer-pel MC
H.263: Improved quality at lower bit rate, to enable video conferencing/telephony below 54 kbps Half-pixel MC and other improvement
MPEG-1 video Video on CD and video on the Internet (good quality at 1.5 mbps) Half-pixel MC and bidirectional MC
MPEG-2 video SDTV/HDTV/DVD (4-15 mbps) Extended from MPEG-1, considering interlaced video
14
MPEG-4 To enable object manipulation and scene composition at the decoder -> interactive TV/virtual reality Object-based video coding: new shape coding tools Coding of synthetic video and audio: animation tools
MPEG-7 To enable search and browsing of multimedia documents Defines the syntax for describing the structural and conceptual content To be covered later when discussing multimedia databases
These standards use several techniques such as:
DPCM (Differential Pulse Code Modulation) Transform Coding Predictive Coding Model-based Coding
Predictive Coding or as known also Motion-compensated Prediction, the encoder forms a model of the current frame based on the samples of a previously coded and transmitted frame. The encoder tries to compensate the motion in a video sequence by moving and warping the samples of the previously transmitted frame reference frame. The resulting predicted frame is subtracted from the current frame to produce a residual error frame and always further coding follows motion-compensated prediction, e.g. transform coding for the residual frame [12].
15
3.2 MPEG-2 Coding Standard

MPEG-2 is a video coding standard created by the Moving Picture Experts Group (MPEG). Now, it is the standard format used for satellite TV, digital cable TV, DVD movies, and HDTV. In addition, MPEG2 is a commonly used format to distribute video files on the internet [12] [13].
MPEG-2 is an evolution of MPEG-1, an earlier MPEG coding standard. In fact, MPEG-2 decoder can decode an MPEG-1 video. The additions to MPEG-2, therefore, are what make it a separate standard. The major additions are:
Support for higher resolution video Support for interlaced video (as used on standard definition TV (SDTV)) Optimized for higher bit rates (typically 4 Mb/s and above, versus 1.5 Mb/s and below for MPEG-1) Scalability via layered encoding to support a variety of quality levels/transmission bandwidths from one coded source
MPEG-2 Compression:
Color Space:
YCbCr
Chroma Sub-sampling:
4:2:0
16
Block based coding:
MPEG-2 uses block based coding for motion estimation and compensation. This
means that a frame is not encoded as a whole; it is divided into many independently coded blocks. A macroblock is 16x16 pixels and is a basic unit of MPEG-2 coding. However, each macroblock is further divided into 8x8 pixel blocks. This results in 6 blocks per macroblock.
Types of Frames:
1. I-frame: Intra-coded frame, coded independently of all the other frames in the sequence, they are the most important frames in the sequence, used as reference to other frames and can be compressed using only transform coding DCT giving moderate compression performance.
2. P-frame: Predictively coded frame, coded based on previously coded frames that precede that frame. The MPEG-2 standard dictates that the past frame must be an I or P frame, but not a B frame. Coding is achieved using motion vectors. The basic idea is to
match each macroblock in the current frame with the corresponding area in the past reference frame as closely as possible.
3. B-frame: Bi-directionally predicted frame, coded based on previously coded frames that precede or succeed the current frame (I or P-frames) in temporal order of images sequence. B-frame is simply a more general version of a P frame. Motion vectors can refer not only to a past frame, but to a future frame, or both a past and future frame.
17
Using future frames is exactly like a P frame except for referencing the future. Using past and future frames together works by averaging the predicted past macroblock with the predicted future macroblock. The main advantage of the usage of B frames is
coding efficiency. In most cases, B frames will result in less bits being coded overall.
Backward prediction in this case allows the encoder to make more intelligent decisions on how to encode the video within these areas. Also, since B frames are not used to predict future frames, errors generated will not be propagated further within the sequence. One disadvantage is that the frame reconstruction memory buffers within the encoder and decoder must be doubled in size to accommodate the 2 reference frames. This is almost never an issue for the relatively expensive encoder; another disadvantage is that there will necessarily be a delay throughout the system as the frames are delivered out of order [6] [9] [12] [13].
3.2.1 Group of Pictures

Figure .1 MPEG Group of pictures 3
An I-frame with all other frames before the next I-frame is referred to as group of pictures (GOP).There are various possible GOP structures, such as the [IIIIII...] which uses no temporal prediction and need a high bit rate. Second the [IBIBIB...] which uses less bit rate than the all I-frame structure, third the [IBBPBBPB...] shown in figure (3.1) Which uses forward and bi-directional prediction and give the best compression, but needs large decoder memory and finally the [IPPIPPIP...] with uses only forward prediction and needs less decoder memory[12] [13].
18
3.3 Motion Estimation and Compensation

A video signal consists of a sequence of individual frames. Each frame may be compressed using image codec, where each frame is coded without any reference to other frame. However, by using the temporal redundancy in a video sequence maybe can achieve a better compression performance. This may be achieved by adding a front end to the image codec, with two main functions: 1. Prediction: create a prediction of the current frame by reference one or more previously coded frames.
2. Compensation: subtract the prediction from the current frame to produce a residual frame. The key of this approach is the prediction function: if the prediction is exact then the residual frame will contain little data and will be compressed to a very small size by image codec. In order to decode the frame, the decoder must reverse the compensation process so it must add the prediction to the decoder [15].
Figure .2 Video codec with prediction 3
19
Frame difference gives better compression performance when successive frames are very similar, but does not perform well if there is a significant change between the current and previous frames. Such changes are usually due to movement in the video scene and a significantly better prediction can be achieved estimating this movement and compensating for it. Figure (3.3) has shown a video codec which has motion prediction [15]. Two new steps are required in the encoder: 1. Motion estimation: A region of current frame is compared with neighboring region of the previous frame, motion estimator attempts to find the best match macroblock.
2. Motion compensation: the best match macroblock from the reference frame is subtracted from the current macroblock. The decoder has the same motion compensation operation to reconstruct the current frame. This means that the encoder has to transmit the coordinates (usually it is named motion vector) of the best matching macroblock to the decoder [15].
Figure .3 Video codec with motion estimation and compensation 3
20
Chapter 4 Block Matching Algorithms
Chapter Four
4. Block Matching Algorithm
In the popular video coding Standards (H.261, H.263, MPEG-1, MPEG-2 and MPEG-4), motion estimation and compensation are carried out on non-overlapping small regions Blocks in the current frame. Motion estimation on a complete block is known as block matching Algorithm (BMA). For each block of a certain size in the current frame, the motion estimation algorithm searches a neighboring area of the reference frame for a matching same block size area. The best one is the one that minimizes the energy of the difference between the current and the matching block. The area in which the search is carried out may be centered around the position of the current block, because (a) it is likely to be a good match due to the high correlation between sub-sequent video frames and (b) it would be computationally intensive to search the whole reference frame.
21
Figure (4.1) illustrates the block matching process, the current block in this case is a (3x3) pixels, which is compared to the same position in the reference frame (5x5) and the immediate
neighboring positions ( +/1 pixels in each direction). The mean squared error (MSE) between the current block and the same position in the reference frame position (0,0) is given by the equation in the figure to be 2.44, and also showing the complete set of MSE values for each search position, Of the candidate positions available, (-1,1) gives the smallest MSE and therefore the best match [13] [14].
A video encoder carries out this process for each block in the current frame using the following steps:
Figure .1 Block matching process 4
1. Calculate the energy of the difference between current block and a set of neighboring blocks in the reference frame. 2. Select the block that gives the lowest error ( for example: MSE) 3. Subtract the matching bock from the current block producing the difference block. 4. Encode and transmit the difference block. 5. Encode and transmit a motion vector that indicates the position of the matching region, relative to the current block position. (In the above example, the motion vector (-1, 1). Steps 1 and 2 correspond to motion estimation and step 3 to motion compensation. The Video decoder reconstructs the block as follows: 1. Decode the difference block and the motion vector. 2. Add the difference block to the matching region (pointed to by the motion vectors) in the reference frame.
22
4.1 Block Matching Algorithm Comparison Criteria:

Mean squared error provides a measure of the energy remaining in the difference block and can be calculated for (N x N) sample block as: 1 = 2
1 1

=0 =0
(4.1)
Where C and R are the samples of the current and reference blocks and 00 , 00 are the top-left samples in the current and reference blocks. Mean absolute error (MAE) provides a reasonable approximation of the remaining energy and is much easier to be calculated than MSE, since it requires a magnitude calculation instead of a squared calculation for each pair of samples as show in the equation:
1 1
1 = 2

=0 =0
(4.2)
The comparison may be simplified further by removing the term 1/N2 and simply calculate the sum of absolute errors (SAE) or sum of absolute differences (SAD):
1 1
=
=0 =0
(4.3)
23
4.2 Search Algorithm for Motion Estimation

In order to find the best matching region in the reference frame, theoretical caring out a comparison of the current block with every possible candidate in the reference frame, which of course is impractical because of the large number of comparisons required. In practice a good match for the current block can usually be found in the immediate neighborhood of the block position in the reference frame. Hence, in practice the search for a matching region is limited to a search widow, which is centered on the current block position. Search window optimum size depends on several factors (1) Resolution of each frame (Larger window for higher resolution), (2) Type of scene (High motion scenes benefit from a larger search window) and finally (3) the available processing resources as larger window would requires more comparisons and therefore more processing.
4.2.1 Full Search Block Matching Algorithm
Figure 4 search Raster and Spiral algorithms Figure 4.2 .2 Full search Raster and Spiral algorithms Full
24
This type of search calculates the comparison criteria at each available position in the search window, which is computationally intensive especially in large search windows. Raster Full search motion estimation processes the locations starting from the top-left location as shown in the figure (4.2) or in a spiral order starting from the position (0, 0) shown in figure (4.2) .The spiral search order has an advantage over the raster when early termination algorithm are used because the best match is most likely to be near the center of the search region. Due to the intensive computations required by the full search, various fast algorithms have been developed , which trade off estimation accuracy for reduced computation [6] [12] [13] [14].
4.2.2 Fast Search Block Matching Algorithm

This type of algorithms aims to reduce the number of comparison operations compared to the full search algorithm, for example; Logarithmic search, Three-step search, Cross search, On-at a time search, Nearest Neighbors search and the Hierarchical search. Fast search will sample the only some of the possible locations in the search region. The difference in results is that the difference block contains more energy than that found by the full search and hence the number of coded bits generated by the video encoder increase increasing the errors and therefore poorer compression performance than the full search.
4.2.2.1 Logarithmic Search Strategy
The Logarithmic search is one of the popular techniques used which starts from the position corresponding to zero displacement and each step tests five points in a diamond arrangement. In the next step, the diamond search is repeated with its center shifted to the best matching point resulting from the previous step, while not searching a candidate position if it is outside the search window. The step size of the search (radius of the diamond) is reduced if the best matching point
25
is the center its self or if it is on the maximum search border range. Otherwise the search step stays the same. The Logarithmic search is typically accurate for large searching windows and it returns fast and reasonable quality [6] [12] [13] [14].
Figure .3 Fast search Logarithmic algorithm 4
26
Chapter 5 Transform Coding
Chapter Five
5. Transform Coding
Transform Coding is a main point for most of the video coding systems and standards. Spatial image data (image samples or motion-compensated residual samples) are transformed into a different representation, the reason is that spatial image data is difficult to compress, neighboring samples are highly correlated and the energy is distributed across the image, which makes it difficult to discard data or even reduce the precision of data without disturbing the image quality. This type of coding should compact the image energy (concentrate the energy into a small number of significant values), decorrelate the data (so that discarding insignificant data has minimal effect on the image quality) and it should be suitable for practical implementation in software and hardware.
27
5.1 Two Dimensional Discrete Cosine Transform (2-D DCT)

The 2-D DCT version transforms a 2-D block of samples into a block of coefficients. Figure (5.1), shows a 720x572 pixel image then taken an 8x8 block, the next step shows the block samples values and finally the block is transformed with 2-D DCT to produce the coefficients shown in the last part. The compaction and decorretation performance of the DCT increases with the increase of block
Figure .1 2-D DCT performed on an 8x8 block of an image 5 Figure 5.1
size, but also computational complexity increases with the block size. A block size of 8x8 is commonly used in image and video coding applications. This size gives a good compromise between compression efficiency and computational efficiency. Equation (5.1), is used to calculate the forward DCT for an 8x8 block of image samples [16].
7 7
() = 4
, cos
=0 =0
2 + 1 cos 16
2 + 1 16
(5.1)
The inverse DCT reconstructs a block of image samples from an array of DCT coefficients. The IDCT takes as its input a block of 8x8 DCT coefficients , and reconstructs a block of 8x8 image samples , Equation (5.2).
7 7
, =
=0 =0
() , cos 4
2 + 1 cos 16
2 + 1 16
(5.2)
28
Intensity map
Figure (5.2) shows the intensity map for a block of image samples and next the 2-D DCT coefficients, which shows that the energy in the transformed coefficients is concentrated about the top-left corner of the array of coefficients Compaction. The top-left coefficients
DCT coefficients
correspond to low frequencies, where there is a peak in energy in this area and the values decrease to the bottom right of the array (higher frequency coefficients)[17] .
Figure .2 An image with the intensity map along with the compacted version 5
5.2 Quantization
The function of the coder is to transmit the DCT block to the decoder, in a bit rate efficient manner, so that it can perform the inverse transform to reconstruct the image. It has been observed that the numerical precision of the DCT coefficients may be reduced while still maintaining good image quality at the decoder. Quantization is used to reduce the number of possible values to be transmitted, reducing the required number of bits. In practice, this results in the high-frequency coefficients being more quantized than the low-frequency coefficients. Note that the quantization noise introduced by the coder is not reversible in the decoder, making the coding and decoding process 'lossy'. At quality 50 (i.e. 84% zeros) there is almost no visible loss in the image, but there is high compression. At lower quality levels, the quality goes down by a lot but the compression does not increase that much [16] [17].
29
This part shows that the DCT exploits interpixel redundancies to render excellent decorrelation for most natural images. Thus, all (uncorrelated) transform coefficients can be encoded independently without compromising coding efficiency. In addition, the DCT packs energy in the low frequency regions. Therefore, some of the high frequency content can be discarded without significant quality degradation. Such a
Figure .3 Inverse DCT of Trees; (a) DCT(100%); (b) DCT(75%); (c) 5 DCT(50%); (d) DCT(25%).
quantization scheme causes further reduction in the average number of bits per pixel. Lastly, it is concluded that successive frames in a video transmission exhibit high temporal correlation. This correlation can be employed to improve coding efficiency [16] [17].
30
Chapter 6 Analog Devices Hardware & Software Experience
Chapter six
6. Analog Devices Hardware & Software Experience
The EZ-KIT Lite includes an ADSP-BF561 Processor desktop evaluation board along with an evaluation suite of the VisualDSP++ development and debugging environment with the C/C++ compiler, assembler, and linker. It also includes sample processor application programs.
Figure .1 Image for BF561 Hardware 6
31
6.1 ADZS-BF561-EZLITE
ADSP-BF561 Blackfin processor (600 MHz) SDRAM: 64 MB Flash memory: 8 MB AD1836A Analog Devices 96 kHz audio codec 4 input RCA phono jacks (2 channels) 6 output RCA phono jacks (3 channels) ADV7183A video decoder w/ 3 input RCA phono jacks ADV7179 video encoder w/ 3 output RCA phono jacks Universal asynchronous receiver/transmitter (UART) 20 LEDs: 1 power (green), 1 board reset (red), 1 USB (red), 16 general purpose (amber), and 1 USB monitor (amber) 5 push buttons with debounce logic: 1 reset, 4 programmable flags Expansion interface JTAG ICE 14-pin header
Figure .2 Connector Locations 6
32
6.2 VisualDSP++ Release 5.0

The ADSP-BF561 is supported with a complete set of CROSSCORE software and hardware development tools, including Analog Devices emulators and the VisualDSP++ development environment. The same emulator hardware that supports other Analog Devices processors also fully emulates the ADSP-BF561. The VisualDSP++ project management environment lets programmers develop and debug an application. This environment includes an easy to use assembler that is based on an algebraic syntax, an archiver (librarian/library builder), a linker, a loader, a cycle-accurate instructionlevel simulator, a C/C++ compiler, and a C/C++ runtime library that includes DSP and mathematical functions. A key point for these tools is C/C++ code efficiency. The compiler has been developed for efficient translation of C/C++ code to Blackfin assembly.
VisualDSP++ Features: The Blackfin processor has architectural features that improve the efficiency of compiled C/C++ code.
The VisualDSP++ debugger has a number of important features. Data visualization is enhanced by a plotting package that offers a significant level of flexibility. This graphical representation of user data enables the programmer to quickly determine the performance of an algorithm. As algorithms grow in complexity, this capability can have increasing significance on the designers development schedule, increasing productivity. Statistical profiling enables the programmer to nonintrusively poll the processor as it is running the program. This feature, unique to VisualDSP++, enables the software developer to passively gather important code execution metrics without interrupting the real-time characteristics of the program. Essentially, the developer can identify bottlenecks in software quickly and efficiently. By using the profiler, the programmer can focus on those areas in the program that impact performance and take corrective action.
33
Figure .3 Visual DSP++ Release 5.0 6
Debugging both C/C++ and assembly programs with the VisualDSP++ debugger, programmers can: View mixed C/C++ and assembly code (interleaved source and object information). Insert breakpoints. Set conditional breakpoints on registers, memory, and stacks. Trace instruction execution. Perform linear or statistical profiling of program execution. Fill, dump, and graphically plot the contents of memory. Perform source level debugging. Create custom debugger windows
34
6.3 Implementation and Testing
Using the hardware ADSP-BF561 Processor desktop evaluation board along with VisualDSP++ software to test their performance; Supplying the board with a PAL or NTSC video signal, then buffering the data in SDRAM. The buffered video frame is then sent out to the video monitor. In this application, no processing is done on buffered video frames. Connect the board to power supply, Pc with the USB cable provided then follow these steps to test this application:
1. ADSP-BF561 EZ-KIT LITE SETTINGS
SW2: 1-OFF 2-OFF 3-OFF 4-OFF 5-OFF 6-ON SW3: 1-OFF 2-ON 3-ON 4-OFF SW4: 1-ON 2-ON 3-ON 4-ON 5-OFF 6-OFF SW5: 1-OFF 2-ON 3-ON 4-ON SW10: 1-OFF 2-OFF 3-OFF 4-OFF 5-OFF 6-OFF SW11: 1-OFF 2-OFF 3-OFF 4-OFF SW12: 1-ON 2-ON 3-ON 4-ON SW13: 1-ON 2-ON
35
2. External connections Connect a monitor to the EZ-Kit video-out connector and a video source to the EZ-Kit video-in. The video connectors are the bank of 6 RCA-style jacks nearest the serial cable connector on the EZ-Kit labeled as J6.
Figure .4 Connection to Video In and Video Out devices 6
3. Operational Description Open the "VideoInVideoOut.dpj" project in the VisualDSP++ Integrated Development Environment. Under the "Project" tab, select "Build Project" (program is then loaded automatically into DSP). Run the executables by pressing "multiprocessor run" (CTRL-F5) on the toolbar. Halt the processor ("multiprocessor halt" button). If you open a memory window and go to the addresses of sFrame0, 1, 2, 3, you see the video data of the four frames.
36
Chapter 7 Experiment & Analysis
Chapter Seven
7. Experiments & Analysis
In this part some of the MPEG-2 video compression standard properties and enhancements will be tested, Using MATLAB that is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages. First steps is to load a video into Matlab and to divide it into number of frames as mentioned before, and then comes the important part which is to divide each one of these frames into same parts macroblocks which is the small element that will undergo each operation till the end.
7.1 Exact Procedure

[1] Use command fopen to load the video file, and then adjust the frame components as the given ratio 4:2:0 for luminance and chrominance ratios. Also calculate the new frame size with the luminance and chrominance ratios and specify the new number of frames by dividing the file size by the new frame size.
37
[2] The GOP used for this test is [ IBBPBBPBBI ]; therefore the next step is classifying each frame type so that its easy to call each frame through the process. Using fread command that reads the video file data loaded to Matlab as binary format into matrices. Next using fseek to move between video frames and classify them.
[3] In this step the P and B-frames should pass through the motion estimation and compensation part. The main part in this step as mentioned before is to get the motion vectors for each frame along with the difference frame, introducing the motion estimation function that takes as an input the current frame that needed to be coded, the reference one, type of search , macroblock size and for sure the search window size. Dividing the search into three branches; Raster, Spiral and logarithmic search:
Raster search function: This function will calculate first the number of macroblocks within each frame and then move through these macroblocks within the fixed search window in a raster way from the beginning of the search window, block by block to the end to calculate the minimum difference macroblock and to get the motion vectors.
Spiral search function: Operates the same as the raster search function, the only difference is that this function moves between the macroblocks in a spiral way starting from the current frame macroblock current location which is the center of the search
38
window, that has a computational advantage because the best match is likely to occur near the center of the search window.
Logarithmic search function: Searches for the best match block in a logarithmic way that was mentioned in part (4.2.2.1), and also there is another difference as logarithmic search is a fast search technique not like the raster and spiral full searches. It does take account for the search window, it searches the whole frame for the best match, but it takes another parameter as an input rather than the search window, which is the number of steps in each move N.
[4] Second main part for the experiment is the motion compensation; taking as an input the calculated motion vectors, macroblock size and the reference frame. It creates a new frame with the reference frame size and first fills it with zeros, then divides the frame into non-overlapping macroblocks and finally gets the matching macroblock the reference frame.
[5] Introducing the Peak-signal-to-noise-ratio function, which is the main point for all analysis and measurements, this function takes two frames and calculates the PSNR, actually it takes the current and the compensated frames to calculate the PSNR for the compensated frame, using Equation (2.1) and gives the final value in dB.
[6] Discrete cosine transform function is very useful for our analysis, which simply works on an individual frame to calculate its 2-D DCT coefficients for each block 8x8 pixels and to construct the DCT image with the preferred quantization weight of compression. First it reads the input image
39
using imread command, then using Double to get better precision for the loaded values of the image. Next dividing the image into non-overlapping blocks of 8x8 to perform DCT on each block using dct2 built-in function on each block. After that multiplying the outcome block by a block of values which specify the quantization weight (i.e. 1:64,10:64,32:64). Inverse DCT is done easily, again divide the frame into block and perform idct2 function on each to get the reconstructed image.
[7] Finally plotting figures and graphs that shows the real and the predicted frames, PSNR for different frames and different searching types, Complexity for each operation and also DCT and IDCT with different compression ratios figures.
7.2 Results for Different Schemes

In this part of the thesis, Variation of some parameters will take place and observing there effect on quality. It is clear and mentioned before that minimizing the macroblock size will give better performance for this technique of compression that could be easily observed from the outputs quality. Trying to calculated the PSNR for the first GOP in this video for different macroblock sizes [ 1 , 8 , 16 ] and also with different search algorithms [Logarithmic, Raster and Spiral search]. Before going through the test steps, it is important to mention the uncompressed video file used and its specifications: The video file is Foreman.yuv which is very popular in video processing and testing issues. The video was converted from avi format to yuv format using the Windows Command Processor, this video is a QCIF with 30 frames per second and of resolution 176x144 and of 4:2:0 sub-sampling. Now using different macroblocks size and keeping the search window constant of 25x25 macroblocks for full search algorithms and observe the variation of quality for different type of searches.
40
Figure .1 PSNR for {p} and {B} predicted frames using Logarithmic search 7
In the above graphs the predicted frames from one to eight between two I-frames are [ B{1} B{2} P{1} B{3} B{4} P{2} B{5} B{6} ]. Also it is clear that the very small macroblock of 1x1 would result in a high quality, especially with the full search that tries to find the best match from all the macroblocks. Also putting into account that the search window is constant of 7. Moving between the PSNR values, B{1} and B{2} are of the best PSNR and this was predicted because they use the first I frame with first P frame for motion estimation and compensation, Where I-frames are not compressed at all they are sent with full details and the first P-frame is always not that bad because it is predicted from an I-frame, therefore the first two Bs are of the highest PSNR values. Next mentioning the second P frame that has the worst PSNR value because it was predicted from a predicted frame P{1}. After that the last two Bs again rise to high quality as they got near to an I-frame.
41
Figure .2 PSNR for {p} and {B} predicted frames using Raster full search 7
Figure .3 PSNR for {p} and {B} predicted frames using Spiral full search 7
42
Now it is very clear and obvious that the performance of the two full search algorithms is higher than the fast search algorithm Logarithmic search. Also it is important to mention that both Raster and Spiral full search algorithms resulted in the same PSNR values as they work the same way by searching the whole search window, but only differs in technique.
Figure .4 PSNR for predicted frames Foreman video using Raster full search different search window size 7
In Figure( 7.4), the PSNR values for the predicted frames with the change in the search window keeping a constant macroblock size of 8x8 pixels, Using the raster full search it is obvious that increasing the region of search will result in more precision and accuracy in finding the best match. Also Figure (7.4) shows that all the predicted frame PSNR will change together as the macroblock size is constant, taking a wise look at the figure, it can be concluded that the difference between search window 7 and 15 is larger than the difference between 15 and 25, as increasing the search window so much will not give much better quality as the macoblock being searched for, will likely be near its original position for most of the video scenes.
43
Testing another video file Stephan, which is important for our test as this video scenes contain a lot of motion and variation between its frames in a tennis match, this video is a CIF with 30 frames per second and of resolution 352x288 and of 4:2:0 sub-sampling. Testing for different macroblock size keeping a constant search window of 25 for full search algorithms and observe the variation of quality for different type of searches. Also testing different search window size and keeping the macrobloack size constant of 8x8.
Figure .5 PSNR for {p} and {B} predicted "Stephan video" frames using 'Logarithmic search 7
Figure .6 PSNR for {p} and {B} predicted "Stephan video" frames 'Full search algorithms 7
44
A noteworthy point out of the previous two graphs is that, for a macroblock size of 1x1 there is a huge difference in the resulting PSNR between fast and full search algorithms, but the both macroblock sizes 8x8 and 16x16 the PSNR values are almost the same, which means that in a rapid motion scene Tennis match using different search techniques with a large block size will not improve the quality and the only way to increase the PSNR values is to minimize the macroblock size.
Figure .7 PSNR for {p} and {B} predicted "Stephan video" frames for different search windows 'Raster full search algorithm 7
As more support for our compression technique, testing another video file Fish, which is also well known for video processing and testing issues, this video is a CIF with 30 frames per second and of resolution 352x288 and of 4:2:0 sub-sampling. Testing with different macroblock size and different search algorithms, keeping a constant search window of 25 for full search algorithms. Also testing different search window size and keeping the macroblock size constant of 8x8.
45
Figure .8 PSNR for {p} and {B} predicted "Fish video" frames using 'Logarithmic fast search algorithm 7
Figure 7.10 Figure .9 PSNR for {P} and {B} predicted Fish video frames using Raster full search algorithm 7
46
Figure .10 PSNR for {p} and {B} predicted "Fish video" frames different search windows using 'Raster full search algorithm 7
Fine looking to the above figure, it can be observed that the PSNR values for both search windows [15 25] are the same, Since Fish video is a yellow fish moving along with the capturing device, there is no need for a large window size and exhaustive searching between the block, as it is likely to find the matching region very near to the current frame original position. Therefore increasing the search window than 15 will increase nothing to quality.
47
Figure .11 Foreman video predicted frames macroblock size 1 "Logarithmic search" 7
Figure .14 Stephan video predicted frames macroblock size 1 "Logarithmic search" 7
48
Figure .17 Fish video predicted frames macroblock size 1 "Logarithmic search" 7
Figure .20 Foreman video predicted frames macroblock size 1 "Raster search" 7
49
Figure .23 Stephan video predicted frames macroblock size 1 "Raster search" 7
Figure .26 Fish video predicted frames macroblock size 1 "Raster search" 7
50
Figure .29 PSNR values for "Foreman video" with search window 7 using 'Raster full search' 7
Figure .32 PSNR values for "Stephan video" with search window 7 using 'Raster full search' 7
51
Figure .35 PSNR values for "Fish video" with search window 7 using 'Raster full search' 7
52
Introducing Transform coding as two-dimensional discrete cosine transform to the compression coder, passing the I-frames over the 2-D DCT for more compression as the I-frames or only coded without any prediction, where there isnt any compression. Hence, 2-D DCT will effectively improve the compression technique, testing the performance and the quality of the predicted frames using several DCT compression qualities (i.e. Different Quantization compression ratios 1:64, 10:64, 21:64 and 36:64) on video sequence foreman for a macroblock size of 8x8 and a search window 7 for simplicity and noticing the variation of the predicted frames.
Figure .38 PSNR for predicted frames using different 2D-DCT Compression Qualities 7
53
Figure .39 foreman video predicted frames "NO DCT" 7
Figure .40 foreman video predicted frames "DCT 36:64" 7
54
Chapter 8 Conclusion & Future Work
Chapter Eight
8. Conclusion & Future Work
8.1 Conclusion
In this thesis, various techniques for motion estimation block matching algorithm were implemented and tested, then the complete hybrid motion estimator and compensator with discrete cosine transform was also tested. Results from the previous chapter conducted that; (1) Full search algorithms will always lead to a better visual quality than fast search algorithms, but unfortunately with a significant increase in execution time due to the search strategy complexity. (2) Smaller macroblock size will enhance the block matching algorithm therefore will increase the probability of finding the best match block in the reference frame which will result in better quality for all video sequences tested. (3) Increasing the Search window will also give more flexibility for the algorithm to find the best matching block, trying different search window sizes for various videos, it was obvious that in terms of PSNR the result was significantly improved. As an important point to mention, Fish video gave the same results for both search windows (15, 25) which means that increasing the search window more than a certain threshold for video with low
55
motion characteristics will increase nothing to quality and will increase execution time. On the other hand for Stephan video the frames difference is high recognized which means that block (if they are the same ones) will change their locations rapidly and therefore increasing the search window and searching in a larger region will obviously give better quality. Next the part containing the DCT for the I-frames, we can conclude that, DCT will increase the compression as I-frames are coded without any prediction, therefore transform coding these frames will be an effective step for minimization of data. As the DCT is one of the lossy types of compression, therefore it will show degradation in visual quality. Testing different quantizer compression matrices will lead to different visual qualities as shown in part (7.2) as the compression increases from 64:64 (i.e. no compression) till 1:64 (i.e. Highest compression) the quality also inversely changes with the compression ratio.
8.2 Future work:

Techniques enhancing the Motion estimation block matching coder performance
Many empirical researches are being tested nowadays to: 1. Improve the predicted frames (decoded frames) quality. 2. Decrease the algorithm computational complexity. 3. Modify the coder to achieve efficient execution time. 4. Increase coding efficiency. Describing some of the most obvious and novel improvements to our thesis, we concluded that changing the search window for example for different videos yield different results and hence quality differs, there are many experiments done to try to take advantage of changing the search window with the type of video, Search Window Size Decision, is an innovative topic enhancing this
Figure .1 Block diagram for Search window size decision after motion is detected 8
56
hybrid coder efficiency, where its main idea is applying a Motion Detection Algorithm for the decision of the search window in the motion estimation, which will reduce the coders complexity[20]. Another innovative feature used in H.264/MPEG-4 AVC is the Block Size Selection Algorithm for inter-frame coding, which will increase the encoder efficiency, but with insignificant degradation in the picture quality. Results of the algorithm demonstrate a speed up in encoding time of up to 73% compared with the H.264 benchmark. Block size is no longer fixed, but ranges from 4x4 to 16x16 for inter-frame coding [21]. Since the performances of any of the mentioned algorithms highly depend on the characteristics of the video contents, there is no single algorithm that can adapt to all kinds of video contents. A multiple stage motion estimation scheme for video compression was proposed that tackles this issue, which is called Content Adaptive Search Technique (CAST) and can provide adaptability to the video contents to maximize the overall performance. CAST scheme consists of four stages; motion vector field prediction, block-based segmentation, motion parameter extraction, and adaptive search strategy. Through preprocessing the motion vector field of the previous reference frame in the first three stages, CAST extracts the motion parameters for each region. The 4th stage is a combination of various techniques including motion vector prediction, search area decision and an adaptive fast search algorithm that is adjusted by a mathematical model for the block distortion surface (BDS). CAST scheme improves the visual quality, while yielding a faster speed, comparing with the other predictive ME algorithms [22].
57
References
[1] S.Dhahri, A.Zitouni, H. Chaouch, and R. Tourki, Adaptive Motion Estimator Based on Variable Block Size Scheme, Proceedings of World Academy of Science, Engineering and Technology, Volume: 38, February 2009. [2] U-V.Koc, Low Complexity and High Throughput Fully DCT-Based Motion Compensated Video Coders, National Science Foundation Engineering Research Center Program, University of Maryland, Harvard University, 1996 [3] Fayez M.Idris, An Algorithm and Architecture for Video Compression, School of Graduates Studies and Research, University of Ottawa, 1993. [4] Lai Kam Cheong, Enhancing Techniques for a Standard Conforming Real-Time Video Codec, Department of Electronic and Information Engineering The Hang Kong Polytechnic University, September 2002 [5] Colin E.manning, Motion Compensated Video Compression Overview, http://www.newmediarepublic.com/dvideo/compression/adv08.html#blockmatching [6] Iain E. G. Richardson, VIDEO CODEC DESIGN: Developing Image and Video Compression Systems, Chichester : Wiley, 2002. [7] John G. Proakis, Dimitris G. Manolakis, DIGITAL SIGNAL PROCESSING, 4th edition, Prentice Hall, 2007. [8] Yao Wang, Jorn Ostermann, Ya-Qin Zhang, VIDEO PROCESSING AND COMMUNICATIONS, Prentice Hall, Upper Saddle River, NJ 07458, 2002. [9] Dave Marshall, http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html, April 2001.
[10] Dr. Leonardo Chiariglione, http://www.chiariglione.org/mpeg/index.asp, I-10040 Villar Dora, ITALY. [11] A. Zakhor, EECS 290T: Multimedia Signal Processing, Communications and Networking, University of California at Berkeley department of Electrical Engineering & Computer Sciences, Spring 2004, http://inst.eecs.berkeley.edu/~ee290t/sp04/ [12] K. R. Rao, Z. S. Bojkovic, D. A. Milovanovic, Multimedia Communication Systems: Techniques, Standard and Networks, Prentice Hall PTR, 2002.
58
[13] J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'', Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999. [14] V. Bhaskaranand K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: KluwerAcademic Publishers, 1997. [15] Yu-Nan Pan, A Fast Search Algorithm for Motion Estimation on H.264/AVC, Department of Electrical Engineering National Central University, Jhongli 320, Taiwan, July 2004.
[16] Syed Ali Khayam, The Discrete Cosine Transform (DCT): Theory and Application, Department of Electrical & Computer Engineering, Michigan State University, March 2003. [17] Ken Kabeen, Peter Gent, Image Compression and the Discrete Cosine Transform, College of the Redwoods. [18] Processor Development Tools http://www.autex.ru/dspa/dspa2008/04.pdf [19] ADSP-BF561 EZ-KIT Lite Evaluation System Manual, 2008 Analog Devices, Inc., http://www.analog.com/static/imported-files/eval_kit_manuals/ADSP-BF561%20EZKIT%20LIte%20Manual%20Rev%203-2%20March%202008.pdf [20] Gianluca Bailo, Massimo Bariani, Ivano Barbieri, Marco Raggio, Search Window Size Decision for Motion Estimation Algorithm in H.264 Video Coder, Department of Biophysical and Electronic Engineering, University of Geova, ITALY, 2004
[21] Hyungjoon Kim and Yucel Altunbasak, Low-Complexity Macroblock Mode Selection for H.264/AVC Encoders, Center of Signal and Image processing, Georgia Institute of Technology, Atlanta, 2004. [22] Jiancong Luo, Ishfaq Ahmed, Yu Sun and Yongfang Liang, A Multistage Fast Motion Estimation Scheme for Video Compression, Department of computer Science and Engineering, University of Texas, Arlington, 2004.
59

Motion Compensation Based Video Coder

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Motion Compensation Based Video Coder

Enviado por

Direitos autorais:

Formatos disponíveis

Information Engineering and Technology Faculty German University in Cairo

Motion Compensation Based video coder on a DSP board

Author: Supervisor: Submission Date:

Mohamed Ismail Mohamed Dr.Gamal Fahmy 13 July, 2009

Information Engineering and Technology Faculty German University in Cairo

Motion Compensation Based video coder on a DSP board

Author: Supervisor: Submission Date:

Mohamed Ismail Mohamed Dr.Gamal Fahmy 13 July, 2009

Mohamed Ismail Mohamed 13 July , 2009

1.4 Thesis Organization ......................................................................................................................... 4

Chroma Sub-Sampling ................................................................................................................... 10

2.5 Digital Video Formats and Applications......................................................................................... 11

Video Compression Fundamentals ................................................ 13

Block Matching Algorithm ............................................................. 21

Transform Coding ........................................................................... 27

5.1 Two Dimensional Discrete Cosine Transform ............................................................................... 28 Quantization ................................................................................................................................... 29

Chapter 6 Analog Devices Hardware & Software Experience......................... 31

Chapter 7 Experiment and Analysis ................................................................... 37

Chapter 8 Conclusion & Future Work ............................................................... 55

1.1 Importance of video compression

Figure .1 A Typical Video Encoder 1

1.4 Thesis Organization

2.1 Digital Video

2.2 Objective Video Quality

2.3 Color Spaces

Figure .1 Example for an image along with its RGB components 2

(2.3) (2.4) (2.5)

Figure .2 Example for an image along with its YCbCr components 2

2.4 Chroma Sub-Sampling

Figure .3 Chroma Subsampling different versions 2

2.5 Digital Video Formats and Applications:

Video format SIF CIF QCIF

Size 352x240/288 352x288 176x144

Color sampling 4:2:0 4:2:0 4:2:0

Frame rate 30/25 fps 30 fps 30 fps

Raw data (Mbps) 30 36.5 9.1

Chapter 3 Video Compression Fundamentals

Format ITU-R 601 CIF QCIF

Frames per second 30 fps 30 fps 15 fps

Bit rate (uncompressed) 216Mbps 36.5Mbps 4.6Mbps

Chapter 3 Video Compression Fundamentals

3.1 Video Coding Standards

Chapter 3 Video Compression Fundamentals

These standards use several techniques such as:

Chapter 3 Video Compression Fundamentals

3.2 MPEG-2 Coding Standard

Chapter 3 Video Compression Fundamentals

Block based coding:

Chapter 3 Video Compression Fundamentals

3.2.1 Group of Pictures

Chapter 3 Video Compression Fundamentals

3.3 Motion Estimation and Compensation

Figure .2 Video codec with prediction 3

Chapter 3 Video Compression Fundamentals

Figure .3 Video codec with motion estimation and compensation 3

Chapter 4 Block Matching Algorithms

Chapter 4 Block Matching Algorithms

Chapter 4 Block Matching Algorithms

4.1 Block Matching Algorithm Comparison Criteria:

Chapter 4 Block Matching Algorithms

4.2 Search Algorithm for Motion Estimation

4.2.1 Full Search Block Matching Algorithm

Chapter 4 Block Matching Algorithms