Você está na página 1de 5

Compression techniques and Cyclic Redundency Check

Prateek B K SR.NO 07917 5th sem November 10, 2009

Contents

1
.

Data Compression

1 Data Compression 1.1 1.2 Introduction . . . . . . . . . . . . . . . Lossless Compressions . . . . . . . . . 1.2.1 1.2.2 1.2.3 1.3 Human Coding . . . . . . . . Run Length Coding . . . . . . algorithm . . . . . . . . . . . .

1 1 2 2 2 3 3

1.1

Introduction

Lossy Compression . . . . . . . . . . .

2 Cyclic Redunduncy Check 2.1 2.2 2.3 Introduction . . . . . . . . . . . . . . . Specication . . . . . . . . . . . . . . . algorithm . . . . . . . . . . . . . . . .

Coding comprises all techniques aiming to a reduction of redundancy in the signal. The amount of redundancy inside a signal depends on the distribu4 tion of the signal values and the statistical bindings between them. Thus, uniform distributed white noise, for instance, contains no redundancy. Coding 4 techniques are reversible, which means the operations can be inverted without loss of digital information (identity between original and reconstructed signal). 5 Data reduction comprises all methods, which reduce the irrelevant parts of the signal content. This 5 results into changed signal information. 1

These both categories are supplemented by a third, which I have termed decorrelation. All techniques that concentrate the signal information (or signal energy) to few signal values, belong to this class. Typically, the distribution of the signal values becomes sharper by decorrelation. Most of the decorrelation algorithms work with oating point numbers. In this case the operations are not reversible due to the limited resolution of digital numbers in computers. Some methods can operate with integer numbers and guarantee reversibility. Generally, all processing steps are based on assumptions at the signal in terms of a signal model and sets of parameters. In case of changing parameters or model alterations the compression result becomes worse. This can be avoided by adaptation. The analysis of existing compression systems or the development of new ones should always distinguish between the basic technique and its adaptation.

code. A Human code is designed by merging together the two least probable characters, and repeating this process until there is only one character remaining. A code tree is thus generated and the Human code is obtained from the labeling of the code tree. An example of how this is done is shown below. It does not matter how the characters are arranged. I have arranged it above so that the nal code tree looks nice and neat. It does not matter how the nal code tree are labeled (with 0s and 1s). I chose to label the upper branches with 0s and the lower branches with 1s. There may be cases where there is a tie for the two least probable characters. In such cases, any tie-breaking procedure is acceptable. Human codes are not unique. Human codes are optimal in the sense that no other lossless xed-to-variable length code has a lower average rate. The rate of the above code is 2.94 bits/character. The entropy lower bound is 2.88 bits/character. 1.2.2 Run Length Coding

1.2

Lossless Compressions

The main objective of this page is to introduce two important lossless compression algorithms: Human Coding and Lempel-Ziv Coding. A Human encoder takes a block of input characters with xed length and produces a block of output bits of variable length. It is a xed-to-variable length code. Lempel-Ziv, on the other hand, is a variable-to-xed length code. The design of the Human code is optimal (for a xed blocklength) assuming that the source statistics are known a priori. The Lempel-Ziv code is not designed for any particular source but for a large class of sources. Surprisingly, for any xed stationary and ergodic source, the Lempel-Ziv algorithm performs just as well as if it was designed for that source. Mainly for this reason, the Lempel-Ziv code is the most widely used technique for lossless le compression.

LE is probably the easiest compression algorithm there is. It replaces sequences of the same data values within a le by a count number and a single value.Suppose the following string of data (17 bytes) has to be compressed: ABBBBBBBBBCDEEEEF Using RLE compression, the compressed le takes up 10 bytes and could look like this: A *8B C D *4E F As you can see, repetitive strings of data are replaced by a control character (*) followed by the num1.2.1 Human Coding ber of repeated characters and the repetitive characThe basic idea in Human coding is to assign short ter itself. The control character is not xed, it can codewords to those input blocks with high probabil- dier from implementation to implementation. ities and long codewords to those with low probabilIf the control character itself appears in the le ities. This concept is similar to that of the Morse then one extra character is coded. 2

Als you can see, RLE encoding is only eective if 1.2.3 there are sequences of 4 or more repeating characters because three characters are used to conduct RLE so coding two repeating characters would even lead to an increase in le size. It is important to know that there are many dierent run-length encoding schemes. The above example has just been used to demonstrate the basic principle of RLE encoding. Sometimes the implementation of RLE is adapted to the type of data that are being compressed. Advantages and disadvantages This algorithm is very easy to implement and does not require much CPU horsepower. RLE compression is only ecient with les that contain lots of repetitive data. These can be text les if they contain lots of spaces for indenting but line-art images that contain large white or black areas are far more suitable. Computer generated colour images (e.g. architectural drawings) can also give fair compression ratios. Where is RLE compression used? RLE compression can be used in the following le formats:

algorithm

1.3

Lossy Compression

TIFF les

PDF les

A compression technique that does not decompress data back to 100 Audio, video and some imaging applications can tolerate loss, and in many cases, it may not be noticeable to the human ear or eye. In other cases, it may be noticeable, but not that critical to the application. The more tolerance for loss, the smaller the le can be compressed, and the faster the le can be transmitted over a network. Examples of lossy le formats are MP3, AAC, MPEG and JPEG. Lossy compression is never used for business data and text, which demand a perfect lossless restoration. See lossless compression, MP3, AAC, MPEG and JPEG. Transforms the raw data to a domain that more accurately reects the information content. For example, rather than expressing a sound le as the am3

plitude levels over time, one may express it as the fre- one to reduce the resolution on the components to quency spectrum over time, which corresponds more accord with human perception humans have highest accurately to human audio perception. resolution for black-and-white (luma), lower resolution for mid-spectrum colors like yellow and green, and lowest for red and blues thus NTSC displays approximately 350 pixels of luma per scanline, 150 pixels of yellow vs. green, and 50 pixels of blue vs. red, which are proportional to human sensitivity to each component.

2
2.1

Cyclic Redunduncy Check


Introduction

While data reduction (compression, be it lossy or lossless) is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space[1] for example, in principle, if one starts with an analog or high-resolution digital master, an MP3 le of a given bitrate (e.g. 320 kbit/s) should provide a better representation than a raw uncompressed audio in WAV or AIFF le of the same bitrate. (Uncompressed audio can get lower bitrate only by lowering sampling frequency and/or sampling resolution.) Further, a transform coding may provide a better domain for manipulating or otherwise editing the data for example, equalization of audio is most naturally expressed in the frequency domain (boost the bass, for instance) rather than in the raw time domain. From this point of view, perceptual encoding is not essentially about discarding data, but rather about a better representation of data. Another use is for backward compatibility and graceful degradation: in color television, encoding color via a luminance-chrominance transform domain (such as YUV) means that black-and-white sets display the luminance, while ignoring the color information. Another example is chroma subsampling: the use of color spaces such as YIQ, used in NTSC, allow 4

In the CRC method, a certain number of check bits, often called a checksum, are appended to the message being transmitted. The receiver can determine whether or not the check bits agree with the data, to ascertain with a certain degree of probability whether or not an error occurred in transmission. If an error occurred, the receiver sends a negative acknowledgement (NAK) back to the sender, requesting that the message be retransmitted. The technique is also sometimes applied to data storage devices, such as a disk drive. In this situation each block on the disk would have check bits, and the hardware might automatically initiate a reread of the block when an error is detected, or it might report the error to software. The material that follows speaks in terms of a sender and a receiver of a message, but it should be understood that it applies to storage writing and read- ing as well. The circuit is implemented as follows: 1. The register contains n bits, equal to the length of the FCS 2. There are up to n XOR gates. 3. The presence or absence of a gate corresponds to the presence or absence of a term in the divisor polynomial P(X). The same circuit is used for both creation and check of the CRC. When creating the FCS, the circuit accepts the bits of the raw frame and then a sequence of zeros. The length of the sequence is the same as the length of the FCS. The contents of the shift register will be the FCS to append. When checking the

FCS, the circuit accepts the bits of the received frame (raw frame appended by FCS and perhaps corrupted by errors). The contents of the shift register should be zero or else there are errors.

2.2

Specication

Omission of the high-order bit of the divisor polynomial: Since the high-order bit is always 1, and since an n-bit CRC must be dened by an (n + 1)-bit divisor which overows an n-bit register, some writers assume that it is unnecessary to mention the divisors high-order bit.

The concept of the CRC as an error-detecting code gets complicated when an implementer or standards committee turns it into a practical system. Here are some of the complications: Sometimes an implementation prexes a xed bit pattern to the bitstream to be checked. This is useful when clocking errors might insert 0-bits in front of a message, an alteration that would otherwise leave the CRC unchanged. Sometimes an implementation appends n 0-bits (n being the size of the CRC) to the bitstream to be checked before the polynomial division occurs. This has the convenience that the CRC of the original bitstream with the CRC appended is exactly zero, so the CRC can be checked simply by performing the polynomial division on the expanded bitstream and comparing the remainder with zero. Sometimes an implementation exclusive-ORs a xed bit pattern into the remainder of the polynomial division. Bit order: Some schemes view the low-order bit of each byte as rst, which then during polynomial division means leftmost, which is contrary to our customary understanding of low-order. This convention makes sense when serial-port transmissions are CRC-checked in hardware, because some widespread serial-port transmission conventions transmit bytes leastsignicant bit rst. Byte order: With multi-byte CRCs, there can be confusion over whether the byte transmitted rst (or stored in the lowest-addressed byte of memory) is the least-signicant byte or the mostsignicant byte. For example, some 16-bit CRC schemes swap the bytes of the CRC. 5

2.3

algorithm

Você também pode gostar