Você está na página 1de 10

TERM PAPER CSE-408

TOPIC:HUFFMAN CODES

SUBMITTED TO:MR.VIJAY GARG

SUBMITTED BY:KARANBIR SINGH B.TECH CSE 10804631 RK1R08B39

ACKNOWLEDGEMENT

First and foremost I, KARANBIR SINGH is very thankful to Lect.VIJAY GARG who assigned me this term paper HUFFMAN CODES. I am hearty thankful to college library for providing the books, my roommates and classmates for helping me in assembling the notes related to this topic. Last but not the least; I am very thankful to my parents who give me financial support to complete my term paper.

KARANBIR SINGH

Contents

1) Introduction 2) Types of Huffman coding a)N-ray Huffman coding b)Adaptive Huffman coding c)Huffman template algorithm d)Length limited Huffman coding e)Huffman coding with unequal letter costs f)Hu-tucker coding g)Canonical Huffman code 3) Properties

4) Advantages

5) Disadvantages

6) Applications 7) References

INTRODUCTION

Huffman coding is an entropy encoding algorithm used for lossless data compression. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 ".Huffman coding is based on the frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently. Codes are stored in a Code Book which may be constructed for each image or a set of images. In all cases the code book plus encoded data must be transmitted to enable decoding.Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type ,no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted. Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.

TYPES OF HUFFMAN CODING N-ary Huffman coding


The n-ary Huffman algorithm uses the {0, 1, ... , n 1} alphabet to encode message and build an n-ary tree. This approach was considered by Huffman in his original paper. The same algorithm applies as for binary (n equals 2) codes, except that the n least probable symbols are taken together, instead of just the 2 least probable. Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added. This is because the tree must form an n to 1 contractor; for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractor. If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Huffman tree.

Adaptive Huffman coding


A variation called adaptive Huffman coding involves calculating the probabilities dynamically based on recent actual frequencies in the sequence of source symbols, and changing the coding tree structure to match the updated probability estimates.

Huffman template algorithm


Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only that the weights form a totally ordered commutative monoid, meaning a way to order weights and to add them. The Huffman template algorithm enables one to use any kind of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many combining methods (not just addition).

Such algorithms can solve other minimization problems, such as minimizing design. , a problem first applied to circuit

Length-limited Huffman coding limited


Length-limited Huffman coding is a variant where the goal is still to limited achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm. Its time complexity is , where is the maximum length of a codeword. No algorithm is known to solve this problem to in linear or linearithmic time, unlike the pre-sorted and unsorted conventional Huffman problems, respectively.

Huffman coding with unequal letter costs ng


In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is N digits will always have a cost of N no matter how many of those digits are 0s, N, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing. Huffman coding with unequal letter costs is the generalization in ization which this assumption is no longer assumed true: the letters of the encoding alphabet may have non uniform lengths, due to non-uniform characteristics of the transmission medium. An example is the encoding alphabet of Morse code, where a 'dash' takes longer to send code, than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in

the same manner or with the same efficiency as conventional Huffman coding.

Optimal alphabetic binary trees (Hu Tucker coding) (Hu-Tucker


In the standard Huffman coding problem, it is assumed that any codeword can correspond to any input symbol. In the alphabetic version, the alphabetic order of inputs and outputs must be identical. Thus, for example, could not be assigned code , but instead should be assigned either or . This is also known as the Hu-Tucker problem, after the authors of the paper Tucker presenting the first linearithmic solution to this optimal binary alphabetic problem, which has some similarities to Huffman algorithm, but is not a variation of this algorithm. These optimal alphabetic binary trees are often used as binary search trees trees.

Canonical Huffman code ical


If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman code has the same lengths as the optimal alphabetic code, which can be found from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting from numerically (re-)ordered input is sometimes called the canonical )ordered Huffman code and is often the code used in practice, due to ease of encoding/decoding. The technique for finding this code is sometimes called Huffman-Shannon Shannon-Fano coding, since it is optimal like , Huffman coding, but alphabetic in weight probability, like ShannonFano coding. The Huffman . Huffman-Shannon-Fano code corresponding to the Fano example is , which, having the same codeword lengths as the original solution, is also optimal.

PROPERTIES
1. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) -> great for decoder, unambiguous. 2. If prior statistics are available and accurate, then Huffman coding is very good 3. The frequencies used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed. 4. Huffman coding is optimal when the probability of each input symbol is a negative power of two. 5. The worst case for Huffman coding can happen when the probability of a symbol 6 cedes 2-1 = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called run-length encoding.

ADVANTAGES
Algorithm is easy to implement Produce a lossless compression of images

DISADVANTAGES
Efficiency depends on the accuracy of the statistical model used and type of image.

Algorithm varies with different formats, but few get any better than 8:1 compression. Compression of image files that contain long runs of identical pixels by Huffman is not as efficient when compared to RLE. The Huffman encoding process is usually done in two passes. During the first pass, a statistical model is built, and then in the second pass the image data is encoded based on the generated model. From here we can see that Huffman encoding is a relatively slow process as time is required to build the statistical model in order to archive an efficient compression rate. Another disadvantage of Huffman is that, all codes of the encoded data are of different sizes (not of fixed length). Therefore it is very difficult for the decoder to know that it has reached the last bit of a code, and the only way for it to know is by following the paths of the up-side down tree and coming to an end of it (one of the branch). Thus, if the encoded data is corrupted with additional bits added or bits missing, then whatever that is decoded will be wrong values, and the final image displayed will be garbage. It is required to send Huffman table at the beginning of the compressed file ,otherwise the decompressor will not be able to decode it. This causes overhead.

APPLICATIONS
1. Arithmetic coding can be viewed as a generalization of Huffman coding; indeed, in practice arithmetic coding is often preceded by Huffman coding, as it is easier to find an arithmetic code for a binary input than for a nonbinary input. 2. Huffman coding is in wide use because of its simplicity, high speed and lack of encumbrance by patents. 3. Huffman coding today is often used as a "back-end" to some other compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding.

REFERENCES
1.www.google.com/Huffman 2. http://en.wikipedia.org/huffman_codes 3. A.V.Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis Of Computer Algorithms, Pearson Education Asia, 2007 4. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein, Introduction to Algorithms, PHI Pvt. Ltd., 2007

Você também pode gostar